Context Engineering - @jackzhangpython

<p align="right"><font color="#3f3f3f">2025年07月02日</font></p> > 原文链接：[AI代理的上下文工程：构建Manus的经验教训]([https://blog.langchain.com/context-engineering-for-agents/](https://blog.langchain.com/context-engineering-for-agents/)) ## 摘要智能体需要上下文来执行任务。上下文工程是在智能体轨迹的每一步中，用恰当信息填充上下文窗口的艺术和科学。在这篇文章中，我们通过回顾各种流行的智能体和论文，分解了一些常见的上下文工程策略——写入、选择、压缩和隔离。然后我们解释了LangGraph是如何设计来支持这些策略的！另外，请观看我们关于上下文工程的视频[这里](https://youtu.be/4GiqzUHD5AA?ref=blog.langchain.com)。 ## 上下文工程 **[关键注解]：这个类比非常重要，它将LLM的上下文窗口比作计算机的RAM内存，这为理解上下文工程的重要性提供了直观的理解框架。** 正如Andrej Karpathy所说，LLM就像是一种[新型操作系统](https://www.youtube.com/watch?si=-aKY-x57ILAmWTdw&t=620&v=LCEmiRjPEtQ&feature=youtu.be&ref=blog.langchain.com)。LLM像CPU，其[上下文窗口](https://docs.anthropic.com/en/docs/build-with-claude/context-windows?ref=blog.langchain.com)像RAM，作为模型的工作内存。就像RAM一样，LLM上下文窗口处理各种上下文来源的[容量](https://lilianweng.github.io/posts/2023-06-23-agent/?ref=blog.langchain.com)有限。正如操作系统管理哪些内容适合放入CPU的RAM中一样，我们可以考虑"上下文工程"发挥类似的作用。[Karpathy很好地总结了这一点](https://x.com/karpathy/status/1937902205765607626?ref=blog.langchain.com)：上下文工程是"...为下一步用恰当信息填充上下文窗口的精妙艺术和科学。" 在构建LLM应用程序时，我们需要管理哪些类型的上下文？上下文工程是一个适用于几种不同上下文类型的[总括](https://x.com/dexhorthy/status/1933283008863482067?ref=blog.langchain.com)： - **指令** – 提示、记忆、少样本示例、工具描述等 - **知识** – 事实、记忆等 - **工具** – 工具调用的反馈 ## 智能体的上下文工程 **[关键注解]：这段指出了智能体面临的核心挑战——长时间运行的任务会累积大量token，导致多种性能问题。这是上下文工程变得至关重要的根本原因。** 今年，随着LLM在[推理](https://platform.openai.com/docs/guides/reasoning?api-mode=responses&ref=blog.langchain.com)和[工具调用](https://www.anthropic.com/engineering/building-effective-agents?ref=blog.langchain.com)方面变得更好，对[智能体](https://www.anthropic.com/engineering/building-effective-agents?ref=blog.langchain.com)的兴趣大幅增长。[智能体](https://www.anthropic.com/engineering/building-effective-agents?ref=blog.langchain.com)交替进行[LLM调用和工具调用](https://www.anthropic.com/engineering/building-effective-agents?ref=blog.langchain.com)，通常用于[长时间运行的任务](https://blog.langchain.com/introducing-ambient-agents/)。智能体交替进行[LLM调用和工具调用](https://www.anthropic.com/engineering/building-effective-agents?ref=blog.langchain.com)，使用工具反馈来决定下一步。然而，长时间运行的任务和工具调用的累积反馈意味着智能体经常使用大量token。这可能导致许多问题：可能[超过上下文窗口的大小](https://cognition.ai/blog/kevin-32b?ref=blog.langchain.com)，成本/延迟急剧增加，或降低智能体性能。Drew Breunig[很好地概述了](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com)较长上下文可能导致性能问题的具体方式，包括： **[关键注解]：这四种上下文失败模式是理解上下文工程必要性的核心概念，每种都代表了不同类型的性能退化。** - [上下文污染：当幻觉进入上下文时](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-poisoning) - [上下文干扰：当上下文压倒训练时](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-distraction) - [上下文混乱：当多余的上下文影响响应时](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-confusion) - [上下文冲突：当上下文的部分内容相互矛盾时](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-clash) 考虑到这一点，[Cognition](https://cognition.ai/blog/dont-build-multi-agents?ref=blog.langchain.com)指出了上下文工程的重要性： "上下文工程"...实际上是构建AI智能体的工程师的第一要务。 [Anthropic](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com)也清楚地阐述了这一点：智能体经常进行跨越数百轮的对话，需要仔细的上下文管理策略。那么，人们今天是如何应对这一挑战的呢？我们将智能体上下文工程的常见策略分为四个类别——写入、选择、压缩和隔离——并通过回顾一些流行的智能体产品和论文给出每种策略的示例。然后我们解释LangGraph是如何设计来支持这些策略的！ **[关键注解]：以下四个策略构成了本文的核心框架，每个策略都代表了一种不同的上下文管理方法。** ## 写入上下文写入上下文意味着将其保存在上下文窗口之外，以帮助智能体执行任务。 ### 记录板当人类解决任务时，我们会做笔记并记住事情以备将来相关任务使用。智能体也在获得这些能力！通过"[记录板](https://www.anthropic.com/engineering/claude-think-tool?ref=blog.langchain.com)"进行笔记是在智能体执行任务时持久化信息的一种方法。这个想法是将信息保存在上下文窗口之外，以便智能体可以使用。[Anthropic的多智能体研究员](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com)展示了一个清晰的例子：主研究员首先思考方法并将其计划保存到内存中以持久化上下文，因为如果上下文窗口超过200,000个token，它将被截断，保持计划很重要。记录板可以通过几种不同的方式实现。它们可以是简单[写入文件](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem?ref=blog.langchain.com)的[工具调用](https://www.anthropic.com/engineering/claude-think-tool?ref=blog.langchain.com)。它们也可以是运行时[状态对象](https://langchain-ai.github.io/langgraph/concepts/low_level/?ref=blog.langchain.com#state)中的一个字段，在会话期间持续存在。无论哪种情况，记录板都让智能体保存有用信息以帮助完成任务。 ### 记忆记录板帮助智能体在给定会话（或[线程](https://langchain-ai.github.io/langgraph/concepts/persistence/?ref=blog.langchain.com#threads)）内解决任务，但有时智能体受益于跨多个会话记住事情！[Reflexion](https://arxiv.org/abs/2303.11366?ref=blog.langchain.com)引入了在每个智能体回合后进行反思并重用这些自生成记忆的想法。[生成智能体](https://ar5iv.labs.arxiv.org/html/2304.03442?ref=blog.langchain.com)从过去智能体反馈的集合中定期合成记忆。这些概念进入了流行产品，如[ChatGPT](https://help.openai.com/en/articles/8590148-memory-faq?ref=blog.langchain.com)、[Cursor](https://forum.cursor.com/t/0-51-memories-feature/98509?ref=blog.langchain.com)和[Windsurf](https://docs.windsurf.com/windsurf/cascade/memories?ref=blog.langchain.com)，它们都有基于用户-智能体交互自动生成可跨会话持续的长期记忆的机制。 ## 选择上下文选择上下文意味着将其拉入上下文窗口以帮助智能体执行任务。 ### 记录板从记录板选择上下文的机制取决于记录板的实现方式。如果它是一个[工具](https://www.anthropic.com/engineering/claude-think-tool?ref=blog.langchain.com)，那么智能体可以通过工具调用简单地读取它。如果它是智能体运行时状态的一部分，那么开发者可以选择在每一步向智能体公开状态的哪些部分。这为在后续回合中向LLM公开记录板上下文提供了细粒度的控制级别。 ### 记忆如果智能体有保存记忆的能力，它们也需要选择与正在执行的任务相关的记忆的能力。这可能有几个有用的原因。智能体可能选择少样本示例（[情节性](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#memory-types)[记忆](https://arxiv.org/pdf/2309.02427?ref=blog.langchain.com)）作为期望行为的示例，指令（[程序性](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#memory-types)[记忆](https://arxiv.org/pdf/2309.02427?ref=blog.langchain.com)）来引导行为，或事实（[语义](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#memory-types)[记忆](https://arxiv.org/pdf/2309.02427?ref=blog.langchain.com)）作为任务相关上下文。一个挑战是确保选择相关的记忆。一些流行的智能体简单地使用总是被拉入上下文的狭窄文件集。例如，许多代码智能体使用特定文件来保存指令（"程序性"记忆）或在某些情况下保存示例（"情节性"记忆）。Claude Code使用[CLAUDE.md](http://claude.md/?ref=blog.langchain.com)。[Cursor](https://docs.cursor.com/context/rules?ref=blog.langchain.com)和[Windsurf](https://windsurf.com/editor/directory?ref=blog.langchain.com)使用规则文件。但是，如果智能体存储大量[集合](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#collection)的事实和/或关系（例如，[语义](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#memory-types)记忆），选择就更困难了。[ChatGPT](https://help.openai.com/en/articles/8590148-memory-faq?ref=blog.langchain.com)是一个流行产品的好例子，它存储并从大量用户特定记忆集合中选择。嵌入和/或[知识](https://arxiv.org/html/2501.13956v1?ref=blog.langchain.com#:~:text=In%20Zep%2C%20memory%20is%20powered,subgraph%2C%20and%20a%20community%20subgraph)[图](https://neo4j.com/blog/developer/graphiti-knowledge-graph-memory/?ref=blog.langchain.com#:~:text=changes%20since%20updates%20can%20trigger,and%20holistic%20memory%20for%20agentic)用于记忆索引通常用于辅助选择。尽管如此，记忆选择仍然具有挑战性。在AIEngineer World's Fair上，[Simon Willison分享了](https://simonwillison.net/2025/Jun/6/six-months-in-llms/?ref=blog.langchain.com)选择出错的例子：ChatGPT从记忆中获取了他的位置并意外地将其注入到请求的图像中。这种意外或不期望的记忆检索可能让一些用户感觉上下文窗口"不再属于他们"！ ### 工具智能体使用工具，但如果提供太多工具可能会变得过载。这通常是因为工具描述重叠，导致模型对使用哪个工具感到困惑。一种方法是[对工具描述应用RAG（检索增强生成）](https://arxiv.org/abs/2410.14594?ref=blog.langchain.com)，以便只获取与任务最相关的工具。一些[最近的论文](https://arxiv.org/abs/2505.03275?ref=blog.langchain.com)表明这可以将工具选择准确性提高3倍。 ### 知识 **[关键注解]：这段强调了RAG在代码智能体中的重要性和复杂性，特别是在处理大型代码库时的挑战。** [RAG](https://github.com/langchain-ai/rag-from-scratch?ref=blog.langchain.com)是一个丰富的话题，它[可能是一个核心的上下文工程挑战](https://x.com/_mohansolo/status/1899630246862966837?ref=blog.langchain.com)。代码智能体是大规模生产中RAG的最佳例子之一。来自Windsurf的Varun很好地捕捉了其中一些挑战：代码索引 ≠ 上下文检索...[我们正在进行索引和嵌入搜索...[使用]AST解析代码并沿语义有意义的边界进行分块...随着代码库规模的增长，嵌入搜索作为检索启发式变得不可靠...我们必须依赖技术组合，如grep/文件搜索、基于知识图的检索，以及...重新排序步骤，其中[上下文]按相关性顺序排列。 ## 压缩上下文压缩上下文涉及仅保留执行任务所需的token。 ### 上下文摘要智能体交互可以跨越[数百轮](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com)并使用token密集的工具调用。摘要是管理这些挑战的一种常见方法。如果您使用过Claude Code，您已经看到了这个功能。Claude Code在您超过95%的上下文窗口后运行"[自动压缩](https://docs.anthropic.com/en/docs/claude-code/costs?ref=blog.langchain.com)"，它将摘要用户-智能体交互的完整轨迹。这种跨[智能体轨迹](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#manage-short-term-memory)的压缩可以使用各种策略，如[递归](https://arxiv.org/pdf/2308.15022?ref=blog.langchain.com#:~:text=the%20retrieved%20utterances%20capture%20the,based%203)或[分层](https://alignment.anthropic.com/2025/summarization-for-monitoring/?ref=blog.langchain.com#:~:text=We%20addressed%20these%20issues%20by,of%20our%20computer%20use%20capability)摘要。在智能体设计的特定点[添加摘要](https://github.com/langchain-ai/open_deep_research/blob/e5a5160a398a3699857d00d8569cb7fd0ac48a4f/src/open_deep_research/utils.py?ref=blog.langchain.com#L1407)也很有用。例如，它可以用于后处理某些工具调用（例如，token密集的搜索工具）。作为第二个例子，[Cognition](https://cognition.ai/blog/dont-build-multi-agents?ref=blog.langchain.com#a-theory-of-building-long-running-agents)提到在智能体-智能体边界进行摘要，以在知识交接期间减少token。如果需要捕获特定事件或决策，摘要可能是一个挑战。[Cognition](https://cognition.ai/blog/dont-build-multi-agents?ref=blog.langchain.com#a-theory-of-building-long-running-agents)为此使用了微调模型，这强调了这一步骤可能需要多少工作。 ### 上下文修剪而摘要通常使用LLM来提炼最相关的上下文片段，修剪通常可以过滤或者，正如Drew Breunig指出的，"[修剪](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html?ref=blog.langchain.com)"上下文。这可以使用硬编码启发式，如从列表中删除[较旧的消息](https://python.langchain.com/docs/how_to/trim_messages/?ref=blog.langchain.com)。Drew还提到了[Provence](https://arxiv.org/abs/2501.16214?ref=blog.langchain.com)，一个用于问答的训练上下文修剪器。 ## 隔离上下文隔离上下文涉及将其分割以帮助智能体执行任务。 ### 多智能体 **[关键注解]：多智能体方法是隔离上下文的最流行方式，但也带来了token使用量增加等挑战。这里展示了性能提升与资源消耗之间的权衡。** 隔离上下文最流行的方式之一是将其分割到子智能体中。OpenAI [Swarm](https://github.com/openai/swarm?ref=blog.langchain.com)库的动机是[关注点分离](https://openai.github.io/openai-agents-python/ref/agent/?ref=blog.langchain.com)，其中智能体团队可以处理特定的子任务。每个智能体都有特定的工具集、指令和自己的上下文窗口。 Anthropic的[多智能体研究员](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com)为此提供了论证：具有隔离上下文的多个智能体优于单智能体，主要是因为每个子智能体上下文窗口可以分配给更窄的子任务。正如博客所说： [子智能体操作]并行运行在各自的上下文窗口中，同时探索问题的不同方面。当然，多智能体的挑战包括token使用（例如，Anthropic报告的比聊天多达[15倍的token](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com)），需要仔细的[提示工程](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com)来规划子智能体工作，以及子智能体的协调。 ### 使用环境进行上下文隔离 HuggingFace的[深度研究员](https://huggingface.co/blog/open-deep-research?ref=blog.langchain.com#:~:text=From%20building%20,it%20can%20still%20use%20it)显示了上下文隔离的另一个有趣例子。大多数智能体使用[工具调用API](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview?ref=blog.langchain.com)，它返回可以传递给工具（例如，搜索API）以获取工具反馈（例如，搜索结果）的JSON对象（工具参数）。HuggingFace使用[CodeAgent](https://huggingface.co/papers/2402.01030?ref=blog.langchain.com)，它输出包含所需工具调用的代码。然后代码在[沙箱](https://e2b.dev/?ref=blog.langchain.com)中运行。来自工具调用的选定上下文（例如，返回值）然后传递回LLM。这允许在环境中将上下文与LLM隔离。Hugging Face注意到这是隔离token密集对象的特别好的方法： [代码智能体允许]更好地处理状态...需要稍后存储此图像/音频/其他内容吗？没问题，只需将其分配为[您状态中的]变量，您[稍后使用它]。 ### 状态值得指出的是，智能体的运行时[状态对象](https://langchain-ai.github.io/langgraph/concepts/low_level/?ref=blog.langchain.com#state)也可以是隔离上下文的好方法。这可以达到与沙箱相同的目的。状态对象可以设计为具有可以写入上下文的字段的[模式](https://langchain-ai.github.io/langgraph/concepts/low_level/?ref=blog.langchain.com#schema)。模式的一个字段（例如，消息）可以在智能体的每个回合暴露给LLM，但模式可以在其他字段中隔离信息以便更有选择地使用。 ## 使用LangSmith / LangGraph进行上下文工程 **[关键注解]：这部分提供了实际的实施指导，强调了在开始上下文工程之前建立观察和评估基础设施的重要性。** 那么，您如何应用这些想法呢？在开始之前，有两个基础部分是有帮助的。首先，确保您有[查看数据](https://hamel.dev/blog/posts/evals/?ref=blog.langchain.com)和跟踪智能体token使用情况的方法。这有助于确定最佳应用上下文工程努力的地方。[LangSmith](https://docs.smith.langchain.com/?ref=blog.langchain.com)非常适合智能体[跟踪/可观察性](https://docs.smith.langchain.com/observability?ref=blog.langchain.com)，提供了做这件事的好方法。其次，确保您有一个简单的方法来测试上下文工程是否伤害或改善智能体性能。LangSmith支持[智能体评估](https://docs.smith.langchain.com/evaluation/tutorials/agents?ref=blog.langchain.com)来测试任何上下文工程努力的影响。 ### 写入上下文 LangGraph设计时考虑了线程范围的（[短期](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#short-term-memory)）和[长期记忆](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#long-term-memory)。短期记忆使用[检查点](https://langchain-ai.github.io/langgraph/concepts/persistence/?ref=blog.langchain.com)在智能体的所有步骤中持久化[智能体状态](https://langchain-ai.github.io/langgraph/concepts/low_level/?ref=blog.langchain.com#state)。这作为"记录板"非常有用，允许您将信息写入状态并在智能体轨迹的任何步骤获取它。 LangGraph的长期记忆让您可以跨智能体的多个会话持久化上下文。它很灵活，允许您保存小的[文件](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#profile)集（例如，用户配置文件或规则）或更大的记忆[集合](https://langchain-ai.github.io/langgraph/concepts/memory/?ref=blog.langchain.com#collection)。此外，[LangMem](https://langchain-ai.github.io/langmem/?ref=blog.langchain.com)提供了大量有用的抽象来辅助LangGraph记忆管理。 ### 选择上下文在LangGraph智能体的每个节点（步骤）内，您可以获取[状态](https://langchain-ai.github.io/langgraph/concepts/low_level/?ref=blog.langchain.com#state)。这给您对在每个智能体步骤向LLM呈现什么上下文的细粒度控制。此外，LangGraph的长期记忆在每个节点内都可以访问，并支持各种类型的检索（例如，获取文件以及[在记忆集合上进行基于嵌入的检索](https://langchain-ai.github.io/langgraph/cloud/reference/cli/?ref=blog.langchain.com#adding-semantic-search-to-the-store)）。关于长期记忆的概述，请参见[我们的Deeplearning.ai课程](https://www.deeplearning.ai/short-courses/long-term-agentic-memory-with-langgraph/?ref=blog.langchain.com)。关于应用于特定智能体的记忆入门点，请参见我们的[Ambient Agents](https://academy.langchain.com/courses/ambient-agents?ref=blog.langchain.com)课程。这展示了如何在可以管理您的电子邮件并从您的反馈中学习的长期运行智能体中使用LangGraph记忆。对于工具选择，[LangGraph Bigtool](https://github.com/langchain-ai/langgraph-bigtool?ref=blog.langchain.com)库是对工具描述应用语义搜索的好方法。这有助于在处理大量工具集合时选择与任务最相关的工具。最后，我们有几个[教程和视频](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/?ref=blog.langchain.com)展示如何在LangGraph中使用各种类型的RAG。 ### 压缩上下文因为LangGraph[是一个低级编排框架](https://blog.langchain.com/how-to-think-about-agent-frameworks/)，您[将智能体布局为一组节点](https://www.youtube.com/watch?v=aHCDrAbH_go&ref=blog.langchain.com)，[定义](https://blog.langchain.com/how-to-think-about-agent-frameworks/)每个节点内的逻辑，并定义在它们之间传递的状态对象。这种控制提供了几种压缩上下文的方法。一种常见方法是使用消息列表作为智能体状态，并使用[一些内置实用程序](https://langchain-ai.github.io/langgraph/how-tos/memory/add-memory/?ref=blog.langchain.com#manage-short-term-memory)定期[摘要或修剪](https://langchain-ai.github.io/langgraph/how-tos/memory/add-memory/?ref=blog.langchain.com#manage-short-term-memory)它。但是，您也可以添加逻辑来后处理[工具调用](https://github.com/langchain-ai/open_deep_research/blob/e5a5160a398a3699857d00d8569cb7fd0ac48a4f/src/open_deep_research/utils.py?ref=blog.langchain.com#L1407)或以几种不同方式处理智能体的工作阶段。您可以在特定点添加摘要节点，或者也可以向工具调用节点添加摘要逻辑，以压缩特定工具调用的输出。 ### 隔离上下文 LangGraph围绕[状态](https://langchain-ai.github.io/langgraph/concepts/low_level/?ref=blog.langchain.com#state)对象设计，允许您指定状态模式并在每个智能体步骤访问状态。例如，您可以将工具调用的上下文存储在状态的某些字段中，将它们与LLM隔离，直到需要该上下文。除了状态，LangGraph还支持使用沙箱进行上下文隔离。请参见此[仓库](https://github.com/jacoblee93/mini-chat-langchain?tab=readme-ov-file&ref=blog.langchain.com)，了解使用[E2B沙箱](https://e2b.dev/?ref=blog.langchain.com)进行工具调用的LangGraph智能体示例。请参见此[视频](https://www.youtube.com/watch?v=FBnER2sxt0w&ref=blog.langchain.com)，了解使用Pyodide进行沙箱的示例，其中状态可以持久化。LangGraph还大力支持构建多智能体架构，如[supervisor](https://github.com/langchain-ai/langgraph-supervisor-py?ref=blog.langchain.com)和[swarm](https://github.com/langchain-ai/langgraph-swarm-py?ref=blog.langchain.com)库。您可以[观看](https://www.youtube.com/watch?v=4nZl32FwU-o&ref=blog.langchain.com)[这些](https://www.youtube.com/watch?v=JeyDrn1dSUQ&ref=blog.langchain.com)[视频](https://www.youtube.com/watch?v=B_0TNuYi56w&ref=blog.langchain.com)了解在LangGraph中使用多智能体的更多详细信息。 ## 结论 **[关键注解]：这个结论很好地总结了上下文工程作为一门需要掌握的技艺，以及LangGraph/LangSmith如何支持持续改进的反馈循环。** 上下文工程正在成为智能体构建者应该致力于掌握的一门技艺。在这里，我们涵盖了当今许多流行智能体中看到的几种常见模式： - **写入上下文** - 将其保存在上下文窗口之外以帮助智能体执行任务。 - **选择上下文** - 将其拉入上下文窗口以帮助智能体执行任务。 - **压缩上下文** - 仅保留执行任务所需的token。 - **隔离上下文** - 将其分割以帮助智能体执行任务。 LangGraph使实现这些策略变得容易，LangSmith提供了测试智能体和跟踪上下文使用的简单方法。LangGraph和LangSmith一起实现了识别应用上下文工程的最佳机会、实施它、测试它并重复这一过程的良性反馈循环。 --- ## 关键概念总结 **上下文工程的四大策略：** 1. **写入（Write）**：在上下文窗口外保存信息（记录板、记忆） 2. **选择（Select）**：智能检索相关信息到上下文窗口（RAG、工具选择） 3. **压缩（Compress）**：减少token使用（摘要、修剪） 4. **隔离（Isolate）**：分割上下文管理（多智能体、沙箱、状态管理） **核心挑战：** - 上下文污染、干扰、混乱、冲突 - Token成本和延迟增加 - 上下文窗口大小限制 - 长期运行任务的记忆管理这篇文章提供了一个系统性的框架来理解和实施智能体的上下文管理，对于构建高效的AI智能体系统具有重要的指导意义。