Context Engineering for Agents - Lance Martin, LangChain

<p align="right"><font color="#3f3f3f">2025年09月12日</font></p> --- # Context Engineering for Agents - Lance Martin, LangChain # 智能体的上下文工程 - Lance Martin, LangChain ## 引言与背景 / Introduction & Background **0:04** Hey everyone, welcome to the latest space podcast. This is Allesio, founder of Kernel Labs, and I'm joined by Swixs, founder of Small AI. Hello. Hello. Uh, we are so happy to be in the remote studio with Lance Martin from Lang Chain Langraph and everything else he does. Welcome. **0:04** 大家好,欢迎来到最新一期的播客。我是Allesio,Kernel Labs的创始人,与我一起的是Small AI的创始人Swixs。你好,你好。我们非常高兴能在远程工作室与来自LangChain、Langraph的Lance Martin交流。欢迎。 **0:15** It's great to be here. I'm a longtime listener of the pod and uh is finally great to be on. Yeah. um you you've been uh part of uh you know our orbit for a while. You you spoke at uh one of the AIES uh and also obviously we're we're pretty close with with Lang Chain. **0:15** 很高兴来到这里。我是这个播客的长期听众,终于能够参与进来真是太好了。是的,你已经在我们的圈子里有一段时间了。你在AIES的活动上演讲过,而且显然我们与LangChain关系很密切。 ## 上下文工程的起源 / Origins of Context Engineering **0:33** Um I think uh recently though you know I think you you've you've also like been doing a lot of tutorials. I remember you did like 01 deep researcher sorry R1 deep researcher which is uh a pretty popular project um and uh an async ambient agents. Uh but the thing the thing that really sort of prompted me to reach out and say like okay it's finally time for the Lance Martin pod is your recent work on context engineering which is all the rage. Uh how'd you get into it? **0:33** 我认为最近你做了很多教程。我记得你做过01深度研究员,不对,是R1深度研究员,这是一个很受欢迎的项目,还有异步环境智能体。但真正促使我联系你说"是时候做Lance Martin播客了"的是你最近关于上下文工程的工作,这非常火热。你是怎么开始研究这个的? **1:00** Well, you know, it's funny. Um, buzzwords emerge often times when people have a shared experience. And I think lots of people started building agents kind of early this year, mid this year, quote unquote, the year of agents. And I think kind of what happened is when you put it put when you kind of put together an agent, it's just tool calling in a loop. It's relatively simple to lay out, but it's actually quite tricky to get to work well. **1:00** 嗯,你知道,这很有趣。流行词往往在人们有共同经历时出现。我认为很多人在今年早期、年中开始构建智能体,所谓的"智能体之年"。我认为发生的情况是,当你组装一个智能体时,它只是循环调用工具。布局相对简单,但实际上要让它运行良好是相当棘手的。 **1:24** And in particular, managing context with agents is a hard problem. Carpathy put out that tweet kind of canonizing the term context engineering and he kind of mentioned this uh kind of nice definition which is context engineering is the challenge of feeding an LM just the right context for the next step which is highly applicable to agents and I think that really really resonated with a lot of people **1:24** 特别是,管理智能体的上下文是一个难题。Karpathy发了那条推文,将"上下文工程"这个术语正式化,他提到了一个很好的定义:上下文工程是为语言模型的下一步提供恰当上下文的挑战,这非常适用于智能体,我认为这真的引起了很多人的共鸣。 ## 提示工程 vs 上下文工程 / Prompt Engineering vs Context Engineering **2:06** How do you define the lines between prompt engineering and like context engineering? So is the prompt optimization like context engineering in your mind? Like I think people are kind of confused like are we replacing the the term like what what is it? **2:06** 你如何定义提示工程和上下文工程之间的界限?在你看来,提示优化算是上下文工程吗?我觉得人们有点困惑,比如我们是在替换这个术语吗,到底是什么? **2:24** Well, I think that you know prompt engineering is kind of a subset of context engineering. I think when we kind of moved from chat models and chat interactions to agents there was a big shift that occurred. So with chat models working with chat GBT the human message is really the primary input and of course a lot of time and effort is spent in crafting the right message that's passed to the model. **2:24** 我认为提示工程是上下文工程的一个子集。当我们从聊天模型和聊天交互转向智能体时,发生了一个重大转变。对于使用ChatGPT的聊天模型,人类消息确实是主要输入,当然很多时间和精力都花在制作传递给模型的正确消息上。 **2:42** With agents the game is a bit trickier though because the agents getting context not just from the human but now context is flowing in from tool calls during the agent trajectory. And so I think this was really the key challenge that I observed and many people observed is like oof um when you put together an agent, you're not only managing of course the system instructions, system prompt and of course user instructions, you also have to manage all this context that's flowing in at each step over the course of a large number of tool calls. **2:42** 但对于智能体来说,情况就更棘手了,因为智能体不仅从人类那里获取上下文,而且在智能体执行过程中,上下文还通过工具调用流入。所以我认为这确实是我和许多人观察到的关键挑战——当你组装一个智能体时,你不仅要管理系统指令、系统提示和用户指令,还必须管理在大量工具调用过程中每一步流入的所有上下文。 ## 上下文窗口与性能问题 / Context Window & Performance Issues **3:12** And I think there's been a number of good pieces on this. Manis put out a great piece talking about context engineering with Manis and they made the point that the typical Manis task is like 50 tool calls. Anthropics multi-agent research is another nice example of this. They mentioned that the typical production agent and this is probably referring to Claude Code could be other agents that they've produced is like hundreds of tool calls. **3:12** 我认为关于这个话题有很多好文章。Manis发表了一篇关于上下文工程的精彩文章,他们指出典型的Manis任务大约需要50次工具调用。Anthropic的多智能体研究是另一个很好的例子。他们提到典型的生产智能体——这可能指的是Claude Code或者他们开发的其他智能体——需要数百次工具调用。 **3:38** And so when I had my first experience with this, I think many people have this experience. You put together an agent. You're sold the story. That's just tool calling in a loop. That's pretty simple. You put it together. I was building deep research. These resource tool calls are pretty tokenheavy. And suddenly you're finding that my deep researcher, for example, with a naive tool calling loop was using 500,000 tokens. It was like a dollar to $2 per run. **3:38** 所以当我第一次遇到这种情况时,我想很多人都有这种经历。你组装了一个智能体。别人告诉你这很简单,只是循环调用工具而已。你把它组装起来。我在构建深度研究工具。这些资源工具调用消耗大量token。突然你发现,例如,我的深度研究员使用简单的工具调用循环时使用了50万个token。每次运行成本大约1到2美元。 **4:04** I think this is an experience that many people had. And I think it's kind of that the challenge is realizing that o building agents is actually a little bit tricky because if you just naively plum in the context from each of those tool calls naively you just hit the context window of the LM that's kind of the obvious problem but also Jeff from Chroma spoke about this on the recent pod. There's all these weird and idiosyncratic failure modes as context gets longer. **4:04** 我认为这是很多人都有的经历。挑战在于意识到构建智能体实际上有点棘手,因为如果你只是天真地将每次工具调用的上下文直接输入,你就会触及语言模型的上下文窗口限制,这是显而易见的问题。但Chroma的Jeff在最近的播客中也谈到了这一点。随着上下文变长,会出现各种奇怪和特殊的失败模式。 **4:30** So Jeff has that nice report on context rot. And so you have both these problems happening. If you build a naive agent, context is flowing in from all these tool calls. It could be dozens to hundreds. And there's degradation in performance with respect to context length. And also the trivial problem of hitting the context window itself. **4:30** Jeff有一份关于上下文衰退的精彩报告。所以你会同时遇到这两个问题。如果你构建一个简单的智能体,上下文会从所有这些工具调用中流入。可能有几十到几百次。性能会随着上下文长度而下降。还有触及上下文窗口本身这个显而易见的问题。 ## 上下文工程的五大类别 / Five Categories of Context Engineering **4:49** So this was kind of I think the motivation for this new idea of actually it's very important to engineer the context that you're feeding to an agent. And that kind of spawned into a bunch of different ideas that I put together in the blog post that people are using to handle this um drawn from anthropic from my own experience from Manis and others. **4:49** 所以我认为这就是这个新想法的动机——精心设计你输入给智能体的上下文实际上非常重要。这衍生出一系列不同的想法,我在博客文章中整理了这些想法,人们正在使用这些方法来处理这个问题,这些方法来自Anthropic、我自己的经验、Manis和其他人。 **6:02** How do you define the five categories? So I mean I understand what offload kind of means but like can you maybe yeah go deeper. **6:02** 你如何定义这五个类别?我的意思是我理解"卸载"是什么意思,但你能更深入地解释一下吗? ### 1. 卸载(Offload)/ Offload **6:26** Yeah. Yeah. We should let's walk through these actually. So when I talked about naive agents and the first time I built an agent agent makes a bunch of tool calls. Those tool calls are passed back to the LM at each turn and you naively just plum all that context back. And of course what you see is the context window grows significantly because these this tool feedback is accumulating in your message history. **6:26** 好的,让我们逐一讲解这些。当我谈到简单智能体时,第一次构建智能体时,智能体会进行大量工具调用。这些工具调用在每一轮都会传回语言模型,你只是简单地将所有上下文传回。当然,你会看到上下文窗口显著增长,因为这些工具反馈在你的消息历史中累积。 **6:50** So a perspective that manage shared in particular I thought was really good is it's important and useful to offload context. Don't just naively send back the full context of each of your tool calls. You can actually offload it and they talk about offloading it to disk. So they talked about this idea of using the file system as externalized memory rather than just writing back the full contents of your tool calls which could be token heavy. **6:50** Manis分享的一个观点我认为非常好,那就是卸载上下文很重要且有用。不要只是天真地将每次工具调用的完整上下文发送回去。你实际上可以卸载它,他们谈到将其卸载到磁盘。他们谈到了这个想法:使用文件系统作为外部化内存,而不是直接写回可能消耗大量token的工具调用的完整内容。 **7:12** Write those to disk and you can write back a summary. It could be a URL something so the agent knows it's retrieved a thing. It can fetch that on demand but you're not just naively pushing all that raw context back to the model. So that's this offloading concept. **7:12** 将这些内容写入磁盘,你可以写回一个摘要。可能是一个URL之类的东西,这样智能体知道它检索了某个东西。它可以按需获取,但你不是简单地将所有原始上下文推送回模型。这就是卸载的概念。 **7:32** Um note that it could be a file system. It could also be for example agent state. So Lang graph for example has this notion of state. So it could be kind of the the agent runtime state object. It could be the file system. But the point is you're not just plumbing all the context from your tool calls back into the agents message history. You're saving it in externalized system. You're fetching it as needed. This saves token cost significantly. So that's the offloading concept. **7:32** 注意,它可以是文件系统。也可以是例如智能体状态。例如,Langraph有状态的概念。它可以是智能体运行时状态对象。可以是文件系统。但关键是你不只是将工具调用的所有上下文传回智能体的消息历史。你将其保存在外部化系统中。按需获取。这大大节省了token成本。这就是卸载的概念。 **摘要生成的重要性 / Importance of Summary Generation** **8:05** I I guess the question on the offloading is like what's the um minimum you know summary metadata or whatever you need to keep in the context to let the model understand what's in the offloaded context like if you're doing deep research obviously you're offloading kind of like the full pages maybe but like how do you generate like an effective summary or blurb about what's in the file **8:05** 我想关于卸载的问题是,你需要在上下文中保留的最小摘要、元数据或其他信息是什么,以便让模型理解卸载的上下文中有什么。如果你在进行深度研究,显然你要卸载完整的页面,但你如何生成关于文件内容的有效摘要或简介? **8:24** so this is actually a very interesting and important point so I'll give an example from what I did with open deepep research so open deep research is a deep research agent that I've been working on for about a year and it's now according to deep research bench the best performing uh deep research agent um at least on that particular benchmark. **8:24** 这实际上是一个非常有趣且重要的点。我会举一个我在开放深度研究中所做的例子。开放深度研究是我开发了大约一年的深度研究智能体,根据深度研究基准测试,它现在是表现最好的深度研究智能体,至少在那个特定基准上是这样。 **8:41** So it's it's pretty good. Listen, it's not as good as open as deep research which uses end to end RL. It's all fully open source and it's pretty strong. So I just do carefully prompted summarization. I try to prompt the summarization model to give kind of an exhaustive set of bullet points of the key things that are in the post just so the agent can know whether to retrieve the full context later. **8:41** 所以它相当不错。听着,它不如使用端到端强化学习的开放深度研究那么好。它是完全开源的,而且相当强大。所以我只是进行仔细提示的摘要生成。我尝试提示摘要模型给出帖子中关键内容的详尽要点列表,这样智能体就可以知道是否稍后检索完整上下文。 **9:06** So, I think it's kind of prompting if you're doing summarization carefully for recall, compressing it, but like making sure that all the key bullet points necessary for the for the LLM to know what's in that piece of kind of full context is actually very important when you're doing this kind of summarization step. **9:06** 所以,我认为如果你为了召回而仔细进行摘要,压缩它,但要确保语言模型知道那部分完整上下文中的所有关键要点,这在进行这种摘要步骤时实际上非常重要。 **9:26** Now, Cognition had a really nice blog post talking about this as well, and they mentioned you can really spend a lot of time on summarization, so I don't want to trivialize it, but at least my experience has been it's worked quite effectively. Prompt a model carefully to capture exactly. So, in this post, they talk a lot about even using a fine-tuned model for performing kind of summarization. **9:26** Cognition有一篇非常好的博客文章也谈到了这一点,他们提到你确实可以在摘要上花费大量时间,所以我不想轻视它,但至少我的经验是它工作得相当有效。仔细提示模型以准确捕获。在这篇文章中,他们甚至谈到了使用微调模型来进行摘要。 **9:43** In this case, they're talking about um agent agent boundaries and summarizing, for example, message history. But the same applies the same challenges apply to summarizing for example the full contents of tokenheavy tool calls so the model knows what's in context. So I basically spent a lot of time prompt engineering to make sure my summaries capture with high recall what's in the document but compress the content significantly. **9:43** 在这种情况下,他们谈论的是智能体边界和摘要,例如消息历史。但同样的挑战也适用于摘要,例如消耗大量token的工具调用的完整内容,以便模型知道上下文中有什么。所以我基本上花了很多时间进行提示工程,以确保我的摘要能够高召回率地捕获文档中的内容,但大幅压缩内容。 ### 2. 上下文隔离(多智能体)/ Context Isolation (Multi-Agent) **10:29** I I I do think that the compression that that was also part of the meetup findings of yesterday where we were we were at the context engineering meet up that Chroma hosted uh that you do want frequent compression because you don't want to hit the context raw limit. Um yeah, I don't I'm not sure there's much else to say like offloading is important and you should probably do it. the um there was also a really interesting link I guess somebody I think Dex um was linking it to the concept of multi- aents and why you why you do want multi- aents is because you can compress and load in different things based on the role of the agent and probably a single agent would not have all the context **10:29** 我确实认为压缩也是昨天在Chroma主办的上下文工程聚会上的发现之一,你确实需要频繁压缩,因为你不想触及上下文原始限制。是的,我不确定还有什么要说的,卸载很重要,你应该这样做。还有一个非常有趣的联系,我想是Dex将其与多智能体的概念联系起来,为什么你想要多智能体,是因为你可以根据智能体的角色压缩和加载不同的东西,单个智能体可能无法拥有所有上下文。 **10:53** yeah you know so that's exactly right and actually one of the other big themes I hit and and talk about quite a bit is context isolation with multi- aent and I do think this does link back to the cognition take. So which is interesting. So their argument against multi-agent is that it can be hard multi- aent. Correct. And what they're arguing is a few different things. **10:53** 是的,你知道这完全正确,实际上我强调和谈论的另一个重要主题是多智能体的上下文隔离,我确实认为这与Cognition的观点有关。这很有趣。他们反对多智能体的论点是多智能体可能很难。正确。他们论证的是几个不同的东西。 **11:11** One of the main things is that it is difficult to communicate sufficient context to sub aents. They talk a lot about um spending time on that summarization or compression step. They even use a fine-tuned model to ensure that all the relevant information. Yeah. So they actually show it a little bit down below as kind of a linear agent. But even at those agent agent boundaries, they talk a lot about being careful about how you compress information and pass it between agents. **11:11** 主要问题之一是很难向子智能体传达足够的上下文。他们大量谈论在摘要或压缩步骤上花费时间。他们甚至使用微调模型来确保所有相关信息。是的。所以他们在下面展示了一种线性智能体。但即使在这些智能体边界处,他们也大量谈论如何小心地压缩信息并在智能体之间传递。 **代码智能体中的挑战 / Challenges in Code Agents** **11:45** Yeah, I think the the biggest question for me I mean uh coding is kind of like the the main use case that I have. Um, and I think I still haven't figured out how much of value there is in showing how the implementation was made to then write. If you have a sub agent that writes tests or you have a sub agent that does different things, how much do you need to explain to it about how you got to the place the codebase is in versus not? **11:45** 是的,我认为对我来说最大的问题是——编码是我的主要用例。我认为我仍然没有弄清楚向编写测试的子智能体或执行不同任务的子智能体展示实现过程有多大价值,你需要向它解释多少关于代码库是如何到达当前状态的信息? **12:11** And then does it only need to return the test back in the context of the main agent? if it has to fix some code to match the task, should it say that to the main agent? I think that's kind of like it's clear to me like the deep research use case because it's kind of like atomic pieces of content that you're going through. But I think when you have state that depends between the sub agents, I think that's the thing is still unclear to me. **12:11** 然后它是否只需要在主智能体的上下文中返回测试?如果它必须修复一些代码以匹配任务,它应该告诉主智能体吗?我认为深度研究用例对我来说是清楚的,因为它就像你正在处理的原子内容片段。但我认为当你在子智能体之间有依赖状态时,这对我来说仍然不清楚。 **读 vs 写任务 / Read vs Write Tasks** **12:35** So I think that's one of the most important points about this context isolation um kind of bucket. So cognition argues which actually I think is a very reasonable argument. They argue don't do sub agents because each sub aent implicitly makes decisions and those decisions can conflict. So you have sub agent one doing a bunch of tasks, sub aent two doing a bunch of tasks. They those kind of decisions may be conflicting and then when you try to compile the full result in your example with coding there could be tricky conflicts. **12:35** 所以我认为这是关于上下文隔离这个类别的最重要的观点之一。Cognition提出的论点我认为实际上是非常合理的。他们主张不要使用子智能体,因为每个子智能体都会隐式地做出决策,这些决策可能会冲突。所以你有子智能体1执行一堆任务,子智能体2执行一堆任务。这些决策可能会冲突,然后当你尝试在编码示例中编译完整结果时,可能会出现棘手的冲突。 **13:05** I found this to be the case as well and I think a perspective I like on this is use multi-agent in cases where there's very clear and easy parallelization of tasks. Cognition in Walden Yen spoke on this quite a bit. He talks about this idea of kind of read versus write tasks. So for example, if each sub agent is writing some component of your final solution, that's much harder. **13:05** 我也发现了这种情况,我喜欢的一个观点是,在任务有非常清晰且容易并行化的情况下使用多智能体。Cognition的Walden Yan对此谈了很多。他谈到了读取任务与写入任务的概念。例如,如果每个子智能体都在编写最终解决方案的某个组件,那就困难得多。 **13:28** they have to communicate like you're saying and agent agent communication is still um quite early but with deep research it's really only reading they're just doing context collection and you can do a write from all that share context after all the sub aents work and I found this worked really well for deeper research and actually anthropic report on this too **13:28** 它们必须像你说的那样通信,而智能体之间的通信仍然处于早期阶段,但对于深度研究来说,它实际上只是读取,它们只是进行上下文收集,你可以在所有子智能体工作后从所有共享上下文中进行写入,我发现这对深度研究非常有效,实际上Anthropic也报告了这一点。 **13:47** so their deep researcher just uses parallelized sub aents for research coalation and they do the writing in one out at the end. So this works great. So it's a very nuanced point that what you apply context isolation to in terms of the problem. Yeah. So you can see this is their work matters significantly. **13:47** 所以他们的深度研究员只是使用并行化的子智能体进行研究收集,他们在最后一次性完成写入。所以这非常有效。这是一个非常微妙的观点,即你将上下文隔离应用于什么问题。是的。所以你可以看到这是他们的工作,非常重要。 **14:07** Coding may be much harder. In particular, if you're having each sub agent create one component of your system, there's many potentially implicitly conflicting decisions each of the sub aents are making. When you try to compile the full system, there may be lots of conflicts. with research you're just doing context gathering in each of those sub aent steps and you're writing in a single step. **14:07** 编码可能要困难得多。特别是,如果你让每个子智能体创建系统的一个组件,每个子智能体可能做出许多隐含冲突的决策。当你尝试编译完整系统时,可能会有很多冲突。而对于研究,你只是在每个子智能体步骤中进行上下文收集,然后在单个步骤中写入。 **14:32** So, I think this was kind of a key tension between the cognition take, don't do multi-agents, and the anthropic take. Hey, multi-agents work really well. It depends on the problem you're trying to do with multi-agents. So, this was a very subtle and interesting point. What you apply multi-agents to matters tremendously and how you use them. **14:32** 所以,我认为这是Cognition的观点(不要使用多智能体)和Anthropic的观点(多智能体效果很好)之间的关键矛盾。这取决于你试图用多智能体解决什么问题。这是一个非常微妙和有趣的观点。你将多智能体应用于什么以及如何使用它们至关重要。 **14:45** I like the take that apply multi- aents to problems that are easily paralyzable that are read only for example context gathering for deep research and do like the final quote unquote write in this case report writing at the end. I think this is trickier for coding agents. I did find it interesting that claude code now allows for sub aents. So they obviously have some belief that this can be done well or at least it can be done. **14:45** 我喜欢将多智能体应用于容易并行化的问题的观点,这些问题是只读的,例如深度研究的上下文收集,然后在最后进行所谓的"写入",在这种情况下是报告写作。我认为这对编码智能体来说更棘手。我确实发现Claude Code现在允许子智能体很有趣。所以他们显然相信这可以做得很好,或者至少可以做到。 **15:10** But I still think I actually kind of agree with Walden's take. it can be very tricky in the case of coding if sub agents are doing tasks that need to be highly coordinated. **15:10** 但我仍然认为我实际上有点同意Walden的观点。在编码的情况下,如果子智能体正在执行需要高度协调的任务,这可能非常棘手。 ### 3. 检索(RAG)/ Retrieval (RAG) **15:33** I think that's a a well u explained uh contrasting comparison. Um not much to add there. I think um it's interesting that they have different use cases and different architectures involved. Um I don't know if that's a permanent thing that that might fall to the bitter lesson as as uh you would put it. Yes, we should probably talk about uh some of the other parts of the system that uh you set up. So um is that there's a lot of interesting techniques there. **15:33** 我认为这是一个解释得很好的对比比较。没有太多要补充的。我认为有趣的是,它们有不同的用例和不同的架构。我不知道这是否是永久的,可能会遵循你所说的"苦涩教训"。是的,我们可能应该谈谈你设置的系统的其他一些部分。那里有很多有趣的技术。 **15:50** Well, let's talk about classic old retrieval. So rag is obviously uh it has been in the air for now many years obviously well before LLMs and this clamp whole wave. One thing I found pretty interesting is that for example different code agents take very different approaches to retrieval. **15:50** 好吧,让我们谈谈经典的老式检索。显然,RAG已经存在很多年了,显然早在大语言模型和这整个浪潮之前。我发现非常有趣的一件事是,例如,不同的代码智能体对检索采取了非常不同的方法。 **索引 vs 智能体检索 / Indexing vs Agentic Retrieval** **16:06** Verun from windsurf shared an interesting perspective on how they approach retrieval in the context of windsurf. So they use classic code chunking along carefully designed semantic boundaries embedding those chunks. So classic kind of semantic similarity vector search and retrieval but they also combine that with for example GP. Then they do for example knowledge knowledge graphs. They they then also mention knowledge graphs. They then talk about combining those results doing reanking. So this is kind of your classic complicated multi-step rag pipeline. **16:06** Windsurf的Verun分享了他们在Windsurf上下文中如何处理检索的有趣观点。他们使用经典的代码分块,沿着精心设计的语义边界嵌入这些块。这是经典的语义相似性向量搜索和检索,但他们还将其与例如grep结合。然后他们还使用知识图谱。他们还提到知识图谱。然后他们谈论结合这些结果进行重新排序。这是你的经典复杂的多步骤RAG流程。 **16:47** Now what's interesting is Boris from Enthropic and Cloud Code has taken a very different approach. He's spoken about this quite a bit. Cloud code doesn't do any indexing. It's just doing quoteunquote agentic retrieval. just using simple tool calls, for example, using GP to kind of poke around your files, no indexing whatsoever, and it obviously works extremely well. **16:47** 现在有趣的是,来自Anthropic和Claude Code的Boris采取了非常不同的方法。他对此谈了很多。Claude Code不做任何索引。它只是进行所谓的智能体检索。只是使用简单的工具调用,例如使用grep在你的文件中查找,完全不做索引,而且显然效果非常好。 **17:06** So, there's very different approaches to kind of rag and retrieval that different code agents are taking. And this seems to be kind of an interesting and emerging theme like when do you actually need kind of more hardcore indexing? when can you just get away with simple just kind of a gentic search using very basic file tools? **17:06** 所以,不同的代码智能体对RAG和检索采取了非常不同的方法。这似乎是一个有趣且正在出现的主题,比如你什么时候真正需要更硬核的索引?什么时候你可以只使用非常基本的文件工具进行简单的智能体搜索? **llms.txt的方法 / The llms.txt Approach** **17:41** Yeah, one of the more uh viral moments from one of our recent podcasts was Boris's partnered with us and Klein also mentioning that uh they just don't do code indexing, they just use agentic search. Um and yeah, probably like you know that's a that's a really good 8020. And then if you really want to fine-tune it, probably you want to do a little mix, but maybe you don't have to for for your needs. **17:41** 是的,我们最近一个播客中比较火的时刻之一是Boris与我们合作,Klein也提到他们不做代码索引,他们只使用智能体搜索。是的,可能这是一个非常好的80/20法则。如果你真的想微调它,可能你想做一点混合,但也许你不需要为了你的需求这样做。 **17:58** Yeah, I actually just saw a client posted uh I think yesterday talking about that they only use script, they don't do indexing. And so I think within the retrieval kind of area of context engineering, um there are some interesting trade-offs you can make with respect to are you doing kind of classic vector storebased semantic search and retrieval with a relatively complicated pipeline like Verun's talking about with Windsurf or just good old kind of agentic search with basic file tools. **17:58** 是的,我实际上刚看到Cline昨天发布的内容,谈到他们只使用grep,不做索引。所以我认为在上下文工程的检索领域,你可以做出一些有趣的权衡,即你是在使用像Verun谈论的Windsurf那样相对复杂的经典向量存储语义搜索和检索流程,还是只使用带有基本文件工具的老式智能体搜索。 **18:18** I will note I actually did a benchmark on this myself. I think there's a shared blog post somewhere. I'll bring it up right now. Yep. I actually looked at this a bit myself. Um, this was a while ago, but I compared three different ways to do retrieval on all Langraph documentation for a set of 20 coding questions related to Langraph. **18:18** 我要指出,我实际上自己做了一个基准测试。我想有一个共享的博客文章。我现在把它调出来。是的。我自己实际上研究了一下这个。这是一段时间以前的事了,但我比较了三种不同的方法来检索所有Langraph文档,针对一组20个与Langraph相关的编码问题。 **18:50** So, I basically wanted to allow different code agents to write Langraph for me by retrieving from our docs. I tested cloud code and cursor. I used three different approaches for grabbing documentation. So one was I took all of our docs around 3 million tokens. I indexed them in a vector store and just did classical vector store search and retrieval. **18:50** 所以,我基本上想让不同的代码智能体通过从我们的文档中检索来为我编写Langraph。我测试了Claude Code和Cursor。我使用了三种不同的方法来获取文档。一种是我获取了我们所有的文档,大约300万个token。我将它们索引到向量存储中,然后进行经典的向量存储搜索和检索。 **19:07** I also used an LM.ext with just a simple file loader tool. So that's kind of more like the agentic search. Just basically look at this element.ext text file which has all of the URLs of our documents with some basic description and let the LM or the code agent in this case just make tool calls to fetch specific docs of interest **19:07** 我还使用了llms.txt,只有一个简单的文件加载工具。这更像是智能体搜索。基本上就是查看这个llms.txt文本文件,其中包含我们所有文档的URL和一些基本描述,然后让语言模型或代码智能体在这种情况下进行工具调用来获取感兴趣的特定文档。 **19:25** and I also just tried context stuffing to take all the docs 3 million tokens and just feed them all to the code agent. So these are just some results I found comparing cloud code to cursor and interesting what I actually found this this is only my particular test case but I actually found that lm.ext text with good descriptions, which is just very simple. It's just basically a markdown file with all the URLs of your documentation and like a description of what's in that doc. Just that passed to the code agent with a simple tool just to grab files is extremely effective. **19:25** 我还尝试了上下文填充,将所有文档300万个token全部输入代码智能体。这些只是我比较Claude Code和Cursor时发现的一些结果,有趣的是我实际上发现——这只是我的特定测试案例——我发现llms.txt配合良好的描述非常有效,这非常简单。它基本上只是一个markdown文件,包含所有文档的URL和文档内容的描述。只需将其传递给代码智能体,再加上一个简单的抓取文件的工具,就非常有效。 **20:07** And what happens is the code agent can just say, "Okay, here's the question. I need to grab this doc and read it." It'll read it. I need to grab this doc, read it, read it. This worked really well for me and I actually use this all the time. So I actually personally don't do vector store indexing. I actually do lm.ext with a simple search tool with cloud code is kind of my go-to. **20:07** 发生的情况是代码智能体可以说:"好的,这是问题。我需要抓取这个文档并阅读它。"它会阅读它。我需要抓取这个文档,阅读它,阅读它。这对我来说效果非常好,我实际上一直在使用它。所以我个人实际上不做向量存储索引。我实际上使用llms.txt配合简单的搜索工具和Claude Code是我的首选。 **20:25** Cloud code in this case. This was done a few months ago. These things are always changing. In this particular point in time, cla code actually outperformed cursor for my test case. That's this actually cloud code pled me. And this was I did this back in April. So I've been kind of on cloud code since. Um but that was really it. So this kind of goes to the point that Boris has been making about claude code about inclined as well. **20:25** 在这种情况下是Claude Code。这是几个月前做的。这些事情总是在变化。在那个特定时间点,Claude Code在我的测试案例中实际上表现优于Cursor。这实际上是Claude Code说服了我。这是我在4月份做的。所以从那时起我就一直在使用Claude Code。但就是这样。所以这证明了Boris关于Claude Code和Cline的观点。 **20:49** You give an LM access to simple file file tools. In this case I actually use an LM.ext to help it out so it can actually know what's in each file is extremely effective and much more simple and easy to maintain easier to maintain than building an index. So that's just my own experience as well. **20:49** 你给语言模型访问简单的文件工具。在这种情况下,我实际上使用llms.txt来帮助它,这样它就可以知道每个文件中有什么,这非常有效,而且比构建索引简单得多,也更容易维护。这也是我自己的经验。 **Deep Wiki与LLM生成的描述 / Deep Wiki & LLM-Generated Descriptions** **21:06** The scaled up form of LMS.ext which I really like and I use uh quite a bit is uh actually the the deep wiki from cognition. So, I made a little Chrome extension for myself where I like any repo, including yours, uh, I can just hit e wiki and this is an llm.txt kind of, but also I I read it. Um, it's just a better wiki. **21:06** 我真正喜欢并经常使用的llms.txt的扩展形式实际上是Cognition的deep wiki。所以,我为自己做了一个小的Chrome扩展,我可以对任何仓库(包括你的)点击e wiki,这类似于llms.txt,但我也会阅读它。它只是一个更好的wiki。 **21:31** No, no. So, this this is a great example. So, and I actually I think that this could be a very nice approach. Take a repo, compile it down to some kind of easily kind of readable Yeah. lm.ext. What I actually found was even using an LM to write the descriptions helped a lot. So I have actually a little package on my GitHub where it can rip through documentation and just pass it to a cheap LLM to write a high quality summary of each doc. This works extremely well. **21:31** 不不不。这是一个很好的例子。我实际上认为这可能是一个非常好的方法。获取一个仓库,将其编译成某种易于阅读的llms.txt。我实际上发现,甚至使用语言模型来编写描述也很有帮助。所以我在GitHub上有一个小包,它可以浏览文档,然后将其传递给一个便宜的语言模型来编写每个文档的高质量摘要。这非常有效。 **21:55** And so that lm.ext then has LLM generated. Yeah. Let's see where it is. It's uh you go to my repos. It's one of my newer ones. No, that's old. That's much older. I have a million things here. Go to my repos. Uh it's go back up to the top. It's I've too much here. Try repositories. You do too much, man. Yeah, too much open. It won't be top. It'll be uh Yeah, this one **21:55** 所以那个llms.txt然后有LLM生成的内容。是的。让我看看它在哪里。你去我的仓库。这是我较新的一个。不,那是旧的。那个更旧。我这里有一百万个东西。去我的仓库。回到顶部。我这里有太多东西了。试试repositories。你做得太多了,伙计。是的,打开太多了。不会在顶部。会在...是的,这个。 **22:22** this is a little repo. It got almost no attention and but I found it to be very useful. So, basically, you can it's it's trivial. You just point it to some documentation. It can kind of rip through it, grab all the pages, send each one to an LLM, and LM writes a nice description, compiles it into an LM.ext file. **22:22** 这是一个小仓库。几乎没有引起关注,但我发现它非常有用。所以,基本上,你可以——这很简单。你只需将它指向某些文档。它可以浏览它,抓取所有页面,将每个页面发送给语言模型,语言模型编写一个好的描述,将其编译成llms.txt文件。 **22:35** I found when I did this and I then fed that to Claude Code, Cloud Code's extremely good at saying, "Okay, based on the description, here's the page I should load. Here's the page I should load for the question asked." Dead simple. I use this all the time. Um, well, I use it when I'm trying to generate element.ext for new documentation, but I've done this for Langraph. I've done it for a few other kind of libraries that I use frequently. **22:35** 我发现当我这样做并将其输入Claude Code时,Claude Code非常擅长说:"好的,根据描述,这是我应该加载的页面。这是我应该为所提问题加载的页面。"非常简单。我一直在使用这个。嗯,当我试图为新文档生成llms.txt时我会使用它,但我为Langraph做过这个。我为我经常使用的其他几个库做过这个。 **23:00** You just give that to Cloud Code. Then cloud code can rip through and grab docs really effectively. super simple. And the only catch is I found that the descriptions in your element of text matter a lot because LM actually has to use the descriptions to know what to read, you know. **23:00** 你只需将其提供给Claude Code。然后Claude Code可以非常有效地浏览和抓取文档。超级简单。唯一的问题是我发现你的llms.txt中的描述非常重要,因为语言模型实际上必须使用描述来知道要阅读什么。 **MCP服务器 / MCP Servers** **23:15** Anyway, that's just uh a nice little utility that I use all the time. When we had a client that said the context 7 MCP by Upstach, which is like an MCP for like um project documentation and stuff like that, was one of the most used. Have you seen Have you tried it? Have you seen anything else like that that kind of like automates some of the stuff away? **23:15** 无论如何,这只是我一直使用的一个小工具。当我们有一个客户说Upstash的Context 7 MCP是最常用的之一,它是一个用于项目文档之类的MCP。你见过吗?你试过吗?你见过其他类似的东西,可以自动化一些工作吗? **23:39** Well, you know, it's funny. We have an MCP server for Minecraft documentation that basically gives, for example, CLA code the an LM.ext file and a simple search file search tool. Now, Claude has built-in kind of uh fetch tools, but at the time we built it, it didn't. But it's a very simple MCP server that exposes LM.ext files to, for example, cloud code. It's called MCP doc. Uh, so it's a little very simple utility. I use that all the time. Extremely useful. So you basically can just point it to all the lm.ext files you want to work with. **23:39** 嗯,你知道,这很有趣。我们有一个用于MinCraft文档的MCP服务器,它基本上为例如Claude Code提供llms.txt文件和一个简单的文件搜索工具。现在,Claude有内置的获取工具,但在我们构建它时,它没有。但这是一个非常简单的MCP服务器,它向例如Claude Code公开llms.txt文件。它叫MCP doc。所以这是一个非常简单的小工具。我一直在使用它。非常有用。所以你基本上可以将它指向你想要使用的所有llms.txt文件。 **24:09** Yeah, this one. Yeah. Well, the MCP docs have a MCP server that you can search the docs with. So it's kind of rolls all the way down. But I guess my question is like should this be like one server per project you know or like at some point you're going to have kind of like a meta server which is like the and I think part of it is um you know once you move on from just doing tool calling in servers to doing things like sampling and kind of like uh you know prompts and resources and stuff like that you can do a lot of the extraction in the server itself as well. **24:09** 是的,这个。是的。MCP文档有一个MCP服务器,你可以用它搜索文档。所以它一直向下延伸。但我想我的问题是,这应该是每个项目一个服务器吗?还是在某个时候你会有一种元服务器?我认为部分原因是,一旦你从仅在服务器中进行工具调用转向做采样、提示和资源之类的事情,你也可以在服务器本身中进行大量提取。 **24:47** And again it goes back to like your point on context engineering. It's like maybe you do all that work not in the context but in the server and then you just put the final piece that you care about in the context. Uh but it seems like very early. **24:47** 这又回到了你关于上下文工程的观点。也许你在服务器中而不是在上下文中完成所有这些工作,然后你只是将你关心的最终部分放入上下文。但这似乎还很早期。 **24:59** Yeah, this is actually a very interesting point. I've spoken with folks from Enthropic about this quite a bit. It is I found that storing prompts in MCP servers is actually pretty important in particular to tell the LM or code agent how to use the server. And so I actually end up do having kind of separate servers for different projects with specific prompts. Um and also sometimes I'll have you can also sort of have resources. So some have specific resources for that particular project in the server itself. **24:59** 是的,这实际上是一个非常有趣的观点。我与Anthropic的人员就此进行了很多讨论。我发现在MCP服务器中存储提示实际上非常重要,特别是告诉语言模型或代码智能体如何使用服务器。所以我实际上最终为不同的项目创建了单独的服务器,带有特定的提示。有时我还会有资源。所以有些服务器本身为特定项目提供特定资源。 **25:24** Um so I actually don't mind separating servers project-wise um with project specific kind of context and prompts necessary for that particular task. Yeah, a lot of people actually may have missed some features of the NCP spec and uh you do have prompts in there. Uh it's probably one of the the first actual uh features that they that they have which uh actually maybe kind of underrated like people kind of view MCP as just you know uh in tool integration but uh there's actually a lot of stuff in here including uh sampling which um is is underrated too. **25:24** 所以我实际上不介意按项目分离服务器,带有该特定任务所需的项目特定的上下文和提示。是的,很多人实际上可能错过了MCP规范的一些功能,你确实在其中有提示。这可能是他们拥有的第一个实际功能之一,这可能有点被低估了,比如人们倾向于将MCP视为工具集成,但实际上这里有很多东西,包括采样,这也被低估了。 **26:01** Yeah, that's right. That's exactly right. And actually that the prompting thing is pretty important because even to use our little simple MCP doc server for Langraph docs you actually I found it's better of course if you prompt it but then I had to put in the readme initially like oh okay here's how you should prompt it but of course that prompt can just live in the server itself and uh so you can kind of compartmentalize the prompt necessary for the LM to use the server effectively within the server itself. **26:01** 是的,完全正确。实际上提示这件事很重要,因为即使要使用我们为Langraph文档创建的简单MCP doc服务器,我发现如果你对它进行提示当然会更好,但然后我不得不在readme中最初写上"好的,这是你应该如何提示它",但当然那个提示可以就在服务器本身中,所以你可以将语言模型有效使用服务器所需的提示封装在服务器本身中。 **26:31** And this was a problem I saw initially. A lot of people were using our MTP doc server and then finding, oh, this doesn't work well. And it's like, oh, it's a skill issue. You need to prompt it better. But then that's that's our problem. This the prompt should actually live in the server and should be available to the code agent, right? Uh so it knows how to use the server, right? **26:31** 这是我最初看到的一个问题。很多人使用我们的MCP doc服务器,然后发现,哦,这效果不好。就像,哦,这是技能问题。你需要更好地提示它。但那是我们的问题。这个提示实际上应该在服务器中,并且应该对代码智能体可用,对吧?所以它知道如何使用服务器,对吧? ### 4. 减少上下文 / Reducing Context **26:48** So, so that's maybe retrieval and that's a whole retrieval is a big theme. Uh and it's you know it it obviously predates this new term of context engineering, but there's a lot going on in the retrieval bucket. certainly is an important subset of context engineering. I'm wondering if there's any other trends in retrieval. Before we leave the topic, um you know, I think one other thing I was tracking was just coar and like the the general concept of late interaction. I don't know if you guys do a do a ton on that, but um some sort of in between element between full agentic and full pre-indexing and sort of um two-phase indexing maybe is is what I would call it. Any comments on that? **26:48** 所以,也许检索就是这样,检索是一个大主题。你知道,它显然早于上下文工程这个新术语,但在检索这个桶里有很多事情发生。当然这是上下文工程的一个重要子集。我想知道检索中是否还有其他趋势。在我们离开这个话题之前,你知道,我认为我一直在追踪的另一件事是ColBERT和晚期交互的一般概念。我不知道你们是否在这方面做了很多工作,但某种介于完全智能体和完全预索引之间的元素,某种两阶段索引,也许这就是我所说的。对此有何评论? **27:26** I haven't personally looked at Cold Bear very much. I played with it only a little bit, so I don't have much perspective there, unfortunately. **27:26** 我个人对ColBERT研究不多。我只是稍微尝试了一下,所以不幸的是我在这方面没有太多观点。 **27:32** All right, happy to move on. We could talk about maybe reducing context briefly. Everyone's had an experience with this because if you use cloud code, you hit that kind of 95, you know, you've hit 95% of the context window and you're about to and cloud code's about to perform compaction. **27:32** 好的,很高兴继续。我们可以简单谈谈减少上下文。每个人都有这样的经历,因为如果你使用Claude Code,你会达到那种95%的程度,你知道,你已经达到上下文窗口的95%,Claude Code即将执行压缩。 **27:46** So that's like a very intuitive and and kind of obvious case in which you want to do some kind of context reduction when you're near the context window. I think an interesting take here though is there's a lot of other opportunities for using summarization. We talked about it a little bit previously um with offloading but actually at tool call boundaries is a pretty reasonable place to do some kind of compaction or pruning. **27:46** 所以这是一个非常直观且明显的案例,当你接近上下文窗口时,你想要进行某种上下文减少。但我认为这里有趣的观点是,还有很多其他使用摘要的机会。我们之前在谈论卸载时稍微谈到了这一点,但实际上在工具调用边界处进行某种压缩或修剪是相当合理的。 **28:03** I use that in open deep research hugging face actually has a very interesting open deep research implementation. It actually uses like not a coding agent but the code agent agent implementation. So instead of tool calls as JSON, tool calls are actually code blocks. They go to a coding environment that actually runs the code. And one argument they make there is that they perform some kind of summarization or compaction and only send back limited context to the LLM, leave the raw kind of tool call itself, which is often token heavy as we're talking about deep research in the environment. **28:03** 我在开放深度研究中使用了这一点,Hugging Face实际上有一个非常有趣的开放深度研究实现。它实际上使用的不是编码智能体,而是代码智能体的实现。所以工具调用不是JSON,而是代码块。它们进入一个实际运行代码的编码环境。他们提出的一个论点是,他们执行某种摘要或压缩,只将有限的上下文发送回语言模型,将原始工具调用本身(通常消耗大量token,因为我们在环境中谈论深度研究)留在环境中。 **28:47** So it's another example anthropic and their multi- aent researcher also does kind of summarization of findings. Um so I think you see pruning show up all over the place. It's pretty intuitive. I think an interesting counter to pruning was made by Manis. They make the point and the kind of warning that pruning comes with risk particularly if it's irreversible. And cognition kind of hits this too. **28:47** 所以这是另一个例子,Anthropic和他们的多智能体研究员也进行发现的摘要。所以我认为你会看到修剪到处出现。这很直观。我认为Manis提出了一个有趣的反对修剪的观点。他们指出并警告说,修剪伴随着风险,特别是如果它是不可逆的。Cognition也涉及到这一点。 **29:12** They talk about we have to be very careful with summarization. You can even fine-tune models to do it effectively. That's actually why Manis kind of has the the perspective that you should definitely use context offloading. So perform tool calls offload the observations to for example disk so you have them. then sure do some kind of pruning summarization like Allesia was asking before to pass back to the LM useful information but you still have that raw context available to you so you don't have kind of lossy uh compression or lossy summarization **29:12** 他们谈到我们必须非常小心地进行摘要。你甚至可以微调模型以有效地完成它。这实际上就是为什么Manis有这样的观点:你绝对应该使用上下文卸载。所以执行工具调用,将观察结果卸载到例如磁盘,这样你就有了它们。然后当然做某种修剪摘要,就像Alessio之前问的那样,将有用的信息传回语言模型,但你仍然有那个原始上下文可用,所以你没有有损压缩或有损摘要。 **上下文中毒 / Context Poisoning** **29:45** so I think that's a an important and useful caveat to note on the point of summarization or pruning you have to be careful about information loss uh this is something that people do disagree on and I'll just flag this uh on pruning mistakes pruning wrong paths. Um, Mana says keep it in and so you can learn from the mistakes. Some other people would say that well once you've made a mistake it was going to keep going down that path that there was a mistake you got to you got to unwind or you just got to like prune it and tell it do not do the thing I know to be wrong. So So then you just do the other thing. **29:45** 所以我认为在摘要或修剪方面,这是一个重要且有用的注意事项,你必须小心信息丢失。这是人们确实存在分歧的问题,我只是标记一下关于修剪错误、修剪错误路径的问题。Manis说保留它,这样你就可以从错误中学习。其他一些人会说,一旦你犯了错误,它会继续沿着那条路径走下去,有一个错误,你必须回退,或者你只需要修剪它并告诉它不要做我知道是错误的事情。所以你就做另一件事。 **30:23** I don't know if you have an opinion but like I would call this out there. There was someone that that spoke yesterday that disagreed with this. So that's actually very interesting. So Drew Brunig has a really nice blog post on context failure modes. So he has a few. I'm just gonna Yes. This one context poisoning. This is interesting. **30:23** 我不知道你是否有意见,但我想指出这一点。昨天有人发言不同意这一点。这实际上非常有趣。Drew Breunig有一篇关于上下文失败模式的非常好的博客文章。他有几个。我只是...是的。这个上下文中毒。这很有趣。 **30:43** Drew Brunick has a nice blog post that hits this point. He talks about this theme of context poisoning and apparently Gemini reports on this in their technical report. So he talked about um for example a model can perform a hallucination and that hallucination then is stuck in the history of the agent and it can kind of poison the context so to speak and and kind of steer the agent off track **30:43** Drew Breunig有一篇很好的博客文章涉及这一点。他谈到了上下文中毒这个主题,显然Gemini在他们的技术报告中报告了这一点。所以他谈到例如一个模型可以产生幻觉,然后那个幻觉就卡在智能体的历史中,它可以所谓的毒害上下文,并使智能体偏离轨道。 **31:04** and I think he he cited a very specific example from Gemini 25 playing Pokemon they mentioned in the tech report. So that's one perspective on this issue of we should be very careful about mistakes in context that can poison the context. That's perspective one. Perspective two is like you were saying is if an agent makes a mistake for example calling a tool you should leave that in so it knows how to correct. **31:04** 我认为他引用了Gemini 2.5玩Pokemon的一个非常具体的例子,他们在技术报告中提到了这一点。所以这是关于我们应该非常小心上下文中可能毒害上下文的错误这个问题的一个观点。这是观点一。观点二就像你说的,如果智能体犯了错误,例如调用工具,你应该保留它,这样它就知道如何纠正。 **31:29** So I think there is an interesting tension there. I will note it does seem that clawed code will leave failures in. I notice when I work with it for example it'll it'll kind of have an error the error will get printed and it'll it'll kind of use that to correct. So and in my experiences when working with agents in particular for tool call errors I actually like to keep them in personally. That's just been my experience. **31:29** 所以我认为这里存在一个有趣的矛盾。我注意到Claude Code似乎会保留失败。我注意到当我使用它时,例如它会有一个错误,错误会被打印出来,它会用它来纠正。在我与智能体工作的经验中,特别是对于工具调用错误,我个人实际上喜欢保留它们。这只是我的经验。 **31:46** uh I don't try to prune them. Also, for what it's worth, it can be kind of tricky to prune from the contact from the from the message history. Um you have to decide when to do it. So, if you're introducing a bunch more code you have to manage. Um so, I'm not sure I love the idea of kind of selectively trying to prune your message history when you're building an agent. It can add more logic you need to manage uh within your kind of agent scaffolding or harness. **31:46** 我不会尝试修剪它们。另外,值得一提的是,从消息历史中修剪可能有点棘手。你必须决定什么时候做。所以,如果你引入了一堆你必须管理的更多代码。所以,我不确定我是否喜欢在构建智能体时选择性地尝试修剪消息历史的想法。它可以在你的智能体脚手架或框架中添加更多你需要管理的逻辑。 **32:14** That's a classic sort of precision recall, but like sort of reinvented for uh context in a in an agentic workflow. Exactly. Exactly. Right. **32:14** 这是经典的精确率-召回率问题,但有点像在智能体工作流的上下文中重新发明。完全正确。完全正确。 ### 5. 缓存 / Caching **32:26** While we're on the topic of Drew, Drew, Drew is obviously another really good author. Uh he's created he's coined a bunch of like sort of context engineering lore. Um any other uh commentary on on stuff that you know you particularly like or disagree with? **32:26** 既然我们在谈论Drew,Drew显然是另一位非常好的作者。他创造了一堆上下文工程的知识。你知道的你特别喜欢或不同意的其他评论吗? **32:39** I'll show you something kind of funny. if you go to his post. Um, so he and I did did a meet up on this and I I kind of like this quote from Stuart Brand. It was kind of comical. If you want to know where the future is being made, look for where language is being invented and lawyers are congregating. **32:39** 我给你看一些有趣的东西。如果你去他的帖子。嗯,我们就此做了一次聚会,我有点喜欢Stuart Brand的这句话。这有点滑稽。如果你想知道未来在哪里被创造,寻找语言被发明和律师聚集的地方。 **32:56** So, and it was talking about this this idea of kind of why buzzwords emerge. And he actually was the one who turned me on to this idea that a term like conduct engineering kind of catches fire because it captures an experience that many people are having. They don't come out of nowhere. And if you scroll down a little bit, he kind of talks about this. He's a whole post about kind of I think it's how to build a buzzword. **32:56** 所以,它谈论的是为什么流行词会出现的想法。他实际上是让我接受这个想法的人:像上下文工程这样的术语之所以火起来,是因为它捕捉到了许多人正在经历的经验。它们不是凭空而来的。如果你向下滚动一点,他谈到了这一点。他有一整篇文章,我认为是关于如何构建流行词的。 **33:14** Um but he talks a lot about this idea of of kind of successful bud buzzwords are capturing a common experience that many of us feel and I think that's kind of the the genesis of context engineering is also largely because many of us build agents. Ooh, there's lots of ways that can be quite tricky and oh, contact engineering is kind of what I've been doing and you hear a number of people saying and then you it kind of resonates. You say, oh, okay, yes, that describes my experience. **33:14** 但他大量谈论这个想法,成功的流行词捕捉到了我们许多人感受到的共同经验,我认为这就是上下文工程的起源,也主要是因为我们许多人都在构建智能体。哦,有很多方法可能相当棘手,哦,上下文工程就是我一直在做的事情,你听到很多人说,然后你会产生共鸣。你说,哦,好的,是的,这描述了我的经验。 **33:44** So, I think that's just an interesting aside on on kind of how language emerges anthropologically kind of in in in different communities. Uh, well, I mean, I I will cosign this because that's exactly what I use to coin or come up with AI engineer. A engineer. No, exactly. just because uh people were trying to hire software engineers that were uh more up to speed with the AI and engineers wanted to work at companies that um would respect their work, you know, and uh maybe also come out from the baggage of classical ML engineering. **33:44** 所以,我认为这只是一个有趣的题外话,关于语言是如何在不同社区中人类学地出现的。嗯,我的意思是,我会支持这一点,因为这正是我用来创造或想出AI工程师的方法。AI工程师。不,确实如此。只是因为人们试图招聘更了解AI的软件工程师,工程师想在尊重他们工作的公司工作,你知道,也许还要摆脱经典机器学习工程的包袱。 **34:09** Uh a lot of AI engineers don't even need to use PyTorch because you can just prompt and do typical software engineering. Um and I think that's probably the right way. uh at least in a world where most models are most of the frontier models are coming from closed labs. **34:09** 很多AI工程师甚至不需要使用PyTorch,因为你可以只使用提示并进行典型的软件工程。我认为这可能是正确的方式。至少在一个大多数前沿模型来自封闭实验室的世界中。 **34:28** I think an interesting counter on this is uh when you for example people try to create language that doesn't really resonate that doesn't capture common experience it tends to flop. So which is to say that buzzwords kind of co-evolve with the ecosystem. They tend to kind of become big and and resonate because they actually capture experience. Many people try to coin terms that don't actually resonate that go nowhere. Allesia, do you have experience with that? **34:28** 我认为一个有趣的反例是,当例如人们试图创造不真正产生共鸣、不能捕捉共同经验的语言时,它往往会失败。也就是说,流行词与生态系统共同进化。它们往往会变得很大并产生共鸣,因为它们实际上捕捉到了经验。许多人试图创造实际上不产生共鸣、一无所获的术语。Alessio,你有这样的经历吗? **34:52** I'm the worst at naming things. Uh, but you do a great job, Sean. Yes, you nailed it. The the few ones you put on lat space. So, that's right. **34:52** 我最不擅长命名。但你做得很好,Sean。是的,你做得很好。你在latent space上发布的几个。所以,没错。 **缓存的具体实现 / Caching Implementation** **35:04** Cool. Uh, well, you know, I I wanted to talk about context engineering. Uh, okay. So, so, uh, sorry, I I don't know if I sidetracked you a little bit with No, that's perfect. the meta stuff on on that hit that hits a lot of the major themes. I can maybe just talk very brief about one more. We could talk about bitter lesson and some other things. Yeah. Um, if you go back to that table, I just wanted to give Manis a shout because I thought they had one other very interesting point. **35:04** 很好。嗯,你知道,我想谈谈上下文工程。好的。所以,抱歉,我不知道我是否让你稍微偏离了话题。不,这很完美。关于那个元主题涉及了很多主要主题。我也许可以简单地再谈一个。我们可以谈谈苦涩教训和其他一些事情。是的。如果你回到那个表格,我只是想给Manis一个致敬,因为我认为他们有另一个非常有趣的观点。 **35:27** Oh, the the table that you had. Yes, exactly. So, we've talked about offloading, reducing context, retrieval, context isolation. Those are I think the big ones you can see very commonly used. I do want to highlight Manis. So, I thought they had a very interesting take here about caching. And it's a good argument. **35:27** 哦,你的那个表格。是的,确实如此。所以,我们已经谈论了卸载、减少上下文、检索、上下文隔离。我认为这些是你可以看到非常常用的大类。我确实想强调Manis。所以,我认为他们在这里关于缓存有一个非常有趣的观点。这是一个很好的论点。 **35:48** When people have the experience of building an agent, the fact that it runs in a loop and that all those prior tool calls are passed back through every time is quite like a shock the first time you lo an agent. You have one token every tool call and you incur that token cost every pass through your agent. And so, Manis talks about the idea of just caching your prior message history. It's a good idea. I haven't done it personally, but seems quite reasonable. So, caching reduces both latency and cost significantly. **35:48** 当人们有构建智能体的经验时,它在循环中运行且所有那些先前的工具调用每次都会传回的事实在第一次构建智能体时是相当震惊的。你每次工具调用都有一个token,并且每次通过你的智能体都会产生该token成本。所以,Manis谈到了只缓存你的先前消息历史的想法。这是一个好主意。我个人没有这样做,但似乎相当合理。所以,缓存可以显著降低延迟和成本。 **36:20** Yeah. But don't most other APIs auto cache for you? I mean, if you're using like OpenAI, you would just automatically have a cash hit. I'm actually not sure that's the case. For example, when you're building a you're passing your message history back through every time. As far as I know, it's stateless. **36:20** 是的。但大多数其他API不是为你自动缓存吗?我的意思是,如果你使用OpenAI,你会自动获得缓存命中。我实际上不确定是这样的。例如,当你构建时,你每次都将消息历史传回。据我所知,它是无状态的。 **36:37** I think um so there's different APIs for this across the different providers. Um, but especially if you use just the responses API, the new one. Um, it should be that if you're just if you're never modifying the state, uh, which is good for people, good for you if you believe that you shouldn't compress conversation history. Bad for you if you do. Uh, if you never modify the state, then you can just use the SS API. Everything that you passed in prior is going to be cached, which is which is kind of nice. **36:37** 我认为不同提供商有不同的API。但特别是如果你只使用responses API,新的那个。应该是这样的,如果你从不修改状态,这对人们有好处,如果你认为不应该压缩对话历史,对你有好处。如果你这样做,对你不好。如果你从不修改状态,那么你可以只使用responses API。你之前传入的所有内容都将被缓存,这很好。 **37:02** Enthropic used to require weird header thing and they've made it more automatic. Yeah. Okay. Okay, so that's a good call out. So I had used Anthropics kind of caching header explicitly in the past, but it may be the case that caching is automatically done for you, which is which is fantastic if that's the case. I think it's a good call out for Manis. Yeah, Gemini also introduced implicit caching. Yeah, it's like it's it's really hard to keep up. Like you basically have to follow everyone on Twitter and just like read everything. **37:02** Anthropic过去需要奇怪的头部信息,他们已经使其更加自动化了。是的。好的。好的,这是一个很好的提醒。所以我过去明确使用过Anthropic的缓存头,但可能情况是缓存为你自动完成,如果是这样的话那太棒了。我认为这是对Manis的一个很好的提醒。是的,Gemini也引入了隐式缓存。是的,真的很难跟上。你基本上必须在Twitter上关注每个人,并阅读所有内容。 **37:25** Um, so that's my bullet bot for it. Yeah. Yeah. Yeah. Yeah. Yeah. Well, you know, you know, it's it's interesting though. So APIs are now supporting caching more more. That's fantastic. Um, I had used Anthropics explicit cacher caching header in the past. I do think an important and subtle point here is that caching doesn't solve the long context problem. **37:25** 嗯,这就是我的要点。是的。是的。是的。是的。是的。好吧,你知道,这很有趣。所以API现在越来越多地支持缓存。太棒了。我过去使用过Anthropic的显式缓存头。我确实认为这里一个重要且微妙的观点是,缓存并不能解决长上下文问题。 **37:44** So, it of course solves the problem of like latency and cost. But if you still have 100,000 tokens in context um whether it's cached or not, the LM is utilizing that that context. this came up. I actually asked Anton this in their context rot meetup um or in their context rot webinar um and and they kind of had mentioned that the characterization of context rot that they made they they think they would expect to apply whether or not you're using caching. **37:44** 所以,它当然解决了延迟和成本的问题。但是如果你在上下文中仍然有10万个token,无论它是否被缓存,语言模型都在利用那个上下文。这出现了。我实际上在他们的上下文衰退聚会或网络研讨会上问了Anton这个问题,他们提到他们对上下文衰退的描述,他们认为无论你是否使用缓存,都会适用。 **38:07** So caching shouldn't actually help you with all the context rot and long context problems. It absolutely helps you with latency and cost. Yeah, I I I do think I do wonder what else can be cached. Um I feel like there this is definitely a form of lock in um because you ideally want to be able to run prompts across multiple providers and uh and all that and uh yeah caching is a is a hard problem **38:07** 所以缓存实际上不应该帮助你解决所有上下文衰退和长上下文问题。它绝对帮助你解决延迟和成本问题。是的,我确实在想还有什么可以被缓存。我觉得这绝对是一种锁定形式,因为理想情况下你想要能够在多个提供商之间运行提示,所有这些,是的,缓存是一个难题。 **38:33** like I think ultimately like you control your destiny if you can run your own open models because then you also control the caching uh here every everything else is just a half approximation of that that's right that's exactly right **38:33** 我认为最终如果你可以运行自己的开源模型,你就控制了自己的命运,因为那样你也控制了缓存,这里其他一切只是对此的一半近似。没错,完全正确。 ## 记忆 vs 上下文工程 / Memory vs Context Engineering **38:57** that overall broad uh context engineering. Allesia, I don't know if you have any other takes from like the meetup yesterday or questions. No, I think my main take from yesterday was like um quality of compacting. Um I think there was like one of the charts was using the automated compacting of like a open code and some of these tools is basically the same as not doing it on like the quality of what you get from the previous instructions. **38:57** 总体广泛的上下文工程。Alessio,我不知道你是否还有其他来自昨天聚会的看法或问题。没有,我认为我昨天的主要看法是关于压缩的质量。我认为有一个图表显示,使用像开源代码和这些工具的自动压缩基本上与不这样做在从先前指令获得的质量上是一样的。 **39:15** And um I think Jeff at this charge is like curated compacting is like 2x better but I'm like how to you know it's like how do you do curated compacting? I think that's something that uh maybe we can do a future blog post on. I I think that's interesting to me like how do you compact especially coding agents things where like it can get very very long. **39:15** 我认为Jeff在Chroma的看法是,精心策划的压缩好2倍,但我想知道如何进行精心策划的压缩?我认为这是我们也许可以在未来的博客文章中讨论的内容。我认为这对我来说很有趣,比如你如何压缩,特别是编码智能体,那里的内容可能会变得非常非常长。 **39:34** I think for things like deep research is like look once I get the report it's fine you know but for coding it's like well I would like to keep building. I I found that like even when you're like writing tests or like you're doing changes having the previous history it's like helpful to the model it seems to perform better when it knows why it made certain decisions and I think how to extract that in a way that is like more token efficient and still unclear. **39:34** 我认为对于像深度研究这样的事情,一旦我得到报告就可以了,但对于编码来说,我想继续构建。我发现即使当你在编写测试或进行更改时,拥有先前的历史对模型有帮助,当它知道为什么做出某些决定时,它似乎表现得更好,我认为如何以更节省token的方式提取这些信息仍然不清楚。 **39:58** Um so I don't have I don't have an answer but maybe like a request for for work by people listening. Yeah, you know that that's a great point. It actually echoes some of Wald and Dan's points from cognition also that the summarization compaction step is just is non-trivial. You have to be very careful with it. Devon uses a fine-tuned model for doing summarization within the within the context of coding. **39:58** 所以我没有答案,但也许像是对听众的一个工作请求。是的,你知道这是一个很好的观点。它实际上呼应了Cognition的Walden和Dan的一些观点,摘要压缩步骤是非平凡的。你必须非常小心。Devin使用微调模型在编码上下文中进行摘要。 **40:20** Um so they obviously spend a lot of time and effort on that particular step. Uh and Manis kind of calls out that uh they are very careful about uh information loss whenever they do pruning, compaction, summarization. they always use a file system to offload things so they can retrieve it. So it's a good call out that compaction is risky when you're building agents and very tricky. **40:20** 所以他们显然在那个特定步骤上花费了大量时间和精力。Manis指出,每当他们进行修剪、压缩、摘要时,他们对信息丢失非常小心。他们总是使用文件系统来卸载东西,以便他们可以检索它。所以这是一个很好的提醒,当你构建智能体时,压缩是有风险的,非常棘手。 **40:44** You know, I think there were a lot of there's a lot of previously a lot of interest in memory and um I'm always I was thinking about the interest interplay between memory and uh context engineering. I mean are they kind of the same thing? Is it just a rebrand? uh are there parts of memory and you know you guys recently uh relaunched Langm that's also a form of context engineering but I I don't know if there's there's like a qualitatively or philosophical difference **40:44** 你知道,我认为之前有很多对记忆的兴趣,我一直在思考记忆和上下文工程之间有趣的相互作用。我的意思是它们是同一件事吗?只是换了个名字吗?记忆的某些部分,你知道你们最近重新推出了LangMem,这也是一种上下文工程形式,但我不知道是否存在质的或哲学上的差异。 **41:12** yeah so that's a good thing to hit actually I maybe think about this on two dimensions writing memories reading memories and then the degree of automation on both of those so take the simplest case which actually I quite like claude code how do they do it well for reading ing memories, they just suck in your cloud MDs every time. So every time you spin up cloud uh cloud code, it pulls in all your cloud MDs **41:12** 是的,这是一个很好的话题。我可能从两个维度来考虑这个问题:写入记忆和读取记忆,然后两者的自动化程度。采用最简单的情况,我实际上很喜欢Claude Code是如何做到的。对于读取记忆,他们每次都吸入你的Claude MDs。所以每次你启动Claude Code时,它都会拉入你所有的Claude MDs。 **41:36** for writing memories, the user specifies, hey, I want to save this to memory and then cloud code writes it to cloud MD. So on this axis of like degree of automation across read, write, it's kind of like the 00. It's very simple and it's kind of very like Boris Pilled like super simple and I I actually quite like it. **41:36** 对于写入记忆,用户指定,"嘿,我想将此保存到记忆中",然后Claude Code将其写入Claude MD。所以在读、写的自动化程度轴上,它有点像00。它非常简单,有点像Boris Pilled那样超级简单,我实际上非常喜欢它。 **41:54** Now the other extreme is maybe chatbt. So behind the scenes chatb decides when to write memories and it decides when to suck them in. And actually I thought Simon at a engineer had a great talk on this and he it wasn't about memory but he hit memory in the talk and he mentioned I don't know if you remember this but it was a failure mode in image generation because he wanted an image of a particular scene and it sucked in his location and put it in the image like it sucked in half like half moon bay or something and suck it in the image and it was a case of memory retrieval gone wrong he didn't actually want that **41:54** 现在另一个极端可能是ChatGPT。所以在幕后,ChatGPT决定何时写入记忆以及何时吸入它们。实际上我认为AI Engineer的Simon有一个很好的演讲,虽然不是关于记忆的,但他在演讲中触及了记忆,他提到——我不知道你是否记得这个——这是图像生成中的一个失败模式,因为他想要一个特定场景的图像,它吸入了他的位置并将其放入图像中,比如吸入了Half Moon Bay之类的东西并将其放入图像中,这是记忆检索出错的一个案例,他实际上并不想要那个。 **42:31** so even in a product like chap chatbt that spent a lot of time on memory memory uh it's non-trivial and I think my take is the well the writing of memories is tricky like when actually should the system write memories is is non-trivial reading of memories actually kind of converges with the contextary theme of retrieval like memory retrieval at large scale is just retrieval right I I kind of view them as it's retrieval in a certain context which is your past conversations which uh **42:31** 所以即使在像ChatGPT这样在记忆上花费了大量时间的产品中,这也是非平凡的,我认为我的看法是,记忆的写入很棘手,比如系统实际上应该何时写入记忆是非平凡的,记忆的读取实际上有点与检索的上下文主题趋同,比如大规模的记忆检索就是检索,对吧?我有点将它们视为在特定上下文中的检索,即你的过去对话。 **43:02** that's right you know it is different than retrieval from a knowledge base different than retrieval on the public web. Uh by the way, this is uh S's write up on his website uh on on here where he was just trying to generate images and then suddenly it shows up that that's exactly it. So that there you go. Uh actually it's a subtle point. **43:02** 没错,你知道它与从知识库检索不同,与在公共网络上检索不同。顺便说一下,这是Simon在他的网站上的文章,他只是试图生成图像,然后突然它出现了,就是这样。所以就是这样。实际上这是一个微妙的观点。 **43:18** I I don't know exactly know what open I does behind the hood with respect to memory retrieval. My guess is they're indexing your past conversations and using semantic vector search and probably other things. So it may still be using you know some kind of knowledge base uh or or vector store for retrieval. So in that sense I kind of view it just simply as um you know in the case of sophisticated memory retrieval it is just like a a you know complex rag system in the same way we talked about with like Verun and building wind surf it's kind of a multi-step rag pipeline. **43:18** 我不确切知道OpenAI在记忆检索方面在后台做什么。我猜他们正在索引你的过去对话,并使用语义向量搜索和可能其他东西。所以它可能仍然在使用某种知识库或向量存储进行检索。所以从这个意义上说,我只是简单地将其视为,你知道,在复杂记忆检索的情况下,它就像一个复杂的RAG系统,就像我们谈论Verun和构建Windsurf时一样,它是一种多步骤的RAG流程。 **43:51** Um so I I kind of view memories at least the reading part as just you know it's just retrievable. Um and actually I quite like clause approach is very simple just the retrieval is trivial just suck it in every time. Uh totally. Um I would also highlight um the sort of uh the semantic differences that you've you've established you know episodic semantic procedural uh and background memory processing. We've done an episode with the letter folks on sleeptime compute uh which you know I think these are just like if you if you have ambient agents very longunning agents you're going to run into this kind of context engineering which is previously the domain of memory and uh I would say that the classic context engineering discussion doesn't have this stuff not yet **43:51** 所以我有点将记忆,至少是读取部分,视为你知道的只是可检索的。实际上我很喜欢Claude的方法,非常简单,检索是微不足道的,每次都吸入它。完全正确。我还要强调你已经建立的语义差异,你知道的情节性、语义性、程序性和后台记忆处理。我们与Letta的人做过一集关于睡眠时间计算的节目,你知道我认为这些就像如果你有环境智能体,长时间运行的智能体,你会遇到这种上下文工程,这以前是记忆的领域,我会说经典的上下文工程讨论还没有这些内容。 **44:31** yeah so actually there there's an interesting point there uh I did a course on building ambient agents and I built I have this little email assistant that I use to run my email. I actually think this is bit of a sidebar in memory. Memory pairs really well with human in the loop. So, for example, in my little email assistant, it's just an agent that runs my email. I have the opportunity to pause it before it sends off an email and correct it if I want, like change the tone of this email or I can literally just modify the tool call to have a little UI for that. **44:31** 是的,所以实际上这里有一个有趣的观点。我做了一个关于构建环境智能体的课程,我构建了一个我用来运行电子邮件的小电子邮件助手。我实际上认为这是记忆中的一个题外话。记忆与人机协同配合得非常好。所以,例如,在我的小电子邮件助手中,它只是一个运行我的电子邮件的智能体。我有机会在它发送电子邮件之前暂停它并在我想要时纠正它,比如改变这封电子邮件的语气,或者我可以直接修改工具调用以拥有一个小UI。 **45:06** And every time when you have these ambient agents, you edit for example or you give it feedback, you edit the tool calls itself, that feedback can be sucked into memory. And that's exactly what I do. So I actually think memory pairs very nicely with human loop. And like when you're using human loop to make corrections to a system, that should be captured in memory. And so that's a very nice way to use memory in kind of a narrow way that's just capturing user preferences uh in a over time. **45:06** 每次当你有这些环境智能体时,你编辑或给它反馈,你编辑工具调用本身,那个反馈可以被吸入记忆。这正是我所做的。所以我实际上认为记忆与人机协同配合得很好。当你使用人机协同来纠正系统时,应该在记忆中捕获。所以这是一种非常好的以狭义方式使用记忆的方法,随着时间的推移只是捕获用户偏好。 **45:31** And actually use an LLM to actually reflect on the changes I made, reflect on the prior instructions in memory and just update the instructions based upon uh my edits. And that's a very simple and effective way to use memory when you're building ambient agents that I quite like. Uh there's there is a course which you can find on the GitHub. Um and yeah, I mean you know you guys have done plenty of talks on agents. That's right. **45:31** 实际上使用语言模型来反思我所做的更改,反思记忆中的先前指令,并根据我的编辑更新指令。当你构建环境智能体时,这是一种非常简单有效的使用记忆的方法,我很喜欢。有一个课程,你可以在GitHub上找到。是的,我的意思是你们已经做了很多关于智能体的演讲。没错。 **45:55** But I I think it I think it's a very good point that memory is often kind of confusing when to use it. I think a very clear place to use it is when you're building agents that have human loop because human loop is a great place to update your agent memory with your preferences. So it kind of gets smart over time and learn streaming. It's exactly what I do with my little email assistant. So Harrison, I'm sure I think he said this publicly, uses an email assistant for all his emails. Uh and he gets a lot as a CEO. I I get much fewer because I'm just a lowly guy. Uh but I still use it. Um, and uh, that's a very nice play way to use memory is kind of pair it with human in the loop. **45:55** 但我认为这是一个非常好的观点,记忆经常让人困惑何时使用它。我认为使用它的一个非常明确的地方是当你构建具有人机协同的智能体时,因为人机协同是用你的偏好更新智能体记忆的好地方。所以它随着时间的推移变得更聪明,学习流。这正是我用我的小电子邮件助手做的。所以Harrison,我确信我认为他公开说过这一点,为他所有的电子邮件使用电子邮件助手。作为首席执行官,他收到很多邮件。我收到的要少得多,因为我只是一个普通人。但我仍然使用它。使用记忆的一个非常好的方式是将其与人机协同配对。 **46:37** Yeah, totally. Um, I've I've tried to use the the email system before, but like uh, you know, I'm still still very married to my superhuman. Yeah, fair enough. That's right. That's right. **46:37** 是的,完全正确。我以前尝试过使用电子邮件系统,但你知道,我仍然非常依赖我的Superhuman。是的,公平。没错。没错。 ## 苦涩教训 / The Bitter Lesson **46:48** Okay, so cool. Um, I think that, you know, that was that was about the coverage that we planned on context. That's great. You have a little bit on a bit of lesson that we could uh, wrap up with. Yeah, that's a fun theme to hit on a little bit. I'd love to hear your perspective. So there's a great talk from Hyong Wong Chung Y previously open AI now have MSL Y on the bitter lesson in his approach to AI research. **46:48** 好的,很酷。我认为,你知道,这大约是我们计划的关于上下文的内容。太好了。你有一些关于苦涩教训的内容,我们可以用来结束。是的,这是一个有趣的主题。我很想听听你的看法。所以有一个来自Hyeon-Woo Chung(以前在OpenAI,现在在Mistral)的精彩演讲,关于他在AI研究方法中的苦涩教训。 **47:12** So the take is compute 10xes every 5 years for the same cost. Of course we all know that the kind of history of machine learning has shown yeah exactly this slide exactly history machine learning has shown that actually capturing this scaling is the most important thing in particular algorithms that are more general with fewer inductive biases and more data and compute tend to beat algorithms with more for example handtuned features inductive biases built in **47:12** 所以观点是,相同成本的计算能力每5年增加10倍。当然我们都知道机器学习的历史已经表明,是的,正是这张幻灯片,机器学习的历史已经表明,实际上捕捉这种扩展是最重要的事情,特别是具有较少归纳偏差、更多数据和计算的更通用的算法往往会击败具有更多手工调整特征和内置归纳偏差的算法。 **47:45** which is to say just letting a machine learn how to think itself with more compute and data rather than trying to teach a machine how we think tends to be better. So that's kind of the bitter lesson piece simply stated. So his argument is this subtle point that at any point in time when you're for example doing research you typically need to add some amount of structure to get the performance you want at a given level of compute. But over time that structure can bottleneck your further progress. **47:45** 也就是说,只是让机器通过更多的计算和数据学习如何自己思考,而不是试图教机器我们如何思考,往往会更好。所以这就是简单陈述的苦涩教训部分。所以他的论点是这个微妙的观点,即在任何时候,当你例如进行研究时,你通常需要添加一些结构来获得你想要的性能,在给定的计算水平上。但随着时间的推移,这种结构可能会瓶颈你的进一步进展。 **48:12** And that's kind of what he's showing here is that in the kind of low compute regime kind of on the left of that x-axis adding more structure for example more modeling assumptions more inductive biases is better than less. But as compute grows less structure and this is exactly the bitter lesson point less structure more general tends to win out. So his argument was we should add structure at a given point in time in order to get something to work with the level compute that we have today but remember to move it later. **48:12** 这就是他在这里展示的,在低计算体制中,在x轴的左侧,添加更多结构,例如更多建模假设、更多归纳偏差,比更少的结构更好。但随着计算的增长,更少的结构——这正是苦涩教训的观点——更少的结构、更通用的往往会获胜。所以他的论点是,我们应该在给定时间点添加结构,以便使用我们今天拥有的计算水平使某些东西工作,但要记住稍后移除它。 **48:43** And a lot of his argument was like people often forget to remove that structure later. And I think my link here is that I think this applies to AI engineering too. And if you kind of scroll down I have the same chart showing my little exactly this is this is my little example of building deep research over the course of a year. So I started with a highly structured research workflow. Didn't use tool calling. I embedded a bunch of assumptions about how research should be conducted. **48:43** 他的很多论点是,人们经常忘记稍后移除那个结构。我认为我在这里的联系是,我认为这也适用于AI工程。如果你向下滚动,我有同样的图表显示我的小例子,这正是我在一年内构建深度研究的例子。所以我从一个高度结构化的研究工作流开始。没有使用工具调用。我嵌入了一堆关于应该如何进行研究的假设。 **49:10** In particular, don't use tool calling because everyone knows tool calling is not reliable. This was back in 2024. Decompose the problem into a set of sections and parallelize each one those sections written in parallel into the final report. What I found is you're building LM applications on top of models that are improving exponentially. So while the workflow was more reliable than building an agent back in 2024, that flipped pretty quickly as LMS got better and better and better. **49:10** 特别是,不要使用工具调用,因为每个人都知道工具调用不可靠。这是在2024年。将问题分解为一组部分,并将每个部分并行化写入最终报告。我发现你正在基于指数级改进的模型构建语言模型应用程序。所以虽然工作流比2024年构建智能体更可靠,但随着语言模型变得越来越好,这很快就翻转了。 **49:40** And so it's exactly like was mentioned in the Stanford talk. You have to be constantly reassessing your assumptions when you're building AI applications given the capabilities of the models. And I talk a lot about here the structure, the specific structure I added, the fact that I used the workflow because we know tool calling doesn't work. This was back in 2024. the fact that I decomposed the problem because it's how I thought I should perform research and this basically bottlenecked me. **49:40** 所以这正如斯坦福演讲中提到的。当你根据模型的能力构建AI应用程序时,你必须不断重新评估你的假设。我在这里大量谈论结构,我添加的特定结构,我使用工作流的事实,因为我们知道工具调用不起作用。这是在2024年。我分解问题的事实,因为这是我认为应该进行研究的方式,这基本上成了我的瓶颈。 **50:10** I couldn't use MCP as MCP got for example, you know, much more popular. I couldn't take advantage of the fact that tool calling was getting significantly better over time. So then I moved to an agent, started to remove structure, allow for tool calling, let the agent decide the research path. A subtle mistake that I made which links back to that point about failing to remove structure. I actually wrote the report sections within each sub aent. **50:10** 例如,当MCP变得更受欢迎时,我无法使用MCP。我无法利用工具调用随着时间的推移变得显著更好的事实。所以我转向了智能体,开始移除结构,允许工具调用,让智能体决定研究路径。我犯的一个微妙错误,这回到了未能移除结构的那一点。我实际上在每个子智能体中编写了报告部分。 **50:40** So this talks this kind of links back to what we talked about with sub aents in isolation. Sub agents don't communicate effectively with one another. So if you write report sections in each sub aent the final report is actually pretty disjoint. This is exactly Allesio's challenge and problem about you know using multi- aent. So I actually hit that exact problem. So I ripped out the independent writing and did a oneshot writing at the end and this is the current kind of version of open deep research which is quite good **50:40** 所以这谈论这种联系回到我们谈论的孤立的子智能体。子智能体彼此之间无法有效沟通。所以如果你在每个子智能体中编写报告部分,最终报告实际上相当不连贯。这正是Alessio关于使用多智能体的挑战和问题。所以我实际上遇到了那个确切的问题。所以我删除了独立写作,在最后进行了一次性写作,这是开放深度研究的当前版本,相当不错。 **51:04** and this is kind of the thing that's at least on deep research it's the best performing open deep research assistant at least that that's open source so it was kind of my own arc although we do have some results with GPD5 that that are quite strong so you know the models are always getting better and so indeed our open source assistant actually takes advantage and rides that wave um but I actually kind of experienced I felt like I I actually got bitter lessons myself because I started with a system that was very reliable for the current state of models back in mid 2024, early 2024, but I was completely bottlenecked as models got better. **51:04** 这是至少在深度研究方面表现最好的开源深度研究助手,所以这是我自己的历程,尽管我们确实有一些GPT-5的结果非常强大,所以你知道模型总是在变得更好,所以确实我们的开源助手实际上利用并驾驭了那个浪潮,但我实际上经历了,我觉得我实际上自己得到了苦涩的教训,因为我从一个对2024年中期、2024年初的模型当前状态非常可靠的系统开始,但随着模型变得更好,我完全被瓶颈了。 **51:48** I had to rip out the entire system and rebuild it twice, rechecking my assumptions um in order to kind of capture the gains of the model. So, I think I just want to flag I think this is an interesting point. It's hard to build on top of rapidly expanding models, rapidly improving model capability. And actually, I really enjoyed from a engineer Boris's talk on cloud code. And they're very bitter lesson build. He talks a lot about the fact that they make cloud code kind of very simple and very general because of this fact. They want to give users unfettered kind of access to the model without much scaffolding around it. **51:48** 我不得不拆除整个系统并重建两次,重新检查我的假设,以便捕捉模型的收益。所以,我认为我只是想标记,我认为这是一个有趣的观点。在快速扩展的模型、快速改进的模型能力之上构建是困难的。实际上,我真的很喜欢AI Engineer的Boris关于Claude Code的演讲。他们非常遵循苦涩教训的构建。他谈了很多关于他们使Claude Code非常简单和通用的事实,因为这个事实。他们想给用户不受限制地访问模型,周围没有太多脚手架。 **52:21** Yeah, exactly. He he hits it in one of these slides. Yeah. I I don't know where. Yeah. Yeah. Yeah. But but I but I think it's it's an interesting consideration in a engineering that we're building on top of models that are improving exponentially. And one of the points he makes is a core layer of the bitter lesson is that more general things around the model tend to win. And so when building applications we should be thinking about this. **52:21** 是的,确实如此。他在其中一张幻灯片中提到了这一点。是的。我不知道在哪里。是的。是的。是的。但我认为这是AI工程中一个有趣的考虑,我们正在基于指数级改进的模型构建。他提出的观点之一是,苦涩教训的核心层是围绕模型的更通用的东西往往会获胜。所以当构建应用程序时,我们应该考虑这一点。 **52:47** We should be adding structure necessary to get things to work today, but keeping an eye on improving models and keep but keeping a close eye on models improving rapidly and removing structure in order to unbottleneck ourselves. I think that was kind of my takeaway. So, I really liked the talk um from Hyong Wan Chung. I think that's worth everyone listening to. Um and I think a lot of lessons apply to engineering. **52:47** 我们应该添加必要的结构以使今天的事情工作,但要关注改进的模型,并密切关注快速改进的模型,并移除结构以解除我们自己的瓶颈。我认为这是我的收获。所以,我真的很喜欢Hyeon-Woo Chung的演讲。我认为这值得每个人听。我认为很多教训适用于工程。 **53:11** I think this is similar to incumbents kind of like adopting AI putting in existing tools because you already have the workflow right so you already have all the structure you just put AI it becomes better um but then kind of like the AI native approaches catch up as the models get better and then there's like there's no way for existing products to remove the structure because the structure is the product you know and that's why then you have you know cursor and windsurf are being are better than vs code for like yeah native thing just because they didn't have to deal with removing things **53:11** 我认为这类似于现有公司采用AI并将其放入现有工具中,因为你已经有了工作流,所以你已经有了所有结构,你只是放入AI,它变得更好,但随着模型变得更好,AI原生方法赶上来了,现有产品无法移除结构,因为结构就是产品,你知道,这就是为什么你有Cursor和Windsurf比VS Code更好,因为它们是原生的东西,只是因为它们不必处理移除东西。 **53:41** and why cognition is like you know uh again it's like it doesn't even think about the ID as like the first thing the ID is like a piece of the agent um and so I think you see this in a lot of markets which is like hey again if you have a workflow and you put AI the workflow is better like the workflow is not the end goal you know um and so I think we're now at a place where like you should just start without a lot of structure just because now the models are like so good but I think the first two 2 and a half years of the market there was kind of like the stance of like should I just put AI into the workflow that works? **53:41** 以及为什么Cognition甚至不把IDE作为第一件事,IDE就像智能体的一部分,所以我认为你在很多市场上看到这一点,就像,嘿,如果你有一个工作流并放入AI,工作流会更好,但工作流不是最终目标,你知道,所以我认为我们现在处于一个你应该不带太多结构就开始的地方,只是因为现在模型非常好,但我认为市场的前两年半有点像应该将AI放入有效的工作流中吗? **54:11** Should I rewrite the workflow but the workflow is not that good because the models are not that good. Uh but I think we're past we're past that point now. That's an amazing example actually. If you show your chart again there's another interesting point in your chart. An interesting point here is that in the kind of earlier model regime the structure approach is actually better. And so an interesting take on this. **54:11** 我应该重写工作流,但工作流不是那么好,因为模型不是那么好。但我认为我们已经过了那个点。这实际上是一个惊人的例子。如果你再次展示你的图表,你的图表中还有另一个有趣的观点。这里有一个有趣的观点是,在早期的模型体制中,结构方法实际上更好。所以对此有一个有趣的看法。 **54:29** So um Jared Kaplan the founder of Anthropic has a great talk at startup school from a couple weeks ago and he mentions this point about oftentimes building products that explicitly don't quite work yet can be a good approach because a model under them is exponentially and it'll kind of unlock the product. We saw that with cursor. So like part of the cursor lore is that it it did not work quite it did not work particularly well. Cloud35 hits and then boom it kind of unlocks the product. **54:29** 所以Anthropic的创始人Jared Kaplan几周前在创业学校有一个很棒的演讲,他提到了这一点,即构建明确还不太工作的产品通常可以是一个好方法,因为它们下面的模型正在指数级改进,它会解锁产品。我们在Cursor身上看到了这一点。所以Cursor传说的一部分是它工作得不太好,特别不好。Claude 3.5出现,然后砰,它解锁了产品。 **55:03** And so you kind of hit that knee near the curve when the model capability kind of catches up to the product needs. But in that in that earlier regime, the structure approach appears better. So it's kind of this interesting subtle point that for a while the more structure approach appears better and then the model finally hits the capability needed to unlock your product and suddenly your product just takes off. **55:03** 所以当模型能力赶上产品需求时,你会遇到曲线的那个拐点。但在那个早期体制中,结构方法似乎更好。所以这是一个有趣的微妙观点,即一段时间内更多结构的方法似乎更好,然后模型最终达到解锁你的产品所需的能力,突然你的产品就起飞了。 **55:21** There's kind of another correlary to this that you can get tricked into thinking your structured approach is is indeed better because it'll be better for a while until the model catches up with less structured approaches. Your chart looks very similar to the winds surf chart. Uh I got to bring it up because uh I I was involved in this in the writing of this one. Isn't that similar? There's a there's the ceiling, you know, and then like the boom you you go slow. It's this computer lesson but in like uh Enterprise S. **55:21** 这还有另一个推论,你可能会被欺骗,认为你的结构化方法确实更好,因为它会好一段时间,直到模型赶上结构较少的方法。你的图表看起来非常类似于Windsurf的图表。我必须把它调出来,因为我参与了这篇文章的写作。不是很相似吗?有一个天花板,你知道,然后像砰你缓慢前进。这是企业软件中的计算机教训。 **56:02** That's right. That's right. Very similar. Very similar. I I I just like you know for me like okay the lines are important but to me the bullet points are the main thing. If you understand the bullet points then you cannot you can actually learn from the the the mistakes of others. Uh I spend a lot of effort on the bullet points. Right. **56:02** 没错。没错。非常相似。非常相似。我只是喜欢,你知道对我来说,好的,线条很重要,但对我来说要点是主要的。如果你理解要点,那么你就不能,你实际上可以从其他人的错误中学习。我在要点上花了很多精力。对的。 ## Langraph与框架 / Langraph and Frameworks **56:14** Cool. Yeah. I mean uh so generally you know um there is one spicy take on on this which is like you know how much is langraph aligned with the bitter lesson. Yes, obviously you guys are obviously aware of it, so it's not going to be a surprise. But I do think that making things making abstractions easy to unwind is very important if you believe in a bitter lesson, which which you do. **56:14** 很好。是的。我的意思是,总的来说,你知道,有一个辣的观点,就像你知道Langraph与苦涩教训有多一致。是的,显然你们显然意识到了这一点,所以这不会令人惊讶。但我确实认为,如果你相信苦涩教训(你确实相信),那么使事情使抽象易于解开是非常重要的。 **56:37** No, no, this is super important actually and I actually talked about this in the post. Yeah, there's an interesting subtlety when you talk about Asian frameworks and a lot of people are anti-framework. I completely understand and sympathetic to those points. But I think when people talk about frameworks, there's two different things. So there can be a low-level orchestration framework. **56:37** 不,不,这实际上非常重要,我实际上在帖子中谈到了这一点。是的,当你谈论智能体框架时有一个有趣的微妙之处,很多人反对框架。我完全理解并同情这些观点。但我认为当人们谈论框架时,有两种不同的东西。所以可以有一个低级编排框架。 **56:54** There's a great talk for example at um Anthropic from Shopify. They built this kind of orchestration framework called roast internally and uh it's basically langraph. It's some kind of way to build kind of internal orchestration workflows with LMS and roast langraph provides you low-level building blocks nodes edges state which you can compose into agents you can compose into workflows **56:54** 例如,在Anthropic有一个来自Shopify的精彩演讲。他们内部构建了这种编排框架,称为Roast,基本上就是Langraph。这是一种构建内部编排工作流的方式,使用语言模型,Roast Langraph为你提供低级构建块:节点、边、状态,你可以将其组合成智能体,你可以组合成工作流。 **57:28** I don't hate that I like working low building blocks they're pretty easy to tear down rebuild in fact I I used for example langraph to build open research I had a workflow I rip it out I re agent the building blocks are low-level just nodes, edges, state. But the thing I'm sympathetic to is there's also in addition to just kind of low-level orchestration frameworks, there's also agent abstractions like from framework import agent. That is actually where you can get into more trouble because you might not know what's behind that abstraction. **57:28** 我不讨厌这个,我喜欢使用低级构建块,它们很容易拆除重建,事实上我使用例如Langraph构建开放研究,我有一个工作流,我拆除它,我重新构建智能体,构建块是低级的,只是节点、边、状态。但我同情的是,除了低级编排框架之外,还有智能体抽象,比如从框架导入智能体。这实际上是你可能陷入更多麻烦的地方,因为你可能不知道抽象背后是什么。 **58:04** I think when a lot of people kind of are anti- framework, I think what they're really saying is they're also anti-abstract. They're they're largely anti-abstraction which I'm actually very sympathetic to and I I don't particularly like agent abstractions for this exact reason and I think Walden Yans made a good point like we're very early in ARC of agents we're like in the HTML era uh and and agent abstractions are problematic because you don't know what's necessar under the hood of the abstraction you don't understand it **58:04** 我认为当很多人反对框架时,我认为他们真正要说的是他们也反对抽象。他们在很大程度上反对抽象,我实际上非常同情,我不特别喜欢智能体抽象,正是出于这个原因,我认为Walden Yan提出了一个很好的观点,我们在智能体的演进中非常早期,我们就像在HTML时代,智能体抽象是有问题的,因为你不知道抽象背后需要什么,你不理解它。 **58:28** and if I was building for example you know open research with an with an abstraction I wouldn't necessarily know how to rip it apart and rebuild it uh when models got better so I'm actually wary of abstractions. I'm very sympathetic to that part of the critique of frameworks, but I don't hate low-level orchestration frameworks that just provide nodes, edges, you can just recombine them in any way you want. **58:28** 如果我用抽象构建例如开放研究,当模型变得更好时,我不一定知道如何拆解它并重建它,所以我实际上对抽象持谨慎态度。我非常同情对框架批评的那一部分,但我不讨厌只提供节点、边的低级编排框架,你可以以任何你想要的方式重新组合它们。 **58:47** And then the question is why use orchestration at all? And actually, I use Langraph because you get some nice you get checkpointing, you get state management. It's low-level stuff. And that's the way I happen to use Langraph and that's why I like Langraph and that's actually why a lot of I found like a lot of customers like Langraph. It's not necessarily for the agent abstraction, which I agree can be much trickier. Some people like agent abstractions. That's completely fine as long as you understand what's under the hood. **58:47** 那么问题是为什么要使用编排?实际上,我使用Langraph是因为你得到一些好处,你得到检查点,你得到状态管理。这是低级的东西。这就是我恰好使用Langraph的方式,这就是为什么我喜欢Langraph,这实际上就是为什么我发现很多客户喜欢Langraph。这不一定是为了智能体抽象,我同意这可能更棘手。有些人喜欢智能体抽象。只要你理解背后是什么,这完全没问题。 **59:11** But I think that's a very interesting debate about frameworks. I think the critique is it should be made a little bit more on abstractions because often people don't know what's under the hood. **59:11** 但我认为这是关于框架的一个非常有趣的辩论。我认为批评应该更多地针对抽象,因为人们通常不知道背后是什么。 ## MCP与标准化 / MCP and Standardization **59:22** For those who are looking for resources, uh it was a bit hard to find the Shopify talk because it's unlisted. Yeah, it's unlisted now. Exactly. I don't know why it's unlisted, but it's a nice talk. I found it through the this Chinese Chinese ripoff of the talk. Funny. There you go. Yeah, it's hard. It's actually hard to find now. I think there should be a browse comp where uh you find obscure YouTube videos because that's something I'm very good at. Just kind of my bread and butter. **59:22** 对于那些正在寻找资源的人来说,找Shopify的演讲有点困难,因为它是未列出的。是的,它现在是未列出的。确实如此。我不知道为什么它是未列出的,但这是一个很好的演讲。我通过这个演讲的中文翻版找到了它。有趣。就是这样。是的,这很困难。现在实际上很难找到。我认为应该有一个浏览竞赛,你可以找到晦涩的YouTube视频,因为这是我非常擅长的。这就是我的特长。 **59:47** It's good. And what you know what's funny is this talk follows exactly the arc we often see when we're talking to companies about Lang graph. It is people want to build agents and workflows internally. Everyone rolls their own. it becomes hard to kind of manage and coordinate and review code in large in this context of large organizations. It can be very helpful to have a standard library or framework that people are using with low-level components that are easily composable. **59:47** 很好。你知道有趣的是,这次演讲完全遵循我们与公司谈论Langraph时经常看到的路径。人们想要在内部构建智能体和工作流。每个人都自己开发。在大型组织的背景下,管理、协调和审查代码变得困难。拥有一个人们使用的标准库或框架,具有易于组合的低级组件,可能非常有帮助。 **1:00:12** That's that's what they build with roast. That's effectively what Lang graph is and that's why a lot of people like Langraph. I actually thought the talk on MCP that uh I believe it was it was u at a engineer it was um Jean Welsh. Yes. I I I think that was like a super underrated talk. I tried yelling about it. No one listened to me. But like, you know, if you listen this far into the podcast, do us a favor. Did actually listen to uh John Welsh's talk. It's actually very good. It's very good. **1:00:12** 这就是他们用Roast构建的。这实际上就是Langraph,这就是为什么很多人喜欢Langraph。我实际上认为关于MCP的演讲,我相信是在AI Engineer,是John Welsh。是的。我认为那是一个被低估的演讲。我试图大喊它。没有人听我的。但是,你知道,如果你听到播客的这一步,帮我们一个忙。确实听听John Welsh的演讲。它实际上非常好。非常好。 **1:00:40** So, so actually he makes a case for a lot of the reason why people actually, for example, enterprises, large companies like Langraph, which is the fact that when tool calling got good within anthropic in, you know, sometime mid last year, he actually makes this point explicitly. Yes, exactly. It's it's somewhere right around here. Actually, there's a timeline slide if you go back like one or two. It's very good. **1:00:40** 所以,实际上他为很多原因做了一个案例,为什么人们,例如企业、大公司喜欢Langraph,这是因为当Anthropic的工具调用在去年年中变好时,他实际上明确提出了这一点。是的,确实如此。就在这附近。实际上,如果你回到一两张,有一个时间线幻灯片。非常好。 **1:01:03** So, this this is very interesting. So, he mentions, okay, so your anthropic tool calling gets good in mid124. Everyone's building their own integrations. It becomes complete chaos and that's actually where kind of MCP came from. Let's build a kind of a standard protocol for accessing tools. Everyone adopts it. Much easier to kind of have a and have review and you minimize cognitive load. **1:01:03** 所以,这非常有趣。所以,他提到,好的,所以Anthropic的工具调用在2024年年中变好了。每个人都在构建自己的集成。它变成了完全的混乱,这实际上是MCP的来源。让我们构建一种访问工具的标准协议。每个人都采用它。更容易进行审查,你可以最小化认知负担。 **1:01:27** And this is actually the argument for standardized tooling whether it be frameworks or otherwise within larger orgs is practicality. And he actually his whole talk is making that very pragmatic point which is actually why um people do tend to like kind of frameworks uh for example in large organizations. Agreed. Um and then and then ship it as a gateway. Uh this is the the other uh big um thing that they do. Um **1:01:27** 这实际上是在大型组织中标准化工具的论点,无论是框架还是其他,都是实用性。他的整个演讲都在阐述这个非常务实的观点,这实际上就是为什么人们倾向于喜欢例如在大型组织中的框架。同意。然后将其作为网关发布。这是他们做的另一件大事。 ## 结语 / Closing **1:01:48** that's right Lance. you've been so generous of your time. Thank you. Uh any shameless plugs, uh call to action, stuff like that. Yeah, if you made it this far, thanks for listening. Um we have a a bunch of different courses I've taught uh one on ambient agents, uh one on building open research. So I I actually was very inspired by uh Carpathy had a tweet a long time ago talking about building on-ramps. **1:01:48** 没错,Lance。你对时间非常慷慨。谢谢你。有什么无耻的插入,号召行动之类的吗?是的,如果你听到这里,感谢你的聆听。我们有一堆我教过的不同课程,一个关于环境智能体,一个关于构建开放研究。所以我实际上很受Karpathy很久以前的一条推文的启发,谈论构建入口。 **1:02:13** So he talked about he had his microrad repo. A few people looked at it, but not that many. He made a YouTube video and that created an on-ramp and the repo skyro skyrocket in popularity. So I like this onetwo punch of building a thing like open research then creating a class so people can actually understand how to build it themselves and I kind of like that build a thing create an on-ramp for it. **1:02:13** 所以他谈到他有他的micrograd仓库。少数人看了它,但不是很多。他制作了一个YouTube视频,这创造了一个入口,仓库人气飙升。所以我喜欢这种一二组合,构建一个像开放研究这样的东西,然后创建一个课程,让人们可以实际理解如何自己构建它,我有点喜欢这种构建一个东西为它创建一个入口。 **1:02:38** So I have a class on building open research feel free to it's for free um but it it walks through a bunch of notebooks as to how I build it and and you can see the agent is quite good. We even have better results coming out soon with GPD5. So, uh, if you want kind of an open source deep research agent, have a look at it. Um, it's, uh, it's been pretty fun to build and that's exactly what I talk about in that bitter lesson blog post as well. **1:02:38** 所以我有一个关于构建开放研究的课程,可以免费参加,但它通过一堆笔记本介绍了我如何构建它,你可以看到智能体相当好。我们甚至即将推出GPT-5的更好结果。所以,如果你想要一个开源的深度研究智能体,看看它。这是,呃,构建起来非常有趣,这正是我在苦涩教训的博客文章中谈论的。 **1:02:57** Awesome, Lance. Thank you for joining. Yeah, a lot of fun. Great to be on. **1:02:57** 太棒了,Lance。感谢你的参与。是的,非常有趣。很高兴参加。 --- **翻译完成** 这是一份完整的关于智能体上下文工程的播客文字稿翻译,涵盖了从上下文工程的起源、五大类别(卸载、上下文隔离、检索、减少上下文、缓存)、记忆与上下文工程的关系,到苦涩教训在AI工程中的应用等多个重要主题。