光标内部是如何工作的？

You might have read the news that OpenAI is buying Windsurf for a whopping 3B! In other news, Anysphere, the parent company of Cursor, is raising 900M at a valuation of 9B$! That’s a lot of money for code-generating applications. However, it seems reasonable when you realize Cursor is currently at 300M of revenue. It is supposedly the fastest growing Software as a Service.
你可能已经看到新闻，OpenAI 以惊人的 30 亿美元收购了 Windsurf！另外，Cursor 的母公司 Anysphere 正在以 90 亿美元的估值筹集 9 亿美元资金！对于代码生成应用来说，这是一大笔资金。然而，当你意识到 Cursor 目前的收入达到 3 亿美元时，这似乎是合理的。它据称是增长最快的软件即服务（SaaS）。

The question that is bugging me is why is Cursor or Windsurf special? How do they work internally? Aren’t they just a VS Code wrapper?
让我困惑的问题是，为什么 Cursor 或 Windsurf 如此特别？它们内部是如何工作的？难道它们不就是 VS Code 的一个包装器吗？

What Cursor Is and What It Does
Cursor 是什么以及它的功能

Cursor is an AI-first code editor built to boost developer productivity by writing and editing code. It’s a fork of Visual Studio Code aka VS Code augmented with powerful AI capabilities. Cursor acts like an intelligent pair programmer integrated directly into the IDE, understanding the project and assisting in real-time.
Cursor 是一个以 AI 为核心的代码编辑器，旨在通过编写和编辑代码来提升开发者的生产力。它是 Visual Studio Code（简称 VS Code）的一个分支，增强了强大的 AI 功能。Cursor 就像一个智能的编程搭档，直接集成在 IDE 中，理解项目并实时提供帮助。

But how? It does so by deeply indexing the codebase and learning the coding style of the user. By indexing the complete codebase as vector embeddings, it can catch errors, suggest improvements, and even refactor code easily.
那它是如何做到的？它通过深度索引代码库并学习用户的编码风格来实现。通过将完整代码库索引为向量嵌入，它能够捕捉错误、提出改进建议，甚至轻松重构代码。

I am assuming everyone has used Cursor at some point, but here’s a quick rundown of the top features of the app:
我猜你们大多数人都用过 Cursor，下面是该应用的主要功能快速介绍：

AI Chat (Contextual Assistant) – Cursor provides a chat sidebar where you can converse with an AI about your code. Unlike a generic chat, it’s aware of your current file, cursor location, and project context. You can ask questions like “Is there a bug in this function?” and get answers based on your actual code.
AI 聊天（上下文助手）– Cursor 提供了一个聊天侧边栏，您可以在其中与 AI 讨论您的代码。与通用聊天不同，它能够感知您当前的文件、光标位置和项目上下文。您可以提出诸如“这个函数有 bug 吗？”这样的问题，并根据您的实际代码获得答案。
Semantic Codebase Search – Cursor can act as a smart search engine for your codebase. Instead of just keyword matching, it uses semantic search to understand the meaning of your query and find relevant code. For example, you might ask in chat, “Where do we configure logging?” and Cursor will retrieve the code snippets or files that likely contain the configuration. Under the hood, Cursor indexes your entire repository by computing embeddings for each file (i.e. numerical vector representations of code semantics). This process allows it to answer codebase-wide questions effectively. It retrieves the most relevant code chunks and feeds them into the AI’s response.
语义代码库搜索 – Cursor 可以作为您代码库的智能搜索引擎。它不仅仅是关键词匹配，而是使用语义搜索来理解查询的含义并找到相关代码。例如，您可以在聊天中问：“我们在哪里配置日志？” Cursor 会检索可能包含该配置的代码片段或文件。在底层，Cursor 通过为每个文件计算嵌入（即代码语义的数值向量表示）来索引整个代码库。这个过程使其能够有效回答跨代码库的问题。它会检索最相关的代码块并将其输入到 AI 的回答中。
Smart Refactoring and Multi-File Edits – Cursor has the superpower to perform large-scale or logical refactors. These refactors are done through natural language commands. It uses a dedicated (and smaller) edit model, which is different from the main LLM model answering queries.
智能重构和多文件编辑 – Cursor 拥有执行大规模或逻辑重构的超能力。这些重构通过自然语言命令完成。它使用一个专用的（且较小的）编辑模型，这与用于回答查询的主 LLM 模型不同。
Inline Code Completions (Tab Completion) – Similar to GitHub Copilot, Cursor offers inline code completion as you type, but with enhanced “intelligence.” The AI in Cursor’s Tab feature predicts not just the next token, but potentially the next several lines or the next logical edit you might make based on semantic similarities in your code.
内联代码补全（Tab 补全）——类似于 GitHub Copilot，Cursor 在你输入时提供内联代码补全，但具备更强的“智能”。Cursor 的 Tab 功能中的 AI 不仅预测下一个标记，还可能预测接下来的几行代码或你可能基于代码语义相似性进行的下一个逻辑编辑。
Additional Productivity Features like Cmd/Ctrl+K – There’s also inline commands like Cmd+K for on-demand code generation or editing – you can select a block of code, press a shortcut, and describe an edit (e.g. “optimize this loop”), and the AI will apply it.
额外的生产力功能如 Cmd/Ctrl+K —— 还有内联命令如 Cmd+K 用于按需代码生成或编辑——你可以选中一段代码，按下快捷键，然后描述一个编辑（例如“优化这个循环”），AI 会自动应用该编辑。

How Cursor Works Under the Hood
Cursor 的底层工作原理

Now let’s dive into the technical architecture that makes these features possible. At a high level, Cursor consists of a client-side application (the VS Code-based editor) and a set of backend AI services. Let’s dive into how the client and server work together to orchestrate language model prompts, code indexing, and application of edits.
现在让我们深入了解使这些功能成为可能的技术架构。从高层次来看，Cursor 由一个客户端应用程序（基于 VS Code 的编辑器）和一组后端 AI 服务组成。接下来，我们将探讨客户端和服务器如何协同工作，以协调语言模型提示、代码索引以及编辑的应用。

Client-Side Changes vs VS Code
客户端更改与 VS Code 的区别

Cursor’s desktop application is built on a fork of VS Code, which means it reuses the core editor, UI, and extension ecosystem of VS Code. This gives a lot of IDE features for free (text editing, syntax highlighting, language server support, debugging, etc.) upon which Cursor layers its own AI features. The client includes custom-made UI elements like the chat sidebar, the Composer panel, and special shortcuts (Tab, Cmd+K) to invoke AI actions. Because it’s a true fork (not just a plugin), Cursor can tightly integrate AI into the workflow – for example, the autocompletion is woven into the editor’s suggestion engine, and the chat can directly modify files.
Cursor 的桌面应用程序基于 VS Code 的一个分支构建，这意味着它重用了 VS Code 的核心编辑器、用户界面和扩展生态系统。这为 Cursor 免费提供了许多 IDE 功能（文本编辑、语法高亮、语言服务器支持、调试等），并在此基础上叠加了自己的 AI 功能。客户端包括定制的 UI 元素，如聊天侧边栏、Composer 面板以及用于调用 AI 操作的特殊快捷键（Tab、Cmd+K）。由于它是真正的分支（而不仅仅是插件），Cursor 可以将 AI 紧密集成到工作流程中——例如，自动补全被编织进编辑器的建议引擎，聊天功能可以直接修改文件。

Building Custom Sandbox: Cursor uses language servers (the same ones VS Code uses for languages like Python, TypeScript, Go, etc.) to get real-time information about the code. This provides features like “go to definition,” find references, linting errors, etc. Cursor leverages this in creative ways. Notably, it implements a concept called the “shadow workspace”: essentially a hidden background workspace that the AI can use to safely test changes and get feedback from language servers. For instance, if the AI writes some code, Cursor can spin up a hidden editor window, apply the AI’s changes there (not in your actual open files), and let the language server report any errors or type-check issues. Those diagnostics are fed back to the AI so it can adjust its suggestions before presenting them to you – super cool!
构建自定义沙箱：Cursor 使用语言服务器（与 VS Code 用于 Python、TypeScript、Go 等语言的语言服务器相同）来获取代码的实时信息。这提供了“跳转到定义”、“查找引用”、“代码检查错误”等功能。Cursor 以创新的方式利用了这些功能。值得注意的是，它实现了一个称为“影子工作区”的概念：本质上是一个隐藏的后台工作区，AI 可以用来安全地测试更改并从语言服务器获取反馈。例如，如果 AI 编写了一些代码，Cursor 可以启动一个隐藏的编辑器窗口，在那里应用 AI 的更改（而不是在你实际打开的文件中），并让语言服务器报告任何错误或类型检查问题。这些诊断信息会反馈给 AI，使其能够在向你展示建议之前调整其建议——非常酷！

In essence, the client provides the AI with a sandboxed development environment, complete with compiler/linters – to improve the accuracy of its code edits. (This is currently done via an invisible Electron window that mirrors your project, and future plans involve kernel-level file system proxies for even faster isolation)
本质上，客户端为 AI 提供了一个沙箱式的开发环境，配备了编译器和代码检查工具，以提高其代码编辑的准确性。（目前这是通过一个不可见的 Electron 窗口实现的，该窗口镜像您的项目，未来计划采用内核级文件系统代理以实现更快速的隔离）

Beyond the shadow workspace, the client also handles things like the @ symbol context insertion (when you reference @File or @Code in a prompt, the client knows to fetch that file’s content or snippet), and manages the UI for applying AI changes (e.g. the “Play” button for instant apply of chat suggestions). If you use the “instant apply” feature in chat or Composer, the client receives the diff or new code from the AI and applies it to the actual files, possibly showing you a preview or performing a safe merge. We’ll discuss how those AI responses are generated next.
除了影子工作区，客户端还处理诸如 @ 符号上下文插入（当你在提示中引用 @File 或 @Code 时，客户端知道去获取该文件的内容或代码片段），并管理应用 AI 变更的 UI（例如用于即时应用聊天建议的“播放”按钮）。如果你在聊天或 Composer 中使用“即时应用”功能，客户端会接收来自 AI 的差异或新代码，并将其应用到实际文件中，可能会向你显示预览或执行安全合并。接下来我们将讨论这些 AI 响应是如何生成的。

LLM Orchestration LLM 编排

While some lightweight processing (like splitting code into chunks for indexing) happens locally, the heavy AI lifting is done by Cursor’s cloud backend. When you invoke an AI feature, the client assembles the necessary context (your prompt, selected code, etc.) and sends a request to Cursor’s backend. The backend is responsible for building the final prompt for the large language model, interfacing with the model, and returning the results to the editor. In fact, even if you configure Cursor to use your own OpenAI API key, the requests still funnel through Cursor’s backend for prompt construction and orchestration. This allows Cursor to insert system instructions, code context, and tool-specific formatting around your query before it hits the language model.
虽然一些轻量级处理（如将代码拆分成块以便索引）在本地进行，但繁重的 AI 计算由 Cursor 的云端后端完成。当你调用 AI 功能时，客户端会组装必要的上下文（你的提示、选中的代码等）并向 Cursor 的后端发送请求。后端负责构建发送给大型语言模型的最终提示，接口调用模型，并将结果返回给编辑器。实际上，即使你配置 Cursor 使用你自己的 OpenAI API 密钥，请求仍然通过 Cursor 的后端进行提示构建和编排。这使得 Cursor 能在查询发送给语言模型之前，插入系统指令、代码上下文和特定工具的格式。

Large and Small models orchestration: Cursor uses a mix of AI models – both “frontier” large models (like GPT-4 or Claude 3.5) and purpose-built specialized models For example, for natural language chat about code or very complex tasks, it might use a top-tier model (GPT-4) to maximize quality. But for faster autocomplete and routine code edits, Cursor has its own optimized models. In fact, the Cursor team trained a custom code model nicknamed “Copilot++” (inspired by OpenAI’s Codex/Copilot) to better predict the next code edits.

They also developed a specialized “fast apply” model for rapidly applying large code changes. This model was fine-tuned on Cursor-specific data. This includes examples of Cmd+K edit instructions and their corresponding code diffs. The model is used to perform multi-line edits much faster than GPT-4 can. The custom model (built on a 70-billion-parameter Llama base) runs on Cursor’s servers via an inference engine called Fireworks, and it can generate code with extremely high throughput – over 1000 tokens per second – using an advanced technique called speculative decoding. In short, the backend includes an LLM orchestration layer that picks which model to use for a given task, optimizes the prompt, and leverages performance tricks (like parallel token generation) to deliver results with low latency.

Storing embeddings in a Vector DB: The backend services also include the vector database that stores code embeddings for your entire project (more on that below), as well as caching layers and routing logic. All communication is designed with privacy and performance in mind: if Privacy Mode is enabled, the backend won’t retain any of your code or data after fulfilling a request. If Privacy Mode is off, Cursor may log some telemetry or anonymized data to improve its models, but even then the raw code is not persisted in their servers long-term.

Codebase Indexing and Semantic Embeddings

Codebase Scanning: One of the core enablers of Cursor’s “project awareness” is its code indexing system. When you first open a project in Cursor, it will scan and index the entire codebase in the background. How does this work: Cursor splits each file into smaller chunks and computes a vector embedding for each chunk. An embedding is essentially a numerical representation that captures the semantic content of the text (in this case, code). Cursor uses either OpenAI’s embedding models or a custom embedding model to generate these vectors. Each chunk’s embedding is stored in a vector database along with metadata – for example, which file and line numbers it came from.

The chunks are typically on the order of a few hundred tokens each. Splitting code is necessary both to stay within model token limits and to increase the granularity of search. Cursor uses intelligent strategies for chunking – it won’t just cut blindly every N lines. Tools like tree-sitter (which parses source code into syntax trees) help break the code at logical boundaries (functions, classes) so that each chunk is a coherent block of code. This way, when a chunk is retrieved, it contains a complete construct or thought, which is more useful for the AI to see.

Using a RAG: Once the codebase is indexed into embeddings, semantic search becomes possible. For example, when you ask the chat “Find all places where we call the authenticateUser function,” Cursor will convert that query into an embedding vector and query the vector database for nearest matches. It might retrieve several code chunks across different files that look related (calls to that function, its definition, doc comments mentioning it, etc.). These relevant snippets are then brought back into the context window for the language model. In practical terms, Cursor’s AI will include those code snippets in the prompt it builds for the LLM, often with some annotation like file names. This approach – Retrieval-Augmented Generation (RAG) – means the AI isn’t limited to the code in the file you’re currently editing; it can draw upon any part of your project as long as the index finds it relevant.

This is how Cursor achieves its “whole-project awareness” in practice.

Prompt Construction and Context Management

When you interact with Cursor’s AI (via chat or a command), a lot is happening behind the scenes to construct an effective prompt for the language model. Cursor’s backend takes various context sources and weaves them together in a prompt according to a certain format. These sources include: the user’s query or instruction, the code context (from the open file or retrieved via semantic search), possibly additional context like documentation or examples, and the conversation history (for chat). There are also a system or role prompt with instructions that guides the model. Here’s Cursor’s leaked custom system prompt.

Token length issues: Managing the context window is critical because language models have token limits. Cursor therefore employs strategies to maximize useful information in the prompt and omit or compress less relevant data. One strategy is windowing or chunking for large outputs – if the task is to refactor a 1000-line file, Cursor might break the task into smaller sections, process them individually (maybe with the model planning and applying changes section by section), and then stitch the results.

Cursor’s system also makes use of abstract syntax tree (AST) analysis and static analysis via language servers to enrich context. For example, if you have an error message or a symbol name in your prompt, Cursor could ask the language server for the definition of that symbol or the type information, and include that in the prompt as additional context. The AI might be told, “Here is the definition of function X from file Y,” to better answer a question about X. This kind of integration between traditional tooling (LSP, AST parsing) and LLM is a key part of Cursor’s design to improve accuracy.

We touched on the Shadow Workspace earlier – that’s another form of context management. In an iterative editing scenario, the AI might propose a code change, then the hidden workspace is used to check the result (e.g. does it compile?) before finalizing the answer. If the check fails, the AI can get the compiler or linter feedback (like “variable foo is undefined”) and incorporate that into a follow-up prompt (essentially a self-refinement loop). This loop can repeat a few times in the background within a single user command, so that by the time the AI presents a diff to you, it’s more likely to be correct and apply cleanly. Keep in mind, all of this is invisible to the user!

Applying edits: Another important aspect is how edits are represented and applied. Cursor often has the model produce answers as code edits rather than just plain text explanation. For instance, if you ask it to implement a function, the response might be the full function code block ready to insert. In refactor cases, the AI might output a diff or a list of changes. Cursor’s interface can interpret these and apply them to the project.

Performance Optimizations and Custom Tooling

Cursor’s team has implemented several optimizations to make the experience feel fast and smooth:

Specialized Model Tuning – As mentioned, Cursor fine-tuned its own large language model for code edits (“Fast Apply” model). This model is designed to handle code modifications and multi-file edits more reliably than general models.
Speculative Decoding for Speed – Cursor leverages an advanced inference technique via Fireworks called speculative decoding. In normal LLM generation, the model generates tokens sequentially, which can be slow. Speculative decoding allows a second “draft” model to guess ahead and generate multiple tokens in parallel, which the main model then quickly verifies.
Caching and Session Optimization – On top of caching file data on the backend, Cursor likely caches embedding results and search results. If you ask two similar questions back to back, the second one can reuse the vector search results from the first if appropriate, instead of hitting the database again.
Memory and Resource Management – Running heavy models and multiple editor instances can be resource-intensive. The “shadow workspace” feature, for instance, doubles some resource usage (since a hidden VSCode window with language servers is running). Cursor mitigates this by only launching the shadow workspace on demand and tearing it down after some idle time.
Extensibility via MCP – As a forward-looking feature, Cursor supports the Model Context Protocol (MCP). This allows external tools or data sources to be hooked into Cursor’s AI. For example, an MCP plugin could let the AI query your database or fetch documentation from an internal wiki when you ask a question.

In conclusion, Cursor’s engineering marries the capabilities of large language models with the practical tooling of an IDE. Indexing the codebase and using RAG gives the AI a working knowledge of your project. By leveraging the VS Code infrastructure, it provides the AI with compiler/linter feedback and a tight loop for applying changes safely. By orchestrating specialized models and caching, it achieves an impressively responsive user experience. All these layers – from the client UI to the backend model servers – work together to improve the developer experience immensely.

Hence, Cursor is used by more than a million developers!

References

Cursor Documentation and Privacy Policy – Details on codebase indexing (embeddings) and data handling.

Fireworks Blog on Cursor– Cursor’s key features and custom LLM (fast apply) performance stats.

Developer Insights – Overview of Cursor as a VS Code fork with whole-codebase intelligence.

“Semantic Code Search” – Explanation of Cursor’s code chunking, embedding, and RAG approach.

How Cursor Works Internally?
光标内部是如何工作的？

What Cursor Is and What It Does
Cursor 是什么以及它的功能

How Cursor Works Under the Hood
Cursor 的底层工作原理

Client-Side Changes vs VS Code
客户端更改与 VS Code 的区别

LLM Orchestration LLM 编排

Codebase Indexing and Semantic Embeddings

Prompt Construction and Context Management

Performance Optimizations and Custom Tooling

References

Published by Aditya Rohilla

Leave a comment Cancel reply

What Cursor Is and What It DoesCursor 是什么以及它的功能

How Cursor Works Under the HoodCursor 的底层工作原理

Client-Side Changes vs VS Code客户端更改与 VS Code 的区别

LLM Orchestration LLM 编排

Codebase Indexing and Semantic Embeddings

Prompt Construction and Context Management

Performance Optimizations and Custom Tooling

References

Share this:

Related

GenAI Systems Need a Zero‑Trust, Security‑First Mindset

What is Deep learning and Why you should know about it!

Introduction to Data Science with Python

Published by Aditya Rohilla

Leave a comment Cancel reply

What Cursor Is and What It Does
Cursor 是什么以及它的功能

How Cursor Works Under the Hood
Cursor 的底层工作原理

Client-Side Changes vs VS Code
客户端更改与 VS Code 的区别