MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models (Short Version)
MemOS：一种用于大语言模型中记忆增强生成（MAG）的操作系统（简版）

Zhiyu Li*, Shichao Song $^{1, 3, *}$ , Hanyu Wang $^{1, 3, *}$ , Simin Niu $^{3, *}$ , Ding Chen $^{4, *}$ , Jiawei Yang $^{1, 3}$ , Chenyang Xi $^{1}$ , Huayi Lai $^{3}$ , Jihao Zhao $^{3}$ , Yezhaohui Wang $^{1}$ , Junpeng Ren $^{1}$ , Zehao Lin $^{1}$ , Jiahao Huo $^{1}$ , Tianyi Chen $^{2}$ , Kai Chen $^{1}$ , Kehang Li $^{2}$ , Zhiqiang Yin $^{3}$ , Qingchen ${Y u}^{1}$ , Bo Tang $^{1, †}$ , Hongkang Yang $^{1, †}$ , Zhi-Qin John Xu $^{2, †}$ , Feiyu Xiong $^{1, †}$
李志宇*，宋世超 $^{1, 3, *}$ ，王涵宇 $^{1, 3, *}$ ，牛思敏 $^{3, *}$ ，陈丁 $^{4, *}$ ，杨嘉伟 $^{1, 3}$ ，奚晨阳 $^{1}$ ，赖华艺 $^{3}$ ，赵继豪 $^{3}$ ，王业朝辉 $^{1}$ ，任俊鹏 $^{1}$ ，林泽昊 $^{1}$ ，霍嘉豪 $^{1}$ ，陈天一 $^{2}$ ，陈凯 $^{1}$ ，李可航 $^{2}$ ，尹志强 $^{3}$ ，Qingchen ${Y u}^{1}$ ，唐博 $^{1, †}$ ，杨鸿康 $^{1, †}$ ，徐志钦 $^{2, †}$ ，熊飞宇 $^{1, †}$ $^{1}$ MemTensor (Shanghai) Technology Co., Ltd., $^{2}$ Shanghai Jiao Tong University, $^{3}$ Renmin University of China, $^{4}$ Research Institute of China Telecom
$^{1}$ MemTensor（上海）科技有限公司， $^{2}$ 上海交通大学， $^{3}$ 中国人民大学， $^{4}$ 中国电信研究院

Abstract 摘要

Large Language Models (LLMs) have emerged as foundational infrastructure in the pursuit of Artificial General Intelligence (AGI). Despite their remarkable capabilities in language perception and generation, current LLMs fundamentally lack a unified and structured architecture for handling memory. They primarily rely on parametric memory (knowledge encoded in model weights) and ephemeral activation memory (context-limited runtime states). While emerging methods like Retrieval-Augmented Generation (RAG) incorporate plaintext memory, they lack lifecycle management and multi-modal integration, limiting their capacity for long-term knowledge evolution. To address this, we introduce - a memory operating system designed for LLMs that, for the first time, elevates memory to a first-class operational resource. It builds unified mechanisms for representation, organization, and governance across three core memory types: parametric, activation, and plaintext. At its core is the MemCube, a standardized memory abstraction that enables tracking, fusion, and migration of heterogeneous memory, while offering structured, traceable access across tasks and contexts. MemOS establishes a memory-centric execution framework with strong controllability, adaptability, and evolvability. It fills a critical gap in current LLM infrastructure and lays the groundwork for continual adaptation, personalized intelligence, and cross-platform coordination in next-generation intelligent systems.
大型语言模型（LLMs）已成为追求通用人工智能（AGI）的基础性基础设施。尽管当前 LLMs 在语言感知与生成方面展现出卓越能力，但它们在内存处理方面根本缺乏统一且结构化的架构。现有模型主要依赖参数化内存（编码于模型权重中的知识）和短暂激活内存（受上下文限制的运行时状态）。尽管检索增强生成（RAG）等新兴方法引入了明文内存，但它们缺乏生命周期管理和多模态集成，限制了长期知识演化的能力。为了解决这一问题，我们提出了一种专为 LLMs 设计的内存操作系统——首次将内存提升为一等运算资源。该系统为参数化、激活和明文三类核心内存构建了统一的表示、组织与治理机制。其核心组件 MemCube 是一种标准化的内存抽象，能够实现异构内存的追踪、融合与迁移，并在不同任务和上下文间提供结构化、可追溯的访问。MemOS 建立了以内存为中心的执行框架，具备强大的可控性、适应性和可演化性。它填补了当前 LLM 基础设施的关键空白，为下一代智能系统中的持续适应、个性化智能和跨平台协同奠定了基础。

Date: 2025.05.27 日期：2025.05.27
Correspondence: {tangb,yanghk,xuzq,xiongfy}@memtensor.cn
通讯邮箱：{tangb,yanghk,xuzq,xiongfy}@memtensor.cn
Author Legend: *Co-equal primary author,

†

Correspondence
作者标注：*共同第一作者，

†

通讯作者

1 Introduction 1 引言

Large Language Models (LLMs) are emerging as a foundational pathway toward Artificial General Intelligence (AGI) [39], yet they remain fundamentally limited in supporting robust memory capabilities. Most current architectures rely on implicit parametric memory-knowledge embedded within massive model weights-which is difficult to interpret [37], update [20], or transfer [13]. Although Retrieval-Augmented Generation (RAG)
大型语言模型（LLMs）正逐步成为通向通用人工智能（AGI）的基础途径 [39]，但在支持强大记忆能力方面仍存在根本性限制。目前大多数架构依赖于隐式参数记忆——即嵌入在庞大模型权重中的知识——这不仅难以解释 [37]，也难以更新 [20] 或迁移 [13]。尽管检索增强生成（RAG）
incorporates external knowledge sources

[3, 8, 10, 11, 38]

, it effectively serves as an ad hoc textual patch and lacks a structured, unified mechanism for memory management. These architectural shortcomings lead to four critical issues in real-world applications: inability to model long-term and multi-turn conversational states; poor adaptability to evolving knowledge; lack of persistent modeling for user preferences and multi-agent workflows; and the emergence of “memory silos” across platforms, hindering the reuse and migration of prior interactions. At the root of these challenges lies a fundamental oversight: current LLMs do not treat memory as an explicit, schedulable, and governable resource.
引入了外部知识源

[3, 8, 10, 11, 38]

，但实际上它只是作为一种临时的文本补丁，缺乏结构化、统一的记忆管理机制。这些架构上的不足导致了现实应用中的四个关键问题：无法建模长期和多轮对话状态；对不断变化的知识适应性差；缺乏对用户偏好和多智能体工作流的持久化建模；以及在各平台间出现“记忆孤岛”，阻碍了先前交互的复用和迁移。这些挑战的根本原因在于：当前的 LLMs 并未将记忆视为一种显式、可调度、可治理的资源。

To address this, we propose - a memory operating system designed for large language models. MemOS centers memory units as operational resources and establishes a full lifecycle encompassing memory generation, organization, utilization, and evolution. It offers structured representations, unified interfaces, version control, and access governance to overcome systemic limitations in memory handling. Rather than merely extending the RAG paradigm, MemOS introduces a controllable, adaptable, and evolvable memory infrastructure that empowers LLMs to track knowledge updates, internalize user preferences, and maintain behavioral consistency across platforms. This represents a fundamental shift in language model architecture: from systems that merely perceive and generate to those that remember, adapt, and grow over time.
为了解决这一问题，我们提出了——一种为大型语言模型设计的记忆操作系统。MemOS 将记忆单元作为可操作资源，并建立了涵盖记忆生成、组织、利用和演化的完整生命周期。它提供结构化表示、统一接口、版本控制和访问治理，以克服记忆处理中的系统性局限。MemOS 不仅仅是对 RAG 范式的扩展，而是引入了一套可控、可适应、可演化的记忆基础设施，使 LLMs 能够追踪知识更新、内化用户偏好，并在各平台间保持行为一致性。这标志着语言模型架构的根本转变：从仅能感知和生成的系统，迈向能够记忆、适应并随时间成长的系统。

2 Memory in Large Language Models
2 LLMs 中的记忆

Figure 1 Memory (Mem) in LLMs.
图 1 LLMs 中的记忆（Mem）。
Research into LLM memory has progressed through three major stages (see Figure 1).
对 LLM 记忆的研究经历了三个主要阶段（见图 1）。
The first is the Memory Definition and Exploration stage, in which researchers classify and analyze memory mechanisms along dimensions such as parametric vs. non-parametric and short-term vs. long-term memory [7, 23, 30]. For implicit memory, pre-training and adapter-based methods embed knowledge directly into model weights, while knowledge editing techniques enable targeted post hoc modifications

[1, 2, 5, 9, 14, 19, 24, 26, 32]

. KV-caches and hidden states constitute the core of implicit short-term memory, preserving contextual continuity and guiding generation behavior during inference [6,16,25,27,28]. Explicit short-term memory typically involves prompt concatenation within the context window, but remains limited by context length constraints [18, 21]. Explicit long-term memory leverages external retrieval mechanisms, increasingly adopting structured formats-such as graphs and trees-to improve semantic integration and retrieval efficiency [8, 15, 31, 35].
第一个阶段是记忆定义与探索阶段，研究者从参数化与非参数化、短期记忆与长期记忆等维度对记忆机制进行分类与分析[7, 23, 30]。对于隐式记忆，预训练和基于 adapter 的方法将知识直接嵌入模型权重中，而知识编辑技术则支持有针对性的事后修改

[1, 2, 5, 9, 14, 19, 24, 26, 32]

。KV-caches 和隐藏状态构成了隐式短期记忆的核心，在推理过程中保持上下文连续性并引导生成行为[6,16,25,27,28]。显式短期记忆通常通过在上下文窗口内拼接 prompt 实现，但仍受限于上下文长度[18, 21]。显式长期记忆则利用外部检索机制，并越来越多地采用结构化格式（如图和树）以提升语义整合与检索效率[8, 15, 31, 35]。
The second stage involves the Emergence of Human-like Memory, where systems optimized for long-term persistence, context awareness, and self-reflection begin to exhibit structural and behavioral patterns reminiscent of human memory. Examples include brain-inspired architectures such as HippoRAG and Memory

^{3}

[12, 34], as well as systems like PGRAG and Second-Me [17, 29], which support behavior continuity and personalized memory modeling.
第二阶段涉及类人记忆的出现，此时针对长期持久性、上下文感知和自我反思进行优化的系统开始展现出类似于人类记忆的结构和行为模式。相关示例包括受大脑启发的架构，如 HippoRAG 和 Memory

^{3}

[12, 34]，以及支持行为连续性和个性化记忆建模的系统，如 PGRAG 和 Second-Me [17, 29]。

The third stage advances toward Systematic Memory Management, integrating tool-based operations with OS-inspired governance frameworks. This includes toolkits such as EasyEdit and Mem0, which support
第三阶段则迈向系统化记忆管理，将基于工具的操作与类操作系统的治理框架相结合。这包括如 EasyEdit 和 Mem0 等工具包，支持
explicit memory manipulation [4, 33, 36], as well as systems like Letta [22], which implement paged context management and modular invocation. However, these systems still fall short of providing unified scheduling, lifecycle governance, and memory fusion across roles or agents.
显式记忆操作 [4, 33, 36]，以及实现分页上下文管理和模块化调用的 Letta 系统 [22]。然而，这些系统在跨角色或代理实现统一调度、生命周期治理和记忆融合方面仍有所不足。

3 MemOS Design Philosophy
3 MemOS 设计理念

As AGI continues to evolve into increasingly complex systems characterized by multi-tasking, multi-role collaboration, and multi-modality, language models must move beyond merely “understanding the world”-they must also “accumulate experience,” “retain memory,” and “continuously evolve.” However, prevailing architectures remain anchored in static parameters and lack structured modeling and unified management of memory, rendering them inadequate for supporting knowledge updates, state retention, and personalized adaptation. We propose that treating memory as a first-class resource and building a memory-centric execution paradigm is key to enabling continual adaptation and long-term reasoning in future LLMs.
随着 AGI 不断演化为具有多任务、多角色协作和多模态特征的日益复杂的系统，语言模型必须超越单纯的“理解世界”——它们还需要“积累经验”、“保留记忆”以及“持续进化”。然而，现有的架构仍然停留在静态参数上，缺乏对记忆的结构化建模和统一管理，因而无法有效支持知识更新、状态保持和个性化适应。我们提出，应将记忆视为一等资源，构建以记忆为中心的执行范式，这是未来 LLMs 实现持续适应和长期推理的关键。

As shown in Figure 2, traditional scaling laws are approaching diminishing returns. The research paradigm is shifting from data- and parameter-centric pretraining to post-training paradigms focused on alignment and fine-tuning. Yet even this refined approach faces dual challenges: diminishing performance gains and increasing engineering complexity. We posit that the next fundamental leap will arise from the ability to continuously model and schedule memory-enabling LLMs to maintain contextual consistency, adapt to evolving knowledge, and support iterative refinement across tasks.
如图 2 所示，传统的扩展法则正逐渐接近收益递减。研究范式正在从以数据和参数为中心的预训练，转向以对齐和微调为核心的后训练范式。然而，即便是这种改进后的方法也面临着双重挑战：性能提升的递减和工程复杂性的增加。我们认为，下一个根本性的飞跃将源于持续建模和调度记忆的能力——使 LLMs 能够保持上下文一致性，适应不断变化的知识，并支持跨任务的迭代优化。
To this end, we introduce MemOS-a prototype system designed to support a new memory-centric training paradigm, where learning and inference are no longer separate phases but part of a unified, memory-driven process. MemOS not only enables structured memory storage, interface-level invocation, and lifecycle management, but also provides unified scheduling and version control mechanisms that constitute the foundational infrastructure for sustainable intelligence evolution. In our design vision, MemOS treats memory as a schedulable core resource, breaking down silos between agents, users, applications, and sessions. It adopts evolution as a central management objective - supporting memory recomposition, migration, and fusion to facilitate long-term capability growth. Simultaneously, governance is a foundational pillar: MemOS integrates access control, traceability, and interpretability mechanisms to ensure safe and compliant model
为此，我们引入了 MemOS——一个旨在支持全新以记忆为中心的训练范式的原型系统，在该范式下，学习与推理不再是分离的阶段，而是统一的、以记忆驱动的过程的一部分。MemOS 不仅支持结构化记忆存储、接口级调用和生命周期管理，还提供统一的调度与版本控制机制，构建了可持续智能进化的基础设施。在我们的设计愿景中，MemOS 将记忆视为可调度的核心资源，打破了智能体、用户、应用和会话之间的壁垒。它以进化为核心管理目标——支持记忆的重组、迁移与融合，促进长期能力增长。同时，治理是基础支柱：MemOS 集成了访问控制、可追溯性和可解释性机制，以确保模型在复杂环境中的安全合规运行。

Figure 2 The next leap in model capability evolution hinges on the introduction of memory systems, marking a paradigm shift toward “memory training”. operation in complex environments.
图 2 模型能力进化的下一个飞跃依赖于记忆系统的引入，标志着向“记忆训练”范式转变的重大变革。

4 MemOS

4.1 Types of Memory in MemOS
4.1 MemOS 中的记忆类型

In MemOS, memory is not merely a container of knowledge, but serves as the continuous substrate for perception, understanding, and action within the model. To systematically support LLMs in evolving across diverse tasks and scenarios, MemOS classifies memory into three core types: Parametric Memory, Activation Memory, and Plaintext Memory. Each type differs in its representation, lifecycle, and invocation mechanism, collectively forming the multi-layered structure of an intelligent agent’s cognitive system.
在 MemOS 中，记忆不仅仅是知识的容器，更是模型中感知、理解与行动的持续基底。为了系统性地支持 LLMs 在多样任务和场景中的演化，MemOS 将记忆分为三种核心类型：参数记忆、激活记忆和明文记忆。每种类型在表示方式、生命周期和调用机制上各不相同，共同构成智能体认知系统的多层结构。
Parametric Memory refers to long-term knowledge encoded directly into model weights through pretraining or fine-tuning, embedded within feedforward and attention layers. It can participate in inference without the need for external retrieval. This memory type underpins fundamental language understanding, general knowledge, and skill modules - serving as the backbone for zero-shot generation and capability-driven agents.
参数记忆是指通过预训练或微调直接编码进模型权重的长期知识，嵌入于前馈层和注意力层中。它可以在推理时无需外部检索直接参与。这类记忆支撑着基础的语言理解、通用知识和技能模块——为零样本生成和能力驱动型智能体提供骨干支持。

MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models (Short Version) MemOS：一种用于大语言模型中记忆增强生成（MAG）的操作系统（简版）