\useunder \use 下

\ul

¹¹footnotetext: , *: These authors contributed equally to this work.
， *：这些作者对这项工作的贡献相同。²²footnotetext: ,

\dagger

: Corresponding author.
，

\dagger

：通讯作者。

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation
KAG：通过知识增强生成提升专业领域的 LLM

Lei Liang^∗,1, Mengshu Sun^∗,1, Zhengke Gui^∗,1, Zhongshu Zhu¹, Ling Zhong¹, Peilong Zhao¹,
Zhouyu Jiang¹, Yuan Qu¹, Zhongpu Bo¹, Jin Yang¹, Huaidong Xiong¹, Lin Yuan¹, Jun Xu¹,
Zaoyang Wang¹, Zhiqiang Zhang¹, Wen Zhang², Huajun Chen², Wenguang Chen¹, Jun Zhou^†,1

{leywar.liang, mengshu.sms, zhengke.gzk, jun.zhoujun}@antgroup.com
¹Ant Group Knowledge Graph Team, ²Zhejiang University Github:https://github.com/OpenSPG/KAG

(September 26, 2024) （9月 26， 2024）

Abstract 抽象

The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications. However, it also has limitations, including the gap between vector similarity and the relevance of knowledge reasoning, as well as insensitivity to knowledge logic, such as numerical values, temporal relations, expert rules, and others, which hinder the effectiveness of professional knowledge services. In this work, we introduce a professional domain knowledge service framework called Knowledge Augmented Generation (KAG). KAG is designed to address the aforementioned challenges with the motivation of making full use of the advantages of knowledge graph(KG) and vector retrieval, and to improve generation and reasoning performance by bidirectionally enhancing large language models (LLMs) and KGs through five key aspects: (1) LLM-friendly knowledge representation, (2) mutual-indexing between knowledge graphs and original chunks, (3) logical-form-guided hybrid reasoning engine, (4) knowledge alignment with semantic reasoning, and (5) model capability enhancement for KAG. We compared KAG with existing RAG methods in multi-hop question answering and found that it significantly outperforms state-of-the-art methods, achieving a relative improvement of 19.6% on hotpotQA and 33.5% on 2wiki in terms of F1 score. We have successfully applied KAG to two professional knowledge Q&A tasks of Ant Group, including E-Government Q&A and E-Health Q&A, achieving significant improvement in professionalism compared to RAG methods. Furthermore, we will soon natively support KAG on the open-source KG engine OpenSPG, allowing developers to more easily build rigorous knowledge decision-making or convenient information retrieval services. This will facilitate the localized development of KAG, enabling developers to build domain knowledge services with higher accuracy and efficiency.
最近开发的检索增强生成（RAG）技术实现了特定域应用程序的高效构建。但是，它也有局限性，包括向量相似性与知识推理相关性之间的差距，以及对知识逻辑的不敏感，如数值、时间关系、专家规则等，这些都阻碍了专业知识服务的有效性。在这项工作中，我们引入了一个专业的领域知识服务框架，称为知识增强一代（KAG）。KAG 旨在解决上述挑战，以充分利用知识图谱（KG）和向量检索的优势为动机，并通过五个关键方面双向增强大型语言模型（LLM）和 KG 来提高生成和推理性能：（1） LLM 友好的知识表示，（2）知识图谱和原始块之间的互索引，（3）逻辑形式引导的混合推理引擎，（4）知识与语义推理的一致性，以及（5） KAG 的模型能力增强。我们在多跳问答中将 KAG 与现有的 RAG 方法进行了比较，发现它的性能明显优于最先进的方法，在 F1 分数方面，在 hotpotQA 上实现了 33.6% 的相对改进，在 2wiki 上实现了 33.5% 的相对改进。我们已成功将 KAG 应用于蚂蚁集团的两项专业知识问答任务，包括电子政务问答和电子健康问答，与 RAG 方法相比，专业性有了显著提高。此外，我们很快将在开源 KG 引擎 OpenSPG 上原生支持 KAG，使开发人员能够更轻松地构建严谨的知识决策或便捷的信息检索服务。这将有利于 KAG 的本地化开发，使开发者能够以更高的准确性和效率构建领域知识服务。

1 Introduction 1 介绍

Recently, the rapidly advancing Retrieval-Augmented Generation (RAG)[1, 2, 3, 4, 5] technology has been instrumental in equipping Large Language Models (LLMs) with the capability to acquire domain-specific knowledge. This is achieved by leveraging external retrieval systems, thereby significantly reducing the occurrence of answer hallucinations and allows for the efficient construction of applications in specific domains. In order to enhance the performance of the RAG system in multi-hop and cross-paragraph tasks, knowledge graph, renowned for strong reasoning capabilities, have been introduced into the RAG technical framework, including GraphRAG[6], DALK[7], SUGRE[8], ToG 2.0[9], GRAG[10], GNN-RAG [11] and HippoRAG[12].
最近，快速发展的检索增强生成（RAG）[1， 2， 3， 4， 5] 技术有助于使大型语言模型（LLM）具备获取特定领域知识的能力。这是通过利用外部检索系统来实现的，从而显着减少了答案幻觉的发生，并允许在特定领域中高效构建应用程序。为了提高 RAG 系统在多跳和跨段落任务中的性能，以强大的推理能力而闻名的知识图谱已被引入 RAG 技术框架，包括 GraphRAG[6]、DALK[7]、SUGRE[8]、ToG 2.0[9]、GRAG[10]、GNN-RAG [11] 和 HippoRAG[12]。

Although RAG and its optimization have solved most of the hallucination problems caused by a lack of domain-specific knowledge and real-time updated information, the generated text still lacks coherence and logic, rendering it incapable of producing correct and valuable answers, particularly in specialized domains such as law, medicine, and science where analytical reasoning is crucial. This shortcoming can be attributed to three primary reasons. Firstly, real-world business processes typically necessitate inferential reasoning based on the specific relationships between pieces of knowledge to gather information pertinent to answering a question. RAG, however, commonly relies on the similarity of text or vectors for retrieving reference information, which may lead to incomplete and repeated search results. secondly, real-world processes often involve logical or numerical reasoning, such as determining whether a set of data increases or decreases in a time series, and the next token prediction mechanism used by language models is still somewhat weak in handling such problems.
尽管 RAG 及其优化已经解决了由于缺乏特定领域的知识和实时更新信息而导致的大部分幻觉问题，但生成的文本仍然缺乏连贯性和逻辑性，使其无法产生正确和有价值的答案，尤其是在法律、医学和科学等分析推理至关重要的专业领域。这个缺点可以归因于三个主要原因。首先，现实世界的业务流程通常需要基于知识之间的特定关系进行推理，以收集与回答问题相关的信息。但是，RAG 通常依赖于文本或向量的相似性来检索参考信息，这可能会导致搜索结果不完整和重复。其次，现实世界的过程通常涉及逻辑或数字推理，例如确定一组数据在时间序列中是增加还是减少，而语言模型使用的下一个标记预测机制在处理此类问题方面仍然有些薄弱。

In contrast, the technical methodologies of knowledge graphs can be employed to address these issues. Firstly, KG organize information using explicit semantics; the fundamental knowledge units are SPO triples, comprising entities and the relationships between them[13]. Entities possess clear entity types, as well as relationships. Entities with the same meaning but expressed differently can be unified through entity normalization, thereby reducing redundancy and enhancing the interconnectedness of knowledge [14]. During retrieval, the use of query syntax (such as SPARQL[15] and SQL[16]) enables the explicit specification of entity types, mitigating noisy from same named or similar entities, and allows for inferential knowledge retrieval by specifying relationships based on query requirements, as opposed to aimlessly expanding into similar yet crucial neighboring content. Meanwhile, since the query results from knowledge graphs have explicit semantics, they can be used as variables with specific meanings. This enables further utilization of the LLM’s planning and function calling capabilities [17], where the retrieval results are substituted as variables into function parameters to complete deterministic inferences such as numerical computations and set operations.
相比之下，可以采用知识图谱的技术方法来解决这些问题。首先，KG 使用显式语义组织信息;基本知识单元是 SPO 三元组，由实体和它们之间的关系组成 [13]。实体具有明确的实体类型以及关系。具有相同含义但表达方式不同的实体可以通过实体规范化进行统一，从而减少冗余并增强知识的互连性 [14]。在检索过程中，使用查询语法（例如 SPARQL[15] 和 SQL[16]）可以显式指定实体类型，减轻来自相同名称或类似实体的干扰，并允许通过根据查询要求指定关系来进行推理知识检索，而不是漫无目的地扩展到相似但关键的相邻内容。同时，由于知识图谱的查询结果具有明确的语义，因此可以作为具有特定含义的变量使用。这使得进一步利用 LLM 的规划和函数调用功能 [17]，其中检索结果作为变量替换为函数参数，以完成确定性推理，例如数值计算和集合运算。

To address the above challenges and meet the requirements of professional domain knowledge services, we propose Knowledge Augmented Generation(KAG), which fully leverages the complementary characteristics of KG and RAG techniques. More than merely integrating graph structures into the knowledge base process, it incorporates the semantic types and relationships of knowledge graph and the commonly used Logical Forms from KGQA (Knowledge Graph Question Answering) into the retrieval and generation process. As shown in Figure 1, this framework involves the optimization of the following five modules:
为了应对上述挑战并满足专业领域知识服务的要求，我们提出了知识增强一代（KAG），它充分利用了 KG 和 RAG 技术的互补特性。它不仅仅是将图结构集成到知识库过程中，还将知识图谱的语义类型和关系以及 KGQA（知识图谱问答）中常用的逻辑形式整合到检索和生成过程中。如图 1 所示，该框架涉及以下 5 个模块的优化：

•

We proposed a LLM friendly knowledge representation framework LLMFriSPG. We refer to the hierarchical structure of data, information, and knowledge of DIKW to upgrade SPG to be friendly to LLMs, named LLMFriSPG, to make it compatible with schema-free information extraction and schema-constrained expert knowledge construction on the same knowledge type (such as entity type, event type), and supports the mutual-indexing representation between graph structure and original text chunks, which facilitates the construction of graph-structure-based inverted index and facilitates the unified representation, reasoning, and retrieval of logical form.

• 我们提出了一个 LLM 友好的知识表示框架 LLMFriSPG。我们参考 DIKW 的数据、信息、知识的层次结构升级 SPG 对 LLM 友好，命名为 LLMFriSPG，使其兼容同一知识类型（如实体类型、事件类型）上的无模式信息提取和模式约束的专家知识构建，并支持图结构和原始文本块之间的互标表示，便于构建基于图结构的倒排索引，便于逻辑形式的统一表示、推理和检索。
•

We proposed a logical-form-guided hybrid solving and reasoning engine. It includes three types of operators: planning, reasoning and retrieval, transforming natural language questions into a problem-solving process that combines language and symbols. Each step in the process can utilize different operators such as exact match retrieval, text retrieval, numerical computation, or semantic reasoning, thereby achieving the integration of four distinct problem-solving processes: retrieval, KG reasoning, language reasoning, and numerical computation.

• 我们提出了一个逻辑形式引导的混合求解和推理引擎。它包括三种类型的运算符：规划、推理和检索，将自然语言问题转化为结合语言和符号的问题解决过程。该过程中的每个步骤都可以使用不同的运算符，例如精确匹配检索、文本检索、数值计算或语义推理，从而实现检索、KG 推理、语言推理和数值计算四个不同的问题解决过程的集成。
•

We proposed a knowledge alignment approach based on semantic reasoning. Define domain knowledge as various semantic relations such as synonyms, hypernyms, and inclusions. Semantic reasoning is performed in both offline KG indexing and online retrieval phases, allowing fragmented knowledge generated through automation to be aligned and connected through domain knowledge. In the offline indexing phase, it can improve the standardization and connectivity of knowledge, and in the online Q&A phase, it can serve as a bridge between user questions and indexing accurately.

• 我们提出了一种基于语义推理的知识对齐方法。将领域知识定义为各种语义关系，例如同义词、上位词和包含内容。语义推理在离线 KG 索引和在线检索阶段进行，允许通过自动化生成的碎片化知识通过领域知识进行对齐和连接。在离线索引阶段，可以提高知识的标准化和连通性，在在线问答阶段，它可以作为用户问题和准确索引之间的桥梁。
•

We proposed a model for KAG. To support the capabilities required for the operation of the KAG framework, such as index construction, retrieval, question understanding, semantic reasoning, and summarization, we enhance the three specific abilities of general LLMs: Natural Language Understanding (NLU), Natural Language Inference (NLI), and Natural Language Generation (NLG) to achieve better performance in each functional module.

• 我们提出了一个 KAG 模型。为了支持 KAG 框架运行所需的能力，例如索引构建、检索、问题理解、语义推理和摘要，我们增强了通用 LLM 的三个特定能力：自然语言理解（NLU）、自然语言推理（NLI）和自然语言生成（NLG），以实现每个功能模块的更好性能。

We evaluated the effectiveness of the system on three complex Q&A datasets: 2WikiMultiHopQA[18], MuSiQue[19] and HotpotQA[20]. The evaluation focused on both end-to-end Q&A performance and retrieval effectiveness. Experimental results showed that compared to HippoRAG[12], KAG achieved significant improvements across all three tasks, with F1 scores increasing by 19.6%, 12.2% and 12.5% respectively. Furthermore, retrieval metrics also showed notable enhancements.
我们在三个复杂的问答数据集上评估了该系统的有效性：2WikiMultiHopQA[18]、MuSiQue[19] 和 HotpotQA[20]。评估的重点是端到端 Q&A 性能和检索有效性。实验结果表明，与 HippoRAG[12] 相比，KAG 在所有三项任务中都取得了显着改善，F1 分数分别提高了 19.6%、12.2% 和 12.5%。此外，检索指标也显示出显著的增强。

KAG is applied in two professional Q&A scenarios within Ant Group: E-Government and E-Health. In the E-Government scenario, it answers users’ questions about administrative processes based on a given repository of documents. For E-Health, it responds to inquiries related to diseases, symptoms, treatments, utilizing the provided medical resources. Practical application results indicate that KAG achieves significantly higher accuracy than traditional RAG methods, thereby enhancing the credibility of Q&A applications in professional fields. We will soon natively support KAG on the open source KG engine OpenSPG, allowing developers to more easily build rigorous knowledge decision-making or convenient information retrieval services.
KAG 应用于蚂蚁集团内部的两个专业问答场景：电子政务和电子健康。在电子政务方案中，它根据给定的文档存储库回答用户有关管理流程的问题。对于 E-Health，它利用提供的医疗资源响应与疾病、症状、治疗相关的查询。实际应用结果表明，KAG 的准确率明显高于传统的 RAG 方法，从而提高了 Q&A 应用在专业领域的可信度。我们很快将在开源 KG 引擎 OpenSPG 上原生支持 KAG，使开发人员能够更轻松地构建严谨的知识决策或便捷的信息检索服务。

In summary, we propose a knowledge-augmented technical framework, KAG, targeting professional question-answering scenarios and validate the effectiveness of this framework based on complex question-answering tasks. We present two industry application cases based on Ant Group’s business scenarios and have open-sourced the code to assist developers in building local applications using KAG.
总之，我们提出了一个针对专业问答场景的知识增强技术框架 KAG，并基于复杂的问答任务验证了该框架的有效性。我们基于蚂蚁集团的业务场景，展示了两个行业应用案例，并开源了代码，以协助开发者使用 KAG 构建本地应用。

2 Approach 阿拉伯数字方法

In this section, we will first introduce the overall framework of KAG, and then discuss five key enhancements in sections 2.1 to 2.5. As shown in Figure 1, the KAG framework consists of three parts: KAG-Builder, KAG-Solver, and KAG-Model. The KAG-Builder is designed for building offline indexes, in this module, we proposed a LLM Friendly Knowledge Representation framework and mutual-indexing between knowledge structure and text chunk. In the module KAG-Solver we introduced a Logical-form-guided hybrid reasoning solver that integrates LLM reasoning, knowledge reasoning, and mathematical logic reasoning. Additionally, knowledge alignment by semantic reasoning is used to enhance the accuracy of knowledge representation and retrieval in both KAG-Builder and KAG-Solver. The KAG-Model optimizes the capabilities needed by each module based on a general language model, thereby improving the performance of all modules.
在本节中，我们将首先介绍 KAG 的整体框架，然后讨论 2.1 至 2.5 节中的五个关键增强功能。如图 1 所示，KAG 框架由三个部分组成：KAG-Builder、KAG-Solver 和 KAG-Model。KAG-Builder 专为构建离线索引而设计，在本模块中，我们提出了一个 LLM 友好知识表示框架以及知识结构和文本块之间的相互索引。在 KAG-Solver 模块中，我们引入了一个逻辑形式引导的混合推理求解器，它集成了 LLM 推理、知识推理和数理逻辑推理。此外，通过语义推理进行知识对齐用于提高 KAG-Builder 和 KAG-Solver 中知识表示和检索的准确性。KAG 模型基于通用语言模型优化每个模块所需的能力，从而提高所有模块的性能。

Refer to caption — Figure 1: The KAG Framework. The left side shows KAG-Builder, while the right side displays KAG-Solver. The gray area at the bottom of the image represents KAG-Model.
图 1： KAG 框架。左侧显示 KAG-Builder，而右侧显示 KAG-Solver。图像底部的灰色区域表示 KAG 模型。

2.1 LLM Friendly Knowledge Representation
2.1 LLM 友好的知识表示

In order to define a more friendly knowledge semantic representation for LLMs, we upgrade SPG from three aspects: deep text-context awareness, dynamic properties and knowledge stratification, and name it LLMFriSPG.
为了给 LLM 定义一个更友好的知识语义表示，我们从深度文本上下文感知、动态属性和知识分层三个方面对 SPG 进行了升级，并将其命名为 LLMFriSPG。

$\mathcal{M}$ $=$ $\{\mathcal{T,\rho,C,L}\}$

where, $\mathcal{M}$ represents all types defined in LLMFriSPG, $\mathcal{T}$ represents all EntityType(e.g., Person in Figure 2), EventType classes and all pre-defined properties that are compatible with LPG syntax declarations. $\mathcal{C}$ represents all ConceptType classes, concepts and concept relations, it is worth noting that the root node of each concept tree is a ConceptType class that is compatible with LPG syntax(e.g., TaxoOfPerson in Figure 2.), each concept node has a unique ConceptType class. $\mathcal{\rho}$ represents the inductive relations from instances to conecepts. $\mathcal{L}$ represents all executable rules defined on logical relations and logical concepts. For $\mathcal{\forall}{t}\in\mathcal{T}$ :
其中， $\mathcal{M}$ 表示 LLMFriSPG 中定义的所有类型， $\mathcal{T}$ 表示所有 EntityType（例如图 2 中的 Person）、EventType 类以及与 LPG 语法声明兼容的所有预定义属性。 $\mathcal{C}$ 表示所有的 ConceptType 类、概念和概念关系，值得注意的是，每个概念树的根节点是一个与 LPG 语法兼容的 ConceptType 类（例如图 2 中的 TaxoOfPerson），每个概念节点都有一个唯一的 ConceptType 类。 $\mathcal{\rho}$ 表示从实例到锥体的归纳关系。 $\mathcal{L}$ 表示在逻辑关系和逻辑概念上定义的所有可执行规则。适用 $\mathcal{\forall}{t}\in\mathcal{T}$ ：

${p_{t}}=\{p_{t}^{c},{p_{t}}^{f},p_{t}^{b}\}$

As is show in Figure 2, where, $p_{t}$ represents all properties and relations of type $t$ , and $p_{t}^{c}$ represents the domain experts pre-defined part, $p_{t}^{f}$ represents the part added in an ad-hoc manner, $p_{t}^{b}$ represents the system built-in properties, such as supporting_chunks, descripiton, summary and belongTo. For any instance $e_{i}$ , denote $typeof(e_{i})$ as $t_{k}$ , and supporting_chunks represents the set of all text chunks containing instance ${e_{i}}$ , the user defines the chunk generation strategy and the maximum length of the chunk in KAG builder phase, description represents the general descriptive information specific to class ${t_{k}}$ . It is worth noting that the meaning of description added to the type ${t_{k}}$ and the instance ${e_{i}}$ is different, when description is attached to ${t_{k}}$ , it signifies the global description for that type. Conversely, when it is associated with an instance ${e_{i}}$ , it represents the general descriptive information for ${e_{i}}$ consistent with the orignal document context, description can effectively assist LLM in understanding the precise meaning of a specific instance or type, and can be used in tasks such as information extraction, entity linking, and summary generation. summary represents the summary of ${e_{i}}$ or ${r_{j}}$ in the original document context. belongTo represents the inductive semantics from instance to concept. Each EntityType or EventType can be associated with a ConceptType through belongTo. It is worth noting that, 1) $\mathcal{T}$ and $\mathcal{C}$ have different functions. The statement ${t}$ adopts the object-oriented principle to better match the representation of the LPG[21], and $\mathcal{C}$ is managed by a text-based concept tree. This article will not introduce the SPG semantics in detail. 2) $p_{t}^{c}$ and $p_{t}^{f}$ can be instantiated separately. That is, they share the same class declaration, but in the instance storage space, pre-defined static properties and realtime-added dynamic properties can coexist, and we also support instantiating only one of them. This approach can better balance the application scenarios of professional decision-making and information retrieval. General information retrieval scenarios mainly instantiate dynamic properties, while professional decision-making application scenarios mainly instantiate static properties. Users can strike a balance between ease of use and professionalism based on business scenario requirements. 3) $p_{t}^{c}$ and $p_{t}^{f}$ share the same conceptual terminology. Concepts are general common sense knowledge that is independent of specific documents or instances. Different instances are linked to the same concept node to achieve the purpose of classifying the instances. We can achieve semantic alignment between LLM and instances through concept graphs, and concepts can also be used as navigation for knowledge retrieval. the details are shown in section 2.4 and 2.3.
如图 2 所示，其中， $p_{t}$ 表示类型 $t$ 的所有属性和关系， $p_{t}^{c}$ 表示领域专家预定义的部分， $p_{t}^{f}$ 表示以特殊方式添加的部分， $p_{t}^{b}$ 表示系统内置属性，例如 supporting_chunks、 descripiton、 summary 和 belongTo 。对于任何实例 $e_{i}$ ，表示 $typeof(e_{i})$ 为 $t_{k}$ ，supporting_chunks 表示包含实例 ${e_{i}}$ 的所有文本块的集合，用户定义块生成策略和块的最大长度 KAG 构建器阶段， description 表示特定于类 ${t_{k}}$ 的一般描述性信息。值得注意的是，添加到类型和 ${t_{k}}$ 实例的 ${e_{i}}$ description 的含义不同，当 description 附加到 ${t_{k}}$ 时，它表示该类型的全局描述。相反，当它与实例 ${e_{i}}$ 关联时，它表示与原始文档上下文 ${e_{i}}$ 一致的一般描述性信息，描述可以有效地帮助 LLM 理解特定实例或类型的确切含义，并可用于信息提取、实体链接和摘要生成等任务。summary 表示原始文档上下文中的 ${e_{i}}$ OR ${r_{j}}$ 摘要。belongTo 表示从实例到概念的归纳语义。每个 EntityType 或 EventType 都可以通过 belongTo 与 ConceptType 相关联。值得注意的是，1） $\mathcal{T}$ 并且 $\mathcal{C}$ 具有不同的功能。该语句 ${t}$ 采用面向对象原则，以更好地匹配 LPG 的表示[21]，并 $\mathcal{C}$ 由基于文本的概念树管理。本文不会详细介绍 SPG 语义。2） ptcsuperscriptsubscriptptcp_{t}^{c}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT 和 ptfsuperscriptsubscriptptfp_{t}^{f}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT 可以单独实例化。也就是说，它们共享相同的类声明，但在实例存储空间中，预定义的静态属性和实时添加的动态属性可以共存，我们也支持只实例化其中一个。这种方法可以更好地平衡专业决策和信息检索的应用场景。通用信息检索场景主要实例化动态属性，而专业决策应用场景主要实例化静态属性。用户可以根据业务场景需求，在易用性和专业性之间取得平衡。3） ptcsuperscriptsubscriptptcp_{t}^{c}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT 和 ptfsuperscriptsubscriptptfp_{t}^{f}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT 具有相同的概念术语。概念是独立于特定文档或实例的一般常识性知识。将不同的实例链接到同一个概念节点，以达到对实例进行分类的目的。我们可以通过概念图实现 LLM 和实例之间的语义对齐，概念也可以作为知识检索的导航。详细信息显示在 Section 2.4 和 2.3 中。

In order to more accurately define the hierarchical representation of information and knowledge, as shown in 3, we denote $KG_{cs}$ as knowledge layer, which represents the domain knowledge that complies with the domain schema constraints and has been summarized, integrated, and evaluated. denote $KG_{fr}$ as graph information layer, which represents the graph data such as entities and relations obtained through information extraction. denote $RC$ as raw chunks layer, which represents the original document chunks after semantic segmentation. the $KG_{cs}$ layer fully complies with the SPG semantic specification and supports knowledge construction and logical rule definition with strict schema constraints, SPG requires that domain knowledge must have pre-defined schema constraints. It has high knowledge accuracy and logical rigor. However, due to its heavy reliance on manual annotation, the labor cost of construction is relatively high and the information completeness is insufficient. $KG_{fr}$ shares the same EntityTypes, Eventtypes and Conceptual system with $KG_{cs}$ , and provides effective information supplement for $KG_{cs}$ . Meanwhile, the supporting_chunks, summary, and description edges built between $KG_{fr}$ and $RC$ form an inverted index based on graph structure, making $RC$ an effective original-text-context supplement for $KG_{fr}$ and with high information completeness. As is show in the right part of figure 3, in a specific domain application, $R(KG_{cs})$ , $R(KG_{fr})$ , and $R(RC)$ respectively represent their knowledge coverage in solving the target domain problems. If the application has higher requirements for knowledge accuracy and logic rigorousness, it is necessary to build more domain structured knowledge and consume more expert manpower to increase the coverage of $R(KG_{cs})$ . On the contrary, if the application has higher requirements for retrieval efficiency and a certain degree of information loss or error tolerance, it is necessary to increase the coverage of $R(KG_{fr})$ to fully utilize KAG’s automated knowledge construction capabilities and reduce expert manpower consumption.
为了更准确地定义信息和知识的层次表示，如 3 所示，我们表示 $KG_{cs}$ 为知识层，它表示符合领域架构约束并经过总结、集成和评估的领域知识。表示 $KG_{fr}$ 为图信息层，表示通过信息提取得到的实体、关系等图数据。表示 $RC$ 为 Raw Chunks 层，表示语义分割后的原始文档 Chunks。该 $KG_{cs}$ 层完全遵循 SPG 语义规范，并支持具有严格 schema 约束的知识构建和逻辑规则定义，SPG 要求领域知识必须具有预定义的 schema 约束。它具有很高的知识准确性和逻辑严谨性。但由于严重依赖人工标注，施工人工成本相对较高，信息完整性不足。 $KG_{fr}$ 与 $KG_{cs}$ 共享相同的 EntityTypes、Eventtypes 和 Conceptual 系统，并为提供了有效的信息补充 $KG_{cs}$ 。同时，supporting_chunks、摘要和描述边缘构建于两者之间 $KG_{fr}$ ，形成 $RC$ 基于图结构的倒排索引，是对 $RC$ 原始文本上下文的有效补充 $KG_{fr}$ ，具有很高的信息完整性。如图 3 右侧所示，在特定的域应用程序中， $R(KG_{cs})$ 、 $R(KG_{fr})$ 和 $R(RC)$ 分别表示它们在解决目标域问题方面的知识覆盖率。如果应用对知识的准确性和逻辑严谨性有更高的要求，就需要构建更多的领域结构化知识，消耗更多的专家人力来增加 $R(KG_{cs})$ 的覆盖率。相反，如果应用程序对检索效率有更高的要求，并且有一定程度的信息丢失或容错能力，则需要增加覆盖率， $R(KG_{fr})$ 以充分利用 KAG 的自动化知识构建能力，减少专家人力消耗。

2.2 Mutual Indexing
2.2 元互索引

As illustrated in Figure 4, KAG-Builder consists of three coherent processes: structured information acquisition, knowledge semantic alignment and graph storage writer. The main goals of this module include: 1) building a mutual-indexing between the graph structure and the text chunk to add more descriptive context to the graph structure, 2) using the concept semantic graph to align different knowledge granularities to reduce noise and increase graph connectivity.
如图 4 所示，KAG-Builder 由三个连贯的过程组成：结构化信息获取、知识语义对齐和图形存储写入器。本模块的主要目标包括：1）在图结构和文本块之间构建互索引，为图结构添加更多描述性上下文，2）使用概念语义图来对齐不同的知识粒度，以减少噪声并增加图连接性。

2.2.1 Semantic Chunking
2.2.1 语义分块

According to the document’s structural hierarchy and the inherent logical connections between paragraphs, a semantic chunking process is implemented based on system-built-in prompts. This semantic chunking produces chunks that adhere to both length constraints (specifically for LLM’s context window size constraints) and semantic coherence, ensuring that the content within each chunk is thematically cohesive. We defined Chunk EntityType in $RC$ , which includes fields such as id, summary, and mainText. Each chunk obtained after semantic segmentation will be written into an instance of Chunk, where id is a composite field consisting of articleID, paraCode, idInPara concatenated by the connector # in order to ensure that consecutive chunks are adjacent in the id space. articleID represents the globally unique article ID, paraCode represents the paragraph code in the article, and idInPara is the sequential code of each chunk in the paragraph. Consequently, an adjacency in the content corresponds to a sequential adjacency in their identifiers. Furthermore, a reciprocal relation is established and maintained between the original document and its segmented chunks, facilitating navigation and contextual understanding across different granularities of the document’s content. This structured approach to segmentation not only optimizes compatibility with large-scale language models but also preserves and enhances the document’s inherent semantic structure and association.
根据文档的结构层次结构和段落之间固有的逻辑联系，基于系统内置的提示实现语义分块过程。这种语义分块生成的块同时符合长度约束（特别是 LLM 的上下文窗口大小约束）和语义连贯性，确保每个块中的内容在主题上是内聚的。我们在中 $RC$ 定义了 Chunk EntityType，其中包括 id、summary 和 mainText 等字段。语义分割后获取的每个 chunk 都会写入 Chunk 的一个实例中，其中 id 是一个由 articleID、paraCode、idInPara 组成的复合字段，由连接器 # 连接起来，以保证连续的 chunk 在 id 空间中相邻。articleID 表示全局唯一的文章 ID，paraCode 表示文章中的段落代码，idInPara 是段落中每个块的顺序代码。因此，内容中的邻接对应于其标识符中的连续邻接。此外，在原始文档及其分割块之间建立并维护了互惠关系，从而促进了文档内容的不同粒度之间的导航和上下文理解。这种结构化的分段方法不仅优化了与大规模语言模型的兼容性，还保留和增强了文档的固有语义结构和关联。

2.2.2 Information Extraction with More Descriptive Context
2.2.2 具有更多描述性上下文的信息提取

Given a dataset, we use fine-tuning-free LLM(such as GPT-3.5, DeepSeek, QWen, etc,.) or our fine-tuned model Hum to extract entities, events, concepts and relations to construct $KG_{fr}$ , subsequently, construct the mutual-indexing structure between $KG_{fr}$ and $RC$ , enabling cross-document links through entities and relations. This process includes three steps. First, it extracts the entity set $E=\{e_{1},e_{2},e_{3},...\}$ chunk by chunk, second, extracts the event set ${EV}=\{{ev}_{1},{ev}_{2},{ev}_{3},...\}$ associated to all entities and iteratively extracts the relation set $R=\{r_{1},r_{2},r_{3},...\}$ between all entities in $E$ , finally, completes all hypernym relations between the instance and its spgClass. To provide more convenience for the subsequent Knowledge Alignment phase, and overcome the problem of low discrimination of knowledge phrases such as Wikidata[22] and ConceptNet[23], in the entity extraction phase, we use LLMs to generate built-in properties description, summary, semanticType, spgClass, descripitonOfSemanticType by default for each instance e at one time, as shown in Figure 2, we store them in the e instance storage according to the structure of e.description,e.summary, <e, belongTo, semanticType> and <e, hasClass, spgClass>.
给定一个数据集，我们使用免微调的 LLM（如 GPT-3.5、DeepSeek、QWen 等）或我们微调的模型 Hum 来提取实体、事件、概念和关系来构建 $KG_{fr}$ ，随后，构建和之间的 $KG_{fr}$ $RC$ 互索引结构，通过实体和关系实现跨文档链接。此过程包括 3 个步骤。首先，它逐块提取实体集 $E=\{e_{1},e_{2},e_{3},...\}$ ，其次，提取与所有实体关联的事件集 ${EV}=\{{ev}_{1},{ev}_{2},{ev}_{3},...\}$ ，并迭代提取 $E$ 中所有实体之间的关系集 $R=\{r_{1},r_{2},r_{3},...\}$ ，最后，完成实例与其 spgClass 之间的所有上位词关系。为了给后续的知识对齐阶段提供更多的便利，克服 Wikidata[22] 和 ConceptNet[23] 等知识短语的低辨别度问题，在实体抽取阶段，我们使用 LLM 为每个实例 e 默认生成内置属性 description、summary、semanticType、spgClass、descripitonOfSemanticType，如图 2 所示，我们根据 e.description、e.summary、<e、belongTo、semanticType> 和 <e、hasClass、spgClass> 的结构将它们存储在 E 实例存储中。

2.2.3 Domain Knowledge Injection And Constraints
2.2.3 领域知识注入和约束

When openIE is applied to professional domains, irrelevant noise will be introduced. Previous researches[3, 5, 24] have shown that noisy and irrelevant corpora can significantly undermine the performance of LLMs. It is a challenge to align the granularity of extracted information and domain knowledge. The domain knowledge alignment capabilities in KAG include: 1) Domain term and concept injection. We use an iterative extraction approach, First, we store domain concepts and terms with description in KG storage. Second, we extract all instances in the document through openIE, then we perform vector retrieval to obtain all possible concept and term sets $E_{d}$ . Finally, we add $E_{d}$ to the extraction prompt and perform another extraction to obtain a set $E_{d}^{a}$ that is mostly aligned with the domain knowledge. 2) Schema-constraint Extraction. In the vertical professional domains, the data structure between multiple documents in each data source such as drug instructions, physical examination reports, government affairs, online order data, structured data tables, etc. has strong consistency, and is more suitable for information extraction with schema-constraint, structured Extraction also makes it easier to do knowledge management and quality improvement. For detailed information about knowledge construction based on Schema-constraint, please refer to the SPG¹
当 openIE 应用于专业领域时，将引入不相关的噪音。以前的研究[ 3， 5， 24] 表明，嘈杂和不相关的语料库会显着破坏 LLM 的性能。调整提取的信息和领域知识的粒度是一项挑战。KAG 中的领域知识对齐功能包括：1）领域术语和概念注入。我们使用迭代提取方法，首先，我们将域概念和术语与描述存储在 KG 存储中。其次，我们通过 openIE 提取文档中的所有实例，然后我们进行向量检索，以获得所有可能的概念和术语集 $E_{d}$ 。最后，我们添加到 $E_{d}$ 提取提示符中并执行另一次提取，以获得与领域知识基本一致的集合 $E_{d}^{a}$ 。2）模式约束提取。在垂直专业领域中，每个数据源中诸如药物说明书、体检报告、政府事务、在线订单数据、结构化数据表等多个文档之间的数据结构具有很强的一致性，更适合于具有 schema-constraint 的信息抽取，结构化抽取也使得做知识管理和质量提升变得更加容易。关于基于 Schema-constraint 的知识构建的详细信息，请参考 SPG1¹1Official site of SPG: https://spg.openkg.cn/en-US
SPG 俱乐部官方网站：https://spg.openkg.cn/en-US and OneKE[25]. This article will not introduce it in detail. It is worth noting that, as shown in figure 2, for the same entity type, such as Person, we can pre-define properties and relations such as name, gender, placeOfBirth, (Person, hasFather, Person), (Person, hasFriend, Person), and can also extract tripples directly such as (Jay Chou, spgClass, Person), (Jay Chou, constellation, Capricorn), (Jay Chou, record company, Universal Music Group) through openIE. 3) Pre-defined Knowledge Structures By Document Type. Professional documents such as drug instructions, government affairs documents, and legal definitions generally have a relatively standardized document structure. Each type of document can be defined as an entity type, and different paragraphs are different properties of the entity. Taking government affairs as an example, we can pre-define the GovernmentAffair EntityType and properites such as administrative divisions, service procedures, required materials, service locations, and target groups. The divided chunks are the values of different properties. If the user asks "What materials are needed to apply for housing provident fund in Xihu District?", you can directly take out the chunk corresponding to property required materials to answer the question, avoiding the possible hallucinations caused by LLM re-generation.
和 OneKE[25]。本文不会详细介绍。值得注意的是，如图 2 所示，对于相同的实体类型，比如 Person，我们可以预定义 name， gender， placeOfBirth、（Person， hasFather， Person）、（Person， hasFriend， Person）等属性和关系，也可以通过 openIE 直接提取（Jay Chou， spgClass， Person）、（Jay Chou， constellation， Capricorn）、（Jay Chou，唱片公司，环球音乐集团）等三元组。3）按文档类型预定义的知识结构。药品说明书、政务文件、法律定义等专业文件一般具有相对标准化的文档结构。每种类型的文档都可以定义为实体类型，不同的段落是实体的不同属性。以政府事务为例，我们可以预先定义 GovernmentAffair EntityType 和行政区划、服务流程、所需材料、服务地点和目标人群等属性。划分的 chunk 是不同 property 的值。如果用户问到“西湖区申请住房公积金需要什么材料”，可以直接拿出物业所需材料对应的 chunk 来回答问题，避免了 LLM 再生可能带来的幻觉。

2.2.4 Mutual indexing between text chunk vectors and knowledge structures
2.2.4 文本块向量和知识结构之间的相互索引

KAG’s mutual-indexing is a knowledge management and storage mechanism that conforms to the LLMFriSPG semantic representation. As is described in section 2.1, it includes four core data structures: 1) Shared Schemas are coarse-grained-types pre-defined as SPG Classes at project level, it includes EntityTypes, ConceptTypes, and EventTypes, they serve as a high-level categorization such as Person, Organization, GEOLocation, Date, Creature, Work, Event. 2) Instance Graph include all event and entity instances in $KG_{cs}$ and $KG_{fr}$ . that is, instances constructed through openIE with schema-free or structured extraction with schema-constraint are both stored as instances in KG storage. 3) Text Chunks are special entity node that conforms to the definition of the Chunk EntityType. 4) Concept Graph is the core component for knowledge alignment. it consists of a series of concepts and concept relations, concept nodes are also fine-grained-types of instances. Through relation prediction, instance nodes can be linked to concept nodes to obtain their fine-grained semantic types. , and two storage structures: 1) KG Store. Store KG data structures in LPG databases, such as TuGraph, Neo4J. 2) Vector Store. Store text and vectors in a vector storage engine, such as ElasticSearch, Milvus, or the vector storage embedded in the LPG engine.
KAG 的互索引是一种符合 LLMFriSPG 语义表示的知识管理和存储机制。如第 2.1 节所述，它包括四个核心数据结构：1）共享模式是在项目级别预定义为 SPG 类的粗粒度类型，它包括 EntityType、ConceptType 和 EventType，它们用作高级分类，例如人员、组织、GEOLocation、日期、生物、工作、事件。2）实例图包括 $KG_{cs}$ 和中的所有 $KG_{fr}$ 事件和实体实例。也就是说，通过 openIE 构建的具有 schema-free 的实例或具有 schema-constraint 的结构化提取都作为实例存储在 KG 存储中。3）文本块是符合 Chunk EntityType 定义的特殊实体节点。4）概念图是知识对齐的核心组件。它由一系列的概念和概念关系组成，概念节点也是细粒度类型的实例。通过关系预测，实例节点可以链接到概念节点，以获得其细粒度的语义类型。和两种存储结构：1） KG Store。将 KG 数据结构存储在 LPG 数据库中，例如 TuGraph、Neo4J。2）载体存储。将文本和向量存储在向量存储引擎中，例如 ElasticSearch、Milvus 或 LPG 引擎中嵌入的向量存储。

2.3 Logical Form Solver
2.3 逻辑形式求解器

In the process of solving complex problems, three key steps are involved: planning, reasoning and retrieval. Disassembling question is a planning process to determine the next problem to be tackled. Reasoning includes retrieving information based on the disassembled question, inferring the answer to the question according to the retrieved results, or re-disassembling the sub-question when the retrieved content cannot answer the question. Retrieval is to find the content that can be used as reference for the original question or the disassembled sub-question.
在解决复杂问题的过程中，涉及三个关键步骤：规划、推理和检索。拆解问题是确定下一个要解决的问题的规划过程。推理包括根据反汇编的问题检索信息，根据检索到的结果推断问题的答案，或者在检索到的内容无法回答问题时重新反汇编子问题。检索是找到可以作为原始问题或拆解子问题参考的内容。

Algorithm 1 Logical Form Solver
算法 1 逻辑形式求解器

memory\leftarrow[]

1：

memory\leftarrow[]

query_{cur}\leftarrow query

阿拉伯数字：

query_{cur}\leftarrow query

3:for

round\in(0,n)

do
3：for

round\in(0,n)

lf_{list}\leftarrow

LFPlanner(

query_{cur}

)
4：

lf_{list}\leftarrow

LFPlanner（

query_{cur}

）

history

\leftarrow

[]

5：

history

\leftarrow

[]

6: for

lf\in lf_{list}

do
6：用于

lf\in lf_{list}

lf_{subquery},lf_{func}\leftarrow lf

7：

lf_{subquery},lf_{func}\leftarrow lf

retrievals_{sub}

answer_{sub}

\leftarrow

Reasoner(

lf_{subquery}

lf_{func}

)
8：

retrievals_{sub}

，

answer_{sub}

\leftarrow

推理器（

lf_{subquery}

，

lf_{func}

）

history.append([lf_{subquery},retrievals_{sub},answer_{sub}])

9：

history.append([lf_{subquery},retrievals_{sub},answer_{sub}])

10: end for
10：结束

11:

memory

\leftarrow

Memory(

query

history

)
11：

memory

\leftarrow

内存（

query

，

history

）

12: if not Judge(

query

memory

) then
12：如果不是 Judge（

query

，

memory

）则

13:

query_{cur}

\leftarrow

SupplyQuery(

query

memory

)
13：

query_{cur}

\leftarrow

SupplyQuery（

query

，

memory

）

14: end if
14：如果 end if

15:end for
15：end 为

16:

answer\leftarrow

Generator(

query

memory

)
16：

answer\leftarrow

生成器（

query

，

memory

）

17:return

answer

17：返回

answer

Since interactions between different modules in traditional RAG are based on vector representations of natural language, inaccuracies often arise. Inspired by the logical forms commonly used in KGQA, we designed an executable language with reasoning and retrieval capabilities. This language breaks down a question into multiple logical expressions, each of which may include functions for retrieval or logical operations. The mutual indexing described in Section 2.2 makes this process possible. Meanwhile, we designed a multi-turn solving mechanism based on reflection and global memory, inspired by ReSP[26]. The KAG solving process, as referenced in Figure 6 and Algorithm 17, first decomposes the current question $query_{cur}$ into a list of subquestions $lf_{list}$ represented in logical form, and performs hybrid reasoning to solve them. If an exact answer can be obtained through multi-hop reasoning over structured knowledge, it returns the answer directly. Otherwise, it reflects on the solution results: storing the answers and retrieval results corresponding to $lf_{list}$ in global memory and determining whether the question is resolved. If not, it generates supplementary questions and proceeds to the next iteration. Section 2.3.1, 2.3.2 and 2.3.3 introduce logical form function for planning, logical form for reasoning and logical form for retrieval respectively. In general, the proposed logical form language has the following three advantages:
由于传统 RAG 中不同模块之间的交互基于自然语言的向量表示，因此经常会出现不准确的情况。受到 KGQA 中常用的逻辑形式的启发，我们设计了一种具有推理和检索能力的可执行语言。这种语言将问题分解为多个逻辑表达式，每个逻辑表达式都包含用于检索或逻辑作的函数。Section 2.2 中描述的相互索引使此过程成为可能。同时，我们受 ReSP[26] 的启发，设计了一种基于反射和全局记忆的多轮求解机制。如图 6 和算法 17 所示，KAG 求解过程首先将当前问题 $query_{cur}$ 分解为以逻辑形式表示的子问题 $lf_{list}$ 列表，然后执行混合推理来解决它们。如果可以通过对结构化知识进行多跳推理来获得确切答案，则直接返回答案。否则，它将反映在解决方案结果上：将对应的 $lf_{list}$ 答案和检索结果存储在全局内存中，并确定问题是否已解决。如果没有，它将生成补充问题并继续进行下一次迭代。2.3.1、2.3.2 和 2.3.3 节分别介绍了用于规划的逻辑形式函数、用于推理的逻辑形式和用于检索的逻辑形式。一般来说，建议的逻辑形式语言具有以下三个优点：

•

The use of symbolic language enhances the rigor and interpretability of problem decomposition and reasoning.

• 符号语言的使用增强了问题分解和推理的严谨性和可解释性。
•

Make full use of LLMFriSPG hierarchical representation to retrieve facts and texts knowledge guided by the symbolic graph structure

• 充分利用 LLMFriSPG 层次表示，在符号图结构的引导下检索事实和文本知识
•

Integrate the problem decomposition and retrieval processes to reduce the system complexity.

• 集成问题分解和检索流程，以降低系统复杂性。

Table 13 illustrates a multi-round scenario consistent with pseudocode 17. Although first round the exact number of plague occurrences couldn’t be determined, but we can extracted information indicates: "Venice, the birthplace of Antonio Vivaldi, experienced the devastating Black Death, also known as the Great Plague. This pandemic caused by Yersinia pestis led to 75 to 200 million deaths in Eurasia, peaking in Europe from 1347 to 1351. The plague brought significant upheavals in Europe. Although specific occurrence records in Venice aren’t detailed, it’s clear the city was impacted during the mid-14th century.". As is shown in Table 13,After two iterations, the answer determined is: 22 times.
表 13 说明了与伪代码 17 一致的多轮场景。虽然第一轮无法确定瘟疫发生的确切数量，但我们可以提取信息表明：“ 威尼斯，安东尼奥·维瓦尔第的出生地，经历了毁灭性的黑死病，也被称为大瘟疫。这场由鼠疫耶尔森菌引起的大流行导致欧亚大陆有 75 至 2 亿人死亡，在欧洲的高峰期为 1347 年至 1351 年。瘟疫在欧洲带来了重大动荡。虽然威尼斯的具体事件记录并不详细，但很明显这座城市在 14 世纪中叶受到了影响。如表 13 所示，经过两次迭代，确定的答案为：22 次。

2.3.1 Logical Form Planning
2.3.1 逻辑形式规划

Logical Functions are defined as Table 1, with each function representing an execution action. Complex problems are decomposed by planning a combination of these expressions, enabling reasoning about intricate issues.
逻辑函数定义为表 1，每个函数代表一个执行作。通过规划这些表达式的组合来分解复杂的问题，从而能够对复杂的问题进行推理。

Function Name 函数名称

Function Declaration 函数声明

Retrieval 检索

{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}Retrieval}(s% =s_{i}:type[name],p=p_{i}:edge,o=o_{i}:type[name],

s.prop=value,p.prop=value,o.prop=value

)

s.prop=value,p.prop=value,o.prop=value

）

Sort 排序

{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}Sort}(A,% direction=min|max,limit=n)

Math 数学

math_{i}={\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% Math}(expr)

expr is in LaTeX syntax and can be used to perform operations on sets.
expr 采用 LaTeX 语法，可用于对 set 执行作。

e.g. count:

\|A\|

, sum:

\sum{A}

例如 count：

\|A\|

， sum：

\sum{A}

Deduce 推断

{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}Deduce}(left% =A,right=B,op=entailment|greater|less|equal)

Output 输出

{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}Output}(A,B,% ...)

Table 1: Functions of logical form.
表 1：逻辑形式的函数。

Retrieval. According to the the knowledge or information retrieved from SPO, s, p, o should not repeatedly appear multiple times in the same expression. Constraints can be applied to the s, p, o for querying. For multi-hop queries, multiple retrievals are required. When the current variable refers to a previously mentioned variable, the variable name must be consistent with the referenced variable name, and only the variable name needs to be provided. The knowledge type and name are only specified during the first reference.
检索。根据从 SPO 检索到的知识或信息，s、p、o 不应在同一表达式中多次重复出现。约束可以应用于 s、p、o 以进行查询。对于多跃点查询，需要多次检索。当当前变量引用前面提到的变量时，变量名必须与引用的变量名一致，只需要提供变量名。知识类型和名称仅在第一次引用期间指定。

Sort. Sort the retrieved results. $A$ is the variable name for the retrieved subject-predicate-object(SPO) ( $s_{i}$ , $o_{i}$ , or $s.prop$ , $p.prop$ , $o.prop$ ). $direction$ specifies the sorting direction, where $direction=min$ means sorting in ascending order and $direction=max$ means sorting in descending order. $limit=n$ indicates outputting the topN results.
排序。对检索到的结果进行排序。 $A$ 是检索到的主语谓词对象（SPO）（ $s_{i}$ 、 $o_{i}$ 或 $s.prop$ 、、 $p.prop$ ） $o.prop$ 的变量名称。 $direction$ 指定排序方向，其中 $direction=min$ 表示升序排序， $direction=max$ 表示降序排序。 $limit=n$ 表示输出 topN 结果。

Math. Perform mathematical calculations. $expr$ is in LaTeX syntax and can be used to perform calculations on the retrieved results (sets) or constants. $math_{i}$ represents the result of the calculation and can be used as a variable name for reference in subsequent actions.
数学。执行数学计算。 $expr$ 采用 LaTeX 语法，可用于对检索到的结果（集合）或常量执行计算。 $math_{i}$ 表示计算的结果，并可用作变量名称，以便在后续作中引用。

Deduce. Deduce the retrieval or calculation results to answer the question. $A,B$ can be the variable names from the retrieved SPO or constants. The operator $op=entailment|greater|less|equal$ represents $A$ entails $B$ , $A$ is greater than $B$ , $A$ is less than $B$ , and $A$ is equal to $B$ , respectively.
Deduce.推断检索或计算结果以回答问题。 $A,B$ 可以是检索到的 SPO 或常量中的变量名称。运算符 $op=entailment|greater|less|equal$ 分别表示 $A$ 、 $B$ $A$ 大于 $B$ 、 $A$ 小于 $B$ 和 $A$ 等于 $B$ 。

2.3.2 Logical Form for Reasoning
2.3.2 推理的逻辑形式

When the query statement represented by natural language is applied to the search, the logic is often fuzzy, such as "find a picture containing vegetables or fruits" and "find a picture containing vegetables and fruits". Whether text search or vector search is used, the similarity between the two queries is very high, but the corresponding answers are quite different. The same is true for problems involving logical reasoning processes such as and or not, and intersection differences. To this end, we use logical form to express the question, so that it can express explicit semantic relations. Similar to IRCOT, we decompose complex original problem and plan out various execution actions such as multi-step retrieval, numerical reasoning, logical reasoning, and semantic deduce. Each sub-problem is expressed using logical form functions, and dependencies between sub-questions are established through variable references. The inference resolution process for each sub-question is illustrated as Algorithm 11. In this process, the GraphRetrieval module performs KG structure retrieval according to the logical form clause to obtain structured graph results. Another key module, HybridRetrieval, combining natural language expressed sub-problems and logical functions for comprehensive retrieval of documents and sub-graph information. To understand how logical functions can be utilized to reason about complex problems, refer to the following examples as Table 14.
当自然语言表示的查询语句应用于搜索时，逻辑往往是模糊的，比如 “find a picture containing vegetables or fruits” 和 “find a picture containing vegetables and fruits” 等。无论使用文本搜索还是向量搜索，这两个查询之间的相似度都很高，但对应的答案却大相径庭。对于涉及逻辑推理过程（如 and or not）和交集差异的问题也是如此。为此，我们用逻辑形式来表达问题，使其能够表达明确的语义关系。与 IRCOT 类似，我们分解复杂的原始问题并规划出多步骤检索、数字推理、逻辑推理和语义演绎等各种执行动作。每个子问题都使用逻辑形式函数表示，子问题之间的依赖关系通过变量引用建立。每个子问题的推理解决过程如算法 11 所示。在此过程中，GraphRetrieval 模块根据逻辑形式子句进行 KG 结构检索，得到结构化图结果。另一个关键模块 HybridRetrieval，结合自然语言表达的子问题和逻辑函数，对文档和子图信息进行综合检索。要了解如何利用逻辑函数来推理复杂问题，请参考表 14 中的以下示例。

Output. Directly output $A,B,...$ as the answers. Both $A$ and $B$ are variable names that reference the previously retrieved or calculated
输出。直接输出 $A,B,...$ 为答案。both $A$ 和 $B$ 都是引用先前检索或计算的变量名称

Algorithm 2 Logical Form Reasoner
算法 2 逻辑形式推理器

Each sub-query resulting from the decomposition of a question based on the logical form, along with their respective logical function, are denoted as

lf_{subquery}

and

lf_{func}

根据逻辑形式分解问题所产生的每个子查询，以及它们各自的逻辑函数，都表示为

lf_{subquery}

和

lf_{func}

2:The retrievals and answer of each sub-query, are denoted as

retri_{sub}

and

answer_{sub}

阿拉伯数字：每个子查询的检索和答案表示为

retri_{sub}

和

answer_{sub}

retri_{kg}\leftarrow

GraphRetrieval(

lf_{subquery}

lf_{func}

)

retri_{kg}\leftarrow

GraphRetrieval（

lf_{subquery}

，

lf_{func}

）

4:if

retri_{kg}\neq None

and

retri_{kg}>threshold

then
4：if

retri_{kg}\neq None

和

retri_{kg}>threshold

then

retri_{sub}\leftarrow retri_{kg}

6:else
6：else

retri_{doc}\leftarrow

HybridRetrieval(

lf_{subquery}

retri_{kg}

)

retri_{doc}\leftarrow

HybridRetrieval（

lf_{subquery}

混合

retri_{kg}

检索，）

retri_{sub}\leftarrow retri_{kg},retri_{doc}

8：

retri_{sub}\leftarrow retri_{kg},retri_{doc}

end if

10:

answer_{sub}\leftarrow

Generator(

lf_{subquery}

retri_{sub}

)
10：

answer_{sub}\leftarrow

生成器（

lf_{subquery}

，

retri_{sub}

）

return

retri_{sub},answer_{sub}

retri_{sub},answer_{sub}

2.3.3 Logical Form for Retrieval
2.3.3 检索的逻辑形式

In naive RAG, retrieval is achieved by calculating the similarity (e.g. cosine similarity) between the embeddings of the question and document chunks, where the semantic representation capability of embedding models plays a key role. This mainly includes a sparse encoder (BM25) and a dense retriever (BERT architecture pre-training language models). Sparse and dense embedding approaches capture different relevance features and can benefit from each other by leveraging complementary relevance information.
在 naïve RAG 中，检索是通过计算问题块和文档块的嵌入之间的相似性（例如余弦相似度）来实现的，其中嵌入模型的语义表示能力起着关键作用。这主要包括一个稀疏编码器（BM25）和一个密集检索器（BERT 架构预训练语言模型）。稀疏和密集嵌入方法可以捕获不同的相关性特征，并且可以通过利用互补的相关性信息来相互受益。

The existing method of combining the two is generally to combine the scores of the two search methods in an ensemble, but in practice different search methods may be suitable for different questions, especially in questions requiring multi-hop reasoning. When query involves proper nouns, people, places, times, numbers, and coordinates, the representation ability of the pre-trained presentation model is limited, and more accurate text indexes are needed. For queries that are closer to the expression of a paragraph of text, such as scenes, behaviors, and abstract concepts, the two may be coupled in some questions.
现有的将两者结合起来的方法一般是将两种搜索方法的分数组合成一个集合，但在实践中，不同的搜索方法可能适用于不同的问题，尤其是在需要多跳推理的问题中。当查询涉及专有名词、人、地点、时间、数字和坐标时，预训练的表示模型表示能力有限，需要更准确的文本索引。对于更接近于文本段落表达的查询，例如场景、行为和抽象概念，在某些问题中可能会将两者耦合在一起。

In the design of logical form, it is feasible to effectively combine two retrieval methods. When keyword information is needed as explicit filtering criteria, conditions for selection can be specified within the retrieval function to achieve structured retrieval.
在逻辑形式的设计中，将两种检索方法有效结合是可行的。当需要关键字信息作为显式过滤条件时，可以在检索函数中指定选择条件，实现结构化检索。

For example, for the query "What documents are required to apply for a disability certificate at West Lake, Hangzhou?", the retrieval function could be represented as: "Retrieval(s=s1:Event[applying for a disability certificate], p=p1:support_chunks, o=o1:Chunk, s.location=West Lake, Hangzhou)". This approach leverages the establishment of different indices (sparse or dense) to facilitate precise searches or fuzzy searches as needed.
例如，对于查询 “杭州西湖申请残障证需要什么文件？”，检索函数可以表示为：“Retrieval（s=s1：Event[申请残障证]， p=p1：support_chunks， o=o1：Chunk， s.location=杭州西湖）”。此方法利用建立不同的索引（稀疏或密集）来促进所需的精确搜索或模糊搜索。

Furthermore, when structured knowledge in the form of SPO cannot be retrieved using logical functions, alternative approaches can be employed. These include semi-structured retrieval, which involves using logical functions to search through chunks of information, and unstructured retrieval. The latter encompasses methods such as Retrieval-Augmented Generation (RAG), where sub-problems expressed in natural language are used to retrieve relevant chunks of text. This highlights the adaptability of the system to leverage different retrieval strategies based on the availability and nature of the information.
此外，当无法使用逻辑函数检索 SPO 形式的结构化知识时，可以采用替代方法。这些包括半结构化检索（涉及使用逻辑函数搜索信息块）和非结构化检索。后者包括 Retrieval-Augmented Generation （RAG）等方法，其中用自然语言表达的子问题用于检索相关的文本块。这突出了系统根据信息的可用性和性质利用不同检索策略的适应性。

2.4 Knowledge Alignment
2.4 知识对齐

Constructing KG index through information-extraction and retrieving based on vector-similarity has three significant defects in knowledge alignment:
基于向量相似性的信息抽取和检索构建 KG 指数在知识对齐方面存在三个显著缺陷：

•

Misaligned semantic relations between knowledge. Specific semantic relations, such as contains, causes and isA, are often required between the correct answer and the query, while the similarity relied upon in the retrieval process is a weak semantic measure that lacks properties and direction, which may lead to imprecise retrieval of content.

• 知识之间的语义关系错位。正确答案和查询之间往往需要特定的语义关系，如 contains、causes 和 isA，而检索过程中依赖的相似性是一种缺乏属性和方向的弱语义度量，这可能导致内容的检索不精确。
•

Misaligned knowledge granularity. The problems of knowledge granularity difference, noise, and irrelevance brought by openIE pose great challenges to knowledge management. Due to the diversity of language expressions, there are numerous synonymous or similar nodes, resulting in low connectivity between knowledge elements, making the retrieval recall incomplete.

• 知识粒度不一致。openIE 带来的知识颗粒度差异、噪声、不相关性等问题对知识管理提出了很大的挑战。由于语言表达的多样性，存在许多同义或类似的节点，导致知识元素之间的连通性低，使得检索召回不完整。
•

Misaligned with the domain knowledge structure. There is a lack of organized, systematic knowledge within specific domains. Knowledge that should be interrelated appears in a fragmented state, leading to a lack of professionalism in the retrieved content.

• 与领域知识结构不一致。在特定领域中缺乏有组织的、系统的知识。本应相互关联的知识以碎片化状态出现，导致检索到的内容缺乏专业性。

To solve these problems, we propose a solution that leverages concept graphs to enhance offline indexing and online retrieval through semantic reasoning. This involves tasks such as knowledge instance standardization, instance-to-concept linking, semantic relation completion, and domain knowledge injection. As described in section 2.2.2, we added descriptive text information to each instance, concept or relation in the extraction phase to enhance its interpretability and contextual relevance. Meanwhile, as described in section 2.2.3, KAG supports the injection of domain concepts and terminology knowledge to reduce the noise problem caused by the mismatch of knowledge granularity in vertical domains. The goal of concept reasoning is to make full use of vector retrieval and concept reasoning to complete concept relations based on the aforementioned knowledge structure to enhance the accuracy and connectivity of the domain KG. Refer to the definition of SPG concept semantics²
为了解决这些问题，我们提出了一种解决方案，利用概念图通过语义推理来增强离线索引和在线检索。这涉及知识实例标准化、实例到概念链接、语义关系完成和领域知识注入等任务。如第 2.2.2 节所述，我们在提取阶段为每个实例、概念或关系添加了描述性文本信息，以增强其可解释性和上下文相关性。同时，如 2.2.3 节所述，KAG 支持领域概念和术语知识的注入，以减少垂直领域中知识颗粒度不匹配带来的噪声问题。概念推理的目标是充分利用向量检索和概念推理，在上述知识结构的基础上完成概念关系，以增强领域 KG 的准确性和连通性。请参阅 SPG 概念语义的定义 2²2Semantic Classification of Concept: https://openspg.yuque.com/ndx6g9/ps5q6b/fe5p4nh1zhk6p1d8
概念语义分类：https://openspg.yuque.com/ndx6g9/ps5q6b/fe5p4nh1zhk6p1d8, as is shown in Table 2, we have summarized six semantic relations commonly required for retrieval and reasoning. Additional semantic relations can be added based on the specific requirements of the actual scenario.
，如表 2 所示，我们总结了检索和推理通常需要的 6 种语义关系。可以根据实际场景的具体需求添加额外的语义关系。

Formal Expression 正式表达式

Description 描述

Example 例

<var1, synonym, var2> <var1，同义词， var2>

A synonym relation means that a word or phrase
同义词关系是指单词或短语

var2 that has the same or nearly the same meaning
具有相同或几乎相同含义的 var2

as another word or phrase var1 in the same language
作为同一语言中的另一个单词或短语 var1

and given context. 和给定的上下文。

Fast is a synonym of quick
Fast 是 quick 的同义词.

<var1, isA, var2> <var1， isA， var2>

An isA relation means that a hypernym var2 that is
isA 关系意味着上位词 var2 是

more generic or abstract than a given word or phrase
比给定的单词或短语更通用或抽象

var1 and encompasses a broader category that the
var1 并包含更广泛的类别，

given word belongs to. given word 属于。

Car isA Vehicle.

<var1, isPartOf, var2> <var1、isPartOf、var2>

An isPartOf relation means that something var1 is a
isPartOf 关系意味着 var1 是一个

component or constituent of something var2 larger.
var2 较大的组件或组成部分。

This relation shows that an item is a part of a
此关系显示项是

bigger whole. 更大的整体。

Wheel isPartOf car 车轮是 PartOf 汽车.

<var1, contains, var2> <var1、包含、var2>

A contains relation means that something var1
contains 关系表示 var1

includes or holds var2, something else within it.
包含或包含 var2，其中包含其他内容。

This indicates that one item has the other as a subset
这表示一个项目将另一个项目作为子集

or component. 或组件。

Library contains books 图书馆藏书.

<var1, belongTo, var2> <var1、belongTo、var2>

An belongTo relation means that something var1 is an
belongTo 关系意味着 var1 是一个

instance of concept var2 概念 var2 的实例.

Chamber belongTo Legislative Body
会议厅属于立法机构.

<var1, causes, var2> <var1，原因，var2>

A causes relation means that one event or action var1
原因关系意味着一个事件或作 var1

brings about another var2. This indicates a causal
带来另一个 var2。这表明因果关系

relation where one thing directly results in the
关系中，一个 Thing 直接导致

occurrence of another. 发生另一个。

Fire causes smoke 火灾产生烟雾.

Table 2: Commonly used semantic relations.
表 2：常用的语义关系。

2.4.1 Enhance Indexing
2.4.1 增强索引

The process of enhancing indexing through semantic reasoning, as shown in Figure 5 , specifically implemented as predicting semantic relations or related knowledge elements among index items using LLM, encompassing four strategies:
通过语义推理增强索引的过程，如图 5 所示，具体实现为使用 LLM 预测索引项之间的语义关系或相关知识元素，包括四种策略：

•

Disambiguation and fusion of knowledge instances. Taking entity instance $e_{cur}$ as an example, first, the one-hop relations and description information of $e_{cur}$ are used to predict synonymous relations to obtain the synonym instance set $E_{syn}$ of $e_{cur}$ . Then, the fused target entity $e_{tar}$ is determined from $E_{syn}$ . Finally, the entity fusion rules are used to copy the properties and relations of the remaining instances in $E_{syn}$ to $e_{tar}$ , and the names of these instances are added to the synonyms of $e_{tar}$ , the remaining instances will also be deleted immediately.

• 知识实例的消歧和融合。以实体实例 $e_{cur}$ 为例，首先，利用的 $e_{cur}$ 单跳关系和描述信息来预测同义关系，得到的 $e_{cur}$ 同义词实例集 $E_{syn}$ 。然后，根据确定融合的目标实体 $e_{tar}$ $E_{syn}$ 。最后，使用实体融合规则将剩余实例的属性和关系复制到 $E_{syn}$ 中 $e_{tar}$ ，并将这些实例的名称添加到 $e_{tar}$ 的同义词中，剩余的实例也会立即删除。
•

Predict relations between instances and concepts. For each knowledge instance (such as event, entity), predict its corresponding concept and add the derived triple $<e_{i},\ belongTo,\ c_{j}>$ to the knowledge index. As is shown in Figure 5, <Chamber, belongTo, Legislative Body> means that the Chamber belongs to Legislative Body in classification.

• 预测实例和概念之间的关系。对于每个知识实例（如 event、entity），预测其对应的概念，并将派生的三元组 $<e_{i},\ belongTo,\ c_{j}>$ 添加到知识索引中。如图 5 所示，是指商会在分类上属于立法机构。
•

Complete concepts and relations between concepts. During the extraction process, we use concept reasoning to complete all hypernym and isA relations between semanticType and spgClass. As is shown in Figure 5 and Table 2, we can obtain the semanticType of Chamber is Legislative Body, and its spgClass is Organization in the extraction phase. Through semantic completion, we can get <Legislative Body, isA, Government Agency>, <Government Agency, isA, Organization>. Through semantic completion, the triple information of $KG_{fr}$ space is more complete and the connectivity of nodes is stronger.

• 完整的概念和概念之间的关系。在提取过程中，我们使用概念推理来完成 semanticType 和 spgClass 之间的所有 hypernym 和 isA 关系。如图 5 和表 2 所示，我们可以得到 Chamber 的 semanticType 是 Legislative Body，它的 spgClass 在提取阶段是 Organization。通过语义完成，我们可以得到，。通过语义完成，空间的 $KG_{fr}$ 三重信息更加完整，节点的连通性更强。

2.4.2 Enhance Retrieval
2.4.2 增强检索

In the retrieval phase, we utilize semantic relation reasoning to search the KG index based on the phrases and types in the logical form. For the types, mentions or relations in the logical form, we employ the method of combining semantic relation reasoning with similarity retrieval to replace the traditional similarity retrieval method. This retrieval method makes the retrieval path professional and logical, so as to obtain the correct answer. First, the hybrid reasoning performs precise type matching and entity linking. If the type matching fails, then, semantic reasoning is performed. As shown in Figure 6, if the type Political Party fails to match, semantic reasoning is used to predict that Political Party contains Political Faction, and reasoning or path calculation is performed starting from Political Faction.
在检索阶段，我们利用语义关系推理，根据逻辑形式的短语和类型搜索 KG 索引。对于逻辑形式的类型、提及或关系，我们采用语义关系推理与相似性检索相结合的方法，以取代传统的相似性检索方法。这种检索方法使检索路径专业且合乎逻辑，从而获得正确答案。首先，混合推理执行精确的类型匹配和实体链接。如果类型匹配失败，则执行语义推理。如图 6 所示，如果 Political Party 类型不匹配，则使用语义推理来预测 Political Party 包含 Political Faction，并从 Political Faction 开始进行推理或路径计算。

Take another example. If the user query $q_{1}$ is "Which public places can cataract patients go for leisure?" and the document content $d_{2}$ is "The museum is equipped with facilities to provide barrier-free visiting experience services such as touch, voice interpretation, and fully automatic guided tours for the visually impaired.", It is almost impossible to retrieve $d_{2}$ based on the vector similarity with $q_{1}$ . However, it is easier to retrieve $d_{2}$ through the semantic relation of <cataract patient, isA, visually impaired>.
再举一个例子。如果用户查询 $q_{1}$ 为 “哪些公共场所可以白内障患者休闲游”，文档内容 $d_{2}$ 为 “博物馆设有设施，提供触控、语音传译、视障全自动导游等无障碍参观体验服务”，则几乎不可能根据向量相似度检索 $d_{2}$ 。 $q_{1}$ 然而，通过 <cataract 患者 isA、视力受损> 的语义关系更容易检索 $d_{2}$ 。

2.5 KAG-Model 2,5 KAG 型

KAG includes two main computational processes: offline index building and online query and answer generation. In the era of small language models, these two tasks were typically handled by two separate pipelines, each containing multiple task-specific NLP models. This results in high complexity for the application system, increased setup costs, and inevitable cascading losses due to error propagation between modules. In contrast, large language models, as a capability complex, can potentially integrate these pipelines into a unified, simultaneous end-to-end reasoning process.
KAG 包括两个主要的计算过程：离线索引构建和在线查询和答案生成。在小型语言模型时代，这两项任务通常由两个单独的管道处理，每个管道包含多个特定于任务的 NLP 模型。这会导致应用系统的高度复杂性，增加设置成本，并且由于模块之间的错误传播而不可避免地会导致级联损失。相比之下，大型语言模型作为一种功能综合体，有可能将这些管道集成到一个统一的、同步的端到端推理过程中。

As shown in Figure 7, the processes of indexing and QA each consist of similar steps. Both of the two pipelines can be abstracted as classify, mention detection, mention relation detection, semantic alignment, embedding, and chunk, instance, or query-focused summary. Among these, classify, mention detection, and mention relation detection can be categorized as NLU, while semantic alignment and embedding can be grouped under NLI. Finally, the chunk, instance or query-focused summary can be classified under NLG. Thus, we can conclude that the three fundamental capabilities of natural language processing that a RAG system relies on are NLU, NLI, and NLG.
如图 7 所示，索引和 QA 过程都包含类似的步骤。这两个管道都可以抽象为分类、提及检测、提及关系检测、语义对齐、嵌入以及块、实例或以查询为中心的摘要。其中，分类、提及检测和提及关系检测可归类为 NLU，而语义对齐和嵌入可归类为 NLI。最后，区块、实例或以查询为中心的摘要可以归类为 NLG。因此，我们可以得出结论，RAG 系统所依赖的自然语言处理的三个基本功能是 NLU、NLI 和 NLG。

We focused on exploring methods to optimize these three capabilities, which are introduced in subsections 2.5.1, 2.5.2, and 2.5.3 respectively. Additionally, to reduce the cascade loss caused by linking models into a pipeline, we further explored methods to integrate multiple inference processes into a single inference. Subsection 2.5.4 will discuss how to equip the model with retrieval capabilities to achieve better performance and efficiency through one-pass inference.
我们重点探索了优化这三种功能的方法，分别在第 2.5.1 、 2.5.2 和 2.5.3 小节中介绍。此外，为了减少将模型链接到管道中而导致的级联损失，我们进一步探索了将多个推理过程集成到单个推理中的方法。2.5.4 小节将讨论如何为模型配备检索功能，以通过一次性推理实现更好的性能和效率。

2.5.1 Natural Language Understanding
2.5.1 自然语言理解

NLU is one of the most common foundational tasks in natural language processing, including text classification, named entity recognition, relation Extraction, subject and object extraction, trigger detection, event argument extraction, event extraction, and machine reading comprehension. We have collected over 30 public datasets to enhance understanding capabilities. Experiments found that simply transforming the original datasets into instruction datasets can achieve comparable results to specialized models on trained tasks, but this approach does not improve the model’s NLU capabilities on unseen domains. Therefore, we conducted large-scale instruction reconstruction, designing various instruction synthesis strategies to create an NLU instruction dataset with over 20,000 diverse instructions. By utilizing this dataset for supervised fine-tuning on a given base model, the model has demonstrated enhanced NLU capabilities in downstream tasks. The instruction reconstruction strategy mainly consists of the following three types.
NLU 是自然语言处理中最常见的基础任务之一，包括文本分类、命名实体识别、关系提取、主题和对象提取、触发器检测、事件参数提取、事件提取和机器阅读理解。我们收集了 30 多个公共数据集，以增强理解能力。实验发现，简单地将原始数据集转换为指令数据集可以在训练任务上获得与专业模型相当的结果，但这种方法并不能提高模型在看不见的域上的 NLU 能力。因此，我们进行了大规模的指令重构，设计了各种指令合成策略，以创建一个包含超过 20,000 条不同指令的 NLU 指令数据集。通过利用此数据集对给定的基础模型进行监督微调，该模型在下游任务中展示了增强的 NLU 功能。指令重构策略主要包括以下三种类型。

•

Label bucketing: [25]This strategy focuses on label-guided tasks, where the aim is to extract text based on labels or map text to specified labels, including classification, NER, RE, and EE. When labels in a dataset collectively co-occur in the training set, the model may learn this pattern and overfit to the dataset, failing to independently understand the meaning of each label. Therefore, during the instruction synthesis process, we adopt a polling strategy that designates only one label from each training sample as part of a bucket. Additionally, since some labels have similar semantics and can be confused, we group easily confused labels into a single bucket, allowing the model to learn the semantic differences between the two labels more effectively.

• 标签分桶：[ 25]此策略侧重于标签导向型任务，其目的是根据标签提取文本或将文本映射到指定的标签，包括分类、NER、RE 和 EE。当数据集中的标签在训练集中共同出现时，模型可能会学习这种模式并过度拟合数据集，从而无法独立理解每个标签的含义。因此，在指令合成过程中，我们采用一种轮询策略，从每个训练样本中仅指定一个标签作为桶的一部分。此外，由于某些标签具有相似的语义并且可能会混淆，因此我们将容易混淆的标签分组到一个桶中，使模型能够更有效地学习两个标签之间的语义差异。
•

Flexible and Diverse Input and Output Formats: The LLM employs an instruction-following approach for inference, and a highly consistent input-output format may cause the model to overfit to specific tasks, resulting in a lack of generalization for unseen formats. Therefore, we have flexibly processed the input and output formats. The output is handled as five different formatting instructions, as well as two types of natural language instructions. Additionally, the output format can dynamically be specified as markdown, JSON, natural language, or any format indicated in the examples.

• 灵活多样的输入和输出格式：LLM 采用指令跟随方法进行推理，高度一致的输入输出格式可能会导致模型过度拟合特定任务，从而导致对看不见的格式缺乏泛化。因此，我们灵活地处理了输入和输出格式。输出作为 5 种不同的格式指令以及两种类型的自然语言指令进行处理。此外，输出格式可以动态指定为 markdown、JSON、自然语言或示例中指示的任何格式。
•

Instructoin with Task Guildline: Traditional NLP training often employs a "sea of questions" approach, incorporating a wide variety of data in the training set. This allows the model to understand task requirements during the learning process, such as whether to include job titles when extracting personal names. For the training of LLMs, we aim for the model to perform tasks like a professional annotator by comprehending the task description. Therefore, for the collected NLU tasks, we summarize the task descriptions using a process of self-reflection within the LLM. This creates training data that includes task descriptions within the instructions. Additionally, to enhance task diversity, we implement heuristic strategies to rephrase the task descriptions and answers. This enables the model to understand the differences between task descriptions more accurately and to complete tasks according to the instructions.

• 使用 Task Guildline 进行指导：传统的 NLP 训练通常采用“问题之海”方法，在训练集中包含各种数据。这允许模型了解学习过程中的任务要求，例如在提取个人姓名时是否包含职称。对于 LLM 的训练，我们的目标是通过理解任务描述，让模型像专业注释者一样执行任务。因此，对于收集的 NLU 任务，我们使用 LLM 中的自我反思过程来总结任务描述。这将创建包含说明中任务描述的训练数据。此外，为了增强任务多样性，我们实施了启发式策略来重新表述任务描述和答案。这使模型能够更准确地理解任务描述之间的差异，并根据说明完成任务。

We fine-tuned six foundational models: qwen2, llama2, baichuan2, llama3, mistral, phi3, and used six understanding benchmarks recorded on OpenCompass for performance validation. The table 3 shows that the KAG-Model has a significant improvement in NLU tasks.
我们微调了六个基础模型：qwen2、llama2、baichuan2、llama3、mistral、phi3，并使用了 OpenCompass 上记录的六个理解基准测试进行性能验证。表 3 显示 KAG 模型在 NLU 任务方面有显着改进。

Models 模型	C3	WSC	XSum	Lambda	Lcsts LCST 系列	Race 比赛	Average 平均
GPT4	95.10	74.00	20.10	65.50	12.30	92.35	59.89
Qwen2	92.27	66.35	18.68	62.39	13.07	88.37	56.86
KAG_Qwen2 卡格 _Qwen2	92.88	70.19	31.33	66.16	18.53	88.17	61.21
Llama2 美洲驼2	81.70	50.96	23.29	63.26	15.99	55.64	48.47
KAG_Llama2 KAG_{骆驼 2}	82.36	63.46	24.51	65.22	17.51	68.48	53.59
Baichuan2 百川2	84.44	66.35	20.81	62.43	16.54	76.85	54.57
KAG_Baichuan2 KAG_{百川 2}	84.11	66.35	21.51	62.64	17.27	77.18	54.84
Llama3 美洲驼3	86.63	65.38	25.84	36.72	0.09	83.76	49.74
KAG_Llama3 KAG_{骆驼 3}	83.40	62.50	26.72	54.07	18.45	81.16	54.38
Mistral 米斯特拉尔	67.29	30.77	21.16	59.98	0.78	73.46	42.24
KAG_Mistral KAG_{米斯特拉尔}	47.29	39.42	21.54	69.09	17.14	72.42	44.48
Phi3	68.60	42.31	0.60	71.74	3.47	73.18	43.32
KAG_Phi3	85.21	25.94	0.36	71.24	15.49	74.00	45.37

Table 3: Enhancement of natural language understanding capabilities in different LLMs by KAG. The experimental results are based on the open-compass framework and tested using the “gen” mode. The evaluation metrics for C3, WSC, Lambda, and Race are ACC. XSum and Lcsts are measured using ROUGE-1. Race includes Race-middle and Race-high, and their average is taken.
表 3： KAG 增强了不同 LLM 中的自然语言理解能力。实验结果基于开放罗盘框架，并使用 “gen” 模式进行测试。C3、WSC、Lambda 和 Race 的评估指标是 ACC。XSum 和 Lcsts 是使用 ROUGE-1 测量的。Race 包括 Race-middle 和 Race-high，取它们的平均值。

2.5.2 Natural Language Inference
2.5.2 自然语言推理

The NLI task is used to infer the semantic relations between given phrases. Typical NLI tasks include entity linking, entity disambiguation, taxonomy expansion, hypernym discovery, and text entailment. In the context of knowledge base Q&A, due to the diversity and ambiguity of natural language expressions, as well as the subtle and different types of semantic connections between phrases, it often requires further alignment or retrieval of related information through NLI tasks based on NLU. As described in section 2.4, we categorize the key semantic relation in knowledge base applications into six types. Among these, relations such as isA, isPartOf and contains exhibit directional and distance-based partial order relations. During the reasoning process, it is crucial to accurately determine these semantic relations to advance towards the target answer. In traditional approaches, separate training of representation pre-training models and KG completion(KGC) models is often employed to reason about semantic relations. However, these KGC models tend to focus on learning graph structures and do not fully utilize the essential textual semantic information for semantic graph reasoning. LLMs possess richer intrinsic knowledge, and can leverage both semantic and structural information to achieve more precise reasoning outcomes. To this end, we have collected a high-quality conceptual knowledge base and ontologies from various domains, creating a conceptual knowledge set that includes 8,000 concepts and their semantic relations. Based on this knowledge set, we constructed a training dataset that includes six different types of conceptual reasoning instructions to enhance the semantic reasoning capabilities of a given base model, thereby providing semantic reasoning support for KAG.
NLI 任务用于推断给定短语之间的语义关系。典型的 NLI 任务包括实体链接、实体消除歧义、分类扩展、上位词发现和文本蕴涵。在知识库问答的语境中，由于自然语言表达的多样性和歧义性，以及短语之间微妙而不同类型的语义联系，往往需要通过基于 NLU 的 NLI 任务对相关信息进行进一步的对齐或检索。如 2.4 节所述，我们将知识库应用程序中的关键语义关系分为六种类型。其中，isA、isPartOf 和 contains 等关系表现出方向和基于距离的偏序关系。在推理过程中，准确确定这些语义关系以推进目标答案至关重要。在传统方法中，通常采用表征预训练模型和 KG 完成（KGC）模型的单独训练来推理语义关系。然而，这些 KGC 模型往往专注于学习图结构，并没有充分利用基本的文本语义信息进行语义图推理。LLM 拥有更丰富的内在知识，可以利用语义和结构信息来实现更精确的推理结果。为此，我们从各个领域收集了高质量的概念知识库和本体论，创建了一个包含 8000 个概念及其语义关系的概念知识集。基于这个知识集，我们构建了一个训练数据集，其中包括六种不同类型的概念推理指令，以增强给定基础模型的语义推理能力，从而为 KAG 提供语义推理支持。

Semantic reasoning is one of the core ability required in KAG process, we use NLI tasks and general reasoning Q&A tasks to evaluate the ability of our model, the results are as shown in Table 4 and Table 5. The evaluation results indicates that our KAG-Model demonstrates a significant improvement in tasks related with semantic reasoning: First, Table 5 shows that on the Hypernym Discovery task(which is consistent in form with the reasoning required in semantic enhanced indexing and retrieval.), our fine-tuned KAG-llama model outperforms Llama3 and ChatGPT-3.5 significantly. In addition, the better performance of our model on CMNLI, OCNLI and SIQA compared with Llama3 in Table 4 shows that our model has good capabilities in general logical reasoning.
语义推理是 KAG 过程中需要的核心能力之一，我们使用 NLI 任务和一般推理 Q&A 任务来评估我们模型的能力，结果如表 4 和表 5 所示。评估结果表明，我们的 KAG 模型在与语义推理相关的任务方面表现出显着改进：首先，表 5 显示，在上位词发现任务（在形式上与语义增强索引和检索所需的推理一致）上，我们微调的 KAG-llama 模型明显优于 Llama3 和 ChatGPT-3.5。此外，与表 4 中的 Llama3 相比，我们的模型在 CMNLI、OCNLI 和 SIQA 上的性能更好，这表明我们的模型具有良好的一般逻辑推理能力。

Models 模型	CMNLI	OCNLI	SIQA
Llama3 美洲驼3	35.14	32.1	44.27
KAG-Llama3 KAG-骆驼 3	49.52	44.31	65.81

Table 4: Enhancement of natural language Inference capabilities in different LLMs by KAG. The evaluation metrics for CMNLI, OCNLI, SIQA are measured with accuracy.
表 4： KAG 增强了不同 LLM 中的自然语言推理能力。CMNLI、OCNLI、SIQA 的评估指标是准确测量的。

	1A.English 1A 的。英语	2A.Medical 2A 的。医疗	2B.Music 2B.音乐
ChatGPT-3.5 聊天 GPT-3.5	\ul30.04	\ul26.12	\ul28.47
Llama3-8B 美洲驼 3-8B	23.47	24.26	18.73
KAG-Llama3 KAG-骆驼 3	38.26	55.14	30.16

Table 5: Hypernym Discovery performance comparison on SemEval2018-Task9 dataset, measured in MRR.
表 5： Hypernym Discovery 在 SemEval2018-Task9 数据集上的性能比较，以 MRR 为单位。

2.5.3 Natural Language Generation
2.5.3 自然语言生成

Models that have not undergone domain adaptation training often exhibit significant differences from the target text in domain logic and writing style. Moreover, acquiring sufficient amounts of annotated data in specialized domains frequently poses a challenge. Therefore, we have established two efficient fine-tuning methods for specific domain scenarios, allowing the generation process to better align with scene expectations: namely, K-Lora and AKGF.
未经过领域适应训练的模型通常在领域逻辑和写作风格上表现出与目标文本的显著差异。此外，在专业领域中获取足够数量的注释数据经常会带来挑战。因此，我们为特定领域场景建立了两种高效的微调方法，使生成过程更好地与场景预期保持一致：K-Lora 和 AKGF。

Pre-learning with K-LoRA. First of all, we think that using knowledge to generate answers is the reverse process of extracting knowledge from text. Therefore, by inverting the previously described extraction process, we can create a ’triples-to-text’ generation task. With extensive fine-tuning on a multitude of instances, the model can be trained to recognize the information format infused by the KG. Additionally, as the target text is domain-specific, the model can acquire the unique linguistic style of that domain. Furthermore, considering efficiency, we continue to utilize LoRA-based SFT. We refer to the LoRA obtained in this step as K-LoRA.
使用 K-LoRA 进行预学习。首先，我们认为使用知识生成答案是从文本中提取知识的相反过程。因此，通过反转前面描述的提取过程，我们可以创建一个 'triples-to-text' 生成任务。通过对大量实例进行广泛的微调，可以训练模型识别 KG 注入的信息格式。此外，由于目标文本是特定于域的，因此模型可以获取该域的独特语言风格。此外，考虑到效率，我们继续使用基于 LoRA 的 SFT。我们将此步骤中获得的 LoRA 称为 K-LoRA。

Alignment with KG Feedback. The model may still exhibit hallucinations in its responses due to issues such as overfitting. Inspired by the RLHF(Reinforcement Learning with Human Feedback) approach[27, 28], we hope that the KG can serve as an automated evaluator, providing feedback on knowledge correctness of the current response, thereby guiding the model towards further optimization. First, we generate a variety of responses for each query by employing diverse input formats or random seeds. Subsequently, we incorporate the KG to score and rank these responses. The scoring process compare generated answer with knowledge in KG to ascertain their correctness. The reward is determined by the number of correctly matched knowledge triples. The formula for calculating the reward is represented by Formula 1.
与 KG 反馈保持一致。由于过度拟合等问题，模型可能仍会在其响应中表现出幻觉。受 RLHF（Reinforcement Learning with Human Feedback）方法 [27， 28] 的启发，我们希望 KG 可以作为一个自动评估器，提供当前响应的知识正确性反馈，从而引导模型进一步优化。首先，我们通过采用不同的输入格式或随机种子为每个查询生成各种响应。随后，我们纳入 KG 来对这些回答进行评分和排名。评分过程将生成的答案与 KG 中的知识进行比较，以确定它们的正确性。奖励由正确匹配的知识三元组的数量决定。计算奖励的公式由公式 1 表示。

$reward$ = $\log(rspo+\alpha\times re)\ \ (1)$

where $\alpha$ is a hyperparameter, rspo represents the number of SPO matches, and re represents the number of entity matches.
其中 $\alpha$ 是超参数，rspo 表示 SPO 匹配项的数量，re 表示实体匹配项的数量。

We select two biomedical question-answering datasets, CMedQA[29] and BioASQ[30], for evaluating our model. CMedQA is a comprehensive dataset of Chinese medical questions and answers, while BioASQ is an English biomedical dataset. We randomly choose 1,000 instances from each for testing. For CMedQA, we employ the answer texts from the non-selected Q&A pairs as corpora to construct a KG in a weakly supervised manner. Similarly, with BioASQ, we use all the provided reference passages as the domain-specific corpora. Experimental results, as shown in Table 6, demonstrate significant enhancement in generation performance. For more details on the specific implementation process, please refer to our paper[31]
我们选择了两个生物医学问答数据集 CMedQA[29] 和 BioASQ [30] 来评估我们的模型。CMedQA 是一个包含中文医学问答的综合数据集，而 BioASQ 是一个英文生物医学数据集。我们从每个实例中随机选择 1000 个实例进行测试。对于 CMedQA，我们采用来自未选择的问答对的答案文本作为语料库，以弱监督的方式构建 KG。同样，对于 BioASQ，我们使用所有提供的参考段落作为特定领域的语料库。如表 6 所示，实验结果表明生成性能显著提高。有关具体实施流程的更多详细信息，请参阅我们的论文 [31]

Model	CMedQA CMedQA 认证		BioASQ 生物 ASQ
Model	Rouge-L Rouge-L 胭脂	BLEU	Rouge-L Rouge-L 胭脂	BLEU
ChatGPT-3.5 0-shot ChatGPT-3.5 0 发	14.20	1.78	21.14	5.93
ChatGPT-3.5 2-shot ChatGPT-3.5 2 发	14.66	2.53	21.42	6.11
Llama2 美洲驼2	14.02	2.86	23.47	7.11
KAG_Llama2 KAG_{骆驼 2}	15.44	3.46	24.21	7.79

Table 6: Performance comparison on CMedQA & BioASQ. "CP" indicates "continual pre-trained". We consider continual pre-training as a basic method of domain knowledge infusion, on par with other retrieval-based methods. Consequently, we do not report on the outcomes of hybrid approaches.
表 6：在 CMedQA 和 BioASQ 上的性能比较。“CP” 表示 “持续预训练”。我们认为持续的预训练是领域知识注入的基本方法，与其他基于检索的方法相当。因此，我们不报告混合方法的结果。

2.5.4 Onepass Inference
2.5.4 Onepass 推理

Most retrieval enhanced systems operate in a series of presentation models, retrievers, and generation models, resulting in high system complexity, construction costs, and the inevitable concatenation loss caused by error transfer between modules. We introduces an efficient one-pass unified generation and retrieval (OneGen) model to enable an arbitrary LLM to generate and retrieve in one single forward pass. Inspired by the latest success in LLM for text embedding, we expand the original vocabulary by adding special tokens (i.e. retrieval tokens), and allocate the retrieval task to retrieval tokens generated in an autoregressive manner. During training, retrieval tokens only participate in representation fine-tuning through contrastive learning, whereas other output tokens are trained using language model objectives. At inference time, we use retrieval tokens for efficient retrieving on demand. Unlike the previous pipeline approach where at least two models are needed for retrieval and generation, OneGen unified them in one model, thus eliminating the need for a separate retriever and greatly reducing system complexity.
大多数检索增强系统运行在一系列的表示模型、检索器和生成模型中，导致系统复杂度高，构建成本高，并且不可避免地会因模块之间的错误传递而造成级联损失。我们引入了一种高效的一次性统一生成和检索（OneGen）模型，使任意 LLM 能够在一次正向传递中生成和检索。受到 LLM 在文本嵌入方面的最新成功的启发，我们通过添加特殊标记（即检索标记）扩展了原始词汇表，并将检索任务分配给以自回归方式生成的检索标记。在训练期间，检索标记仅通过对比学习参与表示微调，而其他输出标记则使用语言模型目标进行训练。在推理时，我们使用检索令牌按需进行高效检索。与之前至少需要两个模型进行检索和生成的管道方法不同，OneGen 将它们统一在一个模型中，从而消除了对单独检索器的需求，并大大降低了系统复杂性。

As shown in experiment results in Table 7, we draw the following conclusions: (1) OneGen demonstrates efficacy in ${R\rightarrow G}$ task, and joint training of retrieval and generation yields performance gains on the RAG task. The Self-RAG endows LLMs with self-assessment and adaptive retrieval, while OneGen adds self-retrieval. Our method outperforms the original Self-RAG across all datasets, especially achieving improvements of 3.1pt on Pub dataset and 2.8pt on ARC dataset, validating the benefits of joint training. (2) OneGen is highly efficient in training, with instruction-finetuned LLMs showing strong retrieval capabilities with minimal additional tuning. It requires less and lower-quality retrieval data, achieving comparable performance with just 60K noisy samples and incomplete documents, without synthetic data. For more details on the specific implementation process, please refer to paper[32]
如表 7 中的实验结果所示，我们得出以下结论：（1） OneGen 在任务中 ${R\rightarrow G}$ 表现出有效性，检索和生成的联合训练在 RAG 任务上产生了性能提升。Self-RAG 赋予 LLM 自我评估和自适应检索功能，而 OneGen 则增加了自我检索功能。我们的方法在所有数据集上都优于原始的 Self-RAG，尤其是在 Pub 数据集上实现了 3.1pt 的改进，在 ARC 数据集上实现了 2.8pt 的改进，验证了联合训练的好处。（2） OneGen 在训练方面非常高效，指令微调的 LLM 显示出强大的检索能力，只需最少的额外调整。它需要更少且质量更低的检索数据，在没有合成数据的情况下，仅使用 60K 噪声样本和不完整文档即可实现相当的性能。具体实现流程详见论文 [32]

		Generation Performance 发电性能				Retrieval Performance 检索性能
		HotpotQA 火锅 QA		2WikiMultiHopQA 2WikiMultiHopQA 的		HotpotQA 火锅 QA	2WikiMultiHopQA 2WikiMultiHopQA 的
BackBone 骨干	Retriever 猎犬	EM	F1	EM	F1	Recall@1	Recall@1
	Contriever 集成	52.83	65.64	70.02	74.35	73.76	68.75
Llama2-7B 美洲驼 2-7B	self 自我	54.82	67.93	75.02	78.86	75.90	69.79
	Contriever 集成	53.72	66.46	70.92	75.29	69.79	66.80
Llama3.1-7B 美洲驼 3.1-7B	self 自我	55.38	68.35	75.88	79.60	72.55	68.98
	Contriever 集成	48.55	61.02	68.32	72.66	72.41	67.70
Qwen2-1.5B	self 自我	48.75	60.98	73.84	77.44	72.70	69.27
	Contriever 集成	53.32	66.22	70.80	74.86	74.15	69.01
Qwen2-7B	self 自我	55.12	67.60	76.17	79.82	75.68	69.96

Table 7: In RAG for Multi-Hop QA settings, performance comparison across different datasets using different LLMs.
表 7：在 RAG for Multi-Hop QA 设置中，使用不同 LLM 的不同数据集的性能比较。

3 Experiments 3 实验

3.1 Experimental Settings
3.1 实验设置

Datasets. To evaluate the effectiveness of the KAG for knowledge-intensive question-answering task, we perform experiments on 3 widely-used multi-hop QA datasets, including HotpotQA [20], 2WikiMultiHopQA [18], and MuSiQue [19]. For a fair comparison, we follow IRCoT [33] and HippoRAG [12] utilizing 1,000 questions from each validation set and using the retrieval corpus related to selected questions.
数据。为了评估 KAG 对知识密集型问答任务的有效性，我们在 3 个广泛使用的多跳 QA 数据集上进行了实验，包括 HotpotQA [20]、2WikiMultiHopQA [18] 和 MuSiQue [19]。为了进行公平的比较，我们遵循 IRCoT [33] 和 HippoRAG [12]，利用每个验证集中的 1,000 个问题，并使用与所选问题相关的检索语料库。

Evaluation Metric. When evaluating QA performance, we use two metrics: Exact Match (EM) and F1 scores. For assessing retrieval performance, we calculate the hit rates based on the Top 2/5 retrieval results, represented as Recall@2 and Recall@5.
评估指标。在评估 QA 性能时，我们使用两个指标：精确匹配（EM）和 F1 分数。为了评估检索性能，我们根据前 2/5 检索结果计算命中率，以 Recall@2 和 Recall@5 表示。

Comparison Methods. We evaluate our approach against several robust and commonly utilized retrieval RAG methods. NativeRAG using ColBERTv2 [34] as retriever and directly generates answers based on all retrieved documents [35]. HippoRAG is a RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. In this paper, we also use ColBERTv2 [34] as its retriever [12]. IRCoT interleaves chain-of-thought (CoT) generation and knowledge retrieval steps in order to guide the retrieval by CoT and vice-versa. This interleaving allows retrieving more relevant information for later reasoning steps. It is a key technology for implementing multi-step retrieval in the existing RAG framework.
比较方法。我们根据几种稳健且常用的检索 RAG 方法评估我们的方法。 NativeRAG 使用 ColBERTv2 [34] 作为检索器，并根据所有检索到的文档直接生成答案 [35]。 HippoRAG 是一个受人类长期记忆启发的 RAG 框架，它使 LLM 能够持续整合外部文档中的知识。在本文中，我们还使用 ColBERTv2 [34] 作为其检索器 [12]。 IRCoT 交错了思维链（CoT）生成和知识检索步骤，以指导 CoT 的检索，反之亦然。这种交错允许为以后的推理步骤检索更多相关信息。它是在现有 RAG 框架中实现多步骤检索的关键技术。

3.2 Experimental Results
3.2 实验结果

3.2.1 Overall Results
3.2.1 总体结果

The end-to-end Q&A performance is shown in Table 8. Within the RAG frameworks leveraging ChatGPT-3.5 as backbone model, HippoRAG demonstrates superior performance compared to NativeRAG. HippoRAG employs a human long-term memory strategy that facilitates the continuous integration of knowledge from external documents into LLMs, thereby significantly enhancing Q&A capabilities. However, given the substantial economic costs associated with utilizing ChatGPT-3.5, we opted to use the DeepSeek-V2 API as a viable alternative. On average, the performance of the IRCoT + HippoRAG configuration utilizing the DeepSeek-V2 API slightly surpasses that of ChatGPT-3.5. Our constructed framework KAG shows significant performance improvement compared to IRCoT + HippoRAG, with EM increases of 11.5%, 19.8%, and 10.5% on HotpotQA, 2WikiMultiHopQA, and MuSiQue respectively, and F1 improvements of 12.5%, 19.1%, and 12.2%. These advancements in end-to-end performance can largely be attributed to the development of more effective indexing, knowledge alignment and hybrid solving libraries within our framework. We evaluate the effectiveness of the single-step retriever and multi-step retriever, with the retrieval performance shown in Table 9. From the experimental results, it is evident that the multi-step retriever generally outperforms the single-step retriever. Analysis reveals that the content retrieved by the single-step retriever exhibits very high similarity, resulting in an inability to use the single-step retrieval outcomes to derive answers for certain data that require reasoning. The multi-step retriever alleviates this issue. Our proposed KAG framework directly utilizes the multi-step retriever and significantly enhances retrieval performance through strategies such as mutual-indexing, logical form solving, and knowledge alignment.
端到端 Q&A 性能如表 8 所示。在利用 ChatGPT-3.5 作为主干模型的 RAG 框架中，与 NativeRAG 相比，HippoRAG 表现出卓越的性能。HippoRAG 采用人类长期记忆策略，有助于将外部文档中的知识持续集成到 LLM 中，从而显着增强问答能力。然而，考虑到与使用 ChatGPT-3.5 相关的大量经济成本，我们选择使用 DeepSeek-V2 API 作为可行的替代方案。平均而言，使用 DeepSeek-V2 API 的 IRCoT + HippoRAG 配置的性能略高于 ChatGPT-3.5。与 IRCoT + HippoRAG 相比，我们构建的框架 KAG 表现出显著的性能提升，在 HotpotQA、2WikiMultiHopQA 和 MuSiQue 上的 EM 分别提高了 11.5%、19.8% 和 10.5%，F1 提高了 12.5%、19.1% 和 12.2%。端到端性能的这些进步在很大程度上可以归因于在我们的框架内开发更有效的索引、知识对齐和混合求解库。我们评估了单步检索器和多步检索器的有效性，检索性能如表 9 所示。从实验结果可以明显看出，多步检索器通常优于单步检索器。分析表明，单步检索器检索到的内容表现出非常高的相似性，导致无法使用单步检索结果来得出某些需要推理的数据的答案。多步检索器缓解了这个问题。我们提出的 KAG 框架直接利用多步骤检索器，并通过互索引、逻辑形式求解和知识对齐等策略显著提高检索性能。

Framework 框架	Model	HotpotQA 火锅 QA		2WikiMultiHopQA 2WikiMultiHopQA 的		MuSiQue 音乐阙
Framework 框架		EM	F1	EM	F1	EM	F1
NativeRAG [35, 34] NativeRAG [35， 34]	ChatGPT-3.5 聊天 GPT-3.5	43.4	57.7	33.4	43.3	15.5	26.4
HippoRAG [12, 34] 河马 [12， 34]	ChatGPT-3.5 聊天 GPT-3.5	41.8	55.0	46.6	59.2	19.2	29.8
IRCoT+NativeRAG	ChatGPT-3.5 聊天 GPT-3.5	45.5	58.4	35.4	45.1	19.1	30.5
IRCoT+HippoRAG	ChatGPT-3.5 聊天 GPT-3.5	45.7	59.2	47.7	62.7	21.9	33.3
IRCoT+HippoRAG	DeepSeek-V2 深度搜索-V2	51.0	63.7	48.0	57.1	26.2	36.5
KAG w/ $LFS_{ref_{3}}$ KAG 带 $LFS_{ref_{3}}$	DeepSeek-V2 深度搜索-V2	59.8	74.0	66.3	76.1	35.4	48.2
KAG w/ $LFSH_{ref_{3}}$ KAG 带 $LFSH_{ref_{3}}$	DeepSeek-V2 深度搜索-V2	62.5	76.2	67.8	76.2	36.7	48.7

Table 8: The end-to-end generation performance of different RAG models on three multi-hop Q&A datasets. The values in bold and underline are the best and second best indicators respectively.
表 8：不同 RAG 模型在三个多跳 Q&A 数据集上的端到端生成性能。粗体和下划线中的值分别是最佳和次佳指标。

	Retriever 猎犬	HotpotQA 火锅 QA		2Wiki 2维基		MuSiQue 音乐阙
	Retriever 猎犬	Recall@2	Recall@5	Recall@2	Recall@5	Recall@2	Recall@5
Single-step 单步	BM25 [36]	55.4	72.2	51.8	61.9	32.3	41.2
	Contriever [37] 连接 [37]	57.2	75.5	46.6	57.5	34.8	46.6
	GTR [38]	59.4	73.3	60.2	67.9	37.4	49.1
	RAPTOR [39] 猛禽 [39]	58.1	71.2	46.3	53.8	35.7	45.3
	Proposition [40] 命题 [40]	58.7	71.1	56.4	63.1	37.6	49.3
	NativeRAG [35, 34] NativeRAG [35， 34]	64.7	79.3	59.2	68.2	37.9	49.2
	HippoRAG [12, 34] 河马 [12， 34]	60.5	77.7	70.7	89.1	40.9	51.9
Multi-step 多步骤	IRCoT + BM25	65.6	79.0	61.2	75.6	34.2	44.7
	IRCoT + Contriever	65.9	81.6	51.6	63.8	39.1	52.2
	IRCoT + NativeRAG	67.9	82.0	64.1	74.4	41.7	53.7
	IRCoT + HippoRAG	67.0	83.0	75.8	93.9	45.3	57.6
	KAG	72.8	88.8	65.4	91.9	48.5	65.7

Table 9: The performance of different retrieval models on three multi-hop Q&A datasets
表 9：不同检索模型在 3 个多跳 Q&A 数据集上的性能

3.3 Ablation Studies
3.3 消融研究

The objective of this experiment is to deeply investigate the impact of the knowledge alignment and logic form solver on the final results. We conduct ablation studies for each module by substituting different methods and analyzing the changes in outcomes.
本实验的目的是深入研究知识对齐和逻辑表单求解器对最终结果的影响。我们通过替换不同的方法并分析结果的变化来对每个模块进行消融研究。

3.3.1 Knowledge Graph Indexing Ablation
3.3.1 知识图谱索引消融

In the graph indexing phase, we propose the following two substitution methods:
在图索引阶段，我们提出了以下两种替换方法：

1) Mutual Indexing Method. As a baseline method of KAG, according to the introduction in Sections 2.1 and 2.2, we use information extraction methods (such as OpenIE) to extract phrases and triples in document chunks, and form the mutual-indexing between graph structure and text chunks according to the hierarchical representation of LLMFriSPG, and then write them into KG storage. We denote this method as M_Indexing.
1）互标法。作为 KAG 的基线方法，根据 2.1 节和 2.2 节的介绍，我们使用信息提取方法（如 OpenIE）提取文档块中的短语和三元组，并根据 LLMFriSPG 的分层表示形成图结构和文本块之间的互索引，然后将它们写入 KG 存储。我们将此方法表示为 M_Indexing。

2) Knowledge Alignment Enhancement. This method uses knowledge alignment to enhance the KG mutual-indexing and the logical form-guided reasoning & retrieval. According to the introduction in Section 2.4, it mainly completes tasks such as the classification of instances and concepts, the prediction of hypernyms/hyponyms of concepts, the completion of the semantic relationships between concepts, the disambiguation and fusion of entities, etc., which enhances the semantic distinction of knowledge and the connectivity between instances, laying a solid foundation for subsequent reasoning and retrieval guided by logical forms. We denote this method as K_Alignment.
2）知识对齐增强。这种方法使用知识对齐来增强 KG 互为索引和逻辑形式引导的推理和检索。根据 2.4 节中的介绍，主要完成实例和概念的分类、概念的上位词/下位词的预测、概念间语义关系的完成、实体的消歧融合等任务，增强了知识的语义区分和实例之间的联系，为后续以逻辑形式为导向的推理和检索奠定了坚实的基础。我们将此方法表示为 K_Alignment。

3.3.2 Reasoning and Retrieval Ablation
3.3.2 推理和检索消融

Multi-round Reflection. We adopted the multi-round reflection mechanism from ReSP[26] to assess whether the Logical Form Solver has fully answered the question. If not, supplementary questions are generated for iterative solving until the information in global memory is sufficient. We analyzed the impact of the maximum iteration count $n$ on the results, denoted as $ref_{n}$ . If $n=1$ , it means that the reflection mechanism is not enabled. In the reasoning and retrieval phase, we design the following three substitution methods:
多轮反射。我们采用了 ReSP[26] 的多轮反射机制来评估 Logical Form Solver 是否完全回答了这个问题。否则，将生成补充问题进行迭代求解，直到全局内存中的信息足够。我们分析了最大迭代计数 $n$ 对结果的影响，表示为 $ref_{n}$ 。如果 $n=1$ 为，则表示未启用反射机制。在推理和检索阶段，我们设计了以下三种替换方法：

1) Chunks Retriever. We define KAG’s baseline retrieval strategy with reference to HippoRAG’s[12] retrieval capabilities, with the goal of recalling the top_k chunks that support answering the current question. The Chunk score is calculated by weighting the vector similarity and the personalized pagerank score. We denote this method as $ChunkRetri$ , we denote ChunkRetri with n-round reflections as $CR_{ref_{n}}$ .
1）块检索器。我们参考 HippoRAG 的 [12] 检索能力定义了 KAG 的基线检索策略，目的是召回支持回答当前问题的 top_k 块。区块分数是通过对向量相似度和个性化 pagerank 分数进行加权来计算的。我们将此方法表示为 $ChunkRetri$ ，我们将具有 n 轮反射的 ChunkRetri 表示为 $CR_{ref_{n}}$ 。

2) Logical Form Solver (Enable Graph Retrieval). Next, we employ a Logical Form Solver for reasoning. This method uses pre-defined logical forms to parse and answer questions. First, it explores the reasoning ability of the KG structure in $KG_{cs}$ and $KG_{fr}$ spaces, focusing on accuracy and rigor in reasoning. Then, it uses supporting_chunks in $RC$ to supplement retrieval when the previous step of reasoning has no results. We denote this method as $LFS_{ref_{n}}$ . The parameter $n$ is maximum number of iteration parameter.
2） Logical Form Solver （启用图形检索）。接下来，我们使用 Logical Form Solver 进行推理。此方法使用预定义的逻辑形式来解析和回答问题。首先，它探讨了 KG 结构在 $KG_{cs}$ 和 $KG_{fr}$ 空间中的推理能力，侧重于推理的准确性和严谨性。然后，当上一步推理没有结果时，它使用 supporting_chunks in $RC$ 来补充检索。我们将此方法表示为 $LFS_{ref_{n}}$ 。该参数 $n$ 是最大迭代参数数。

3) Logical Form Solver (Enable Hybrid Retrieval). In order to make full use of the mutual-indexing structure between $KG_{fr}$ and $RC$ to further explore the role of KG structure in enhancing chunk retrieval, we modify the $LFS_{ref_{n}}$ by disabling the Graph Retrieval functionality for direct reasoning. Instead, all answers are generated using the Hybrid Retrieval method. This approach enables us to evaluate the contribution of graph retrieval to the performance of reasoning. We denote this method as $LFSH_{ref_{n}}$ .
3）逻辑形式求解器（启用混合检索）。为了充分利用 KG 结构之间的 $KG_{fr}$ 互向索引结构，并 $RC$ 进一步探索 KG 结构在增强 chunk 检索中的作用，我们 $LFS_{ref_{n}}$ 通过禁用 Graph Retrieval 功能进行直接推理来修改 KG 结构。相反，所有答案都是使用 Hybrid Retrieval 方法生成的。这种方法使我们能够评估图形检索对推理性能的贡献。我们将此方法表示为 $LFSH_{ref_{n}}$ 。

Through the design of this ablation study, we aim to comprehensively and deeply understand the impact of different graph indexing and reasoning methods on the final outcomes, providing strong support for subsequent optimization and improvement.
通过本次消融研究的设计，我们旨在全面、深入地了解不同的图索引和推理方法对最终结果的影响，为后续的优化和改进提供有力的支持。

3.3.3 Experimental Results and Discussion
3.3.3 实验结果与讨论

Graph Index 图形索引	Reasoning 推理	HotpotQA 火锅 QA		2Wiki 2维基		MuSiQue 音乐阙
Graph Index 图形索引		EM	F1	EM	F1	EM	F1
M_Indexing	$CR_{ref_{3}}$	52.4	65.4	48.2	56.0	24.6	36.6
K_Alignment	$CR_{ref_{3}}$	54.7	69.5	62.7	72.5	29.6	41.1
	$LFS_{ref_{1}}$	59.1	73.4	65.2	74.4	31.3	43.4
	$LFS_{ref_{3}}$	59.8	74.0	66.3	76.1	35.4	48.2
	$LFSH_{ref_{1}}$	61.5	76.0	66.0	75.0	33.5	44.3
	$LFSH_{ref_{3}}$	62.5	76.2	67.8	76.2	36.7	48.7

Table 10: The end-to-end generation performance of different model methods on three multi-hop Q&A datasets. The backbone model is DeepSeek-V2 API. As is described in Algorithm 17,

ref_{3}

represents a maximum of 3 rounds of reflection, and

ref_{1}

represents a maximum of 1 round, which means that no reflection is introduced.
表 10：不同模型方法在 3 个多跳 Q&A 数据集上的端到端生成性能。主干模型是 DeepSeek-V2 API。如算法 17 中所述，

ref_{3}

表示最多 3 轮反射，

ref_{1}

最多表示 1 轮反射，这意味着不引入反射。

Graph Index 图形索引	Reasoning 推理	HotpotQA 火锅 QA		2Wiki 2维基		MuSiQue 音乐阙
Graph Index 图形索引		R@2	R@5	R@2	R@5	R@2	R@5
M_Indexing	$CR_{ref_{3}}$	61.5	73.8	54.6	59.7	39.3	52.8
K_Alignment	$CR_{ref_{3}}$	56.3	83.0	66.3	88.1	40.0	62.3
	$LFS_{ref_{1}}$	/	/	/	/	/	/
	$LFS_{ref_{3}}$	/	/	/	/	/	/
	$LFSH_{ref_{1}}$	55.1	85.0	65.9	92.4	36.1	58.4
	$LFSH_{ref_{3}}$	72.7	88.8	65.4	91.9	48.4	65.6

Table 11: The recall performance of different methods across three datasets is presented. The answers to some sub-questions in the

LFS_{ref_{n}}

method use KG reasoning without recalling supporting chunks, which is not comparable to other methods in terms of recall rate. BackBone model is DeepSeek-V2 API.
表 11：介绍了三个数据集中不同方法的召回性能。该方法

LFS_{ref_{n}}

中一些子问题的答案使用了 KG 推理，没有召回支持块，这在召回率方面与其他方法没有可比性。BackBone 模型是 DeepSeek-V2 API。

The analysis of the experimental outcomes can be approached from the following two perspectives:
可以从以下两个角度对实验结果进行分析：

1) Knowledge Graph Indexing. As is shown in Table 11, after incorporation Knowledge Alignment into the KG mutual-indexing, the top-5 recall rates of $CR_{ref_{3}}$ improved by 9.2%, 28.4%, and 9.5% respectively, with an average improvement of 15.7%. As shown in Figure 9, after enhancing knowledge alignment, the relation density is significantly increased, and the frequency-outdegree graph is shifted to the right as a whole
1）知识图谱索引。如表 11 所示，将 Knowledge Alignment 纳入 KG 互惠索引后，前 5 名的召回率分别 $CR_{ref_{3}}$ 提高了 9.2%、28.4% 和 9.5%，平均提高了 15.7%。如图 9 所示，增强知识对齐后，关系密度显著增加，频外度图整体向右移动

•

The 1-hop graph exhibits a notable rightward shift, indicating that the addition of semantic structuring has increased the number of neighbors for each node, thereby enhancing the graph’s density.

• 1 跳图表现出明显的右移，表明语义结构的添加增加了每个节点的邻居数，从而提高了图的密度。
•

The 2-hop and 3-hop graphs display an uneven distribution, with sparse regions on the left and denser regions on the right. When comparing before and after K_Alignment, it is evident that the vertices in each dataset have shifted rightward, with the left side becoming more sparse. This suggests that nodes with fewer multi-hop neighbors have gained new neighbors, leading to this observed pattern.

• 2 跃点和 3 跃点图形显示不均匀的分布，左侧是稀疏区域，右侧是较密集的区域。在比较 K_Alignment 前后时，很明显每个数据集中的顶点都向右移动，左侧变得更加稀疏。这表明具有较少多跳邻接方的节点获得了新的邻接方，从而导致了这种观察到的模式。

This signifies that the newly added semantic relations effectively enhance graph connectivity, thereby improving document recall rates.
这意味着新添加的语义关系有效地增强了图形连接性，从而提高了文档召回率。

2) Graph Inference Analysis. In terms of recall, $LFSH_{ref_{3}}$ achieves improvements over $CR_{ref_{3}}$ under the same graph index, with increases in top-5 recall rates by 15%, 32.2%, and 12.7%, averaging an improvement of 19.9%. This enhancement can be attributed to two main factors:
2）图推理分析。在召回率方面， $LFSH_{ref_{3}}$ 与同一图表指数 $CR_{ref_{3}}$ 相比有所提高，前 5 名召回率分别提高了 15%、32.2% 和 12.7%，平均提高了 19.9%。这种增强可归因于两个主要因素：

•

$LFSH_{ref_{3}}$ decomposes queries into multiple executable steps, with each sub-query retrieving chunks individually. As shown in the time analysis in Figure 8, both $LFSH_{ref_{3}}$ and $LFS_{ref_{3}}$ consume more than twice the time of $LFSH_{ref_{3}}$ , indicating that increased computational time is a trade-off for improved recall rates.

• $LFSH_{ref_{3}}$ 将查询分解为多个可执行步骤，每个子查询单独检索块。如图 8 中的时间分析所示，两者 $LFSH_{ref_{3}}$ 和 $LFS_{ref_{3}}$ 消耗的时间是 $LFSH_{ref_{3}}$ 的两倍多，这表明增加的计算时间是提高召回率的权衡。
•

$LFSH_{ref_{3}}$ not only retrieves chunks but also integrates SPO triples from execution into chunk computation. Compared to $LFSH_{ref_{3}}$ , it retrieves additional query-related relationships.

• $LFSH_{ref_{3}}$ 不仅可以检索块，还可以将执行中的 SPO 三元组集成到块计算中。与 $LFSH_{ref_{3}}$ 相比，它检索其他与查询相关的关系。

Due to the subgraph-based query answering in $LFS_{ref_{3}}$ , it cannot be compared directly in recall rate analysis but can be examined using the F1 metric. In comparison to $LFSH_{ref_{3}}$ , $LFS_{ref_{3}}$ answered questions based on the retrieved subgraphs with proportions of 33%, 34%, and 18%,respectively. $LFS_{ref_{3}}$ shows a decrease in the F1 metric by 2.2%, 0.1%, and 0.5%, while the computation time reduces by 12%, 22%, and 18%.
由于中 $LFS_{ref_{3}}$ 基于子图的查询回答，它不能在召回率分析中直接比较，但可以使用 F1 量度进行检查。与 $LFSH_{ref_{3}}$ 相比， $LFS_{ref_{3}}$ 根据检索到的子图回答问题，比例分别为 33%、34% 和 18%。 $LFS_{ref_{3}}$ 显示 F1 量度减少了 2.2%、0.1% 和 0.5%，而计算时间减少了 12%、22% 和 18%。

The analysis of the cases with decreased performance reveals that errors or incomplete SPOs during the construction phase lead to incorrect sub-query answers, resulting in wrong final answers. This will be detailed in the case study. The reduction in computation time is primarily due to the more efficient retrieval of SPOs compared to document chunks.
对性能下降的案例的分析表明，构建阶段的错误或不完整的 SPO 会导致子查询答案不正确，从而导致最终答案错误。这将在案例研究中详细说明。计算时间的减少主要是由于与文档块相比，SPO 的检索效率更高。

In industrial applications, computation time is a crucial metric. Although $LFS_{ref_{n}}$ may introduce some errors, these can be improved through graph correction and completion. It is noteworthy that in the current experiments, the slight decrease in metrics has been traded off for reduced computation time, which we consider a feasible direction for industrial implementation.
在工业应用中，计算时间是一个关键指标。虽然可能会引入一些错误，但 $LFS_{ref_{n}}$ 这些可以通过图形校正和完成来改进。值得注意的是，在目前的实验中，指标的轻微减少已经被减少的计算时间所折衷，我们认为这是工业实现的可行方向。

For analyze the impact of the maximum number of iterations parameter $n$ on the results, $LFS_{ref_{1}}$ compared to $LFS_{ref_{3}}$ , the F1 scores decreased by 0.6%, 1.6%, and 4.8%, respectively. Based on the experiments of $LFS_{ref_{3}}$ , the proportions for an iteration count of 1 were analyzed to be 97.2%, 94.8%, and 87.9%; $LFSH_{ref_{1}}$ compared to $LFSH_{ref_{3}}$ , the F1 scores decreased by 0.2%, 1.2%, and 4.4%, respectively. Based on the experiments of $LFSH_{ref_{3}}$ , the proportions for an iteration count of 1 were analyzed to be 98.3%, 95.2%, and 84.1%; showing a positive correlation with the F1 score reduction. Table 13 provides a detailed analysis of the effect of iteration rounds on the solution of the final answer. Increasing the maximum number of iterations parameter facilitates the re-planning of existing information when $LFS_{ref_{n}}$ is unable to complete the solution, thereby addressing some unsolvable case.
对于分析最大迭代次数参数 $n$ 对结果的影响， $LFS_{ref_{1}}$ 与 $LFS_{ref_{3}}$ 相比，F1 分数分别下降了 0.6%、1.6% 和 4.8%。根据的实验 $LFS_{ref_{3}}$ ，分析迭代计数为 1 的比例为 97.2%、94.8% 和 87.9%; $LFSH_{ref_{1}}$ 与 $LFSH_{ref_{3}}$ 相比，F1 分数分别下降了 0.2% 、 1.2% 和 4.4%。根据的实验 $LFSH_{ref_{3}}$ ，分析迭代计数为 1 的比例为 98.3%、95.2% 和 84.1%;显示与 F1 分数降低呈正相关。表 13 详细分析了迭代轮次对最终答案解决方案的影响。增加 maximum number of iterations 参数有助于在无法完成解决方案时 $LFS_{ref_{n}}$ 重新规划现有信息，从而解决一些无法解决的情况。

4 Applications 4 应用

4.1 KAG for E-Goverment
4.1 电子政府 KAG

We used the KAG framework and combined it with the Alipay E-government service scenario to build a Q&A application that supports answering users’ questions about service methods, required materials, service conditions, and service locations. To build the e-government Q&A application, we first collected 11,000 documents about government services, and based on the methods described in section 2, implemented functional modules such as index building, logical-form-guided reasoning and solving, semantic enhancement, and conditional summary generation.
我们使用 KAG 框架，并将其与支付宝政务服务场景相结合，构建了一个问答应用，支持回答用户关于服务方式、所需材料、服务条件、服务地点等问题。为了构建电子政务问答应用，我们首先收集了 11000 篇政务服务文档，并基于第 2 节中描述的方法，实现了索引构建、逻辑形式引导的推理和求解、语义增强和条件摘要生成等功能模块。

During the offline index construction phase, the semantic chunking strategy is used to segment government service documents to obtain specific matters and their properties such as the administrative region, service process, required materials, service location, target audience, and the corresponding chunks.
在离线索引构建阶段，使用语义分块策略对政府服务文档进行切分，获取行政区域、服务流程、所需材料、服务位置、目标受众、对应的分块等具体事项及其属性。

In the reasoning and solving phase, a logical function is generated based on the given user question and graph index structure, and the logical form is executed according to the steps of the logical function. First, the index item of the administrative area where the user is located is accurately located. Then, the item name, group of people, etc. are used for search. Finally, the corresponding chunk is found through the required materials or service process. specifically inquired by the user.
在推理解决阶段，根据给定的用户问题和图索引结构生成一个逻辑函数，并根据逻辑函数的步骤执行逻辑形式。首先，准确定位用户所在行政区域的索引项。然后，使用项目名称、人员组等进行搜索。最后，通过所需的材料或服务流程找到相应的块。用户专门询问。

In the semantic enhancement phase, we added two semantic relations, synonymy and hypernymy, between items. A synonymous relation refers to items in two different regions with different names but the same meaning, such as renewal of social security card and application for lost social security card; a co-hypernymy relation refers to two items belonging to different subcategories under the same major category of items, such as applying for housing provident fund loan for construction of new housing and applying for housing provident fund loan for construction and renovation of new housing, the two items have a common hypernymy applying for housing provident fund loan.
在语义增强阶段，我们在项目之间添加了两个语义关系，synonymy 和 hypernymy。同义关系是指两个不同地区名称不同但含义相同的项目，例如社保卡续发和申请丢失社保卡 ;一个同名关系是指同一大类项目下属于不同子类别的两个项目，例如申请新建住房公积金贷款和申请新建住房公积金贷款，这两个项目有一个共同的上名申请住房公积金贷款。

We compared the effects of the two technical solutions, NaiveRAG and KAG, as shown in the table below. It is evident that KAG shows significant improvements in both completeness and accuracy compared to NaiveRAG.
我们比较了 NaiveRAG 和 KAG 两种技术解决方案的效果，如下表所示。很明显，与 NaiveRAG 相比，KAG 在完整性和准确性方面都显示出显着的改进。

Methods 方法	SampleNum SampleNum （样本编号）	Precision 精度	Recall 召回
NaiveRAG 天真	492	66.5	52.6
KAG	492	91.6	71.8

Table 12: Ablation Experiments of KAG in E-Goverment Q&A.
表 12：电子政府 Q&A 中 KAG 的消融实验。

4.2 KAG for E-Health
4.2 用于电子健康的 KAG

We have developed a medical Q&A application based on the Alipay Health Manager scenario, which supports answering user’s questions regarding popular science about disease, symptom, vaccine, operation, examination and laboratory test, also interpretation of medical indicators, medical recommendation, medical insurance policy inquires, hospital inquires, and doctor information inquires. We have sorted out authoritative medical document materials through a team of medical experts, and produced more than 1.8 million entities and more than 400,000 term sets, with a total of more than 5 million relations. Based on this high-quality KG, we have also produced more than 700 DSL³
我们基于支付宝健康管家场景开发了医疗问答应用，支持解答用户关于疾病、症状、疫苗、手术、检查、实验室检查等科普问题，还支持医学指标解读、医疗推荐、医保单查询、医院查询、医生信息查询。我们通过医学专家团队整理了权威的医学文献资料，产生了超过 180 万个实体和超过 40 万个术语集，总共有超过 500 万个关系。基于这种高质量的 KG，我们还生产了 700 多台 DSL3³3DSL: https://openspg.yuque.com/ndx6g9/ooil9x/sdtg4q3bw4ka5wmz
DSL：https://openspg.yuque.com/ndx6g9/ooil9x/sdtg4q3bw4ka5wmz rules for indicator calculations to answer the questions of indicator interpretation.
指标计算的规则来回答指标解释的问题。

During the knowledge construction phase, a strongly constrained schema is used to achieve precise structural definition of entities such as diseases, symptoms, medications, and medical examinations. This approach facilitates accurate answers to questions and generates accurate knowledge, while also ensuring the rigor of relations between entities. In the reasoning phase, the logical form is generated based on the user’s query, and then translated to DSL form for the query on KG. The query result is returned in the form of triples as the answer. The logical form not only indicates how to query the KG, but also contains the key structural information in the user’s query (such as city, gender, age, indicator value, etc.). When parsing the logical form for query in graph, the DSL rules which produced by medical expert will also be triggered, and the conclusion will be returned in the form of triples. For example, if a user asks about "blood pressure 160", it will trigger the rules as:
在知识构建阶段，使用强约束架构来实现实体的精确结构定义，例如疾病、症状、药物和医疗检查。这种方法有助于准确回答问题并生成准确的知识，同时还确保了实体之间关系的严谨性。在推理阶段，根据用户的查询生成逻辑形式，然后转换为 DSL 形式，以便在 KG 上查询。查询结果以三元组的形式返回作为答案。逻辑形式不仅表明了如何查询 KG，还包含了用户查询中的关键结构信息（如城市、性别、年龄、指标值等）。在图中解析查询的逻辑形式时，也会触发医学专家生成的 DSL 规则，并以三元组的形式返回结论。例如，如果用户询问 “blood pressure 160” ，则会触发规则为：

, which strictly follows the defination of $\mathcal{L}$ in LLMFriSPG, and the conclusion that the person may have hypertension will be obtained.
，严格遵循 LLMFriSPG $\mathcal{L}$ 中的定义，得出该人可能患有高血压的结论。

In the semantic enhancement phase, we utilize the term set to express the two semantic relations of synonymy and hypernym of concepts. The hypernym supports the expression of multiple hypernyms. During knowledge construction and user Q&A phase, entities are aligned with medical terms. For example, in the concept of surgery type, the hypernym of deciduous tooth extraction and anterior tooth extraction is tooth extraction. When the user only asks questions about tooth extraction, all its hyponyms can be retrieved based on the term, and then the related entity information can be retrieved for answering. With the support of KAG, we achieved a recall rate of 60.67% and a precision rate of 81.32% on the evaluation set which sampling online Q&A queries. In the end-to-end scenario, the accuracy of medical insurance policy inquires (Beijing, Shanghai, Hangzhou) reached 77.2%, and the accuracy rate of popular science intentions has exceeded 94%, and the accuracy rate of interpreting indicator intentions has exceeded 93%.
在语义增强阶段，我们利用术语集来表达概念的同义词和上位词两种语义关系。上位词支持多个上位词的表达。在知识构建和用户问答阶段，实体与医学术语保持一致。例如，在手术类型的概念中，乳牙拔除和前牙拔除的上位词是拔牙。当用户只询问有关拔牙的问题时，可以根据术语检索其所有下位词，然后可以检索相关实体信息进行解答。在 KAG 的支持下，我们在抽样在线问答查询的评估集上实现了 60.67% 的召回率和 81.32% 的准确率。在端到端情景下，医保保单查询准确率（北京、上海、杭州）达到 77.2%，科普意向准确率超过 94%，解读指标意向准确率超过 93%。

5 Related Works 5 相关作品

5.1 DIKW Pyramid
5,1 DIKW 金字塔

Following the DIKW pyramid theories[41, 42, 43, 44], after data is processed and contextualised, it becomes information, and by integrating information with experience, understanding, and expertise, we gain knowledge. We usually use information extraction technology to obtain information from the original text[45, 46, 47], and obtain knowledge from the information through linking, fusion, analysis, and learning technology[43, 48, 46]. Information and knowledge are a single entity having different forms. There are no unified language to represent data, information and knowledge, RDF/OWL[49] only provides binary representation in the form of triples, and LPG[21] lacks support for knowledge semantics and classification. SPG⁴
遵循 DIKW 金字塔理论[ 41， 42， 43， 44]，数据经过处理和情境化后，它成为信息，通过将信息与经验、理解和专业知识相结合，我们获得了知识。我们通常使用信息抽取技术从原文中获取信息[ 45， 46， 47]，通过链接、融合、分析、学习技术从信息中获取知识[ 43， 48， 46]。信息和知识是具有不同形式的单个实体。没有统一的语言来表示数据、信息和知识，RDF/OWL[ 49] 仅提供三元组形式的二进制表示，而 LPG[ 21] 缺乏对知识语义和分类的支持。自行火炮 4⁴4Official site of SPG: https://spg.openkg.cn/en-US
SPG 俱乐部官方网站：https://spg.openkg.cn/en-US[50] supports knowledge hierarchy and classification representation, but lacks text context support that is friendly to large language models. Our proposed LLMFriSPG supports hierarchical representation from data to information to knowledge, and also provides reverse context-enhanced mutual-indexing.
[50] 支持知识层次结构和分类表示，但缺乏对大型语言模型友好的文本上下文支持。我们提出的 LLMFriSPG 支持从数据到信息再到知识的分层表示，还提供反向上下文增强的互索引。

5.2 Vector Similarity-based RAG
5,2 基于向量相似度的 RAG

The external knowledge base use the traditional search engine provides an effective method for updating the knowledge of LLMs, it retrievals supporting documents by calculating the text or vector similarity[1, 4] between the query and document, and then answers questions using the in-context learning method of LLMs. In addition, this method faces great challenges in understanding long-distance knowledge associations between documents. Simple vector-based retrieval is not suitable for multi-step reasoning or tracking logical links between different information fragments. To address these challenges, researchers have explored methods such as fine-grained document segmentation, CoT[33], and interactive retrieval[26, 2]. Despite these optimizations, traditional query-chunks similarity methods still has difficulty in accurately focusing on the relations between key knowledge in complex questions, resulting in low information density and ineffective association of remote knowledge. We will illustrate the logical-form-guided solving method.
外部知识库使用传统的搜索引擎提供了一种更新 LLM 知识的有效方法，它通过计算查询和文档之间的文本或向量相似性 [1， 4] 来检索支持文档，然后使用 LLM 的上下文学习方法回答问题。此外，这种方法在理解文档之间的长距离知识关联方面面临着巨大的挑战。简单的基于向量的检索不适用于多步骤推理或跟踪不同信息片段之间的逻辑链接。为了应对这些挑战，研究人员探索了细粒度文档分割、CoT[33] 和交互式检索 [26， 2] 等方法。尽管进行了这些优化，传统的查询块相似性方法仍然难以准确关注复杂问题中关键知识之间的关系，导致信息密度低，远程知识关联无效。我们将说明逻辑形式导向的求解方法。

5.3 Information Retrieval-based GraphRAG
5,3 基于信息检索的 GraphRAG

This type of methods use information extraction techniques to build entity and relation associations between different documents, which can better perceive the global information of all documents. Typical tasks in the knowledge construction phase include: graph information extraction and knowledge construction&enhancement. Methods like GraphRAG[51], ToG 2.0[9], HippoRAG[12] use OpenIE to extract graph-structure information like entities and relations, some of them exploit multi-hop associations between entities to improve the effectiveness of cross-document retrieval[9, 12], methods like DALK[7] use PubTator Central(PTC) annotation to reduce the noise problem of openIE, some of them utilize entity disambiguation technology to enhance the consistency of graph information[12, 52]. GraphRAG[51] generates element-level and community-level summaries when building offline indexes, and it uses a QFS[53] method to first calculate the partial response of each summary to the query and then calculate the final response. This inherent characteristic of GraphRAG’s hierarchical summarization makes it difficult to solve questions such as multi-hop Q&A and incremental updates of documents. KGs constructed by openIE contains a lot of noise or irrelevant information[54, 55, 56]. According to the DIKW pyramid hierarchy, these methods only extract the information graph structure and make limited attempts to disambiguate entities in the transformation of information into knowledge,but they do not address issues such as semantic directionality and logical sensitivity. This paper will introduce a method in KAG to enhance information-to-knowledge conversion based on domain concept semantic graph alignment.
这类方法利用信息抽取技术在不同文档之间建立实体和关系关联，可以更好地感知所有文档的全局信息。知识构建阶段的典型任务包括：图信息提取和知识构建&增强。GraphRAG[51]、ToG 2.0[9]、HippoRAG[12] 等方法使用 OpenIE 来提取实体和关系等图结构信息，其中一些方法利用实体之间的多跳关联来提高跨文档检索的效率[9， 12]，DALK 等方法 [7] 使用 PubTator Central（PTC）注解来减少 openIE 的噪声问题，其中一些利用实体消歧技术来增强图信息的一致性 [12， 52]。GraphRAG[51] 在构建离线索引时生成元素级和社区级摘要，它使用 QFS[53] 方法首先计算每个摘要对查询的部分响应，然后计算最终响应。GraphRAG 分层摘要的这一固有特性使得多跳 Q&A 和文档增量更新等问题难以解决。openIE 构建的 KG 包含大量噪声或不相关的信息 [54， 55， 56]。根据 DIKW 金字塔层次结构，这些方法仅提取信息图结构，在信息转化为知识的过程中对实体进行有限的消歧尝试，并没有解决语义方向性和逻辑敏感性等问题。本文将介绍 KAG 中一种基于领域概念语义图对齐的增强信息到知识转换的方法。

5.4 KG-based Question and Answering
5,4 基于 KG 的问答

Reasoning based on traditional KGs has good explainability and transparency, but is limited by the scale of the domain KG, the comprehensiveness of knowledge, the detailed knowledge coverage, and the timeliness of updates[57]. n this paper, we introduce HybridReasoning to alleviate issues such as knowledge sparsity, inconsistent entity granularity, and high graph construction costs. The approach leverages KG retrieval and reasoning to enhance generation, rather than completely replacing RAG.
基于传统 KG 的推理具有良好的可解释性和透明度，但受到领域 KG 规模、知识的全面性、详细的知识覆盖面和更新的及时性的限制 [57]。在本文中，我们引入了 HybridReasoning 来缓解知识稀疏、实体粒度不一致和图构建成本高等问题。该方法利用 KG 检索和推理来增强生成，而不是完全取代 RAG。

To achieve KG-enhanced generation, it is necessary to address KG-based knowledge retrieval and reasoning. One approach is knowledge edge retrieval (IR)[58], which narrows down the scope by locating the most relevant entities, relations, or triples based on the question. Another approach is semantic parsing (SP)[59, 60], which converts the question from unstructured natural language descriptions into executable database query languages (such as SQL, SPARQL[61], DSL⁵
为了实现 KG 增强的生成，有必要解决基于 KG 的知识检索和推理问题。一种方法是知识边缘检索（IR）[ 58]，它通过根据问题定位最相关的实体、关系或三元组来缩小范围。另一种方法是语义解析（SP）[ 59， 60]，它将问题从非结构化自然语言描述转换为可执行的数据库查询语言（如 SQL、SPARQL[ 61]、DSL5⁵5DSL: https://openspg.yuque.com/ndx6g9/ooil9x/sdtg4q3bw4ka5wmz
DSL：https://openspg.yuque.com/ndx6g9/ooil9x/sdtg4q3bw4ka5wmz, etc.), or first generates structured logical forms (such as S-expressions[62, 63]) and then converts them into query languages.
等），或者首先生成结构化的逻辑形式（例如 S-expressions[62， 63]），然后将它们转换为查询语言。

Although conversational QA over large-scale knowledge bases can be achieved without explicit semantic parsing (e.g., HRED-KVM[64]), most work focuses on exploring context-aware semantic parsers[60, 65, 63].
尽管可以在没有显式语义解析（例如 HRED-KVM[64]）的情况下实现对大规模知识库的对话 QA，但大多数工作都集中在探索上下文感知语义解析器 [60， 65， 63]。

Some papers use sequence-to-sequence models to directly generate query languages[66, 67]. These methods are developed for a specific query language, and sometimes even for a specific dataset, lacking generality for supporting different types of structured data. Others use step-by-step query graph generation and search strategies for semantic parsing[68, 69, 70]. This method is prone to uncontrollable issues generated by LLM, making queries difficult and having poor interpretability. Methods like ChatKBQA[63], CBR-KBQA[71] completely generate S-expressions and provide various enhancements for the semantic parsing process. However, the structure of S-expressions is relatively complex, and integrating multi-hop questions makes it difficult for LLMs to understand and inconvenient for integrating KBQA and RAG for comprehensive retrieval. To address these issues, we propose a multi-step decomposed logical form to express the multi-hop retrieval and reasoning process, breaking down complex queries into multiple sub-queries and providing corresponding logical expressions, thereby achieving integrated retrieval of SPO and chunks.
一些论文使用序列到序列模型直接生成查询语言 [66， 67]。这些方法是针对特定查询语言开发的，有时甚至是针对特定数据集开发的，在支持不同类型的结构化数据方面缺乏通用性。其他应用程序使用分步查询图生成和搜索策略进行语义解析 [68， 69， 70]。这种方法容易出现 LLM 产生的不可控问题，查询难度大，可解释性差。ChatKBQA[63]、CBR-KBQA[71] 等方法可以完全生成 S 表达式，并为语义解析过程提供各种增强功能。然而，S 表达式的结构相对复杂，集成多跳问题使得 LLM 难以理解，不方便集成 KBQA 和 RAG 进行全面检索。针对这些问题，我们提出了一种多步骤的分解逻辑形式来表达多跳检索和推理过程，将复杂查询分解为多个子查询并提供相应的逻辑表达式，从而实现 SPO 和 chunk 的集成检索。

5.5 Bidirectional-enhancement of LLMs and KGs
5,5 LLM 和 KG 的双向增强

LLM and KG are two typical neural and symbolic knowledge utilization methods. Since the pre-trained language model such as BERT [72], well-performed language models are used to help improve the tasks of KGs. The LLMs with strong generalization capability are especially believed to be helpful in the life-cycle of KGs. There are a lot of works conducted to explore the potential of LLMs for in-KG and out-of-KG tasks. For example, using LLMs to generate triples to complete triples is proved to be much cheaper than the traditional human-centric KG construction process, with acceptable accuracy for the popular entities [73]. In the past decade, methods for in-KG tasks are designed by learning from KG structures, such as structure embedding-based methods. The text information such as names and descriptions of entities is not fully utilized due to the limited text understanding capability of natural language processing methods until LLMs provide a way. Some works using LLMs for text semantic understanding and reasoning of entities and relations in KG completion [74], rule learning [75], complex logic querying [76], etc. On the other way, KGs are also widely used to improve the performance of LLMs. For example, using KGs as external resources to provide accurate factual information, mitigating hallucination of LLMs during answer generation [9], generating complex logical questions answering planning data to fine-tune the LLMs, improving LLMs planning capability and finally improving its logical reasoning capability [77], using KGs to uncover associated knowledge that has changed due to editing for better knowledge editing of LLMs [78], etc. The bidirectional-enhancement of LLMs and KGs is widely explored and partially achieved.
LLM 和 KG 是两种典型的神经和符号知识利用方法。由于 BERT [72] 等预训练语言模型，性能良好的语言模型被用于帮助改进 KG 的任务。具有较强泛化能力的 LLMs 尤其被认为有助于 KG 的生命周期。为了探索 LLM 在 KG 内和 KG 外任务中的潜力，进行了大量工作。例如，事实证明，使用 LLM 生成三元组以完成三元组比传统的以人为中心的 KG 构建过程便宜得多，并且对于流行的实体来说具有可接受的准确性 [73]。在过去的十年中，KG 内任务的方法是通过学习 KG 结构来设计的，例如基于结构嵌入的方法。由于 LLM 提供一种方法之前，由于自然语言处理方法的文本理解能力有限，因此实体的名称和描述等文本信息没有得到充分利用。一些作品使用 LLM 进行文本语义理解和对 KG 补全中的实体和关系进行推理 [74]、规则学习 [75]、复杂逻辑查询 [76] 等。另一方面，KG 也被广泛用于提高 LLM 的性能。例如，使用 KG 作为外部资源来提供准确的事实信息，减轻 LLM 在答案生成过程中的幻觉 [9]，生成复杂的逻辑问题回答规划数据以微调 LLM，提高 LLM 的规划能力，并最终提高其逻辑推理能力 [77]，使用 KG 发现因编辑而发生变化的相关知识，以便更好地对 LLM 进行知识编辑 [78] 等。LLM 和 KG 的双向增强得到了广泛的探索和部分实现。

6 Limitations 6 局限性

In this article, we have proven the adaptability of the KAG framework in Q&A scenarios in vertical and open domains. However, the currently developed version of OpenSPG-KAG 0.5 still has major limitations that need to be continuously overcome, such as:
在本文中，我们证明了 KAG 框架在垂直和开放领域的 Q&A 场景中的适应性。但是，目前开发的 OpenSPG-KAG 0.5 版本仍然存在需要不断克服的主要限制，例如：

Implementing our framework requires multiple LLM calls during the construction and solving phases. A substantial number of intermediate tokens required to be generated during the planning stage to facilitate the breakdown of sub-problems and symbolic representation, this leads to computational and economic overhead, as illustrated in Table 14, where the problem decomposition not only outputs sub-problems but also logical functions, resulting in approximately twice as many generated tokens compared to merely decomposing the sub-problems. Meanwhile, currently, all model invocations within the KAG framework, including entity recognition, relation extraction, relation recall, and standardization, rely on large models. This multitude of models significantly increases the overall runtime. In future domain-specific implementations, tasks like relation recall, entity recognition, and standardization could be substituted with smaller, domain-specific models to enhance operational efficiency.
实现我们的框架需要在构造和求解阶段进行多次 LLM 调用。在规划阶段需要生成大量的中间标记，以促进子问题和符号表示的分解，这会导致计算和经济开销，如表 14 所示，其中问题分解不仅输出子问题，还输出逻辑函数，导致生成的标记数量大约是仅分解子问题的两倍。同时，目前，KAG 框架内的所有模型调用，包括实体识别、关系提取、关系调用和标准化，都依赖于大型模型。如此众多的模型显著增加了整体运行时间。在未来的特定于领域的实现中，关系调用、实体识别和标准化等任务可以替换为更小的特定于领域的模型，以提高运营效率。

The ability to decompose and plan for complex problems requires a high level of capability. Currently, this is implemented using LLMs, but planning for complex issues remains a significant challenge. For instance, when the task is to compare who is older, the problem should be decomposed into comparing who was born earlier. Directly asking for age is not appropriate, as they are deceased, and "what is the age" refers to the age at death, which doesn’t indicate who is older. Decomposing and planning complex problems necessitates ensuring the model’s accuracy, stability, and solvability in problem decomposition and planning. The current version of the KAG framework does not yet address optimizations in these areas. We will further explore how pre-training, SFT, and COT strategies can improve the model’s adaptability to logical forms and its planning and reasoning capabilities.
分解和规划复杂问题的能力需要高水平的能力。目前，这是使用 LLM 实现的，但规划复杂问题仍然是一个重大挑战。例如，当任务是比较谁的年龄时，问题应该分解为比较谁出生得更早。直接询问年龄是不合适的，因为他们已经去世了，而 “what is the age” 指的是死亡时的年龄，这并不表示谁更年长。分解和规划复杂问题需要确保模型在问题分解和规划中的准确性、稳定性和可解性。当前版本的 KAG 框架尚未解决这些领域的优化问题。我们将进一步探讨预训练、SFT 和 COT 策略如何提高模型对逻辑形式的适应性及其规划和推理能力。

Question: Which film has the director who is older, God’S Gift To Women or Aldri Annet Enn Bråk?
问题：哪部电影的导演年龄更大，是《上帝赐予女性的礼物》还是 Aldri Annet Enn Bråk？
Q1: Which director directed the film God’S Gift To Women? A1: Michael Curtiz
Q1：哪位导演执导了电影《上帝赐给女人的礼物》？A1：迈克尔·柯蒂斯
Q2: Which director directed the film Aldri Annet Enn Bråk? A2: Edith Carlmar
Q2：哪位导演执导了电影 Aldri Annet Enn Bråk？A2：伊迪丝·卡尔玛
Q3: What is the age of the director of God’S Gift To Women? A3: 74 years old. Michael Curtiz (December 24, 1886 to April 11, 1962)…
Q3： God's Gift To Women 的导演年龄是多少岁？ A3： 74 岁。迈克尔·柯蒂斯（Michael Curtiz，1886 年 12 月 24 日至 1962 年 4 月 11 日）...
Q4: What is the age of the director of Aldri Annet Enn Bråk? A4: 91 years old. Edith Carlmar (Edith Mary Johanne Mathiesen) (15 November 1911 to 17 May 2003) …
Q4：Aldri Annet Enn Bråk 的导演年龄是多少岁？A4： 91 岁。伊迪丝·卡尔马（Edith Mary Johanne Mathiesen）（1911 年 11 月 15 日至 2003 年 5 月 17 日） ...
Q5: Compare the ages of the two directors to determine which one is older. A5: Edith Carlmar is older. Actually, Michael Curtiz was born earlier.
Q5：比较两位董事的年龄，以确定哪位年龄较大。A5：Edith Carlmar 年纪大了。实际上，迈克尔·柯蒂斯（Michael Curtiz）出生得更早。

OpenIE significantly lowers the threshold for building KGs, but it also obviously increases the technical challenges of knowledge alignment. Although the experiments in this article have shown that the accuracy and connectivity of extracted knowledge can be improved through knowledge alignment. However, there are still more technical challenges waiting to be overcome, such as optimizing the accuracy of multiple-knowledge(such as events, rules, pipeline, etc.) extraction and the consistency of multiple rounds of extraction. In addition, schema-constraint knowledge extraction based on the experience of domain experts is also a key way to obtain rigorous domain knowledge, although the labor cost is high. These two methods should be applied collaboratively to better balance the requirements of vertical scenarios for the rigor of complex decision-making and the convenience of information retrieval. For instance, when extracting team members from multiple texts and asked about the total number of team members, a comprehensive extraction is crucial for providing an accurate answer based on the structured search results. Incorrect extractions also impair response accuracy.
OpenIE 显著降低了构建 KG 的门槛，但也明显增加了知识对齐的技术挑战。尽管本文中的实验表明，通过知识对齐可以提高提取知识的准确性和连通性。然而，还有更多的技术挑战有待攻克，例如优化多知识（如事件、规则、管道等）提取的准确性和多轮提取的一致性。此外，基于领域专家经验的 schema-constraint 知识提取也是获得严谨领域知识的关键途径，尽管人力成本很高。这两种方法应该协同工作，以更好地平衡垂直场景对复杂决策的严谨性和信息检索的便利性的要求。例如，当从多个文本中提取团队成员并询问团队成员总数时，全面提取对于根据结构化搜索结果提供准确答案至关重要。不正确的提取也会损害响应的准确性。

7 Conclusion and Future Work
7 结论和未来工作

In order to build professional knowledge services in vertical domains, fully activate the capabilities and advantages of symbolic KGs and parameterized LLMs, and at the same time significantly reduce the construction cost of domain KGs, we proposed the KAG framework and try to accelerated its application in professional domains. In this article, we introduce in detail the knowledge accuracy, information completeness and logical rigorous are the key characteristics that professional knowledge services must have. At the same time, we also introduce innovations such as LLMs friendly knowledge representation, mutual-indexing of knowledge structure and text chunks, knowledge alignment by semantic reasoning, logic-form-guided hybrid reasoning&solving and KAG model. Compared with the current most competitive SOTA method, KAG has achieved significant improvements on public data sets such as HotpotQA, 2wiki, musique. We have also conducted case verifications in E-goverment Q&A and E-Health Q&A scenarios of Alipay, further proving the adaptability of the KAG framework in professional domains.
为了构建垂直领域的专业知识服务，充分激活符号 KG 和参数化 LLM 的能力和优势，同时显著降低领域 KG 的构建成本，我们提出了 KAG 框架，并尝试加速其在专业领域的应用。在本文中，我们详细介绍了知识准确性、信息完整性和逻辑严谨性是专业知识服务必须具备的关键特征。同时，我们还引入了创新技术，如 LLMs 友好的知识表示、知识结构和文本块的互索引、通过语义推理进行知识对齐、逻辑形式引导的混合推理和求解以及 KAG 模型。与目前最具竞争力的 SOTA 方法相比，KAG 在 HotpotQA、2wiki、musique 等公共数据集上取得了显著的改进。我们还在支付宝的电子政务问答和电子健康问答场景中进行了案例验证，进一步证明了 KAG 框架在专业领域的适应性。

In the future, there is still more work to be explored to continuously reduce the cost of KG construction and improve the interpretability and transparency of reasoning, such as multiple knowledge extraction, knowledge alignment based on OneGraph, domain knowledge injection, large-scale instruction synthesis, illusion suppression of knowledge logic constraints, etc.
未来，要不断降低 KG 建设成本，提高推理的可解释性和透明度，还有更多的工作要做，如多元知识抽取、基于 OneGraph 的知识对齐、领域知识注入、大规模指令合成、知识逻辑约束的错觉抑制等。

This study does not encompass the enhancement of models for decomposing and planning complex problems, which remains a significant area for future research. In future work, KAG can be employed as a reward model to provide feedback and assess the model’s accuracy, stability, and solvability through the execution of planning results, thereby enhancing the capabilities of planning models.
本研究不包括增强用于分解和规划复杂问题的模型，这仍然是未来研究的重要领域。在未来的工作中，KAG 可以作为奖励模型，通过规划结果的执行来提供反馈并评估模型的准确性、稳定性和可解性，从而增强规划模型的能力。

We will also work in depth with the community organization OpenKG to continue to tackle key technical issues in the collaboration between LLMs and KGs.
我们还将与社区组织 OpenKG 深入合作，继续解决 LLM 和 KG 之间合作中的关键技术问题。

8 Acknowledgements
8 确认

This work was completed by the AntGroup Knowledge Graph Team, in addition to the authors in the list, other contributors include Yuxiao He, Deng Zhao, Xiaodong Yan, Dong Han, Fanzhuang Meng, Yang Lv, Zhiying Yin, etc, thank you all for your continuous innovation attempts and hard work. This work also received strong support from Professor Huajun Chen, Researcher Wen Zhang of Zhejiang University, and Professor Wenguang Chen of AntGroup Technology Research Institute, thank you all.
这项工作由蚂蚁集团知识图谱团队完成，除名单中的作者外，其他贡献者包括何玉晓、邓赵、闫晓东、韩东、孟凡庄、吕阳、尹志英等，感谢大家不断的创新尝试和辛勤工作。这项工作还得到了陈华军教授、浙江大学张温研究员和蚂蚁集团技术研究院陈文光教授的大力支持，谢谢大家。

References

[1] Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023.
[2] Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 9248–9274. Association for Computational Linguistics, 2023.
[3] Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17754–17762, 2024.
[4] Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6491–6501, 2024.
[5] Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, and Dong Yu. Chain-of-note: Enhancing robustness in retrieval-augmented language models. arXiv preprint arXiv:2311.09210, 2023.
[6] Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization, 2024.
[7] Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sunkwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, et al. Dalk: Dynamic co-augmentation of llms and kg to answer alzheimer’s disease questions with scientific literature. arXiv preprint arXiv:2405.04819, 2024.
[8] Minki Kang, Jin Myung Kwak, Jinheon Baek, and Sung Ju Hwang. Knowledge graph-augmented language models for knowledge-grounded dialogue generation, 2023.
[9] Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, and Jian Guo. Think-on-graph 2.0: Deep and interpretable large language model reasoning with knowledge graph-guided retrieval. arXiv preprint arXiv:2407.10805, 2024.
[10] Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. Grag: Graph retrieval-augmented generation, 2024.
[11] Costas Mavromatis and George Karypis. Gnn-rag: Graph neural retrieval for large language model reasoning, 2024.
[12] Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologically inspired long-term memory for large language models. arXiv preprint arXiv:2405.14831, 2024.
[13] Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 8:489–508, 2016.
[14] Siwei Wu, Xiangqing Shen, and Rui Xia. Commonsense knowledge graph completion via contrastive pretraining and node clustering, 2023.
[15] Yi-Hui Chen, Eric Jui-Lin Lu, and Kwan-Ho Cheng. Integrating multi-head convolutional encoders with cross-attention for improved sparql query translation, 2024.
[16] Yu Gu, Vardaan Pahuja, Gong Cheng, and Yu Su. Knowledge base question answering: A semantic parsing perspective, 2022.
[17] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models, 2023.
[18] Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing A multi-hop QA dataset for comprehensive evaluation of reasoning steps. In Donia Scott, Núria Bel, and Chengqing Zong, editors, Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, pages 6609–6625. International Committee on Computational Linguistics, 2020.
[19] Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Musique: Multihop questions via single-hop question composition. Trans. Assoc. Comput. Linguistics, 10:539–554, 2022.
[20] Dirk Groeneveld, Tushar Khot, Mausam, and Ashish Sabharwal. A simple yet strong pipeline for hotpotqa. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 8839–8845. Association for Computational Linguistics, 2020.
[21] Chandan Sharma and Roopak Sinha. A schema-first formalism for labeled property graph databases: Enabling structured data loading and analytics. In Proceedings of the 6th ieee/acm international conference on big data computing, applications and technologies, pages 71–80, 2019.
[22] Denny Vrandečić and Markus Krötzsch. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78–85, 2014.
[23] Hugo Liu and Push Singh. Conceptnet—a practical commonsense reasoning tool-kit. BT technology journal, 22(4):211–226, 2004.
[24] Siye Wu, Jian Xie, Jiangjie Chen, Tinghui Zhu, Kai Zhang, and Yanghua Xiao. How easily do irrelevant inputs skew the responses of large language models? arXiv preprint arXiv:2404.03302, 2024.
[25] Honghao Gui, Hongbin Ye, Lin Yuan, Ningyu Zhang, Mengshu Sun, Lei Liang, and Huajun Chen. Iepile: Unearthing large-scale schema-based information extraction corpus. arXiv preprint arXiv:2402.14710, 2024.
[26] Zhouyu Jiang, Mengshu Sun, Lei Liang, and Zhiqiang Zhang. Retrieve, summarize, plan: Advancing multi-hop question answering with an iterative approach. arXiv preprint arXiv:2407.13101, 2024.
[27] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
[28] Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
[29] Xiongtao Cui and Jungang Han. Chinese medical question answer matching based on interactive sentence representation learning. volume abs/2011.13573, 2020.
[30] Anastasios Nentidis, Georgios Katsimpras, Eirini Vandorou, Anastasia Krithara, Antonio Miranda-Escalada, Luis Gasco, Martin Krallinger, and Georgios Paliouras. Overview of BioASQ 2022: The tenth BioASQ challenge on large-scale biomedical semantic indexing and question answering. In Lecture Notes in Computer Science, pages 337–361. Springer International Publishing, 2022.
[31] Zhouyu Jiang, Ling Zhong, Mengshu Sun, Jun Xu, Rui Sun, Hui Cai, Shuhan Luo, and Zhiqiang Zhang. Efficient knowledge infusion via KG-LLM alignment. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024, pages 2986–2999. Association for Computational Linguistics, 2024.
[32] Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang, Zhiqiang Zhang, Jun Zhou, Huajun Chen, and Ningyu Zhang. Onegen: Efficient one-pass unified generation and retrieval for llms, 2024.
[33] Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 10014–10037. Association for Computational Linguistics, 2023.
[34] Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. Colbertv2: Effective and efficient retrieval via lightweight late interaction. In Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 3715–3734. Association for Computational Linguistics, 2022.
[35] Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
[36] Stephen E. Robertson and Steve Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In W. Bruce Croft and C. J. van Rijsbergen, editors, Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum), pages 232–241. ACM/Springer, 1994.
[37] Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. Unsupervised dense information retrieval with contrastive learning. Trans. Mach. Learn. Res., 2022, 2022.
[38] Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, and Yinfei Yang. Large dual encoders are generalizable retrievers. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 9844–9855. Association for Computational Linguistics, 2022.
[39] Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. RAPTOR: recursive abstractive processing for tree-organized retrieval. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024.
[40] Tong Chen, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Hongming Zhang, and Dong Yu. Dense X retrieval: What retrieval granularity should we use? CoRR, abs/2312.06648, 2023.
[41] Russell L Ackoff. From data to wisdom. Journal of applied systems analysis, 16(1):3–9, 1989.
[42] Sasa Baskarada and Andy Koronios. Data, information, knowledge, wisdom (dikw): A semiotic theoretical and empirical exploration of the hierarchy and its quality dimension. Australasian Journal of Information Systems, 18(1), 2013.
[43] Jose Claudio Terra and Terezinha Angeloni. Understanding the difference between information management and knowledge management. KM Advantage, pages 1–9, 2003.
[44] Jonathan Hey. The data, information, knowledge, wisdom chain: the metaphorical link. Intergovernmental Oceanographic Commission, 26(1):1–18, 2004.
[45] Sunita Sarawagi et al. Information extraction. Foundations and Trends® in Databases, 1(3):261–377, 2008.
[46] Gerhard Weikum and Martin Theobald. From information to knowledge: harvesting entities and relationships from web sources. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 65–76, 2010.
[47] Jakub Piskorski and Roman Yangarber. Information extraction: Past, present and future. Multi-source, multilingual information extraction and summarization, pages 23–49, 2013.
[48] Priti Srinivas Sajja and Rajendra Akerkar. Knowledge-based systems for development. Advanced Knowledge Based Systems: Model, Applications & Research, 1:1–11, 2010.
[49] Dean Allemang and James Hendler. Semantic web for the working ontologist: effective modeling in RDFS and OWL. Elsevier, 2011.
[50] Peng Yi, Lei Liang, Yong Chen Da Zhang, Jinye Zhu, Xiangyu Liu, Kun Tang, Jialin Chen, Hao Lin, Leijie Qiu, and Jun Zhou. Kgfabric: A scalable knowledge graph warehouse for enterprise data interconnection.
[51] Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024.
[52] Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, and Dhagash Mehta. Hybridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. arXiv preprint arXiv:2408.04948, 2024.
[53] Hoa Trang Dang. Duc 2005: Evaluation of question-focused summarization systems. In Proceedings of the Workshop on Task-Focused Summarization and Question Answering, pages 48–55, 2006.
[54] Hongming Zhang, Xin Liu, Haojie Pan, Yangqiu Song, and Cane Wing-Ki Leung. Aser: A large-scale eventuality knowledge graph. In Proceedings of the web conference 2020, pages 201–211, 2020.
[55] Zhen Bi, Jing Chen, Yinuo Jiang, Feiyu Xiong, Wei Guo, Huajun Chen, and Ningyu Zhang. Codekgc: Code language model for generative knowledge graph construction. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(3):1–16, 2024.
[56] Tianqing Fang, Hongming Zhang, Weiqi Wang, Yangqiu Song, and Bin He. Discos: Bridging the gap between discourse knowledge and commonsense knowledge. In Proceedings of the Web Conference 2021, pages 2648–2659, 2021.
[57] Ling Tian, Xue Zhou, Yan-Ping Wu, Wang-Tao Zhou, Jin-Hao Zhang, and Tian-Shu Zhang. Knowledge graph and knowledge reasoning: A systematic review. Journal of Electronic Science and Technology, 20(2):100159, 2022.
[58] Yiyu Yao, Yi Zeng, Ning Zhong, and Xiangji Huang. Knowledge retrieval (kr). In IEEE/WIC/ACM International Conference on Web Intelligence (WI’07), pages 729–735. IEEE, 2007.
[59] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533–1544, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.
[60] Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, and Jian Yin. Dialog-to-action: Conversational question answering over a large-scale knowledge base. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 2946–2955, 2018.
[61] Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. Semantics and complexity of sparql. In Isabel Cruz, Stefan Decker, Dean Allemang, Chris Preist, Daniel Schwabe, Peter Mika, Mike Uschold, and Lora M. Aroyo, editors, The Semantic Web - ISWC 2006, pages 30–43, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
[62] Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, and Yu Su. Beyond i.i.d.: Three levels of generalization for question answering on knowledge bases. In Proceedings of the Web Conference 2021, pages 3477–3488, New York, NY, USA, 2021. Association for Computing Machinery.
[63] Haoran Luo, Haihong E, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin, Yifan Zhu, and Luu Anh Tuan. Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models. In Findings of the Association for Computational Linguistics: ACL 2024. Association for Computational Linguistics, 2024.
[64] Endri Kacupaj, Joan Plepi, Kuldeep Singh, Harsh Thakkar, Jens Lehmann, and Maria Maleshkova. Conversational question answering over knowledge graphs with transformer and graph attention networks. In Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, editors, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, pages 850–862. Association for Computational Linguistics, 2021.
[65] Yunshi Lan and Jing Jiang. Modeling transitions of focal entities for conversational knowledge base question answering. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 3288–3297. Association for Computational Linguistics, 2021.
[66] Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Salim Roukos, Alexander G. Gray, Ramón Fernandez Astudillo, Maria Chang, et al. Leveraging abstract meaning representation for knowledge base question answering. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, volume ACL/IJCNLP 2021 of Findings of ACL, pages 3884–3894. Association for Computational Linguistics, 2021.
[67] Reham Omar, Ishika Dhall, Panos Kalnis, and Essam Mansour. A universal question-answering platform for knowledge graphs. Proceedings of the ACM on Management of Data, 1(1):57:1–57:25, 2023.
[68] Farah Atif, Ola El Khatib, and Djellel Difallah. Beamqa: Multi-hop knowledge graph question answering with sequence-to-sequence prediction and beam search. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 781–790, New York, NY, USA, 2023. Association for Computing Machinery.
[69] Jinhao Jiang, Kun Zhou, Zican Dong, Keming Ye, Xin Zhao, and Ji-Rong Wen. Structgpt: A general framework for large language model to reason over structured data. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 9237–9251. Association for Computational Linguistics, 2023.
[70] Yu Gu, Xiang Deng, and Yu Su. Don’t generate, discriminate: A proposal for grounding language models to real-world environments. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4928–4949, Toronto, Canada, July 2023. Association for Computational Linguistics.
[71] Rajarshi Das, Manzil Zaheer, Dung Thai, Ameya Godbole, Ethan Perez, Jay Yoon Lee, Lizhen Tan, Lazaros Polymenakos, and Andrew McCallum. Case-based reasoning for natural language queries over knowledge bases. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9594–9611, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
[72] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
[73] Blerta Veseli, Simon Razniewski, Jan-Christoph Kalo, and Gerhard Weikum. Evaluating the knowledge base completion potential of GPT. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 6432–6443. Association for Computational Linguistics, 2023.
[74] Yichi Zhang, Zhuo Chen, Wen Zhang, and Huajun Chen. Making large language models perform better in knowledge graph completion. ACM MM, 2024.
[75] Linhao Luo, Jiaxin Ju, Bo Xiong, Yuan-Fang Li, Gholamreza Haffari, and Shirui Pan. Chatrule: Mining logical rules with large language models for knowledge graph reasoning. CoRR, abs/2309.01538, 2023.
[76] Nurendra Choudhary and Chandan K. Reddy. Complex logical reasoning over knowledge graphs using large language models. CoRR, abs/2305.01157, 2023.
[77] Junjie Wang, Mingyang Chen, Binbin Hu, Dan Yang, Ziqi Liu, Yue Shen, Peng Wei, Zhiqiang Zhang, Jinjie Gu, Jun Zhou, Jeff Z. Pan, Wen Zhang, and Huajun Chen. Learning to plan for retrieval-augmented large language models from knowledge graphs. CoRR, abs/2406.14282, 2024.
[78] Mengqi Zhang, Xiaotian Ye, Qiang Liu, Pengjie Ren, Shu Wu, and Zhumin Chen. Knowledge graph enhanced large language model editing. CoRR, abs/2402.13593, 2024.

Appendix A Example of KAG Solver
附录 A KAG 求解器示例

Round One 第一轮

Initial Question: How many times did the plague occur in the birth place of Concerto in C Major Op 3 6’s composer?
初始问题：瘟疫在 C 大调 Op 3 6 的作曲家协奏曲的诞生地发生了多少次？

Step1: Who is the composer of Concerto in C Major Op 3 6?.
Step1：谁是 C 大调协奏曲 Op 3 6 的作曲家？

Logical Form: Retrieval(s=s1:Work[C Major Op 3 6],p=p1:composer,o=o1:Person
逻辑形式：检索（s=s1：Work[C major Op 3 6]，p=p1：composer，o=o1：Person)

Graph Retrieval: (antonio vivaldi, composed concerto in, c major op 3 6)
Graph Retrieval：（安东尼奥·维瓦尔第，C 大调协奏曲，op 3 6）

Hybird Retrieval: not executed
Hybird 检索：未执行

A1: Antonio Vivaldi
A1：安东尼奥·维瓦尔第（Antonio Vivaldi）

Step2: What is the birthplace of this composer?
Step2：这位作曲家的出生地是什么？

Logical Form: Retrieval(s=o1,p=p2:brithPlace,o=o2:Place
逻辑形式：检索（s=o1，p=p2：brithPlace，o=o2：Place)

Graph Retrieval: not found
图形检索：未找到

Hybird Retrieval Hybird 检索:

doc1:Antonio Vivaldi composed a mandolin concerto (Concerto in C major Op.3 6) …
doc1：安东尼奥·维瓦尔第创作了一首曼陀林协奏曲（C 大调协奏曲 Op.3 6）......

doc2:Antonio Lucio Vivaldi (Italian: … Born in Venice…
doc2：Antonio Lucio Vivaldi （意大利语： ...出生于威尼斯 ...

…

A2: Venice
A2：威尼斯

Step3: How many times did the plague occur in this birthplace?
Step3：这个发源地发生了多少次瘟疫？

Logical Form: Retrieval(s=o2,p=p3:plagueOccurrence,o=o3:Plague
逻辑形式：Retrieval（s=o2，p=p3：plagueOccurrence，o=o3：Plague)

Graph Retrieval: not found
图形检索：未找到

Hybird Retrieval Hybird 检索:

doc1:Antonio Vivaldi composed a mandolin concerto (Concerto in C major Op.3 6) …
doc1：安东尼奥·维瓦尔第创作了一首曼陀林协奏曲（C 大调协奏曲 Op.3 6）......

doc2:The Black Death, also known as the Great Plague.. in Europe from 1347 to 1351..
doc2：黑死病，也被称为大瘟疫..从 1347 年到 1351 年在欧洲..

…

A3: I don’t know
A3：我不知道

Step4: Count the number of plague occurrences.
第 4 步：计算鼠疫发生的次数。

Logical Form: Count(o3)->nums
逻辑形式：count（o3）-%3 枚举

Graph Retrieval: not found
图形检索：未找到

Hybird Retrieval: not found
Hybird 检索：未找到

A4: I don’t know
A4：我不知道

Round Two 第二轮

Rewrite Question: What specific records or historical accounts detail the number of plague occurrences in Venice during the mid-14th century?
重写问题：哪些具体的记录或历史记录详细说明了 14 世纪中叶威尼斯发生的瘟疫事件数量？

Step1: What specific records or historical accounts detail the number of plague occurrences in Venice during the mid-14th century?
第 1 步：哪些具体记录或历史记录详细说明了 14 世纪中叶威尼斯发生的瘟疫数量？

Logical Form: Retrieval(s=s1:City[Venice],p=p3:plagueOccurrencesInMid14thCentury,o=o1:Times
逻辑形式：Retrieval（s=s1：City[Venice]，p=p3：plagueOccurrencesInMid14thCentury，o=o1：Times)

Graph Retrieval: not found
图形检索：未找到

Hybird Retrieval Hybird 检索:

doc1:In 1466, perhaps 40,000 people died of the plague…Plague occurred in Venice 22 times between 1361 and 1528…
doc1：1466 年，大概有 40,000 人死于瘟疫......1361 年至 1528 年间，威尼斯发生了 22 次瘟疫 ......

doc2:The Black Death, also known as the Great Plague…
doc2：黑死病，又名大瘟疫......

…

A1: The plague occurred in Venice 22 times between 1361 and 1528. The 1576–77 plague killed 50,000, almost a third of the population.
A1：1361 年至 1528 年间，瘟疫在威尼斯发生了 22 次。1576-77 年的瘟疫导致 50,000 人死亡，几乎占人口的三分之一。

Final Answer 最终答案

Question: How many times did the plague occur in the birth place of Concerto in C Major Op 3 6’s composer?
问题：瘟疫在 C 大调 Op 3 6 的作曲家的诞生地发生了多少次？

A: 22 times
A： 22 次

Table 13: An example of using logical-from to guide question planning, reasoning, retrieval, and answer generation, and using multiple rounds of reflection to rephrase questions.
表 13：使用 logical-from 指导问题规划、推理、检索和答案生成，以及使用多轮反射来改写问题的示例。

Appendix B Example of Logical form Reasoner
附录 B Logical 形式 Reasoner 示例

Numerical Reasoning 数值推理

question: Which sports team for which Cristiano Ronaldo played in 2011 was founded last ?
问题：克里斯蒂亚诺·罗纳尔多（Cristiano Ronaldo）在 2011 年效力的哪支运动队是最后成立的？

Step1: Identify the Sports Teams Cristiano Ronaldo Played for in 2011 .
第 1 步：确定克里斯蒂亚诺·罗纳尔多（Cristiano Ronaldo）在 2011 年效力的运动队。

Logical Form: Retrieval(s=s1:Player[Cristiano Ronaldo], p=p1:playedFor, o=o1:SportsTeam, p.PlayedForInYear=2011)
逻辑形式：检索（s=s1：Player[Cristiano Ronaldo]， p=p1：playedFor， o=o1：SportsTeam， p.PlayedForInYear=2011）

Step2: Determine the Foundation Years of Each Identified Team.
第 2 步：确定每个已确定团队的基础年限。

Logical Form: Retrieval(s=o1, p=p2:foundationYear, o=o2:Year
逻辑形式：Retrieval（s=o1， p=p2：foundationYear， o=o2：Year)

Step3: Which team was founded last?
Step3：哪个团队是最后成立的？

Logical Form: Sort(set=o1, orderby=o2, direction=max, limit=1)
逻辑形式：Sort（set=o1， orderby=o2， direction=max， limit=1）

question: What is the sum of 30 + 6 and the age of the founder of Tesla in 2027 ?
问题：30 + 6 和 2027 年特斯拉创始人的年龄之和是多少？

Step1: What is the sum of 30 + 6 ?
Step1： 30 + 6 的总和是多少？

Logical Form: math1 = Math(30+6)
逻辑形式： math1 = math（30+6）

Step2: Who is the founder of Tesla?
Step2：特斯拉的创始人是谁？

Logical Form: Retrieval(s=s2:Company[Tesla], p=p2:founder, o=o2)
逻辑形式：检索（s=s2：Company[Tesla]， p=p2：founder， o=o2）

Step3: In which year was the founder of Tesla born?
Step3：特斯拉的创始人是哪一年出生的？

Logical Form: Retrieval(s=o2, p=p3:yearOfBirth, o=o3)
逻辑形式：检索（s=o2， p=p3：yearOfBirth， o=o3）

Step4: How old will the founder of Tesla be in the year 2027?
第 4 步：2027 年特斯拉的创始人几岁？

Logical Form: math4 = Math(2027-o3)
逻辑形式：math4 = math（2027-o3）

Step5: What is the sum of math1 and math4?
第 5 步：math1 和 math4 之和是多少？

Logical Form: math5 = Math(math1+math4)
逻辑形式： math5 = math（math1+math4）

Logical Reasoning 逻辑推理

question: Find a picture containing vegetables or fruits.
问题：找到一张包含蔬菜或水果的图片。

Step1: Find pictures containing vegetables.
Step1：查找包含蔬菜的图片。

Logical Form: Retrieval(s=s1:Image, p=p2:contains, o=o1:Vegetables
逻辑形式：检索（s=s1：Image， p=p2：contains， o=o1：蔬菜)

Step2: Find pictures containing fruits.
Step2：查找包含水果的图片。

Action2: Retrieval(s=s2:Image, p=p2:contains, o=o2:Fruits
作 2：检索（s=s2：Image， p=p2：contains， o=o2：Fruits)

Step3: Output s1, s2.
第 3 步：输出 s1、s2。

Logical Form: Output(s1, s2)
逻辑形式：Output（s1， s2）

question: Find a picture containing vegetables and fruits.
问题：找到一张包含蔬菜和水果的图片。

Step1: Find pictures containing vegetables.
Step1：查找包含蔬菜的图片。

Logical Form: Retrieval(s=s1:Image, p=p2:contains, o=o1:Vegetables
逻辑形式：检索（s=s1：Image， p=p2：contains， o=o1：蔬菜)

Step2: Find pictures containing fruits.
Step2：查找包含水果的图片。

Logical Form: Retrieval(s=s1, p=p2:contains, o=o2:Fruits
逻辑形式：检索（s=s1， p=p2：contains， o=o2：Fruits)

Step3: Output s1. 步骤 3：输出 s1。

Logical Form: Output(s1)
逻辑形式：output（s1）

Semantic Deduce 语义推导

question: Do I need to present the original ID card when applying for a passport?
问题：申请护照时需要出示身份证原件吗？

Step1: What documents are required to apply for a passport?
Step1：申请护照需要哪些文件？

Logical Form: Retrieval(s=s1:Event[apply for a passport], p=p1:support_chunks, o=o1:Chunk
逻辑形式： Retrieval（s=s1：Event[申请护照]， p=p1：support_chunks， o=o1：Chunk)

Step2: Does this set of documents include the original identity card?
第二步：这套文件是否包括身份证正本？

Logical Form: Deduce(left=o1, right=the original identity card, op=entailment)
逻辑形式： Deduce（left=o1， right=原始身份证， op=entailment）

Table 14: The cases of reasoning with logical form
表 14：用逻辑形式进行推理的情况

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented GenerationKAG：通过知识增强生成提升专业领域的 LLM

Abstract 抽象

1 Introduction 1 介绍

2 Approach 阿拉伯数字 方法

2.1 LLM Friendly Knowledge Representation2.1 LLM 友好的知识表示

2.2 Mutual Indexing2.2 元互索引

2.2.1 Semantic Chunking2.2.1 语义分块

2.2.2 Information Extraction with More Descriptive Context2.2.2 具有更多描述性上下文的信息提取

2.2.3 Domain Knowledge Injection And Constraints2.2.3 领域知识注入和约束

2.2.4 Mutual indexing between text chunk vectors and knowledge structures2.2.4 文本块向量和知识结构之间的相互索引

2.3 Logical Form Solver2.3 逻辑形式求解器

2.3.1 Logical Form Planning2.3.1 逻辑形式规划

2.3.2 Logical Form for Reasoning2.3.2 推理的逻辑形式

2.3.3 Logical Form for Retrieval2.3.3 检索的逻辑形式

2.4 Knowledge Alignment2.4 知识对齐

2.4.1 Enhance Indexing2.4.1 增强索引

2.4.2 Enhance Retrieval2.4.2 增强检索

2.5 KAG-Model 2,5 KAG 型

2.5.1 Natural Language Understanding2.5.1 自然语言理解

2.5.2 Natural Language Inference2.5.2 自然语言推理

2.5.3 Natural Language Generation2.5.3 自然语言生成

2.5.4 Onepass Inference2.5.4 Onepass 推理

3 Experiments 3 实验

3.1 Experimental Settings3.1 实验设置

3.2 Experimental Results3.2 实验结果

3.2.1 Overall Results3.2.1 总体结果

3.3 Ablation Studies3.3 消融研究

3.3.1 Knowledge Graph Indexing Ablation3.3.1 知识图谱索引消融

3.3.2 Reasoning and Retrieval Ablation3.3.2 推理和检索消融

3.3.3 Experimental Results and Discussion3.3.3 实验结果与讨论

4 Applications 4 应用

4.1 KAG for E-Goverment4.1 电子政府 KAG

4.2 KAG for E-Health4.2 用于电子健康的 KAG

5 Related Works 5 相关作品

5.1 DIKW Pyramid5,1 DIKW 金字塔

5.2 Vector Similarity-based RAG5,2 基于向量相似度的 RAG

5.3 Information Retrieval-based GraphRAG5,3 基于信息检索的 GraphRAG

5.4 KG-based Question and Answering5,4 基于 KG 的问答

5.5 Bidirectional-enhancement of LLMs and KGs5,5 LLM 和 KG 的双向增强

6 Limitations 6 局限性

7 Conclusion and Future Work7 结论和未来工作

8 Acknowledgements8 确认

References

Appendix A Example of KAG Solver附录 A KAG 求解器示例

Appendix B Example of Logical form Reasoner附录 B Logical 形式 Reasoner 示例

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation
KAG：通过知识增强生成提升专业领域的 LLM

2 Approach 阿拉伯数字方法

2.1 LLM Friendly Knowledge Representation
2.1 LLM 友好的知识表示

2.2 Mutual Indexing
2.2 元互索引

2.2.1 Semantic Chunking
2.2.1 语义分块

2.2.2 Information Extraction with More Descriptive Context
2.2.2 具有更多描述性上下文的信息提取

2.2.3 Domain Knowledge Injection And Constraints
2.2.3 领域知识注入和约束

2.2.4 Mutual indexing between text chunk vectors and knowledge structures
2.2.4 文本块向量和知识结构之间的相互索引

2.3 Logical Form Solver
2.3 逻辑形式求解器

2.3.1 Logical Form Planning
2.3.1 逻辑形式规划

2.3.2 Logical Form for Reasoning
2.3.2 推理的逻辑形式

2.3.3 Logical Form for Retrieval
2.3.3 检索的逻辑形式

2.4 Knowledge Alignment
2.4 知识对齐

2.4.1 Enhance Indexing
2.4.1 增强索引

2.4.2 Enhance Retrieval
2.4.2 增强检索

2.5.1 Natural Language Understanding
2.5.1 自然语言理解

2.5.2 Natural Language Inference
2.5.2 自然语言推理

2.5.3 Natural Language Generation
2.5.3 自然语言生成

2.5.4 Onepass Inference
2.5.4 Onepass 推理

3.1 Experimental Settings
3.1 实验设置

3.2 Experimental Results
3.2 实验结果

3.2.1 Overall Results
3.2.1 总体结果

3.3 Ablation Studies
3.3 消融研究

3.3.1 Knowledge Graph Indexing Ablation
3.3.1 知识图谱索引消融

3.3.2 Reasoning and Retrieval Ablation
3.3.2 推理和检索消融

3.3.3 Experimental Results and Discussion
3.3.3 实验结果与讨论

4.1 KAG for E-Goverment
4.1 电子政府 KAG

4.2 KAG for E-Health
4.2 用于电子健康的 KAG

5.1 DIKW Pyramid
5,1 DIKW 金字塔

5.2 Vector Similarity-based RAG
5,2 基于向量相似度的 RAG

5.3 Information Retrieval-based GraphRAG
5,3 基于信息检索的 GraphRAG

5.4 KG-based Question and Answering
5,4 基于 KG 的问答

5.5 Bidirectional-enhancement of LLMs and KGs
5,5 LLM 和 KG 的双向增强

7 Conclusion and Future Work
7 结论和未来工作

8 Acknowledgements
8 确认

Appendix A Example of KAG Solver
附录 A KAG 求解器示例

Appendix B Example of Logical form Reasoner
附录 B Logical 形式 Reasoner 示例