Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
Zehong Wang ^(†,**,ox,1){ }^{\dagger, *, \otimes, 1} Zheyuan Liu ^(**,1){ }^{*, 1} Tianyi Ma ^(**,1){ }^{*, 1} Jiazheng Li ^(**,2){ }^{*, 2} Zheyuan Zhang*, ^(**){ }^{*} 王 ^(†,**,ox,1){ }^{\dagger, *, \otimes, 1} 哲源泽红 刘 ^(**,1){ }^{*, 1} 天义 马 ^(**,1){ }^{*, 1} 佳正 李 ^(**,2){ }^{*, 2} 哲源 张*, ^(**){ }^{*}Xingbo Fu ^(**,3)^{*, 3} Yiyang Li ^(**,1){ }^{*, 1} Zhengqing Yuan ^(**,1){ }^{*, 1} Wei Song ^(1){ }^{1} Yijun Ma ^(1){ }^{1} Qingkai Zeng ^(1){ }^{1} 星波 傅 ^(**,3)^{*, 3} 一阳 李 ^(**,1){ }^{*, 1} 正清 袁 ^(**,1){ }^{*, 1} 伟 宋 ^(1){ }^{1} 一军 马 ^(1){ }^{1} 庆凯 曾 ^(1){ }^{1}Xiusi Chen ^(4){ }^{4} Jianan Zhao ^(6,7){ }^{6,7} Jundong Li ^(3){ }^{3} Meng Jiang ^(1){ }^{1} Pietro Liò ^(5){ }^{5} 陈秀思 赵 ^(4){ }^{4}^(6,7){ }^{6,7} 俊东 李 ^(3){ }^{3} 梦 江 ^(1){ }^{1} 彼得罗 廖 ^(5){ }^{5}Nitesh Chawla ^(1){ }^{1} Chuxu Zhang ^(2){ }^{2} Yanfang Ye ^(⊛,1){ }^{\circledast, 1} Nitesh Chawla ^(1){ }^{1} Chuxu 张 ^(2){ }^{2} 艳芳 叶 ^(⊛,1){ }^{\circledast, 1}^(†){ }^{\dagger} Project Leader *Major Student Contributors ^(†){ }^{\dagger} 项目负责人 *主要学生贡献者^(®){ }^{\circledR} Correspondance: Zehong Wang zwang43@nd.edu, Yanfang Yeyye7@nd.edu ^(®){ }^{\circledR} 通讯:王泽红 zwang43@nd.edu、叶艳芳 yye7@nd.edu^(1){ }^{1} University of Notre Dame, ^(2){ }^{2} University of Connecticut, ^(3){ }^{3} University of Virginia, 圣母 ^(1){ }^{1} 大学、康涅狄格大学、 ^(2){ }^{2}^(3){ }^{3} 弗吉尼亚大学、^(4){ }^{4} University of Illinois Urbana-Champaign, ^(5){ }^{5} University of Cambridge, 伊利诺伊 ^(4){ }^{4} 大学厄巴纳-香槟分校、 ^(5){ }^{5} 剑桥大学、^(6){ }^{6} Mila - Québec AI Institute, ^(7){ }^{7} Université de Montréal ^(6){ }^{6} Mila - 蒙特利尔大学魁 ^(7){ }^{7} 北克人工智能研究所
Abstract 抽象
Graph-structured data pervades domains such as social networks, biological systems, knowledge graphs, and recommender systems. While foundation models have transformed natural language processing, vision, and multimodal learning through large-scale pretraining and generalization, extending these capabilities to graphs-characterized by non-Euclidean structures and complex relational semantics-poses unique challenges and opens new opportunities. To this end, Graph Foundation Models (GFMs) aim to bring scalable, generalpurpose intelligence to structured data, enabling broad transfer across graph-centric tasks and domains. This survey provides a comprehensive overview of GFMs, unifying diverse efforts under a modular framework comprising three key components: backbone architectures, pretraining strategies, and adaptation mechanisms. We categorize GFMs by their generalization scope-universal, task-specific, and domain-specific-and review representative methods, key innovations, and theoretical insights within each category. Beyond methodology, we examine theoretical foundations including transferability and emergent capabilities, and highlight key challenges such as structural alignment, heterogeneity, scalability, and evaluation. Positioned at the intersection of graph learning and generalpurpose AI , GFMs are poised to become foundational infrastructure for open-ended reasoning over structured data. This survey consolidates current progress and outlines future directions to guide research in this rapidly evolving field. Resources are available at https://github.com/Zehong-Wang/Awesome-Foundation-Models-on-Graphs. 图结构数据遍布社交网络、生物系统、知识图谱和推荐系统等领域。虽然基础模型通过大规模预训练和泛化改变了自然语言处理、视觉和多模态学习,但将这些功能扩展到以非欧几里得结构和复杂关系语义为特征的图带来了独特的挑战并开辟了新的机会。为此,图基础模型 (GFM) 旨在为结构化数据带来可扩展的通用智能,从而实现跨以图为中心的任务和领域的广泛传输。本调查全面概述了 GFM,将各种工作统一在一个模块化框架下,该框架由三个关键组成部分组成:主干架构、预训练策略和适应机制。我们根据通用范围(通用、任务特定和领域特定)对 GFM 进行分类,并回顾每个类别中的代表性方法、关键创新和理论见解。除了方法论之外,我们还研究了包括可转移性和涌现能力在内的理论基础,并强调了结构一致性、异质性、可扩展性和评估等关键挑战。GFM 位于图学习和通用人工智能的交叉点,有望成为结构化数据开放式推理的基础基础设施。该调查整合了当前的进展,并概述了指导这一快速发展领域的研究的未来方向。资源可在 https://github.com/Zehong-Wang/Awesome-Foundation-Models-on-Graphs 获得。
The pursuit of a one-model-fits-all paradigm stands as one of the most ambitious and transformative goals in machine learning. This vision aspires to develop highly generalizable models capable of performing a wide spectrum of tasks across diverse domains, without requiring extensive task-specific architecture design or training. Historically, machine learning has been dominated by specialized models tailored to specific data modalities and objectives [1], often requiring handcrafted features [2] and domain-dependent optimization strategies [3]. From early rule-based systems and linear classifiers to the rise of deep learning, the evolution of machine learning has been marked by progressive gains in representation learning, scalability, and task performance [4, 5]. Classical models such as decision trees, support vector machines (SVMs), and k-nearest neighbors (KNN) demonstrated success in low-dimensional and structured settings, but faced challenges when applied to high-dimensional, unstructured, or multimodal data. The emergence of deep learning models-such as convolutional neural networks (CNNs) for vision [6] and recurrent neural networks (RNNs) for sequential data [7, 8]-significantly advanced performance in perceptual tasks. Nonetheless, these models still required task-specific tuning, architecture adjustments, and large-scale labeled datasets to achieve robust generalization. A paradigm shift occurred with the development of transfer learning [9] and self-supervised learning [10], which enabled models to learn broadly transferable representations from large-scale unlabeled data. These developments laid the groundwork for the emergence of foundation models [11], which are trained on massive datasets with the objective of acquiring universal knowledge that can be readily adapted to a wide array of downstream tasks. 追求一刀切的模式是机器学习领域最雄心勃勃、最具变革性的目标之一。这一愿景旨在开发高度通用的模型,能够跨不同领域执行广泛的任务,而无需进行广泛的特定任务架构设计或培训。从历史上看,机器学习一直由针对特定数据模式和目标定制的专门模型主导 [1],通常需要手工制作的特征 [2] 和领域相关的优化策略 [3]。从早期的基于规则的系统和线性分类器到深度学习的兴起,机器学习的发展以表示学习、可扩展性和任务性能的逐步进步为标志[4,5]。决策树、支持向量机 (SVM) 和 k 最近邻 (KNN) 等经典模型在低维和结构化设置中取得了成功,但在应用于高维、非结构化或多模态数据时面临挑战。深度学习模型的出现,例如用于视觉的卷积神经网络(CNN)[6]和用于顺序数据的循环神经网络(RNN)[7,8]——显着提高了感知任务的性能。尽管如此,这些模型仍然需要特定于任务的调整、架构调整和大规模标记数据集才能实现稳健的泛化。随着迁移学习[9]和自监督学习[10]的发展,范式发生了转变,这使得模型能够从大规模未标记数据中学习广泛的可转移表示。 这些发展为基础模型的出现奠定了基础[11],这些模型在海量数据集上进行训练,目的是获得易于适应各种下游任务的通用知识。
Foundation models are characterized by their scale, general-purpose nature, and pretraining across heterogeneous data sources. They are built to capture transferable inductive biases, enabling strong performance with minimal task-specific supervision. Scaling laws [12, 13] and data-driven learning paradigms have driven their 基础模型的特点是其规模、通用性质以及跨异构数据源的预训练。它们旨在捕获可转移的归纳偏差,以最少的特定任务监督实现强大的性能。缩放定律[12,13]和数据驱动的学习范式推动了它们
Figure 1: From Task-Specific Graph Models to General-Purpose Graph Foundation Models. This figure contrasts the paradigm shift from traditional Graph Neural Networks (GNNs) to Graph Foundation Models (GFMs). (a) GFMs are pretrained on large-scale graph corpora spanning multiple domains (e.g., social, web, academic, molecular) to acquire broadly transferable representations. Through various adaptation techniques-such as fine-tuning, distillation, prompting, or zero-shot inference-they can generalize across a wide spectrum of downstream tasks, including node classification, link prediction, graph classification, and graph-to-text generation. (b) In contrast, traditional GNNs are typically trained in an end-to-end manner on a single-domain dataset for a specific task, often lacking the scalability and generalization capabilities required for open-world settings. This shift mirrors the transition observed in language and vision domains, where foundation models have redefined the standard for general-purpose intelligence. 图 1:从特定于任务的图模型到通用图基础模型。该图对比了从传统图神经网络 (GNN) 到图基础模型 (GFM) 的范式转变。(a) GFM 在跨多个领域(例如,社交、网络、学术、分子)的大规模图语料库上进行预训练,以获得广泛可转移的表示。通过各种适配技术(例如微调、蒸馏、提示或零样本推理),它们可以在广泛的下游任务中进行推广,包括节点分类、链接预测、图分类和图到文本生成。(b) 相比之下,传统的 GNN 通常以端到端的方式在单域数据集上针对特定任务进行训练,通常缺乏开放世界设置所需的可扩展性和泛化能力。这种转变反映了在语言和视觉领域观察到的转变,其中基础模型重新定义了通用智能的标准。
success across numerous domains, including natural language processing, computer vision, and robotics. For instance, Large Language Models (LLMs) [14, 15] process text by tokenizing input sequences and formulating tasks such as translation, summarization, or reasoning as autoregressive next-token prediction problems. Similarly, Large Vision Models (LVMs) [16, 17, 18] treat visual inputs as sequences of tokens and apply Transformer-based architectures for visual question answering, captioning, or image generation. These models exhibit remarkable zero-shot and few-shot generalization capabilities, enabling rapid adaptation to novel tasks without requiring substantial fine-tuning. 在自然语言处理、计算机视觉和机器人技术等众多领域取得成功。例如,大型语言模型(LLMs)[14, 15]通过对输入序列进行标记化,并将翻译、摘要或推理等任务表述为自回归下一个标记预测问题来处理文本。同样,大型视觉模型(LVM)[16,17,18]将视觉输入视为标记序列,并应用基于 Transformer 的架构进行视觉问答、字幕或图像生成。这些模型表现出卓越的零样本和少量泛化能力,能够快速适应新任务,而无需进行大量微调。
In this context, the rise of Graph Foundation Models (GFMs), as illustrated in Figure 1, seeks to extend these capabilities to graph-structured data-an essential yet fundamentally different modality characterized by relational dependencies, permutation invariance, and non-Euclidean geometry [19, 20, 21]. GFMs aspire to offer a unified, pretrainable, and adaptable solution for a wide range of graph-based applications, spanning from molecular property prediction and knowledge graph reasoning to social network analysis and recommendation systems. For instance, OFA [22] operates on eight text-attributed graphs (TAGs), spanning citation networks, Wikipedia networks, knowledge graphs, and molecular graphs, where each node is associated with a textual description. By employing a shared textual encoder, OFA projects these node descriptions into a unified embedding space, thereby aligning node features across graphs. To bridge the gap between pretraining and downstream tasks, it further introduces a prompt graph mechanism tailored to facilitate task adaptation. Similarly, GFT [23] identifies transferable patterns in graph data by modeling them as computation trees. It aligns node representations across graphs via a tree reconstruction task designed to capture cross-domain generalization. A key innovation of GFT lies in its construction of a transferable tree vocabulary, which encodes structural patterns shared across diverse graph domains. Beyond these general-purpose models, various GFMs have been proposed for specific tasks-such as node classification [24, 25], anomaly detection [26], and recommendation systems [27]-or are specialized for particular domains, including knowledge graphs [28,29], molecular graphs [30, 31], and computation graphs [32, 33]. 在这种情况下,图基础模型(GFM)的兴起,如图 1 所示,试图将这些功能扩展到图结构数据——一种基本但根本不同的模态,其特征是关系依赖性、排列不变性和非欧几里得几何[19,20,21]。GFM 渴望为广泛的基于图的应用提供统一、可预训练和适应性强的解决方案,从分子属性预测和知识图推理到社交网络分析和推荐系统。例如,OFA [22]在八个文本属性图(TAG)上运行,跨越引文网络、维基百科网络、知识图谱和分子图,其中每个节点都与文本描述相关联。通过采用共享文本编码器,OFA 将这些节点描述投影到一个统一的嵌入空间中,从而跨图对齐节点特征。为了弥合预训练和下游任务之间的差距,它进一步引入了一种量身定制的提示图机制,以促进任务适应。同样,GFT [23]通过将图数据建模为计算树来识别图数据中的可转移模式。它通过旨在捕获跨域泛化的树重建任务来对齐图中的节点表示。GFT 的一个关键创新在于它构建了一个可转移的树词汇表,该词汇表对不同图域之间共享的结构模式进行编码。除了这些通用模型之外,还提出了各种用于特定任务的 GFM,例如节点分类[24,25],异常检测[26]和推荐系统[27],或者专门用于特定领域,包括知识图谱[28\u201229]、分子图谱[30,31]和计算图[32,33]。
Existing Surveys. Despite the rapid progress and growing interest in GFMs, the literature still lacks a comprehensive and unified survey that systematically covers the breadth and depth of this emerging field. Existing reviews tend to focus on isolated aspects of GFMs, offering fragmented insights without capturing the full landscape of foundational techniques, design challenges, and research directions. For instance, Liu et al. [34] propose a taxonomy of GFMs based on backbone architectures-categorizing them into GNNbased, LLM-based, and hybrid GNN+LLM models-but their discussion remains limited to methodologies, without delving into applications and theoretical understandings. Zhao et al. [35] center their analysis around pretraining objectives, offering valuable insights into learning paradigms. However, their scope excludes broader system design and theoretical insights. Mao et al. [36] provide a theoretical perspective on transferability within GFMs, shedding light on generalization capacity but omitting concrete methodological advances and empirical systematization. Wang et al. [37] similarly emphasize transferability and emergent abilities, but without encompassing the full architectural, algorithmic, and application-driven spectrum of GFMs. In a complementary direction, Zhao et al. [38] survey methods for cross-domain graph learning, a crucial but singular facet of GFM design. However, effective foundation models must also address cross-task generalization and structural alignment across diverse graph types. Other works such as Wu et al. [39] explore the use of GFMs in specific domains like recommender systems, while recent reviews [40, 41, 42, 43] focus on the integration of GNNs and LLMs, treating them as a subfield rather than part of a cohesive GFM framework. 现有调查。尽管 GFM 取得了快速进展和日益增长的兴趣,但文献仍然缺乏全面和统一的调查,系统地涵盖这一新兴领域的广度和深度。现有的评论往往侧重于 GFM 的孤立方面,提供零散的见解,而没有捕捉基础技术、设计挑战和研究方向的全部情况。例如,Liu 等[34]提出了一种基于主干架构的 GFM 分类法,将其分为基于 GNN 的、基于 LLM 的和混合的 GNN+LLM 模型,但他们的讨论仍然局限于方法论,而没有深入研究应用和理论理解。Zhao 等[35]以预训练目标为分析中心,为学习范式提供了有价值的见解。然而,它们的范围排除了更广泛的系统设计和理论见解。毛等[36]提供了关于 GFM 内可转移性的理论视角,阐明了泛化能力,但忽略了具体的方法论进步和实证系统化。Wang 等[37]同样强调可转移性和涌现能力,但没有涵盖 GFM 的全部架构、算法和应用驱动范围。在互补的方向上,Zhao 等[38]调查了跨域图学习的方法,这是 GFM 设计的一个关键但独特的方面。然而,有效的基础模型还必须解决跨任务泛化和跨不同图类型的结构对齐问题。其他作品如吴等人。 [39]探索了 GFM 在推荐系统等特定领域的使用,而最近的评论[40,41,42,43]侧重于 GNN 和 LLM 的集成,将它们视为一个子领域,而不是一个有凝聚力的 GFM 框架的一部分。
Our Position. In contrast, our survey aims to bridge these gaps by offering a holistic and systematic review of graph foundation models. We begin by outlining the historical development and foundational challenges behind GFMs, followed by a unified framework that decomposes GFMs into their core components-backbone architectures, pretraining strategies, and adaptation mechanisms. We introduce a comprehensive taxonomy that classifies GFMs into universal, domain-specific, and task-specific paradigms. Moreover, we analyze theoretical foundations (e.g., transferability, emergent ability), benchmark resources, and current limitations. Finally, we synthesize open research challenges and future directions to guide the continued advancement of the field. Our key contributions are summarized as follows: 我们的立场。相比之下,我们的调查旨在通过对图基础模型进行全面和系统的审查来弥合这些差距。我们首先概述了 GFM 背后的历史发展和基础挑战,然后是一个统一的框架,将 GFM 分解为其核心组件——主干架构、预训练策略和适应机制。我们引入了一种全面的分类法,将 GFM 分为通用范式、特定领域范式和特定任务范式。此外,我们还分析了理论基础(例如可转移性、涌现能力)、基准资源和当前局限性。最后,我们综合开放的研究挑战和未来方向,以指导该领域的持续发展。我们的主要贡献总结如下:
Challenges in Designing GFMs (Section 3). We identify and categorize the fundamental challenges in building graph foundation models into three core dimensions: feature heterogeneity, structural heterogeneity, 设计 GFM 的挑战(第 3 节)。我们将构建图基础模型的基本挑战确定并分类为三个核心维度:特征异质性、结构异质性、
and task heterogeneity. These challenges highlight the unique complexities of learning from graph-structured data at scale. 和任务异构性。这些挑战凸显了从大规模图结构化数据中学习的独特复杂性。
A Unified Framework (Section 4). We propose a unified modular framework that decomposes GFMs into three key components: backbone architectures, pretraining strategies, and adaptation mechanisms. This abstraction facilitates a systematic understanding of diverse design choices and supports composability across methods. 统一框架(第 4 节)。我们提出了一个统一的模块化框架,将 GFM 分解为三个关键组件:主干架构、预训练策略和适应机制。这种抽象有助于系统地理解不同的设计选择,并支持跨方法的可组合性。
Taxonomy and Comprehensive Review (Sections 5, 6, 7). We introduce a principled taxonomy that classifies GFMs into three categories based on their scope and generalization capacity: universal GFMs, domain-specific GFMs, and task-specific GFMs. For each category, we conduct an extensive literature review ^(1){ }^{1}, detailing design philosophies and summarizing representative models. 分类学和综合审查(第 5、6、7 节)。我们引入了一种原则性分类法,根据其范围和泛化能力将 GFM 分为三类:通用 GFM、特定领域 GFM 和特定任务 GFM。对于每个类别,我们都会进行广泛的文献综述 ^(1){ }^{1} ,详细介绍设计理念并总结代表性模型。
Theoretical Foundations (Section 8). We explore the theoretical foundations underpinning GFMs, with a focus on scaling laws, transferability theory, and emerging understanding of generalization in graph-based pretraining. These insights provide formal grounding for the empirical success of GFMs. 理论基础(第 8 节)。我们探索了支撑 GFM 的理论基础,重点关注缩放定律、可转移性理论以及对基于图的预训练中泛化的新兴理解。这些见解为 GFM 的实证成功提供了正式基础。
Resources and GitHub Repository (Section 9). To support reproducibility and accelerate ongoing research, we compile and release a curated repository of resources, including benchmark datasets, open-source implementations, pretrained models, and a living GitHub collection: https://github.com/Zehong-Wang/ Awesome-Foundation-Models-on-Graphs. 资源和 GitHub 存储库(第 9 节)。为了支持可重复性并加速正在进行的研究,我们编译并发布了一个精选的资源存储库,包括基准数据集、开源实现、预训练模型和一个活的 GitHub 集合:https://github.com/Zehong-Wang/ Awesome-Foundation-Models-on-Graphs。
Open Questions (Section 10). We conclude by outlining key open problems in the development of GFMs, such as effective alignment across heterogeneous graphs, scalable and efficient adaptation mechanisms, robust evaluation protocols, and deeper theoretical insights. These challenges point to promising avenues for advancing the next generation of general-purpose graph learning systems. 开放性问题(第 10 节)。最后,我们概述了 GFM 开发中的关键悬而未决的问题,例如跨异构图的有效对齐、可扩展且高效的适应机制、稳健的评估协议和更深入的理论见解。这些挑战为推进下一代通用图学习系统提供了有希望的途径。
Summary of Future Directions in Building Graph Foundation Models. Despite recent progress, the development of GFMs remains in its infancy, with numerous open challenges spanning scalability, data availability, evaluation, utilization, and theoretical understanding. First, unlike LLMs and VLMs that benefit from established scaling laws, GFMs require more scalable architectures, high-level generative objectives, and unified learning instances to unlock similar performance gains. Second, addressing the data scarcity inherent to graphs calls for automated graph data collection, high-fidelity synthetic generation, and qualitycentric dataset curation strategies. Third, evaluating GFMs demands benchmarks that reflect real-world tasks, alongside metrics that capture generalization, robustness, and trustworthiness. Fourth, effectively utilizing GFMs involves improving adaptation mechanisms (e.g., zero-shot and prompt-based learning), identifying high-impact applications beyond traditional graph tasks, and integrating multimodal knowledge representations. Finally, theoretical foundations remain underexplored-key issues include understanding the limits of transferability, resolving cross-domain pattern conflicts, ensuring robustness under distribution shifts, and deriving generalization guarantees. Addressing these open questions is essential to realizing the full potential of GFMs across diverse domains. The comprehensive discussion is provided in Section 10. 构建图基础模型的未来方向总结。尽管最近取得了进展,但 GFM 的开发仍处于起步阶段,存在许多悬而未决的挑战,涵盖可扩展性、数据可用性、评估、利用和理论理解。首先,与受益于既定扩展定律的 LLM 和 VLM 不同,GFM 需要更具可扩展性的架构、高级生成目标和统一的学习实例来释放类似的性能提升。其次,解决图固有的数据稀缺问题需要自动图数据收集、高保真合成生成和以质量为中心的数据集管理策略。第三,评估 GFM 需要反映实际任务的基准,以及捕捉泛化、稳健性和可信度的指标。第四,有效利用 GFM 涉及改进适应机制(例如,零样本和基于提示的学习),识别传统图任务之外的高影响力应用,以及集成多模态知识表示。最后,理论基础仍然未得到充分探索——关键问题包括理解可转移性的极限、解决跨域模式冲突、确保分布偏移下的鲁棒性以及推导泛化保证。解决这些悬而未决的问题对于充分发挥 GFM 在不同领域的潜力至关重要。第 10 节提供了全面的讨论。 ^(1){ }^{1} As of April 1, 2025 ^(1){ }^{1} 截至 2025 年 4 月 1 日
2 Background 2 背景
2.1 A Brief History of Graph Learning 2.1 图学习简史
Similar to the trajectory observed in natural language processing (NLP) and computer vision (CV), graph machine learning is undergoing a paradigm shift-from highly specialized, task-specific models toward more unified and general-purpose frameworks. This evolution has progressed through several key milestones, as outlined below: 与在自然语言处理(NLP)和计算机视觉(CV)中观察到的轨迹类似,图机器学习正在经历范式转变——从高度专业化、特定于任务的模型转向更加统一和通用的框架。这一演变经历了几个关键里程碑,如下所述:
Traditional Graph Learning Methods. Early approaches to graph learning were deeply rooted in classical graph theory and combinatorial optimization [44]. Algorithms such as shortest path computation [45], spectral clustering [46], and graph kernels [47] enabled important applications in network analysis, community detection, and graph matching. However, these methods often relied on handcrafted features, struggled with scalability, and lacked the capacity to learn rich, transferable representations. 传统的图学习方法。早期的图学习方法深深植根于经典图论和组合优化[44]。最短路径计算[45]、谱聚类[46]和图核[47]等算法在网络分析、社区检测和图匹配方面实现了重要应用。然而,这些方法通常依赖于手工制作的功能,在可扩展性方面存在困难,并且缺乏学习丰富、可转移表示的能力。
Graph Embedding. The integration of representation learning into graphs led to the emergence of graph embedding techniques. Methods such as DeepWalk [48], node2vec [49], and LINE [50] introduced the idea of mapping nodes into low-dimensional continuous vector spaces via random walks or neighborhood sampling. These embeddings proved effective for downstream tasks like node classification, clustering, and link prediction. Nonetheless, they were largely transductive, lacked inductive generalization, and captured structural patterns without accommodating node or edge attributes. 图嵌入。表示学习与图的集成导致了图嵌入技术的出现。DeepWalk [48]、node2vec [49]和 LINE [50]等方法引入了通过随机游走或邻域采样将节点映射到低维连续向量空间中的想法。事实证明,这些嵌入对于节点分类、聚类和链路预测等下游任务是有效的。尽管如此,它们在很大程度上是转导性的,缺乏归纳泛化,并且在没有适应节点或边缘属性的情况下捕获结构模式。
Graph Neural Networks. A transformative step occurred with the introduction of Graph Neural Networks (GNNs) [51], which brought deep learning principles to non-Euclidean graph structures. GNNs utilize message-passing mechanisms [52] to iteratively aggregate and update node representations based on their neighbors. Key models include Graph Convolutional Networks (GCNs) [19], Graph Attention Networks (GATs) [20], and GraphSAGE [21], each improving expressive power and generalization in different contexts. While GNNs advanced the field significantly, they remained constrained by challenges such as oversmoothing, limited receptive fields, and the need for extensive task-specific architecture tuning. 图神经网络。图神经网络(GNN)的引入发生了变革性步骤[51],它为非欧几里得图结构带来了深度学习原理。GNN 利用消息传递机制[52],根据其邻居迭代聚合和更新节点表示。关键模型包括图卷积网络(GCN)[19]、图注意力网络(GATs)[20]和 GraphSAGE [21],它们都提高了不同上下文下的表达能力和泛化能力。虽然 GNN 在该领域取得了显着进步,但它们仍然受到过度平滑、感受野有限以及需要大量特定于任务的架构调整等挑战的限制。
Figure 2: The Evolution of Graph Learning Paradigms. This figure illustrates the historical trajectory of graph learning, highlighting the increasing task-solving capacity over time. (1) Statistical methods (pre-2010s) relied on heuristic-driven techniques, such as spectral analysis and graph kernels, to solve narrowly scoped graph tasks. (2) Graph embeddings (circa 2010) introduced shallow, task-agnostic representations via random walks or matrix factorization, enabling better structural understanding. (3) Graph neural networks (2016 onward) adopted deep learning principles-particularly message passing-to build end-to-end task-specific models capable of capturing semantic dependencies. (4) Graph foundation models (post-2023) represent the latest paradigm, aiming for universal, general-purpose solvers that are pretrained on large-scale graphs and adapted to diverse downstream tasks across domains. This timeline reflects a broader shift from handcrafted, task-bound solutions to scalable, generalizable graph intelligence. 图 2:图学习范式的演变。该图说明了图学习的历史轨迹,强调了随着时间的推移不断提高的任务解决能力。(1) 统计方法(2010 年代之前)依靠启发式驱动技术,例如光谱分析和图核,来解决范围狭窄的图任务。(2)图嵌入(大约2010年)通过随机游走或矩阵分解引入了浅层的、与任务无关的表示,从而实现了更好的结构理解。(3)图神经网络(2016年起)采用深度学习原理,特别是消息传递,构建能够捕获语义依赖关系的端到端任务特定模型。(4)图基础模型(2023 年后)代表了最新的范式,旨在构建通用的通用求解器,这些求解器在大规模图上进行预训练,并适应跨领域的各种下游任务。这一时间表反映了从手工制作的、任务约束的解决方案到可扩展、可推广的图智能的更广泛转变。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
Graph Foundation Models. Inspired by the success of foundation models in NLP and vision, graph learning has recently entered the era of Graph Foundation Models. These models [23, 30, 53] are pretrained on large-scale graphs using self-supervised objectives [54], enabling them to learn universal representations that transfer across tasks and domains. GFMs integrate both structural dependencies and semantic content, exhibiting strong zero-shot and few-shot generalization. Applications span diverse domains, including molecular property prediction, social network analysis, recommendation systems, and knowledge graphs. By decoupling model training from specific tasks, GFMs reduce reliance on labeled data and domain-specific heuristics, moving the field closer to general-purpose graph intelligence. 图形基础模型。受到基础模型在 NLP 和视觉方面的成功的启发,图学习最近进入了图基础模型时代。这些模型[23,30,53]使用自监督目标[54]在大规模图上进行了预训练,使它们能够学习跨任务和跨域转移的通用表示。GFM 整合了结构依赖性和语义内容,表现出很强的零样本和少样本泛化。应用涵盖多个领域,包括分子属性预测、社交网络分析、推荐系统和知识图谱。通过将模型训练与特定任务解耦,GFM 减少了对标记数据和特定领域启发式方法的依赖,使该领域更接近通用图智能。
2.2 Background of Foundation Models 2.2 基础模型的背景
Foundation Models. Foundation models have emerged as a cornerstone in modern artificial intelligence, representing a paradigm shift from narrowly designed task-specific models to highly generalizable systems. These models are defined by large-scale pretraining on diverse and heterogeneous datasets-ranging from web text and books to images, code, and multimodal content-which enables them to develop broad, transferable capabilities across a wide array of domains. The term “foundation model” was popularized by the Stanford Institute for Human-Centered Artificial Intelligence [11], underscoring the shared trends across modalities such as language, vision, code, and audio. Architecturally, foundation models are typically built upon the transformer framework [55], which leverages self-attention mechanisms to effectively capture long-range dependencies and scale across billions of parameters. Prominent examples include GPT [56], BERT [57], PaLM [58], and LLaMA [59] in the language domain, as well as CLIP [60] and DALL•E [60, 61] for vision-language tasks. 基础模型。基础模型已成为现代人工智能的基石,代表了从狭隘设计的特定任务模型到高度通用系统的范式转变。这些模型是通过对各种异构数据集(从网络文本和书籍到图像、代码和多模态内容)进行大规模预训练来定义的,这使它们能够在广泛的领域开发广泛的、可转移的能力。“基础模型”一词由斯坦福大学以人为本的人工智能研究所[11]推广,强调了语言、视觉、代码和音频等模态的共同趋势。在架构上,基础模型通常建立在 Transformer 框架之上[55],该框架利用自注意力机制来有效地捕获远程依赖关系,并跨数十亿个参数进行扩展。突出的例子包括语言领域的 GPT [56]、BERT [57]、PaLM [58]和 LLaMA [59],以及视觉语言任务的 CLIP [60]和 DALL•E [60, 61]。
A defining strength of foundation models lies in their ability to learn general-purpose representations that are highly adaptable to new tasks with minimal additional supervision. Through exposure to large and diverse pretraining data, these models acquire versatile semantic, syntactic, and structural knowledge, which can be transferred to a wide range of downstream tasks via fine-tuning, prompting, or instruction tuning. Their transformer-based architecture further enhances their capacity to model complex and structured data, making them suitable for tasks involving reasoning, generation, and classification. Notably, foundation models demonstrate strong few-shot and zero-shot learning capabilities by leveraging in-context learning-where the model performs new tasks based solely on input prompts without additional gradient updates. This opens up a flexible interface for users to guide model behavior through natural language or structured prompts. 基础模型的一个决定性优势在于它们能够学习通用表示,这些表示在最少的额外监督下高度适应新任务。通过接触大量且多样化的预训练数据,这些模型获得了通用的语义、句法和结构知识,这些知识可以通过微调、提示或指令调整转移到广泛的下游任务中。它们基于 Transformer 的架构进一步增强了它们对复杂和结构化数据进行建模的能力,使其适合涉及推理、生成和分类的任务。值得注意的是,基础模型通过利用上下文学习展示了强大的少量和零样本学习能力——其中模型仅根据输入提示执行新任务,而无需额外的梯度更新。这为用户打开了一个灵活的界面,可以通过自然语言或结构化提示来指导模型行为。
As these models scale in size and training data, they begin to exhibit emergent behaviors-capabilities that were not explicitly programmed or observed in smaller models [62,12]. These include logical reasoning, chain-of-thought generation, tool use, and other complex cognitive functions that arise organically during training. While these behaviors highlight the vast potential of foundation models as general-purpose AI systems, they also introduce new challenges. Issues such as data and societal bias, interpretability, safety, and environmental cost become increasingly relevant as these models are integrated into real-world applications. Nonetheless, the ability of foundation models to unify learning across tasks and domains continues to reshape the AI landscape and drive innovation across disciplines. 随着这些模型在大小和训练数据上的扩展,它们开始表现出在较小的模型中没有明确编程或观察到的紧急行为能力[62,12]。其中包括逻辑推理、思维链生成、工具使用以及训练过程中有机产生的其他复杂的认知功能。虽然这些行为凸显了基础模型作为通用人工智能系统的巨大潜力,但它们也带来了新的挑战。随着数据和社会偏见、可解释性、安全性和环境成本等问题被集成到现实世界的应用中,这些问题变得越来越重要。尽管如此,基础模型统一跨任务和领域学习的能力继续重塑人工智能格局并推动跨学科创新。
Graph Foundation Models. Inspired by the success of foundation models on other domains, graph foundation models are proposed as specialized foundation models designed to understand and reason over graph-structured data. Graphs-comprising nodes (entities) and edges (relationships)-are widely used to represent complex systems such as social networks, molecular structures, and knowledge graphs. Same as other foundation models, GFMs are pre-trained on large, diverse graph datasets to learn general-purpose representations that can be fine-tuned or adapted for various downstream tasks like node classification, link prediction, and graph generation. By capturing the structural and relational properties inherent in graphs, GFMs enable more effective and scalable analysis across domains where interconnected data plays a central role. We summarize the key properties of GFMs in the following. 图形基础模型。受到基础模型在其他领域的成功的启发,图基础模型被提议为专门的基础模型,旨在理解和推理图结构数据。图——包括节点(实体)和边(关系)——被广泛用于表示复杂的系统,如社交网络、分子结构和知识图谱。与其他基础模型一样,GFM 在大型、多样化的图数据集上进行预训练,以学习通用表示,这些表示可以微调或适应各种下游任务,如节点分类、链路预测和图生成。通过捕获图固有的结构和关系属性,GFM 可以跨互连数据发挥核心作用的领域进行更有效和可扩展的分析。我们将 GFM 的关键特性总结如下。
Pretraining on Large-Scale Graph Data GFMs are trained on extensive and diverse graph datasets, enabling them to learn generalizable patterns and structural semantics across various domains (e.g., biology, social networks, and knowledge graphs). 大规模图数据预训练 GFM 在广泛且多样化的图数据集上进行训练,使它们能够学习跨各个领域(例如生物学、社交网络和知识图谱)的可推广模式和结构语义。
General-Purpose Representations GFMs learn universal node, edge, and graph-level embeddings that can be adapted to a wide range of tasks with minimal fine-tuning or prompting efforts. 通用表示 GFM 学习通用节点、边缘和图级嵌入,这些嵌入可以适应各种任务,只需最少的微调或提示工作。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
Structural Awareness GFMs inherently capture topological features of graphs-such as connectivity, neighborhood structure, and global graph properties-making them effective in modeling complex relationships. 结构意识 GFM 本质上捕获图的拓扑特征,例如连通性、邻域结构和全局图属性,使其能够有效地对复杂的关系进行建模。
Transferability Similar to other foundation models, GFMs can transfer knowledge across tasks and domains, allowing for effective performance even with limited task-specific data. 可转移性与其他基础模型类似,GFM 可以跨任务和跨领域转移知识,即使特定于任务的数据有限,也能实现有效的性能。
Few-shot and Zero-shot Capabilities Once pre-trained, GFMs can often perform new tasks with few or even no labeled examples by leveraging their rich internal representations. 少样本和零样本功能一旦经过预训练,GFM 通常可以利用其丰富的内部表示来执行新任务,只需很少甚至没有标记的示例。
2.3 Definitions & Notations 2.3 定义和符号
We introduce key notations and concepts used throughout this paper. Table 1 summarizes the primary symbols and their respective meanings. Throughout the paper, bold uppercase letters denote matrices, while bold lowercase letters represent vectors. 我们介绍了本文中使用的关键符号和概念。表 1 总结了主要符号及其各自的含义。在整篇论文中,粗体大写字母表示矩阵,而粗体小写字母表示向量。
Definition 2.1 (Graph). A graph is represented as G=(V,E)\mathcal{G}=(\mathcal{V}, \mathcal{E}), where V\mathcal{V} and EsubeVxxV\mathcal{E} \subseteq \mathcal{V} \times \mathcal{V} are node and edge sets, where N=|V|N=|\mathcal{V}| and M=|E|M=|\mathcal{E}| denote the number of nodes and edges. This defines the adjacency matrix A. Iff (i,j)inE(i, j) \in \mathcal{E}, then A_(ij)=1\mathbf{A}_{i j}=1; otherwise, A_(ij)=0\mathbf{A}_{i j}=0. This allows us to define the neighborhoods for each node in the graph as N_(v)={u inV∣(u,v)inE}\mathcal{N}_{v}=\{u \in \mathcal{V} \mid(u, v) \in \mathcal{E}\}. 定义 2.1(图表)。图形表示为 G=(V,E)\mathcal{G}=(\mathcal{V}, \mathcal{E}) ,其中 V\mathcal{V} 和 EsubeVxxV\mathcal{E} \subseteq \mathcal{V} \times \mathcal{V} 是节点和边集,其中 N=|V|N=|\mathcal{V}| 和 M=|E|M=|\mathcal{E}| 表示节点和边的数量。这定义了邻接矩阵 A. Iff (i,j)inE(i, j) \in \mathcal{E} ,然后 A_(ij)=1\mathbf{A}_{i j}=1 ;否则, A_(ij)=0\mathbf{A}_{i j}=0 .这允许我们将图中每个节点的邻域定义为 N_(v)={u inV∣(u,v)inE}\mathcal{N}_{v}=\{u \in \mathcal{V} \mid(u, v) \in \mathcal{E}\} 。
Definition 2.2 (Attributed Graph). If a graph associates to node features or edge features, the graph is denoted as attributed graph. This is represented as G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}). The XinR^(N xx D)\mathbf{X} \in \mathbb{R}^{N \times D} and Ain{0,1}^(N xx N)\mathbf{A} \in\{0,1\}^{N \times N} denote the node attributes and adjacent matrix, respectively. The raw attribute of node v_(i)inVv_{i} \in \mathcal{V} is represented by x_(i)inR^(D)\mathbf{x}_{i} \in \mathbb{R}^{D}. 定义 2.2(属性图)。如果图形与节点特征或边特征相关联,则该图形将表示为属性图形。这表示为 G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) 。和 分别 XinR^(N xx D)\mathbf{X} \in \mathbb{R}^{N \times D}Ain{0,1}^(N xx N)\mathbf{A} \in\{0,1\}^{N \times N} 表示节点属性和相邻矩阵。node 的原始属性 v_(i)inVv_{i} \in \mathcal{V} 由 表示。 x_(i)inR^(D)\mathbf{x}_{i} \in \mathbb{R}^{D}
Definition 2.3 (Text-Attributed Graph (TAG)). A text-attributed graph is formally defined as G=(X,A,D)\mathcal{G}=(\mathbf{X}, \mathbf{A}, \mathbf{D}), where X\mathbf{X} denotes node attributes, A\mathbf{A} represents the adjacency matrix, and D\mathbf{D} encapsulates textual descriptions 定义 2.3(文本属性图 (TAG))。文本属性图的正式定义为 G=(X,A,D)\mathcal{G}=(\mathbf{X}, \mathbf{A}, \mathbf{D}) ,其中 X\mathbf{X} 表示节点属性, A\mathbf{A} 表示邻接矩阵,并 D\mathbf{D} 封装文本描述
Table 1: Summary of Notations 表 1:符号摘要
Symbol 象征
Description 描述
G\mathcal{G}
A graph 图表
V,E\mathcal{V}, \mathcal{E}
Sets of nodes and edges in graph G\mathcal{G} 图形 G\mathcal{G} 中的节点和边集
N,MN, M
Number of nodes and edges 节点和边数
v_(i)inVv_{i} \in \mathcal{V}
A node in graph G\mathcal{G} 图 G\mathcal{G} 中的节点
e_(ij)inEe_{i j} \in \mathcal{E}
An edge in graph G\mathcal{G} 图 G\mathcal{G} 中的边缘
XinR^(N xx D)\mathbf{X} \in \mathbb{R}^{N \times D}
Node attribute matrix for graph G\mathcal{G} 图形 G\mathcal{G} 的节点属性矩阵
Large Language Model (LLM) encoder 大型语言模型 (LLM) 编码器
Symbol Description
G A graph
V,E Sets of nodes and edges in graph G
N,M Number of nodes and edges
v_(i)inV A node in graph G
e_(ij)inE An edge in graph G
XinR^(N xx D) Node attribute matrix for graph G
x_(i)inR^(D) Feature vector for node v_(i)inV
EinR^(M xx D) Edge attribute matrix for graph G
e_(ij)inR^(D) Feature vector for edge e_(ij)inE
Ain{0,1}^(N xx N) Adjacency matrix of graph G
D Textual information on graphs
d_(v_(i)) Text description associated with node v_(i)
d_(e_(ij)) Text description associated with edge e_(ij)
d_(G) Textual description associated with the entire graph
ZinR^(N xxD^(')) Learned node representations
z_(i)inR^(D^(')) Learned representation of node v_(i)inV
N_(v) Neighborhood set of node v inV
T Set of augmentation functions
W,Theta,w,theta Learnable parameters of the model
t∼T A specific augmentation function sampled from T
* Cardinality of a set
1 Concatenation operator
GNN(•) Graph Neural Network (GNN) encoder
LLM(*) Large Language Model (LLM) encoder| Symbol | Description |
| :--- | :--- |
| $\mathcal{G}$ | A graph |
| $\mathcal{V}, \mathcal{E}$ | Sets of nodes and edges in graph $\mathcal{G}$ |
| $N, M$ | Number of nodes and edges |
| $v_{i} \in \mathcal{V}$ | A node in graph $\mathcal{G}$ |
| $e_{i j} \in \mathcal{E}$ | An edge in graph $\mathcal{G}$ |
| $\mathbf{X} \in \mathbb{R}^{N \times D}$ | Node attribute matrix for graph $\mathcal{G}$ |
| $\mathrm{x}_{i} \in \mathbb{R}^{D}$ | Feature vector for node $v_{i} \in \mathcal{V}$ |
| $\mathbf{E} \in \mathbb{R}^{M \times D}$ | Edge attribute matrix for graph $\mathcal{G}$ |
| $\mathbf{e}_{i j} \in \mathbb{R}^{D}$ | Feature vector for edge $e_{i j} \in \mathcal{E}$ |
| $\mathbf{A} \in\{0,1\}^{N \times N}$ | Adjacency matrix of graph $\mathcal{G}$ |
| D | Textual information on graphs |
| $\mathbf{d}_{v_{i}}$ | Text description associated with node $v_{i}$ |
| $\mathbf{d}_{e_{i j}}$ | Text description associated with edge $e_{i j}$ |
| $\mathbf{d}_{\mathcal{G}}$ | Textual description associated with the entire graph |
| $\mathbf{Z} \in \mathbb{R}^{N \times D^{\prime}}$ | Learned node representations |
| $\mathbf{z}_{i} \in \mathbb{R}^{D^{\prime}}$ | Learned representation of node $v_{i} \in \mathcal{V}$ |
| $\mathcal{N}_{v}$ | Neighborhood set of node $v \in \mathcal{V}$ |
| $\mathcal{T}$ | Set of augmentation functions |
| $\mathbf{W}, \boldsymbol{\Theta}, w, \theta$ | Learnable parameters of the model |
| $t \sim \mathcal{T}$ | A specific augmentation function sampled from $\mathcal{T}$ |
| $\cdot$ | Cardinality of a set |
| 1 | Concatenation operator |
| GNN(•) | Graph Neural Network (GNN) encoder |
| $\operatorname{LLM}(\cdot)$ | Large Language Model (LLM) encoder |
associated with nodes, edges, or the entire graph. Specifically, the textual description linked to a node v_(i)v_{i} is denoted as d_(v_(i))\mathbf{d}_{v_{i}}, while d_(e_(ij))\mathbf{d}_{e_{i j}} corresponds to the textual description of edge e_(ij)e_{i j}. Additionally, d_(G)\mathbf{d}_{\mathcal{G}} represents the textual information describing the entire graph G\mathcal{G}. 与节点、边或整个图形相关联。具体来说,链接到节点的文本描述 v_(i)v_{i} 表示为 d_(v_(i))\mathbf{d}_{v_{i}} ,而 d_(e_(ij))\mathbf{d}_{e_{i j}} 对应于 边 e_(ij)e_{i j} 的文本描述。此外, d_(G)\mathbf{d}_{\mathcal{G}} 表示描述整个图形的文本信息 G\mathcal{G} 。
Definition 2.4 (Graph Neural Network (GNN)). Graph neural networks are a class of neural architectures specifically designed to operate on graph-structured data. Given an attributed graph G=(X,A),GNNs\mathcal{G}=(\mathbf{X}, \mathbf{A}), G N N s learn node, edge, or graph-level representations by recursively aggregating and transforming information from local neighborhoods. GNNs capture the relational and topological structure of graphs and are widely used in tasks such as node classification, link prediction, and graph classification. 定义 2.4(图神经网络 (GNN))。图神经网络是一类专门设计用于对图结构数据进行作的神经架构。给定一个属性图,通过递归聚合和转换来自局部邻域的信息来 G=(X,A),GNNs\mathcal{G}=(\mathbf{X}, \mathbf{A}), G N N s 学习节点、边或图级表示。GNN 捕获图的关系和拓扑结构,广泛应用于节点分类、链路预测和图分类等任务。
Definition 2.5 (Large Language Model (LLM)). Large language models are deep neural architectures, typically based on the Transformer framework, that are pretrained on massive textual corpora to learn general-purpose language representations. Given an input text sequence or query q,LLM(*)q, \mathrm{LLM}(\cdot) produces a contextual output, often modeled via next-token prediction. LLMs exhibit strong capabilities in text understanding, generation, and reasoning, and support a wide range of downstream applications without requiring explicit fine-tuning. When applied to graph-structured data, LLMs can process text-attributed graphs by leveraging associated textual information for structure-aware inference. 定义 2.5(大型语言模型 (LLM))。大型语言模型是深度神经架构,通常基于 Transformer 框架,在海量文本语料库上进行预训练以学习通用语言表示。给定输入文本序列或查询 q,LLM(*)q, \mathrm{LLM}(\cdot) 会产生上下文输出,通常通过下一个标记预测进行建模。LLM 在文本理解、生成和推理方面表现出强大的能力,并支持广泛的下游应用程序,而无需显式微调。当应用于图结构数据时,法学硕士可以通过利用相关的文本信息进行结构感知推理来处理文本属性的图。
Definition 2.6 (Graph Foundation Model (GFM)). Graph foundation models are a class of large-scale models pretrained on extensive cross-domain and cross-task graph datasets. Through pretraining, GFMs acquire transferable knowledge and general-purpose capabilities, demonstrating emergent properties and adaptability across diverse graph-based applications. These include molecular property prediction, recommender systems, social network analysis, and anomaly detection, where GFMs effectively leverage structural and relational information to enhance predictive performance. 定义 2.6(图基础模型 (GFM))。图基础模型是一类在广泛的跨领域和跨任务图数据集上预训练的大型模型。通过预训练,GFM 获得了可转移的知识和通用能力,在各种基于图的应用程序中展示出新兴属性和适应性。其中包括分子属性预测、推荐系统、社交网络分析和异常检测,其中 GFM 有效地利用结构和关系信息来增强预测性能。
3 Challenges in Designing Graph Foundation Models 设计图基础模型的 3 个挑战
Graph datasets inherently capture complex real-world phenomena through rich and diverse relational structures. Due to the heterogeneous nature of graph-structured data across domains, designing a single, unified model that generalizes well to various graphs remains a substantial challenge. GFMs are designed to overcome these barriers by learning transferable and adaptable representations across diverse graph settings. In this section, we highlight three fundamental challenges in designing GFMs: feature heterogeneity, structure heterogeneity, and task heterogeneity. These challenges collectively encapsulate the difficulty of building a universally applicable model across diverse graph datasets and learning scenarios. 图数据集本质上通过丰富多样的关系结构捕捉复杂的现实世界现象。由于跨域的图结构数据具有异构性,设计一个能够很好地推广到各种图的单一统一模型仍然是一项重大挑战。GFM 旨在通过学习跨不同图设置的可转移和适应性表示来克服这些障碍。在本节中,我们重点介绍了设计 GFM 的三个基本挑战:特征异质性、结构异质性和任务异质性。这些挑战共同概括了跨不同图数据集和学习场景构建普遍适用的模型的困难。
Addressing feature, structure, and task heterogeneity is central to the development of graph foundation models. We provide a conceptual analysis of each challenge and its root causes below. Strategies to resolve these issues are discussed in depth in Sections 5, 6, and 7. 解决特征、结构和任务异构性是图基础模型开发的核心。我们在下面对每个挑战及其根本原因进行了概念分析。第 5、6 和 7 节将深入讨论解决这些问题的策略。
3.1 Feature Heterogeneity 3.1 特征异构性
Feature heterogeneity refers to differences in node, edge, or graph-level features across datasets. This challenge arises from two primary sources: (1) domain-specific differences and (2) preprocessing inconsistencies, as shown in Figure 3(a). Domain-specific differences occur because graphs from different fields encode different 特征异构性是指数据集之间节点级、边缘级或图级特征的差异。这一挑战源于两个主要来源:(1)特定领域的差异和(2)预处理不一致,如图 3(a)所示。出现特定于域的差异是因为来自不同字段的图形编码不同的
Structure heterogeneity refers to the differences in topological patterns across graph datasets. These differences significantly impact model performance, as graph structure plays a crucial role in downstream reasoning tasks. For example, social and citation networks often exhibit localized dependencies and motifs such as star-shaped hubs and triangles that capture popularity and community structures. In contrast, molecular graphs exhibit long-range dependencies, characterized by ring structures and kk-cliques that encode chemical substructures and interactions. These structural patterns vary widely across domains, complicating the development of a GNN model that generalizes effectively across graph types. Increasing model depth or capacity is not a sufficient solution, as it introduces problems such as over-smoothing [67] and over-squashing [68], both of which hinder the propagation of discriminative signals. Moreover, the locality bias inherent in traditional message-passing GNNs limits their ability to capture global structures [69]. 结构异质性是指图数据集之间拓扑模式的差异。这些差异显着影响模型性能,因为图结构在下游推理任务中起着至关重要的作用。例如,社交和引文网络通常表现出局部依赖性和主题,例如捕捉受欢迎程度和社区结构的星形中心和三角形。相比之下,分子图表现出长程依赖性,其特征是编码化学子结构和相互作用的环结构和 kk 谱系。这些结构模式在不同领域差异很大,使得跨图类型有效泛化的 GNN 模型的开发变得复杂。增加模型深度或容量并不是一个足够的解决方案,因为它会带来过度平滑[67]和过度挤压[68]等问题,这两者都会阻碍判别信号的传播。此外,传统消息传递 GNN 固有的局部性偏差限制了它们捕获全局结构的能力[69]。
Remedy. To improve structural adaptability, several techniques have been proposed, including structure-aware data augmentation [70], graph prompt tuning [71, 72], and discrete structural codebooks [23, 73]. However, current approaches still struggle to adapt to the full spectrum of graph structures found in real-world data. 补救。为了提高结构适应性,人们提出了几种技术,包括结构感知数据增强[70]、图提示调整[71,72]和离散结构码本[23,73]。然而,当前的方法仍然难以适应现实世界数据中发现的全方位图结构。
3.3 Task Heterogeneity 3.3 任务异构性
Task heterogeneity captures the diversity of learning objectives in graph-based tasks, each requiring different modeling strategies and inductive biases. Unlike in natural language processing, where many tasks can be reformulated as question-answering, graph tasks vary substantially in formulation and underlying assumptions. Node-level tasks aim to classify individual nodes using information from their local neighborhoods. Success in these tasks often depends on modeling homophily or heterophily in node interactions. Link-level tasks focus on predicting the presence or type of edges between node pairs. These tasks often rely on proximity metrics, such as common neighbors or shortest paths, to infer relational patterns. Graph-level tasks require holistic understanding of the entire graph, necessitating the extraction of subgraph motifs and global dependencies. Additionally, more complex domain-specific tasks, such as reasoning over knowledge graphs or molecule generation, introduce unique modeling requirements, further complicating generalization across task types. 任务异质性捕获了基于图的任务中学习目标的多样性,每个任务都需要不同的建模策略和归纳偏差。与自然语言处理不同,在自然语言处理中,许多任务可以重新表述为问答,而图任务在表述和基本假设方面差异很大。节点级任务旨在使用来自当地社区的信息对各个节点进行分类。这些任务的成功通常取决于节点交互中的同质或异质建模。链路级任务侧重于预测节点对之间边缘的存在或类型。这些任务通常依赖于邻近度指标(例如公共邻居或最短路径)来推断关系模式。图级任务需要对整个图有全面的理解,需要提取子图主题和全局依赖关系。此外,更复杂的特定领域任务,例如知识图谱推理或分子生成,引入了独特的建模要求,使跨任务类型的泛化进一步复杂化。
Remedy. Tackling task heterogeneity involves either aligning tasks explicitly or developing task-agnostic methods. Explicit alignment strategies reformulate diverse tasks into a unified objective, such as link prediction [74], subgraph classification [22, 75], or tree classification [23, 76]. Implicit alignment methods aim to learn generalizable representations across tasks without relying on task-specific reformulations [77]. Although progress has been made, a universal GFM capable of seamlessly adapting to varied graph tasks remains a challenging and largely unsolved goal. 补救。解决任务异构性涉及明确调整任务或开发与任务无关的方法。显式对齐策略将不同的任务重新表述为一个统一的目标,例如链接预测[74]、子图分类[22,75]或树分类[23,76]。隐式对齐方法旨在学习跨任务的可推广表示,而不依赖特定于任务的重新表述[77]。尽管已经取得了进展,但能够无缝适应各种图任务的通用 GFM 仍然是一个具有挑战性且基本上未解决的目标。
4 A Unified Framework of Graph Foundation Models 4 图基础模型的统一框架
4.1 Unified Framework 4.1 统一框架
Traditional graph learning models typically operate under a task-specific, end-to-end paradigm. A single graph dataset is provided as input, and a graph neural network is trained directly to solve a designated downstream task, such as node classification, link prediction, or graph classification. In contrast, graph foundation models embrace a paradigm shift: rather than being narrowly optimized for a specific task or dataset, they are pretrained on a diverse collection of graph datasets and subsequently adapted to a broad spectrum of downstream scenarios. Figure 1 illustrates the fundamental distinctions between traditional GNNs and GFMs. Central to this new paradigm is the principle of “pretrain-then-adapt”. Given a large-scale graph database D_(pt)\mathcal{D}_{p t} as the source of pretraining data, and a parameterized model backbone theta\theta (which may be a GNN, Transformer, or even a language model), the goal is to learn generalized graph representations by minimizing a pretraining objective: 传统的图学习模型通常在特定于任务的端到端范式下运行。提供单个图数据集作为输入,直接训练图神经网络,解决指定的下游任务,如节点分类、链路预测或图分类。相比之下,图基础模型采用范式转变:它们不是针对特定任务或数据集进行狭隘优化,而是在各种图数据集集合上进行预训练,随后适应广泛的下游场景。图 1 说明了传统 GNN 和 GFM 之间的根本区别。这种新范式的核心是“预训练然后适应”的原则。给定一个大规模的图数据库 D_(pt)\mathcal{D}_{p t} 作为预训练数据的来源,以及一个参数化的模型主干 theta\theta (可能是 GNN、Transformer,甚至是语言模型),目标是通过最小化预训练目标来学习广义图表示:
Once pretraining is complete, the model parameters theta^(**)\theta^{*} encapsulate transferable knowledge about graph structure, semantics, and dynamics. This pretrained model can then be directly applied to downstream tasks, or further adapted for improved performance. To enhance task-specific generalization, an additional adaptation stage can be employed. Given downstream data D_("adapt ")\mathcal{D}_{\text {adapt }} relevant to a target task or domain, the pretrained model parameters theta^(**)\theta^{*} are fine-tuned by minimizing an adaptation loss: 预训练完成后,模型参数 theta^(**)\theta^{*} 将封装有关图结构、语义和动力学的可转移知识。然后,该预训练模型可以直接应用于下游任务,或进一步调整以提高性能。为了增强特定于任务的泛化,可以采用额外的适应阶段。给定与目标任务或领域相关的下游数据 D_("adapt ")\mathcal{D}_{\text {adapt }} ,通过最小化适应损失来微调预训练模型参数 theta^(**)\theta^{*} :
This unified framework-comprising a backbone architecture, a pretraining strategy, and an adaptation mechanism-forms the core of GFM methodology. We describe each of these components in detail below: 这个统一的框架——包括一个骨干架构、一个预训练策略和一个适应机制——构成了 GFM 方法的核心。我们在下面详细描述这些组件中的每一个:
Backbone: The backbone refers to the foundational architecture responsible for processing graph-structured data and learning meaningful representations. It defines how nodes, edges, and global contexts are encoded, integrating structural dependencies and attribute information. Effective backbones can range from GNNs and LLMs to even hybrid architectures. A well-designed backbone is crucial for enabling scalability, multimodal integration, and cross-domain generalization in GFMs. 主干网:主干网是指负责处理图结构数据和学习有意义的表示的基础架构。它定义了节点、边缘和全局上下文的编码方式,集成了结构依赖关系和属性信息。有效的主干网范围可以从 GNN 和 LLM 到混合架构。精心设计的主干网对于实现 GFM 中的可扩展性、多模态集成和跨域泛化至关重要。
Pretraining: Pretraining is the stage where the model learns general-purpose graph representations from large-scale, often unlabeled, graph corpora. This is typically achieved via self-supervised objectives such as contrastive learning, graph masking, or predictive modeling of structural and semantic properties. The goal is to endow the model with a rich understanding of universal graph patterns, fostering transferability, robustness, and data efficiency. A powerful pretraining scheme lays the groundwork for zero-shot and few-shot generalization across a multitude of graph tasks and domains. 预训练:预训练是模型从大规模的、通常未标记的图语料库中学习通用图表示的阶段。这通常是通过自监督目标来实现的,例如对比学习、图掩蔽或结构和语义属性的预测建模。目标是赋予模型对通用图模式的丰富理解,促进可转移性、稳健性和数据效率。强大的预训练方案为跨多个图任务和域的零样本和少量泛化奠定了基础。
Adaptation: Adaptation refers to the process of aligning the pretrained model with specific downstream tasks or domains. This can involve fine-tuning all or part of the model parameters, employing lightweight tuning methods, or appending task-specific prediction heads. The adaptation phase ensures that the generalized knowledge from pretraining is effectively leveraged for specialized applications. When designed properly, adaptation enhances task performance, reduces data requirements, and facilitates rapid deployment across different graph scenarios. 适应:适应是指将预训练模型与特定下游任务或领域对齐的过程。这可能涉及微调全部或部分模型参数、采用轻量级调整方法或附加特定于任务的预测头。适应阶段确保预训练中的通用知识被有效地用于专业应用。如果设计得当,适配可以增强任务性能,减少数据需求,并有助于跨不同图场景的快速部署。
In subsequent sections, we delve deeper into each component of this framework, surveying representative methods and highlighting emerging trends that shape the design and capabilities of graph foundation models. 在随后的部分中,我们将更深入地研究该框架的每个组成部分,调查具有代表性的方法并强调塑造图基础模型的设计和功能的新兴趋势。
4.2 Backbone Architectures 4.2 主干架构
The backbone architectures of GFMs are designed to integrate both structural and semantic information by leveraging the representational capabilities of graph-based models and large language models. Graph models—such as Graph Neural Networks (GNNs) and Graph Transformers—are particularly adept at capturing GFM 的主干架构旨在通过利用基于图的模型和大型语言模型的表示能力来集成结构和语义信息。图模型(例如图神经网络 (GNN) 和图转换器)特别擅长捕获
Graph models play a central role in learning from graph-structured data, where nodes represent entities and edges define relational dependencies. Unlike traditional neural networks operating on Euclidean domains such as images or sequences, graph models extend deep learning paradigms to non-Euclidean spaces, enabling powerful representations across a diverse set of domains, including recommender systems [79], social networks [19, 80, 81], molecular property prediction [52], knowledge graphs [82, 83], and healthcare analytics [84]. 图模型在从图结构数据中学习中发挥着核心作用,其中节点代表实体,边定义关系依赖关系。与在图像或序列等欧几里得域上运行的传统神经网络不同,图模型将深度学习范式扩展到非欧几里得空间,从而在各种领域实现强大的表示,包括推荐系统[79]、社交网络[19,80,81]、分子属性预测[52]、知识图谱[82,83]和医疗保健分析[84]。
At the core of graph-based learning lies the message-passing paradigm, where node representations are iteratively updated by aggregating information from their local neighborhoods. Formally, given a graph G=(V,E)\mathcal{G}=(\mathcal{V}, \mathcal{E}) with node set V\mathcal{V} and edge set E\mathcal{E}, the update rule at the kk-th layer of a GNN is typically defined as: 基于图的学习的核心是消息传递范式,其中节点表示通过聚合来自其本地邻域的信息来迭代更新。正式地,给定一个 G=(V,E)\mathcal{G}=(\mathcal{V}, \mathcal{E}) 具有节点集 V\mathcal{V} 和边集 E\mathcal{E} 的图,GNN 第 kk - 层的更新规则通常定义为:
h_(v)^((k))=UPDATE(h_(v)^((k-1)),AGGREGATE({h_(u)^((k-1)):u inN(v)})),\mathbf{h}_{v}^{(k)}=\operatorname{UPDATE}\left(\mathbf{h}_{v}^{(k-1)}, \operatorname{AGGREGATE}\left(\left\{\mathbf{h}_{u}^{(k-1)}: u \in \mathcal{N}(v)\right\}\right)\right),
where h_(v)^((k))\mathbf{h}_{v}^{(k)} denotes the feature embedding of node vv at layer k,N(v)k, \mathcal{N}(v) is the set of its neighbors, and the AGGREGATE and UPDATE functions define the information propagation and transformation. 其中 h_(v)^((k))\mathbf{h}_{v}^{(k)} ,表示节点在层 vvk,N(v)k, \mathcal{N}(v) 处的特征嵌入是其相邻节点的集合,AGGREGATE 和 UPDATE 函数定义了信息的传播和转换。
Different graph models instantiate these functions in unique ways. For instance, in Graph Convolutional Networks (GCNs) [19], aggregation is performed via normalized summation: h_(v)^((k))=\mathbf{h}_{v}^{(k)}=sigma(W^((k))*(1)/(|N(v)|)sum_(u inN(v)uu{v})h_(u)^((k-1)))\sigma\left(\mathbf{W}^{(k)} \cdot \frac{1}{|\mathcal{N}(v)|} \sum_{u \in \mathcal{N}(v) \cup\{v\}} \mathbf{h}_{u}^{(k-1)}\right), where W^((k))\mathbf{W}^{(k)} is a learnable weight matrix and sigma\sigma is a non-linear activation. Graph Attention Networks (GATs) [20], on the other hand, use attention mechanisms to assign importance scores alpha_(vu)\alpha_{v u} to neighboring nodes: h_(v)^((k))=sigma(sum_(u inN(v))alpha_(vu)*W^((k))h_(u)^((k-1)))\mathbf{h}_{v}^{(k)}=\sigma\left(\sum_{u \in \mathcal{N}(v)} \alpha_{v u} \cdot \mathbf{W}^{(k)} \mathbf{h}_{u}^{(k-1)}\right), where alpha_(vu)\alpha_{v u} is computed via a self-attention mechanism, allowing nodes to dynamically focus on the most relevant neighbors. 不同的图模型以独特的方式实例化这些函数。例如,在图卷积网络(GCN)[19]中,聚合是通过归一化求和进行的: h_(v)^((k))=\mathbf{h}_{v}^{(k)}=sigma(W^((k))*(1)/(|N(v)|)sum_(u inN(v)uu{v})h_(u)^((k-1)))\sigma\left(\mathbf{W}^{(k)} \cdot \frac{1}{|\mathcal{N}(v)|} \sum_{u \in \mathcal{N}(v) \cup\{v\}} \mathbf{h}_{u}^{(k-1)}\right) ,其中 W^((k))\mathbf{W}^{(k)} 是可学习权重矩阵, sigma\sigma 是非线性激活。另一方面,图注意力网络(GATs)[20]使用注意力机制为相邻节点 alpha_(vu)\alpha_{v u} 分配重要性分数: h_(v)^((k))=sigma(sum_(u inN(v))alpha_(vu)*W^((k))h_(u)^((k-1)))\mathbf{h}_{v}^{(k)}=\sigma\left(\sum_{u \in \mathcal{N}(v)} \alpha_{v u} \cdot \mathbf{W}^{(k)} \mathbf{h}_{u}^{(k-1)}\right) ,其中 alpha_(vu)\alpha_{v u} 通过自注意力机制计算,允许节点动态关注最相关的邻居。
Recent advances have pushed the expressive capacity of GNNs by incorporating global attention mechanisms inspired by Transformers [55]. Graph Transformers [85, 86, 87] replace local aggregation with attention over 最近的进展通过结合受 Transformers 启发的全局注意力机制来推动 GNN 的表达能力[55]。图转换器 [85, 86, 87] 用注意力替换局部聚合
all nodes in the graph, enabling the capture of long-range dependencies: 图中的所有节点,从而能够捕获远程依赖关系: h_(v)^((k))=sum_(u inV)alpha_(vu)*Wh_(u)^((k-1)),quadalpha_(vu)=(exp(phi(h_(v),h_(u))))/(sum_(w inV)exp(phi(h_(v),h_(w)))),\mathbf{h}_{v}^{(k)}=\sum_{u \in \mathcal{V}} \alpha_{v u} \cdot \mathbf{W} \mathbf{h}_{u}^{(k-1)}, \quad \alpha_{v u}=\frac{\exp \left(\phi\left(\mathbf{h}_{v}, \mathbf{h}_{u}\right)\right)}{\sum_{w \in \mathcal{V}} \exp \left(\phi\left(\mathbf{h}_{v}, \mathbf{h}_{w}\right)\right)},
where phi(*,*)\phi(\cdot, \cdot) denotes a learnable compatibility function (e.g., dot-product), and alpha_(vu)\alpha_{v u} reflects the attention weight assigned to node uu from the perspective of node vv. 其中 phi(*,*)\phi(\cdot, \cdot) 表示可学习的兼容性函数(例如,点积),并 alpha_(vu)\alpha_{v u} 从节点的角度反映了分配给节点 uu 的注意力权重 vv 。
Graph Models without Auxiliary Modules. Several GFMs adopt pure graph-based backbones, relying solely on structural signals without external models. These approaches capitalize on the inductive bias of GNNs for relational learning. For example, MINIMOL [88] introduces a parameter-efficient GNN for molecular property prediction, where atomic features x_(i)\mathbf{x}_{i} are encoded via a shared initialization function h_(i)^((0))=f_(theta)(x_(i))\mathbf{h}_{i}^{(0)}=f_{\theta}\left(\mathbf{x}_{i}\right). JMP [30] proposes a hierarchical representation scheme, with node embeddings updated using a degree-normalized formulation h_(i)^((t))=sigma(Wsum_(j inN_(i))(h_(j)^((t-1)))/(sqrt(d_(i)d_(j))))\mathbf{h}_{i}^{(t)}=\sigma\left(\mathbf{W} \sum_{j \in \mathcal{N}_{i}} \frac{\mathbf{h}_{j}^{(t-1)}}{\sqrt{d_{i} d_{j}}}\right), where d_(i)d_{i} and d_(j)d_{j} are the degrees of nodes ii and jj, respectively. These approaches demonstrate strong performance in domains with rich relational signals, though they may struggle in scenarios requiring multimodal reasoning or external contextual knowledge. 没有辅助模块的图形模型。一些 GFM 采用纯基于图的主干网,仅依赖结构信号,没有外部模型。这些方法利用 GNN 的归纳偏差进行关系学习。例如,MINIMOL [88]引入了一种用于分子性质预测的参数高效的 GNN,其中原子特征 x_(i)\mathbf{x}_{i} 通过共享初始化函数 h_(i)^((0))=f_(theta)(x_(i))\mathbf{h}_{i}^{(0)}=f_{\theta}\left(\mathbf{x}_{i}\right) 进行编码。JMP [30]提出了一种分层表示方案,其中 d_(i)d_{i} 和 d_(j)d_{j} 分别是节点的次数 iih_(i)^((t))=sigma(Wsum_(j inN_(i))(h_(j)^((t-1)))/(sqrt(d_(i)d_(j))))\mathbf{h}_{i}^{(t)}=\sigma\left(\mathbf{W} \sum_{j \in \mathcal{N}_{i}} \frac{\mathbf{h}_{j}^{(t-1)}}{\sqrt{d_{i} d_{j}}}\right) 和 jj ,更新了节点嵌入。这些方法在具有丰富关系信号的领域表现出强大的性能,尽管它们在需要多模态推理或外部上下文知识的场景中可能会遇到困难。
Graph Models with Auxiliary Language Models. To address the limitations of pure GNNs, several methods incorporate language models as auxiliary modules. These hybrid approaches enable the integration of unstructured textual knowledge into graph representation learning. Two primary strategies have emerged: (i) enhancing node features using LLMs prior to graph encoding, and (ii) employing LLMs as direct encoders over pure graph structure. In the first category, TAPE [89] introduces an LLM-to-LM interpreter that extracts task-specific textual explanations, which are then converted into structured node features. Specifically, given a textual explanation e_(v)\mathbf{e}_{v} generated by a language model, node embeddings are computed via h_(v)=f_(LM)(e_(v))\mathbf{h}_{v}=f_{\mathrm{LM}}\left(\mathbf{e}_{v}\right), where f_(LM)f_{\mathrm{LM}} is a frozen or fine-tuned language model. These embeddings are then used within the graph model to enhance learning. In the second strategy, LLMs act as standalone encoders of node-level text. For example, OFA [22] proposes a unified textual template to align different graph node descriptions, which are then encoded into a shared embedding space using a pretrained LM. This enables zero-shot generalization across domains by leveraging the linguistic alignment of node semantics. These auxiliary-enhanced graph models highlight the flexibility and effectiveness of hybrid architectures, capable of integrating structural and semantic signals in a unified learning framework. 具有辅助语言模型的图形模型。为了解决纯 GNN 的局限性,有几种方法将语言模型作为辅助模块。这些混合方法能够将非结构化文本知识集成到图表示学习中。已经出现了两种主要策略:(i) 在图编码之前使用 LLM 增强节点特征,以及 (ii) 使用 LLM 作为纯图结构的直接编码器。在第一类中,TAPE [89]引入了 LLM 到 LM 的解释器,该解释器提取特定于任务的文本解释,然后将其转换为结构化节点特征。具体来说,给定语言模型 e_(v)\mathbf{e}_{v} 生成的文本解释,节点嵌入是通过计算 h_(v)=f_(LM)(e_(v))\mathbf{h}_{v}=f_{\mathrm{LM}}\left(\mathbf{e}_{v}\right) 的,其中 f_(LM)f_{\mathrm{LM}} 是冻结或微调的语言模型。然后在图模型中使用这些嵌入来增强学习。在第二种策略中,LLM 充当节点级文本的独立编码器。例如,OFA [22]提出了一个统一的文本模板来对齐不同的图节点描述,然后使用预训练的 LM 将其编码到共享的嵌入空间中。这通过利用节点语义的语言对齐实现跨域的零样本泛化。这些辅助增强图模型突出了混合架构的灵活性和有效性,能够将结构和语义信号集成到一个统一的学习框架中。
4.2.2 Language Model as Predictor 4.2.2 语言模型作为预测器
Language models, initially developed for natural language processing tasks such as machine translation, summarization, and question answering [90], have recently found increasing utility in graph learning applications [43,41]. By leveraging large-scale pre-trained models [57,56,59,91,92,93,94,95], these approaches enable structured data to be encoded, reasoned over, and generalized through powerful language-driven representations. A key example is BERT [57], a bidirectional Transformer that optimizes a masked-token prediction (MTP) objective: 语言模型最初是为机器翻译、摘要和问答等自然语言处理任务而开发的[90],最近在图学习应用中的实用性越来越大[43\u201241]。通过利用大规模预训练模型[57,56,59,91,92,93,94,95],这些方法使结构化数据能够通过强大的语言驱动表示进行编码、推理和推广。一个关键的例子是 BERT [57],这是一个双向 Transformer,它优化了掩码标记预测(MTP)目标:
where SS is a sentence sampled from corpus D,s_(i)\mathcal{D}, s_{i} denotes a masked token, and N_(S)N_{S} is the sentence length. This objective promotes contextual representation learning across both left and right contexts. In contrast, autoregressive language models such as GPT-3 [56] use next-token prediction (NTP), predicting each token sequentially: 其中 SS 是从语料库 D,s_(i)\mathcal{D}, s_{i} 中采样的句子,表示被掩码的标记, N_(S)N_{S} 是句子长度。这一目标促进了左右语境的语境表征学习。相比之下,GPT-3 [56]等自回归语言模型使用下一个标记预测(NTP),按顺序预测每个标记:
Scaling model size and training data has led to emergent capabilities, including in-context learning [96], chain-of-thought reasoning [97], and zero-shot generalization [98]. Foundation models such as GPT [14], PaLM [58], and LLaMA [59, 91, 92] exhibit strong compositional reasoning and flexible adaptation, making them promising candidates for graph-based learning when appropriately adapted. 扩展模型大小和训练数据带来了新兴能力,包括上下文学习[96]、思维链推理[97]和零样本泛化[98]。GPT [14]、PaLM [58]和 LLaMA [59, 91, 92]等基础模型表现出强大的组合推理和灵活的适应能力,在适当调整的情况下,它们成为基于图的学习的有希望的候选者。
Language Models without Auxiliary Modules. Standalone language models can be used as graph predictors by transforming graph structures into sequential representations, allowing graph reasoning via text processing without relying on GNNs. This approach serializes the graph structure into a textual input, enabling LLMs to interpret node identities, features, and relationships. The transformation pipeline typically follows: 没有辅助模块的语言模型。独立语言模型可以通过将图结构转换为顺序表示来用作图预测器,从而允许通过文本处理进行图推理,而无需依赖 GNN。这种方法将图结构序列化为文本输入,使 LLM 能够解释节点身份、特征和关系。转换管道通常如下:
where Tokenize (*)(\cdot) encodes each node v_(i)v_{i}, its features x_(i)\mathbf{x}_{i}, and its neighborhood N(v_(i))\mathcal{N}\left(v_{i}\right) into a structured sequence. Language models then perform inference based solely on this textual input. LangGFM [99] exemplifies this method by converting graphs into natural language templates and demonstrating that LLMs can reason over structural patterns without explicit graph operations. BeyondText [100] further validates LLMs’ ability to recover graph topologies and relational dependencies through textual prompts. To optimize these strategies, GLM [101] adopts reinforcement learning to refine prompt engineering, tailoring input formats for graph-aware inference. Despite their flexibility, pure LLM-based methods can struggle to internalize structural inductive biases, particularly for graphs with rich topologies or weak textual signals. To address these challenges, recent work introduces structured tokenization, retrieval mechanisms, and task-specific tuning-leading toward hybrid VLM-style frameworks. 其中 Tokenize (*)(\cdot) 将每个节点 v_(i)v_{i} 、其特征 x_(i)\mathbf{x}_{i} 及其邻域编码 N(v_(i))\mathcal{N}\left(v_{i}\right) 为结构化序列。然后,语言模型仅根据此文本输入执行推理。LangGFM [99]通过将图转换为自然语言模板来举例说明这种方法,并证明法学硕士可以在没有显式图作的情况下推理结构模式。BeyondText [100]进一步验证了法学硕士通过文本提示恢复图拓扑和关系依赖关系的能力。为了优化这些策略,GLM [101]采用强化学习来完善提示工程,为图感知推理定制输入格式。尽管具有灵活性,但纯粹的基于法学硕士的方法可能难以内化结构归纳偏差,特别是对于具有丰富拓扑或文本信号较弱的图。为了应对这些挑战,最近的工作引入了结构化标记化、检索机制和特定于任务的调优,以引导混合 VLM 风格框架。
Language Models with Auxiliary Modules. VLM-style architectures enhance language models by incorporating explicit graph structure through cross-modal alignment, graph-aware tokenization, and auxiliary encoders. Unlike standalone LLM methods, these hybrid architectures use graph models to inform or condition LLMs, facilitating structure-aware language understanding. A common pipeline encodes graphs via a GNN and projects node embeddings into the LLM token space: 带有辅助模块的语言模型。VLM 风格的架构通过跨模态对齐、图感知标记化和辅助编码器合并显式图结构来增强语言模型。与独立的 LLM 方法不同,这些混合架构使用图模型来通知或调节 LLM,从而促进结构感知语言理解。一个常见的管道通过 GNN 对图表进行编码,并将节点嵌入投影到 LLM 代币空间中:
where z_(i)=rho(h_(i))\mathbf{z}_{i}=\rho\left(\mathbf{h}_{i}\right) are node embeddings transformed via a projector rho\rho, and p_(j)\mathbf{p}_{j} are trainable or handcrafted prompt tokens. This enables the LLM to process structural features in a token-compatible format. GraphGPT [102] adopts this approach, applying instruction tuning and contrastive learning to align graph and language representations. GraphTranslator [103] introduces structure-aware regularization to reduce semantic drift between modalities. LLaGA [78] further augments reasoning with a retrieval module that extracts informative subgraphs G_(s)subG\mathcal{G}_{s} \subset \mathcal{G} based on task relevance, ensuring the LLM focuses on salient relational patterns. Although these methods are often categorized as LLM-centric, they fundamentally rely on GNN components, and are therefore included in the GNN+LLM hybrid paradigm described in the following sections. 其中 z_(i)=rho(h_(i))\mathbf{z}_{i}=\rho\left(\mathbf{h}_{i}\right) 是通过投影仪 rho\rho 转换的节点嵌入,并且 p_(j)\mathbf{p}_{j} 是可训练或手工制作的提示令牌。这使得 LLM 能够以标记兼容的格式处理结构特征。GraphGPT [102]采用了这种方法,应用指令调整和对比学习来对齐图和语言表示。GraphTranslator [103]引入了结构感知正则化,以减少模态之间的语义漂移。LLaGA [78]通过检索模块进一步增强了推理能力,该模块根据任务相关性提取信息丰富的子图 G_(s)subG\mathcal{G}_{s} \subset \mathcal{G} ,确保 LLM 专注于显著的关系模式。尽管这些方法通常被归类为以 LLM 为中心,但它们从根本上依赖于 GNN 组件,因此包含在以下部分中描述的 GNN+LLM 混合范式中。
4.2.3 Graph-Language Co-Training 4.2.3 图语言协同训练
Graph-Language Co-Training offers a bidirectional learning framework that integrates graph structural modeling and language-based semantic reasoning. Unlike auxiliary-based approaches where one modality dominates, co-training treats graph and language models as co-equal components, encouraging mutual representation learning. Two major paradigms define this space: (i) Graph-Language Alignment, where shared latent spaces are enforced via contrastive learning, and (ii) Graph-Language Iterative Update, where graph and language representations are jointly optimized through multi-stage or variational objectives. 图语言协同训练提供了一个双向学习框架,集成了图结构建模和基于语言的语义推理。与一种模态占主导地位的基于辅助的方法不同,联合训练将图和语言模型视为共同平等的组成部分,鼓励相互表征学习。两个主要范式定义了这个空间:(i) 图语言对齐,其中共享潜在空间通过对比学习强制执行,以及 (ii) 图语言迭代更新,其中图和语言表示通过多阶段或变分目标共同优化。
Graph-Language Alignment. This approach maps structured and unstructured representations into a shared embedding space. Inspired by CLIP-like training from vision-language modeling [60], GraphCLIP [104] introduces a dual-encoder framework comprising a GNN encoder f^(GNN)f^{\mathrm{GNN}} and a language encoder f^(LLM)f^{\mathrm{LLM}}. Given a graph G=(V,E,X)\mathcal{G}=(\mathcal{V}, \mathcal{E}, \mathbf{X}), and corresponding node descriptions, representations are learned via contrastive loss: 图语言对齐。这种方法将结构化和非结构化表示映射到一个共享的嵌入空间中。受视觉语言建模[60]的类 CLIP 训练的启发,GraphCLIP [104]引入了一个双编码器框架,包括 GNN 编码器和 f^(GNN)f^{\mathrm{GNN}} 语言编码器 f^(LLM)f^{\mathrm{LLM}} 。给定一个图 G=(V,E,X)\mathcal{G}=(\mathcal{V}, \mathcal{E}, \mathbf{X}) 和相应的节点描述,表示是通过对比损失来学习的:
where sim(*,*)\operatorname{sim}(\cdot, \cdot) denotes similarity (e.g., cosine), P\mathcal{P} is the set of aligned node-text pairs, and tau\tau is a temperature parameter. ConGraT [105] extends this idea with dual encoders trained via cross-modal supervision, where the GNN reinforces text-derived features. These methods improve generalization across graph-text pairs by unifying their representation spaces. 其中 sim(*,*)\operatorname{sim}(\cdot, \cdot) 表示相似性(例如余弦), P\mathcal{P} 是对齐的节点-文本对的集合, tau\tau 并且是一个温度参数。ConGraT [105]通过跨模态监督训练的双编码器扩展了这一思想,其中 GNN 强化了文本衍生的特征。这些方法通过统一图形文本对的表示空间来改进图形文本对的泛化。
Graph-Language Iterative Update. Beyond contrastive alignment, iterative co-training enforces dynamic interaction between graph and language models through successive refinement. GLEM [106] exemplifies this by introducing a latent variable z_(v)\mathbf{z}_{v} representing the shared semantics of textual and structural inputs. The generative process is modeled as: 图语言迭代更新。除了对比对齐之外,迭代协同训练还通过连续细化来强制图和语言模型之间的动态交互。GLEM [106]通过引入一个表示文本和结构输入的共享语义的潜在变量 z_(v)\mathbf{z}_{v} 来举例说明这一点。生成过程建模为:
where z_(v)\mathbf{z}_{v} serves as a bridge for integrating modalities. The model alternates between updating semantic features via textual pseudo-labeling and refining topological embeddings through graph propagation, following an EM-style optimization. Such iterative mechanisms offer more expressive fusion of modalities, enabling the co-evolution of graph and language representations and enhancing performance on text-attributed graphs. 其中 z_(v)\mathbf{z}_{v} 充当整合模式的桥梁。该模型在通过文本伪标记更新语义特征和通过图传播完善拓扑嵌入之间交替进行,遵循 EM 风格的优化。这种迭代机制提供了更具表现力的模态融合,实现了图和语言表示的共同进化,并增强了文本归因图的性能。
4.2.4 Discussion 4.2.4 讨论
The backbone architectures of Graph Foundation Models can be broadly classified into three categories: Graph Models, Language Models, and Hybrid Models. In summary, graph-based backbones offer structural fidelity and efficiency; language-based backbones provide generalization and multimodal flexibility; and hybrid models enable comprehensive reasoning but demand more sophisticated training pipelines. 图基础模型的骨干架构大致可分为三类:图模型、语言模型和混合模型。总之,基于图的主干网提供结构保真度和效率;基于语言的主干网提供泛化和多模态灵活性;混合模型支持全面的推理,但需要更复杂的训练管道。
Graph-Based Backbones (e.g., GNNs, Graph Transformers) are inherently aligned with the structure of graph data. By explicitly modeling node neighborhoods through message-passing or attention-based aggregation, these architectures preserve local connectivity and relational information. They are highly effective for tasks such as node classification, link prediction, and graph classification, where structural dependencies are critical. Furthermore, these models tend to be more parameter-efficient when compared to language-based counterparts. However, they face several limitations: first, their inductive bias toward locality makes it difficult to capture long-range dependencies; second, integrating textual or multimodal information remains non-trivial, often requiring auxiliary encoders or ad-hoc fusion mechanisms. 基于图的主干网(例如 GNN、图转换器)本质上与图数据的结构保持一致。通过消息传递或基于注意力的聚合对节点邻域进行显式建模,这些架构保留了本地连接和关系信息。它们对于节点分类、链路预测和图分类等结构依赖关系至关重要的任务非常有效。此外,与基于语言的模型相比,这些模型往往具有更高的参数效率。然而,它们面临着几个局限性:首先,它们对局部性的归纳偏差使得难以捕获远程依赖关系;其次,集成文本或多模态信息仍然并非易事,通常需要辅助编码器或临时融合机制。
Language-Based Backbones leverage LLMs by encoding graph components-such as node attributes, edge descriptions, or subgraph patterns-as natural language. This graph-to-text formulation enables the transfer of powerful language understanding capabilities to graph learning tasks. Language-based models are especially useful when graphs are enriched with textual metadata, such as in knowledge graphs, social networks, or biomedical corpora. Their generalization ability also makes them suitable for zero-shot and few-shot learning scenarios. Nevertheless, these models lack explicit structural inductive biases and rely on sequential representations, which can obscure topological nuances and lead to suboptimal performance in tasks requiring precise relational reasoning. 基于语言的骨干网通过将图组件(例如节点属性、边缘描述或子图模式)编码为自然语言来利用 LLM。这种图形到文本的表述能够将强大的语言理解能力转移到图形学习任务中。当图形富含文本元数据时,例如知识图谱、社交网络或生物医学语料库,基于语言的模型特别有用。它们的泛化能力也使其适用于零样本和少样本学习场景。然而,这些模型缺乏明确的结构归纳偏差并依赖于顺序表示,这可能会掩盖拓扑的细微差别,并导致在需要精确关系推理的任务中表现不佳。
Hybrid Backbones integrate graph-based and language-based architectures, aiming to combine the best of both worlds. These models typically involve dual encoders (e.g., GNNs and LLMs) that interact through co-training, cross-modal attention, or alignment objectives. Hybrid approaches have shown impressive performance across diverse tasks by jointly modeling structure and semantics. For instance, graph encoders can capture the connectivity skeleton, while LLMs encode descriptive context or domain-specific knowledge. However, this synergy comes at a cost: hybrid models often require complex architecture design, increased memory and compute overhead, and careful pretraining or alignment strategies to avoid modality collapse or overfitting. 混合骨干网集成了基于图和基于语言的架构,旨在结合两全其美。这些模型通常涉及双编码器(例如,GNN 和 LLM),它们通过共同训练、跨模态注意力或对齐目标进行交互。混合方法通过联合建模结构和语义,在不同的任务中显示出令人印象深刻的性能。例如,图形编码器可以捕获连接骨架,而法学硕士则编码描述性上下文或特定领域的知识。然而,这种协同作用是有代价的:混合模型通常需要复杂的架构设计、增加的内存和计算开销,以及仔细的预训练或对齐策略,以避免模态崩溃或过度拟合。
4.3 Pretraining Strategies 4.3 训练前策略
Pretraining serves as a foundational step in the development of GFMs, enabling them to acquire transferable knowledge from large-scale unlabeled or weakly-labeled data. In this section, we provide a comprehensive overview of key pretraining paradigms for GFMs, including supervised pretraining, generative pretraining, contrastive pretraining, as illustrated in Figure 5. For detailed information, we recommend to read the awesome surveys [54, 107]. 预训练是 GFM 开发的基础步骤,使它们能够从大规模未标记或弱标记的数据中获取可转移的知识。在本节中,我们全面概述了 GFM 的关键预训练范式,包括监督预训练、生成预训练、对比预训练,如图 5 所示。有关详细信息,我们建议阅读精彩的调查 [54, 107]。
Supervised pretraining is a strategy that leverages labeled graph data to guide the model pretraining. Unlike self-supervised methods that rely solely on intrinsic graph signals, supervised pretraining explicitly optimizes the model to predict known labels, encouraging it to learn task-relevant representations from the outset. Formally, let G_(pt)\mathcal{G}_{\mathrm{pt}} denote a large-scale pretraining graph with associated supervision Y\mathcal{Y}, which may correspond to node-level, edge-level, or graph-level labels. The model is trained to minimize a supervised loss function over these labeled graph instances: 监督预训练是一种利用标记图数据来指导模型预训练的策略。与仅依赖内在图信号的自监督方法不同,监督预训练显式优化模型以预测已知标签,鼓励其从一开始就学习与任务相关的表示。形式上,let G_(pt)\mathcal{G}_{\mathrm{pt}} 表示具有相关监督 Y\mathcal{Y} 的大规模预训练图,它可能对应于节点级、边缘级或图级标签。该模型经过训练,可最小化这些标记图形实例上的监督损失函数:
where L_("pt ")\mathcal{L}_{\text {pt }} is typically a task-specific loss, such as cross-entropy for classification and mean-average error for regression. A core benefit of supervised pretraining is its direct alignment with downstream objectives, which often results in faster convergence and improved performance on similar tasks. Representative methods following this paradigm include OFA [22], which introduces a unified task formulation using nodes-of-interest to standardize various graph prediction tasks, and Prodigy [108], which generates subgraph tasks through structural and semantic decomposition. These approaches illustrate the flexibility of supervised pretraining in capturing both local and global signals across graph domains. 其中 L_("pt ")\mathcal{L}_{\text {pt }} 通常是特定于任务的损失,例如分类的交叉熵和回归的平均误差。监督预训练的一个核心好处是它与下游目标直接保持一致,这通常会导致更快的收敛并提高类似任务的性能。遵循这种范式的代表性方法包括 OFA [22],它引入了使用感兴趣节点来标准化各种图预测任务的统一任务表述,以及 Prodigy [108],它通过结构和语义分解生成子图任务。这些方法说明了监督预训练在跨图域捕获局部和全局信号方面的灵活性。
Despite its effectiveness, supervised pretraining is often limited by the availability and cost of acquiring large-scale, high-quality labels [107]. As a result, it is frequently complemented by self-supervised learning to enhance generalization and reduce reliance on labeled data. 尽管监督预训练有效,但往往受到获取大规模、高质量标签的可用性和成本的限制[107]。因此,它经常辅以自监督学习,以增强泛化并减少对标记数据的依赖。
4.3.2 Generative Pretraining 4.3.2 生成式预训练
Generative pretraining is a foundational learning paradigm in foundation models. The core philosophy behind generative pretraining is to learn universal representations by training models to predict or generate data in its raw form without requiring task-specific labels. This approach assumes that by modeling the data distribution itself, the model acquires broad, transferable knowledge that can be adapted to a wide range of downstream tasks via fine-tuning or prompting. Formally, generative pretraining aims to optimize the likelihood of observed 生成式预训练是基础模型中的一种基础学习范式。生成式预训练背后的核心理念是通过训练模型来学习通用表示,以预测或生成原始形式的数据,而无需特定于任务的标签。这种方法假设通过对数据分布本身进行建模,模型获得了广泛的、可转移的知识,这些知识可以通过微调或提示来适应广泛的下游任务。形式上,生成式预训练旨在优化观察到的可能性
data D={x_(1),x_(2),dots,x_(N)}\mathcal{D}=\left\{x_{1}, x_{2}, \ldots, x_{N}\right\} under a parameterized model f_(theta):f_{\theta}: 参数化模型 f_(theta):f_{\theta}: 下的数据 D={x_(1),x_(2),dots,x_(N)}\mathcal{D}=\left\{x_{1}, x_{2}, \ldots, x_{N}\right\}
where x_(i)x_{i} represents an individual data sample and p(x_(i);theta)p\left(x_{i} ; \theta\right) denotes the estimated probability of generating x_(i)x_{i}. In practice, this objective is often instantiated through auto-regressive or auto-encoder objectives, as seen in models like GPT [56] and BERT [57]. We discuss these two approaches in the following. 其中 x_(i)x_{i} 表示单个数据样本,表示 p(x_(i);theta)p\left(x_{i} ; \theta\right) 生成 x_(i)x_{i} 的估计概率。在实践中,该目标通常通过自回归或自动编码器目标来实例化,如 GPT [56]和 BERT [57]等模型所示。我们将在下文中讨论这两种方法。
Auto-Regressive Generation. Autoregressive modeling is a foundational paradigm in generative learning, widely applied in domains such as natural language processing [56] and computer vision [109]. In this framework, the model learns a joint probability distribution over sequential variables by decomposing it into a product of conditional probabilities. Formally, for a sequence of variables x=(x_(1),x_(2),dots,x_(T))\mathbf{x}=\left(x_{1}, x_{2}, \ldots, x_{T}\right), an autoregressive model parameterized by theta\theta defines the joint likelihood as: 自动回归生成。自回归建模是生成式学习的基础范式,广泛应用于自然语言处理[56]和计算机视觉[109]等领域。在这个框架中,模型通过将顺序变量分解为条件概率的乘积来学习序列变量的联合概率分布。形式上,对于一系列变量 x=(x_(1),x_(2),dots,x_(T))\mathbf{x}=\left(x_{1}, x_{2}, \ldots, x_{T}\right) ,参数化为 的 theta\theta 自回归模型将联合似然定义为:
where x_( < t)\mathbf{x}_{<t} denotes all preceding elements in the sequence. This formulation aligns with the structure of Bayesian networks [10], where each variable is conditionally dependent on its predecessors. Autoregressive objectives have been central to next-token prediction in NLP (e.g., GPT [56]) and patch generation in vision models [109]. 其中 x_( < t)\mathbf{x}_{<t} 表示序列中的所有前置元素。该公式与贝叶斯网络[10]的结构一致,其中每个变量都有条件地依赖于其前身。自回归目标一直是 NLP 中下一个标记预测(例如 GPT[56])和视觉模型中补丁生成[109]的核心。
In the graph domain, autoregressive models extend this concept to structured data by mapping graphs into sequences through node ordering schemes. Given an undirected graph G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) with nn nodes and a node permutation pi\pi, the graph can be transformed into a sequence S^(pi)=f_(S)(G,pi)=(S_(1)^(pi),dots,S_(n)^(pi))\mathbf{S}^{\pi}=f_{S}(\mathcal{G}, \pi)=\left(\mathbf{S}_{1}^{\pi}, \ldots, \mathbf{S}_{n}^{\pi}\right), where each element S_(i)^(pi)\mathbf{S}_{i}^{\pi} encodes the connectivity (i.e., edges) between node pi(v_(i))\pi\left(v_{i}\right) and its preceding nodes pi(v_(j)),j < i\pi\left(v_{j}\right), j<i in the permutation. The generation process then follows an autoregressive factorization: 在图领域,自回归模型通过节点排序方案将图映射到序列中,将这一概念扩展到结构化数据。给定一个 G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) 带有节点和节点排列的 nn 无向图 pi\pi ,该图可以转换为一个序列 S^(pi)=f_(S)(G,pi)=(S_(1)^(pi),dots,S_(n)^(pi))\mathbf{S}^{\pi}=f_{S}(\mathcal{G}, \pi)=\left(\mathbf{S}_{1}^{\pi}, \ldots, \mathbf{S}_{n}^{\pi}\right) ,其中每个元素 S_(i)^(pi)\mathbf{S}_{i}^{\pi} 都编码排列 pi(v_(j)),j < i\pi\left(v_{j}\right), j<i 中节点 pi(v_(i))\pi\left(v_{i}\right) 与其前一个节点之间的连通性(即边)。然后,生成过程遵循自回归分解:
where a new node and its connections are generated conditioned on the current graph prefix. At each step ii, the model maintains a hidden state h_(i)h_{i} that summarizes the generation history, and uses it to parameterize the output distribution for the next adjacency slice S_(i)^(pi)\mathbf{S}_{i}^{\pi} : 其中,生成新节点及其连接以当前图形前缀为条件。在每一步 ii 中,模型都会维护一个隐藏状态,该状态 h_(i)h_{i} 总结了生成历史记录,并使用它来参数化下一个邻接切片的输出分布 S_(i)^(pi)\mathbf{S}_{i}^{\pi} :
where f_("trans ")f_{\text {trans }} is typically implemented using recurrent neural networks (e.g., GRUs [110], LSTMs [111]) or Transformers [55], and f_("out ")f_{\text {out }} defines the distributional output head (e.g., for edge prediction). This architecture enables autoregressive GFMs to sequentially generate nodes and edges in a structured and history-aware manner. Notable examples of this approach include GraphRNN [112], MolecularRNN [113], and GraphGPT [114], which utilize autoregressive mechanisms for graph generation. Moreover, beyond node-centric generation, several works [115, 116] extend this framework to edge-wise autoregression, preserving the same conditional structure as Equation 14. These methods demonstrate the versatility of autoregressive models in capturing the sequential dependencies embedded within complex graph structures. 其中 f_("trans ")f_{\text {trans }} 通常使用递归神经网络(例如,GRU [110]、LSTM [111])或 Transformers [55]来实现,并 f_("out ")f_{\text {out }} 定义分布输出头(例如,用于边缘预测)。这种架构使自回归 GFM 能够以结构化和历史感知的方式顺序生成节点和边缘。这种方法的著名例子包括 GraphRNN [112]、MolecularRNN [113]和 GraphGPT [114],它们利用自回归机制进行图生成。此外,除了以节点为中心的生成之外,一些工作[115,116]将该框架扩展到边自回归,保留了与公式 14 相同的条件结构。这些方法证明了自回归模型在捕获复杂图结构中嵌入的顺序依赖关系方面的多功能性。
Auto-Encoding Generation. The auto-encoder paradigm, particularly in the context of foundation models, has evolved from classical reconstruction objectives [117, 118, 119] to masked modeling strategies [57, 120]. Inspired by the success of models like BERT in natural language processing [57], the underlying philosophy centers on reconstructing masked or corrupted parts of the input conditioned on the visible context. Formally, given an input sequence x=(x_(1),x_(2),dots,x_(T))\mathbf{x}=\left(x_{1}, x_{2}, \ldots, x_{T}\right) with a subset of tokens masked, the objective is to maximize the likelihood of the masked components given the observed ones: 自动编码生成。自动编码器范式,特别是在基础模型的背景下,已经从经典的重建目标[117,118,119]演变为掩码建模策略[57,120]。受到 BERT 等模型在自然语言处理中的成功[57]的启发,其基本理念的核心是重建以可见上下文为条件的输入的掩蔽或损坏部分。形式上,给定一个具有屏蔽标记子集 x=(x_(1),x_(2),dots,x_(T))\mathbf{x}=\left(x_{1}, x_{2}, \ldots, x_{T}\right) 的输入序列,目标是最大化给定观察到的标记的屏蔽组件的可能性:
where Msub{1,dots,T}\mathcal{M} \subset\{1, \ldots, T\} is the set of masked indices, and x_(\\M)\mathbf{x}_{\backslash \mathcal{M}} denotes the visible (unmasked) tokens. This conditional modeling objective enables the model to learn contextualized and robust representations by leveraging co-occurrence patterns within the input. 其中 Msub{1,dots,T}\mathcal{M} \subset\{1, \ldots, T\} 是屏蔽索引的集合,表示 x_(\\M)\mathbf{x}_{\backslash \mathcal{M}} 可见(未屏蔽)标记。这种条件建模目标使模型能够通过利用输入中的共现模式来学习上下文化和稳健的表示。
Extending this idea to graphs, Graph Auto-Encoders (GAEs) [119] adopt a similar masked modeling philosophy. Rather than reconstructing the entire graph or feature space directly, the model is trained to predict masked components of a graph-such as node features, edge features, or structure-conditioned on the visible graph context [119, 121, 122, 123, 124, 125, 73, 114]. Let G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) denote an attributed graph, and let t∼Tt \sim \mathcal{T} be a masking or augmentation function that produces a corrupted version widetilde(G)=( widetilde(X), widetilde(A))\widetilde{\mathcal{G}}=(\widetilde{\mathbf{X}}, \widetilde{\mathbf{A}}). A GNN-based encoder is applied to the corrupted graph to yield latent representations widetilde(Z)=f_("GNN ")( widetilde(G);theta)\widetilde{\mathbf{Z}}=f_{\text {GNN }}(\widetilde{\mathcal{G}} ; \theta), which are then used to reconstruct the masked components through specialized prediction heads. 将这一想法扩展到图,图自动编码器(GAE)[119]采用了类似的掩码建模理念。该模型不是直接重建整个图或特征空间,而是训练模型预测图的掩码组件,例如节点特征、边缘特征或结构条件,这些组件取决于可见图上下文[119,121,122,123,124,125,73,114]。让 G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) 表示一个属性图,让 t∼Tt \sim \mathcal{T} 是一个产生损坏版本 widetilde(G)=( widetilde(X), widetilde(A))\widetilde{\mathcal{G}}=(\widetilde{\mathbf{X}}, \widetilde{\mathbf{A}}) 的掩码或增强函数。将基于 GNN 的编码器应用于损坏的图以产生潜在表示 widetilde(Z)=f_("GNN ")( widetilde(G);theta)\widetilde{\mathbf{Z}}=f_{\text {GNN }}(\widetilde{\mathcal{G}} ; \theta) ,然后使用这些表示通过专门的预测头重建被掩蔽的组件。
For masked feature modeling, the objective is to reconstruct the masked node or edge attributes: 对于掩膜特征建模,目标是重建掩膜节点或边属性:
where X_(M)\mathbf{X}_{\mathcal{M}} denotes the masked features and p(*)p(\cdot) is the prediction head. 其中 X_(M)\mathbf{X}_{\mathcal{M}} 表示被掩膜的特征, p(*)p(\cdot) 是预测头。
For masked structure modeling, the goal is to recover the masked links or adjacency entries: 对于掩码结构建模,目标是恢复掩码链路或邻接条目:
where A_(M)\mathbf{A}_{\mathcal{M}} represents the masked structural components. 其中 A_(M)\mathbf{A}_{\mathcal{M}} 表示被遮罩的结构组件。
4.3.3 Contrastive Pretraining 4.3.3 对比预训练
Contrastive pretraining is a self-supervised learning paradigm that focuses on learning discriminative representations by contrasting positive pairs against negative pairs [126, 127, 128, 129, 130, 131, 132, 133]. The underlying philosophy is to bring semantically similar representations closer in the embedding space while pushing apart dissimilar ones [134, 135, 136], thereby encouraging the model to capture meaningful and generalizable features without relying on manual annotations. Let D={x_(i)}_(i=1)^(N)\mathcal{D}=\left\{x_{i}\right\}_{i=1}^{N} be a dataset, and let (x_(i),x_(i)^(+))\left(x_{i}, x_{i}^{+}\right) denote a positive pair (e.g., different views or augmentations of the same instance), while ( x_(i),x_(j)^(-)x_{i}, x_{j}^{-}) represents a negative pair where x_(j)^(-)!=x_(i)x_{j}^{-} \neq x_{i}. A widely used objective is the InfoNCE loss [128], formulated as: 对比预训练是一种自监督学习范式,侧重于通过对正对与负对进行对比来学习判别表示[126、127、128、129、130、131、132、133]。其基本理念是使语义上相似的表示在嵌入空间中更接近,同时将不同的表示推开[134,135,136],从而鼓励模型在不依赖手动注释的情况下捕获有意义和可推广的特征。设 D={x_(i)}_(i=1)^(N)\mathcal{D}=\left\{x_{i}\right\}_{i=1}^{N} 为数据集,let (x_(i),x_(i)^(+))\left(x_{i}, x_{i}^{+}\right) 表示正对(例如,同一实例的不同视图或增强),而 ( x_(i),x_(j)^(-)x_{i}, x_{j}^{-} ) 表示负对,其中 x_(j)^(-)!=x_(i)x_{j}^{-} \neq x_{i} 。一个广泛使用的目标是 InfoNCE 损失[128],其表述如下:
where f(*)f(\cdot) is the encoder network, sim(*,*)\operatorname{sim}(\cdot, \cdot) denotes a similarity function (e.g., cosine similarity), and tau\tau is a temperature parameter. By optimizing this objective, the model learns to distinguish between semantically relevant and irrelevant samples, leading to representations that transfer well to a variety of downstream tasks. Based on the contrastive levels, we categorize the existing contrastive pretraining methods into instanceinstance and instance-context contrastive methods. 式中 f(*)f(\cdot) ,是编码器网络, sim(*,*)\operatorname{sim}(\cdot, \cdot) 表示相似度函数(例如余弦相似度), tau\tau 是温度参数。通过优化这一目标,该模型学会区分语义相关和不相关的样本,从而产生能够很好地转移到各种下游任务的表示。根据对比水平,我们将现有的对比预训练方法分为实例实例和实例-上下文对比方法。
Instance-Instance Contrastive Learning. Instance-instance contrastive learning focuses on comparing individual instances against one another. The key idea is to encourage representations of the same instance under different views (positive pairs) to be close in the embedding space, while pushing apart representations of different instances (negative pairs). This stands in contrast to instance-to-context learning, which aims to align an instance with a broader semantic context (e.g., prototypes, clusters, or class-level embeddings) [137, 130]. The instance-instance paradigm treats every instance as its own class, leading to instance-discriminative embeddings that are highly generalizable across downstream tasks. 实例-实例对比学习。实例-实例对比学习侧重于将各个实例相互比较。关键思想是鼓励不同视图(正对)下同一实例的表示在嵌入空间中接近,同时将不同实例的表示(负对)分开。这与实例到上下文学习形成鲜明对比,后者旨在使实例与更广泛的语义上下文(例如,原型、集群或类级嵌入)保持一致[137,130]。实例-实例范式将每个实例视为自己的类,从而产生实例判别性嵌入,这些嵌入在下游任务中具有高度通用性。
In the graph domain, many works [138, 139, 140, 23, 141, 142, 143, 144] illustrates the standard pipeline for instance-instance contrastive learning over graph-structured data. The typical workflow begins by applying two stochastic augmentations t_(1),t_(2)∼Tt_{1}, t_{2} \sim \mathcal{T} to a given graph G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}), producing two views widetilde(G)_(1)\widetilde{\mathcal{G}}_{1} and widetilde(G)_(2)\widetilde{\mathcal{G}}_{2} : 在图领域,许多作品 [138, 139, 140, 23, 141, 142, 143, 144] 说明了图结构数据上的实例-实例对比学习的标准管道。典型的工作流程首先对给定的图形 t_(1),t_(2)∼Tt_{1}, t_{2} \sim \mathcal{T} 应用两个随机增强 G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) ,生成两个视图 widetilde(G)_(1)\widetilde{\mathcal{G}}_{1} 和 widetilde(G)_(2)\widetilde{\mathcal{G}}_{2} :
where *** denotes the view index (either 1 or 2 ), and Z_(**)\mathbf{Z}_{*} contains the node-level embeddings for the corresponding augmented graph. 其中 *** 表示视图索引(1 或 2 ),并 Z_(**)\mathbf{Z}_{*} 包含相应增强图的节点级嵌入。
For each node v_(i)inGv_{i} \in \mathcal{G}, a set of positive counterparts P(v_(i))\mathbb{P}\left(v_{i}\right) is selected-typically consisting of the same node under the alternate view. The training objective aims to maximize agreement between positive pairs while minimizing it for all others using InfoNCE loss [128] in Equation 19. 对于每个节点 v_(i)inGv_{i} \in \mathcal{G} ,将选择一组正对应项 P(v_(i))\mathbb{P}\left(v_{i}\right) - 通常由备用视图下的相同节点组成。训练目标旨在使用公式 19 中的 InfoNCE 损失[128]最大限度地提高正对之间的一致性,同时最小化所有其他对的一致性。
When negative samples are not explicitly used - as in BGRL [145] - a bootstrapped alternative is employed, which avoids direct contrast with negative instances. This bootstrapping loss is defined as: 当没有明确使用负样本时(如 BGRL [145]),则采用引导替代方案,以避免与负实例直接对比。这种引导损失定义为:
where g(*)g(\cdot) is a projector and sg[*]\operatorname{sg}[\cdot] denotes the stop-gradient operation. 其中 g(*)g(\cdot) 是投影仪, sg[*]\operatorname{sg}[\cdot] 表示停止梯度作。
While instance-instance contrastive learning has demonstrated impressive results in graph representation learning, a notable limitation arises from the assumption that the positive sample is simply the same node under different augmentations. This assumption can introduce sampling bias, especially in graphs with noisy structure or weak homophily. Addressing the challenge of positive sample selection and exploring more semantically aligned contrastive pairs remain important directions for improving the robustness and generality of graph contrastive learning methods [146]. 虽然实例-实例对比学习在图表示学习中表现出了令人印象深刻的结果,但一个显着的局限性是假设正样本只是不同增强下的同一节点。这种假设可能会引入抽样偏差,特别是在结构嘈杂或同性弱的图中。解决正样本选择的挑战和探索语义上更对齐的对比对仍然是提高图对比学习方法的鲁棒性和通用性的重要方向[146]。
Instance-Context Contrastive Learning. Beyond instance-instance contrastive learning, an alternative and complementary paradigm is instance-context contrastive learning, which aims to maximize mutual information between local (instance-level) and global (context-level) representations [130, 143, 147, 148]. Unlike the instance-instance approach-which distinguishes between individual views of nodes or subgraphs-instancecontext contrastive learning encourages alignment between a node (or substructure) and a global summary of the graph, facilitating the capture of holistic graph-level semantics. 实例-上下文对比学习。除了实例-实例对比学习之外,另一种补充范式是实例-上下文对比学习,其目的是最大化局部(实例级)和全局(上下文级)表示之间的相互信息[130,143,147,148]。与区分节点或子图的单个视图的实例-实例方法不同,实例上下文对比学习鼓励节点(或子结构)与图的全局摘要之间的对齐,从而促进捕获整体图级语义。
In the graph domain, this idea was first introduced by Deep Graph Infomax (DGI) [137], which adapts the Deep InfoMax framework [149] to graph-structured data. DGI proposes to maximize the local-global mutual information between node representations and a global summary vector, thereby enabling the encoder to learn node embeddings that are contextually grounded in the structure and semantics of the entire graph. Formally, given a graph G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) and a stochastic augmentation t∼Tt \sim \mathcal{T} that produces a corrupted graph widetilde(G)=( widetilde(X), widetilde(A))\widetilde{\mathcal{G}}=(\widetilde{\mathbf{X}}, \widetilde{\mathbf{A}}), an encoder f(*)f(\cdot) is used to map both the original and corrupted graphs into latent space: 在图领域,这一想法首先由 Deep Graph Infomax(DGI)[137]引入,它使 Deep InfoMax 框架[149]适应图结构化数据。DGI 提出最大化节点表示和全局汇总向量之间的局部-全局互信息,从而使编码器能够学习基于上下文的节点嵌入,这些嵌入基于整个图的结构和语义。形式上,给定一个图 G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) 和一个产生损坏图 t∼Tt \sim \mathcal{T} 的随机增强 widetilde(G)=( widetilde(X), widetilde(A))\widetilde{\mathcal{G}}=(\widetilde{\mathbf{X}}, \widetilde{\mathbf{A}}) ,编码器 f(*)f(\cdot) 用于将原始图和损坏图映射到潜在空间中:
where Z=[z_(1),dots,z_(N)]\mathbf{Z}=\left[\mathbf{z}_{1}, \ldots, \mathbf{z}_{N}\right] and widetilde(Z)=[ widetilde(z)_(1),dots, widetilde(z)_(M)]\widetilde{\mathbf{Z}}=\left[\widetilde{\mathbf{z}}_{1}, \ldots, \widetilde{\mathbf{z}}_{M}\right] denote the node embeddings from the original and corrupted graphs, respectively. To compute a graph-level summary vector, a permutation-invariant readout function R(*)\mathcal{R}(\cdot) is applied over the original node embeddings s=R(Z)s=\mathcal{R}(\mathbf{Z}), where ss serves as the global context embedding for the graph. 其中 Z=[z_(1),dots,z_(N)]\mathbf{Z}=\left[\mathbf{z}_{1}, \ldots, \mathbf{z}_{N}\right] 和 分别 widetilde(Z)=[ widetilde(z)_(1),dots, widetilde(z)_(M)]\widetilde{\mathbf{Z}}=\left[\widetilde{\mathbf{z}}_{1}, \ldots, \widetilde{\mathbf{z}}_{M}\right] 表示原始图和损坏图中的节点嵌入。为了计算图级汇总向量,将排列不变读出函数 R(*)\mathcal{R}(\cdot) 应用于原始节点嵌入 s=R(Z)s=\mathcal{R}(\mathbf{Z}) ,其中 ss 用作图形的全局上下文嵌入。
The contrastive objective is then defined to distinguish between positive pairs ( z_(i),s\mathbf{z}_{i}, \mathbf{s} ) sampled from the original graph and negative pairs ( widetilde(z)_(j),s\widetilde{\mathbf{z}}_{j}, \mathbf{s} ) from the corrupted one. A discriminator D(*,*)\mathcal{D}(\cdot, \cdot) is trained to assign high scores to positive pairs and low scores to negatives. The training objective is: 然后定义对比目标,以区分从原始图中采样的正对 ( z_(i),s\mathbf{z}_{i}, \mathbf{s} ) 和从损坏图中采样的负对 ( widetilde(z)_(j),s\widetilde{\mathbf{z}}_{j}, \mathbf{s} )。鉴别器 D(*,*)\mathcal{D}(\cdot, \cdot) 经过训练,可以将高分分配给正对,将低分分配给负值。培训目标是:
where NN and MM are the number of nodes in the original and corrupted graphs, respectively. This localglobal objective encourages the model to learn representations that are not only locally expressive but also globally aware. Recent extensions have improved upon DGI by integrating additional structural priors, such as structural mutual information from an information bottleneck perspective [150], or by tailoring the approach for low-resource settings like few-shot node classification [151]. 其中 NN 和 分别 MM 是原始图和损坏图中的节点数。这个局部全球目标鼓励模型学习不仅具有局部表达性而且具有全球意识的表示。最近的扩展通过整合额外的结构先验,例如从信息瓶颈角度的结构互信息[150],或通过为低资源环境(如少样本节点分类[151)定制方法],对 DGI 进行了改进。
4.3.4 Discussion 4.3.4 讨论
The pretraining of GFMs has emerged as a critical driver for improving generalization and transferability across diverse graph learning tasks. While various pretraining paradigms have been developed, each presents distinct benefits and trade-offs depending on the underlying assumptions, supervision signals, and application scenarios. In summary, supervised pretraining provides strong task alignment but requires labeled data; generative approaches enable flexible and scalable learning through reconstruction objectives; and contrastive methods deliver powerful, discriminative representations but rely heavily on augmentation quality and require intensive computation. No single method is universally superior-rather, each presents complementary advantages. GFM 的预训练已成为提高不同图学习任务的泛化性和可转移性的关键驱动力。虽然已经开发了各种预训练范式,但根据基本假设、监督信号和应用场景,每种范式都有不同的优势和权衡。总之,监督预训练提供了强大的任务对齐性,但需要标记数据;生成方法通过重建目标实现灵活和可扩展的学习;对比方法提供强大的、有区别的表示,但严重依赖于增强质量并且需要大量计算。没有一种方法可以普遍优越,相反,每种方法都具有互补的优势。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
Supervised Pretraining directly optimizes the model with respect to labeled graph data, often resulting in task-aligned and semantically rich representations. This approach is particularly effective when the pretraining task closely matches the downstream objectives, as it enables faster convergence and better performance. However, the main limitation lies in the reliance on large-scale, high-quality labeled datasets, which are costly and time-consuming to obtain. This constraint limits the scalability and generality of supervised pretraining. 监督预训练直接针对标记的图数据优化模型,通常会产生任务对齐且语义丰富的表示。当预训练任务与下游目标紧密匹配时,这种方法特别有效,因为它可以实现更快的收敛和更好的性能。然而,主要局限性在于依赖大规模、高质量的标记数据集,获取成本高且耗时。这种约束限制了监督预训练的可扩展性和通用性。
Generative Pretraining, including auto-regressive modeling and auto-encoding modeling, encourages models to learn the underlying structure of graph data by reconstructing input components. These methods leverage only the input data itself, enabling label-free pretraining at scale. The key strength of generative approaches is their flexible and scalable framework, which is efficient than contrastive learning. However, generative models may lack strong task specificity, and their reconstruction-based objectives do not always correlate with downstream performance. 生成式预训练,包括自回归建模和自动编码建模,鼓励模型通过重建输入组件来学习图数据的底层结构。这些方法仅利用输入数据本身,实现大规模无标签预训练。生成方法的主要优势在于其灵活且可扩展的框架,这比对比学习更有效。然而,生成模型可能缺乏很强的任务特异性,其基于重建的目标并不总是与下游性能相关。
Contrastive Pretraining has gained popularity due to its simplicity and effectiveness in learning discriminative graph representations. Instance-instance contrastive learning excels at producing fine-grained embeddings by distinguishing between positive and negative pairs, while instance-context contrastive learning focuses on aligning local representations with global summaries. These methods often outperform generative models on graphs [121]. Nonetheless, contrastive methods are sensitive to the choice of augmentations, sampling strategies, and the availability of informative negatives. Additionally, computing contrastive loss can be computationally intensive, especially when modeling large or densely connected graphs. 对比预训练因其在学习判别图表示方面的简单性和有效性而广受欢迎。实例-实例对比学习擅长通过区分正负对来产生细粒度的嵌入,而实例-上下文对比学习则侧重于使局部表示与全局摘要保持一致。这些方法通常优于图上的生成模型[121]。尽管如此,对比方法对增强的选择、抽样策略和信息负片的可用性很敏感。此外,计算对比损失可能是计算密集型的,尤其是在对大型或密集连接的图形进行建模时。
4.4 Adaptation 4.4 适应
In this section, we provide a comprehensive overview of adaptation strategies for GFMs, highlighting how pretrained models can be effectively applied to diverse downstream tasks. It categorizes these strategies into six key paradigms-transfer learning, distillation, test-time adaptation, graph prompting, in-context learning, and prototype learning-each tailored to specific data conditions, supervision levels, and deployment constraints. 在本节中,我们全面概述了 GFM 的适应策略,重点介绍了如何有效地将预训练模型应用于各种下游任务。它将这些策略分为六种关键范式——迁移学习、蒸馏、测试时间适应、图提示、上下文学习和原型学习——每种范式都针对特定的数据条件、监督级别和部署约束量身定制。
4.4.1 Transfer Learning 4.4.1 迁移学习
Transfer learning is a foundational paradigm in modern machine learning, wherein a model pre-trained on a large source dataset is adapted to a new target domain with limited labeled data. In the context of GFMs, transfer learning enables the reuse of structural and semantic knowledge acquired during large-scale pretraining, thus improving performance and generalization on downstream graph-related tasks. The key idea is to initialize the model with pre-trained parameters, thereby leveraging learned representations and reducing the risk of overfitting-particularly beneficial in scenarios with limited task-specific data. Depending on the adaptation strategy, the pre-trained GFM may be directly applied to a downstream task or further fine-tuned using taskspecific supervision. Formally, let theta\theta denote the parameters of a pre-trained GFM, and let D_("target ")D_{\text {target }} be the target dataset comprising graph-structured instances. Fine-tuning optimizes the model parameters by minimizing the task-specific loss: 迁移学习是现代机器学习的基本范式,其中在大型源数据集上预训练的模型适应具有有限标记数据的新目标域。在 GFM 的背景下,迁移学习能够重用在大规模预训练中获得的结构和语义知识,从而提高下游图相关任务的性能和泛化性。关键思想是使用预训练参数初始化模型,从而利用学习到的表示并降低过度拟合的风险——在特定任务数据有限的场景中特别有用。根据适应策略,预训练的 GFM 可以直接应用于下游任务,也可以使用特定任务的监督进一步微调。形式上,let theta\theta 表示预训练 GFM 的参数,并 为 D_("target ")D_{\text {target }} 包含图结构实例的目标数据集。微调通过最小化特定于任务的损失来优化模型参数:
where f_("GFM ")f_{\text {GFM }} is the GFM model operating on node features X\mathbf{X} and adjacency matrix A\mathbf{A}. 其中 f_("GFM ")f_{\text {GFM }} 是运行在节点特征 X\mathbf{X} 和邻接矩阵上的 GFM 模型 A\mathbf{A} 。
Direct Adaptation. In direct adaptation, the pre-trained GFM is applied to the downstream task without any parameter updates. This zero-shot setting evaluates the generalization ability of the pre-trained model in a plug-and-play fashion. For instance, OFA [22] introduces a unified framework that transforms graph data into textual prompts and leverages pre-trained LLMs for graph inference. It constructs a prompt graph P=(V_(p),E_(p),R_(p))\mathcal{P}=\left(\mathcal{V}_{p}, \mathcal{E}_{p}, \mathcal{R}_{p}\right) by appending a virtual prompt node and its relations to the original graph, enabling in-context learning without task-specific fine-tuning. 直接改编。在直接适配中,预训练的 GFM 应用于下游任务,无需任何参数更新。这种零样本设置以即插即用的方式评估预训练模型的泛化能力。例如,OFA [22]引入了一个统一的框架,将图数据转换为文本提示,并利用预训练的 LLM 进行图推理。它 P=(V_(p),E_(p),R_(p))\mathcal{P}=\left(\mathcal{V}_{p}, \mathcal{E}_{p}, \mathcal{R}_{p}\right) 通过将虚拟提示节点及其关系附加到原始图来构建提示图,从而实现上下文学习,而无需特定于任务的微调。
Full Fine-Tuning. Full fine-tuning involves updating all parameters of the pre-trained model on the target task. This approach offers maximal flexibility and adaptation capacity but can lead to overfitting when the target dataset is small or noisy. It is typically employed when the downstream task is sufficiently different from the pretraining objectives or when high task-specific accuracy is critical [9]. 全面微调。完全微调涉及在目标任务上更新预训练模型的所有参数。这种方法提供了最大的灵活性和适应能力,但当目标数据集较小或嘈杂时,可能会导致过度拟合。当下游任务与预训练目标有足够大的差异或当特定任务的高准确性至关重要时,通常采用它[9]。
Adaptive Fine-Tuning. Adaptive fine-tuning seeks to selectively update specific layers, modules, or parameters in the model, guided by the characteristics of the target task. This selective tuning reduces computational cost and mitigates overfitting. AUX-TS [152] exemplifies this paradigm by dynamically selecting auxiliary tasks based on their semantic similarity to the target task. The similarity scores are learned and used to weigh each auxiliary signal during the fine-tuning process, enabling task-aware adaptation. 自适应微调。自适应微调旨在根据目标任务的特征,有选择地更新模型中的特定层、模块或参数。这种选择性调整降低了计算成本并减轻了过度拟合。AUX-TS [152]通过根据辅助任务与目标任务的语义相似性动态选择辅助任务来举例说明这种范式。在微调过程中,学习并使用相似性分数来权衡每个辅助信号,从而实现任务感知适应。
Parameter-Efficient Fine-Tuning (PEFT). PEFT techniques aim to retain the generalization capabilities of pre-trained models while significantly reducing the number of trainable parameters. These methods typically freeze the backbone model and introduce lightweight modules such as adapters or low-rank transformations. For example, AdapterGNN [153] integrates an adapter module into GNN layers, where the transformation is defined as: 参数高效微调 (PEFT)。PEFT 技术旨在保留预训练模型的泛化能力,同时显着减少可训练参数的数量。这些方法通常冻结主干模型并引入轻量级模块,例如适配器或低秩转换。例如,AdapterGNN [153]将适配器模块集成到 GNN 层中,其中转换定义为:
with W_("down ")\mathbf{W}_{\text {down }} and W_("up ")\mathbf{W}_{\text {up }} being trainable low-dimensional projections and BN denoting batch normalization. Similarly, GraphLoRA [154] applies low-rank adaptation to reduce parameter complexity, and GPF [155] introduces a learnable task-specific vector p\mathbf{p}, which is concatenated to node features x_(i)\mathbf{x}_{i}, while freezing the backbone GNN. 具有 W_("down ")\mathbf{W}_{\text {down }} 可 W_("up ")\mathbf{W}_{\text {up }} 训练的低维投影和表示批量归一化的 BN。同样,GraphLoRA [154]应用低秩自适应来降低参数复杂性,GPF [155]引入了一个可学习的任务特定向量 p\mathbf{p} ,该向量被连接到节点特征 x_(i)\mathbf{x}_{i} ,同时冻结了主干 GNN。
4.4.2 Distillation 4.4.2 蒸馏
Knowledge distillation is a model compression technique aimed at transferring the knowledge encoded in a large, powerful model (the teacher) to a smaller, more efficient model (the student). In the context of GFMs, distillation enables the deployment of compact models that maintain competitive performance while reducing inference time and resource consumption. Let f_("teacher ")f_{\text {teacher }} and f_("student ")f_{\text {student }} denote the outputs of the teacher and student models, respectively. The distillation objective typically combines supervision from ground-truth labels with guidance from the teacher outputs. This can be formalized as: 知识蒸馏是一种模型压缩技术,旨在将编码在大型、强大模型(教师)中的知识转移到更小、更高效的模型(学生)。在 GFM 的背景下,蒸馏可以部署紧凑的模型,从而保持有竞争力的性能,同时减少推理时间和资源消耗。分别让 f_("teacher ")f_{\text {teacher }} 和 f_("student ")f_{\text {student }} 表示教师和学生模型的输出。蒸馏目标通常将地面实况标签的监督与教师输出的指导相结合。这可以形式化为:
where L_("true ")\mathcal{L}_{\text {true }} denotes the supervised loss with respect to the ground-truth labels, L_("match ")\mathcal{L}_{\text {match }} measures the discrepancy between the student and teacher outputs (e.g., using Kullback-Leibler divergence in general), and lambda in[0,1]\lambda \in[0,1] balances the contributions of the two terms. The rationale behind distillation is that the teacher outputs-often referred to as “soft targets”-contain richer supervisory signals than one-hot labels [156]. These include information about inter-class similarities and decision boundaries that are difficult for a student model to infer directly from the data [157]. Moreover, intermediate representations within the teacher model (e.g., node 其中 L_("true ")\mathcal{L}_{\text {true }} 表示相对于基本事实标签的监督损失, L_("match ")\mathcal{L}_{\text {match }} 衡量学生和教师产出之间的差异(例如,一般使用 Kullback-Leibler 散度),并 lambda in[0,1]\lambda \in[0,1] 平衡这两个术语的贡献。蒸馏背后的基本原理是,教师输出(通常称为“软目标”)比单热标签包含更丰富的监督信号[156]。其中包括有关班间相似性和决策边界的信息,学生模型很难直接从数据中推断出这些信息[157]。此外,教师模型中的中间表示(例如,节点
embeddings or attention maps) can be used to further enhance the student’s learning through feature-level alignment [158]. 嵌入或注意力图)可用于通过特征级对齐进一步增强学生的学习[158]。
In the graph domain, distillation techniques have been extended to incorporate structural and relational knowledge. For instance, G-CRD [159] proposes a contrastive representation distillation framework that preserves global graph topology. Instead of only matching final predictions, G-CRD aligns student and teacher node embeddings by maximizing agreement in a shared representation space, effectively transferring topological cues. Beyond prediction-level and embedding-level supervision, additional distillation strategies include: 在图领域,蒸馏技术已被扩展为纳入结构和关系知识。例如,G-CRD [159]提出了一种对比表示蒸馏框架,该框架保留了全局图拓扑结构。G-CRD 不仅匹配最终预测,还通过在共享表示空间中最大化一致性来调整学生和教师节点嵌入,从而有效地传输拓扑线索。除了预测级和嵌入级监督之外,其他蒸馏策略还包括:
Graph Structure Distillation: Preserving relational patterns such as adjacency, edge importance, or motif distributions in the student model [160, 161, 158]. 图结构蒸馏:在学生模型中保留关系模式,如邻接性、边重要性或基序分布[160,161,158]。
Attention-Based Distillation: Mimicking the attention maps learned by graph attention models (e.g., GAT) to preserve neighbor importance [162, 163]. 基于注意力的蒸馏:模仿图注意力模型(例如 GAT)学习的注意力图,以保持邻居的重要性[162,163]。
Multi-View or Multi-Task Distillation: Transferring knowledge from auxiliary tasks or diverse views of the graph to improve robustness [164, 165, 166]. 多视图或多任务蒸馏:从辅助任务或图的不同视图中转移知识以提高鲁棒性[164,165,166]。
4.4.3 Test-Time Adaptation 4.4.3 测试时适配
Test-Time Adaptation (TTA) [167, 168] refers to the process of adapting a pre-trained GFM during inference, using the test data itself to update the model in real time. Unlike fine-tuning and distillation, which are performed in a dedicated training phase prior to deployment, TTA operates entirely at inference time. This paradigm is particularly useful in scenarios involving distributional shifts [169] between training and test data or where access to labeled target-domain samples is limited or unavailable. TTA typically involves adjusting model parameters on-the-fly in an online fashion [170], as each new test sample or batch is processed. Let G_("test ")\mathcal{G}_{\text {test }} denote a test graph and f_(theta)f_{\theta} the pre-trained GFM with parameters theta\theta. The model is adapted by minimizing a self-supervised loss function defined over the test data: 测试时间自适应(TTA)[167,168]是指在推理过程中适配预训练的 GFM 的过程,使用测试数据本身实时更新模型。与在部署前的专用训练阶段执行的微调和蒸馏不同,TTA 完全在推理时运行。这种范式在涉及训练数据和测试数据之间分布偏移[169]的场景中特别有用,或者对标记的目标域样本的访问有限或不可用。TTA 通常涉及在处理每个新的测试样品或批次时以在线方式即时调整模型参数[170]。让我们 G_("test ")\mathcal{G}_{\text {test }} 表示一个测试图和 f_(theta)f_{\theta} 带有参数 theta\theta 的预训练 GFM。该模型通过最小化在测试数据上定义的自监督损失函数来调整:
where eta\eta is the learning rate and L_("self ")\mathcal{L}_{\text {self }} represents a self-supervised objective, such as entropy minimization, pseudo-label consistency, or structural smoothness. These objectives do not require labeled data and are often chosen to encourage confident, stable predictions under test-time perturbations. 其中 eta\eta 是学习率, L_("self ")\mathcal{L}_{\text {self }} 代表自监督目标,例如熵最小化、伪标签一致性或结构平滑度。这些目标不需要标记数据,通常选择这些目标是为了鼓励在测试时扰动下进行自信、稳定的预测。
The key assumption behind TTA is that the incoming test data itself contains useful signals that can help tailor the model to the target distribution. By leveraging this data for real-time adaptation, the model can dynamically adjust to distributional shifts, enhance robustness, and improve prediction accuracy without requiring re-training or access to source-domain data. TTA 背后的关键假设是,传入的测试数据本身包含有用的信号,可以帮助根据目标分布定制模型。通过利用这些数据进行实时适应,该模型可以动态调整以适应分布变化,增强鲁棒性,并提高预测准确性,而无需重新训练或访问源域数据。
Graph Transformation-Based Adaptation. GTrans [171] exemplifies a data-centric TTA approach by performing graph refinement at test time. Rather than updating model parameters directly, GTrans modifies the input graph to better suit the fixed, pre-trained GNN. It learns perturbations on node features and graph topology to minimize a surrogate loss, effectively generating an adapted graph G^(')\mathcal{G}^{\prime} that is more compatible with the pretrained model: 基于图转换的自适应。GTrans [171]通过在测试时进行图细化,举例说明了一种以数据为中心的 TTA 方法。GTrans 不是直接更新模型参数,而是修改输入图以更好地适应固定的、预训练的 GNN。它学习节点特征和图拓扑的扰动,以最大限度地减少代理损失,从而有效地生成与预训练模型更兼容 G^(')\mathcal{G}^{\prime} 的适配图:
where delta\delta denotes learnable perturbations and L_("surrogate ")\mathcal{L}_{\text {surrogate }} approximates task-specific losses without requiring ground-truth labels. This method improves performance by refining the input data rather than the model. 其中 delta\delta 表示可学习的扰动并 L_("surrogate ")\mathcal{L}_{\text {surrogate }} 近似特定于任务的损失,而无需地面实况标签。此方法通过优化输入数据而不是模型来提高性能。
Test-Time Supervision. Recently, LLM-TTT [172] introduces a novel TTA paradigm that leverages the generative and annotative capabilities of LLMs to aid inference on text-attributed graphs. In this framework, LLMs are used to generate pseudo-labels for unlabeled test nodes, which are then used to adapt the GNN at test time. The two-stage pipeline consists of (i) annotation by the LLM using node descriptions and graph context, and (ii) refinement of the GNN using the generated pseudo-labels: 测试时间监督。最近,LLM-TTT [172]引入了一种新的 TTA 范式,该范式利用 LLM 的生成和注释能力来帮助对文本归因图进行推理。在这个框架中,LLM 用于为未标记的测试节点生成伪标签,然后用于在测试时调整 GNN。两阶段管道包括 (i) LLM 使用节点描述和图形上下文进行注释,以及 (ii) 使用生成的伪标签细化 GNN:
This strategy demonstrates the synergy between language-based supervision and graph-based reasoning, improving performance under test-time constraints without requiring manual annotations. 该策略展示了基于语言的监督和基于图形的推理之间的协同作用,无需手动注释即可在测试时间限制下提高性能。
4.4.4 Graph Prompting 4.4.4 图形提示
Graph prompting [155, 173, 174, 72] is an emerging paradigm that adapts GFMs to downstream tasks from a data-centric perspective. Unlike traditional fine-tuning, which updates the model parameters, graph prompting keeps the model frozen and instead learns additional prompt vectors P\mathcal{P} that guide the model behavior. This approach draws inspiration from prompting strategies in NLP, where carefully designed inputs can elicit desired behaviors from large language models. Graph prompting methods can be broadly categorized into two genres: data-level prompting, which modifies the input graph data, and representation-level prompting, which adjusts internal representations within the model. 图提示 [155, 173, 174, 72] 是一种新兴范式,它从以数据为中心的角度使 GFM 适应下游任务。与更新模型参数的传统微调不同,图提示使模型保持冻结状态,而是学习指导模型行为 P\mathcal{P} 的其他提示向量。这种方法从 NLP 中的提示策略中汲取灵感,其中精心设计的输入可以从大型语言模型中引出所需的行为。图提示方法大致可分为两种类型:数据级提示(修改输入图数据)和表示级提示(调整模型内的内部表示)。
Data-Level Prompting. Data-level prompting adapts the input graph by injecting learnable signals either into the feature space or the structure. Given a graph G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}), where X\mathbf{X} is the node feature matrix and A\mathbf{A} is the adjacency matrix, data-level prompting defines a transformation function t_(D)t_{D} that produces a prompted graph widetilde(G)=( widetilde(X), widetilde(A))\widetilde{\mathcal{G}}=(\widetilde{\mathbf{X}}, \widetilde{\mathbf{A}}) using prompt vectors P\mathcal{P} : 数据级提示。数据级提示通过将可学习信号注入特征空间或结构来调整输入图。给定一个图 G=(X,A)\mathcal{G}=(\mathbf{X}, \mathbf{A}) ,其中 X\mathbf{X} 是节点特征矩阵, A\mathbf{A} 是邻接矩阵,数据级提示定义了一个变换函数,该函数 t_(D)t_{D}widetilde(G)=( widetilde(X), widetilde(A))\widetilde{\mathcal{G}}=(\widetilde{\mathbf{X}}, \widetilde{\mathbf{A}}) 使用提示向量生成提示图 P\mathcal{P} :
A simple yet effective strategy is to modify only the node features. For each node v inVv \in \mathcal{V}, a prompt vector p_(v)\mathbf{p}_{v} is learned and added to the original feature vector widetilde(x)_(v)=x_(v)+p_(v)\widetilde{\mathbf{x}}_{v}=\mathbf{x}_{v}+\mathbf{p}_{v}. To reduce complexity, some approaches use a shared prompt vector for all nodes, i.e., p_(1)=p_(2)=cdots=p_(N)=p\mathbf{p}_{1}=\mathbf{p}_{2}=\cdots=\mathbf{p}_{N}=\mathbf{p} [155]. However, shared prompts may lack expressiveness. To address this, recent works introduce an attention-based mechanism over a set of JJ basis vectors {b_(1),dots,b_(J)}[155,175,174,176,177,178]\left\{\mathbf{b}_{1}, \ldots, \mathbf{b}_{J}\right\}[155,175,174,176,177,178]. Each node-specific prompt is computed as a weighted combination widetilde(x)_(v)=x_(v)+p_(v)=x_(v)+sum_(j=1)^(J)w_(v,j)*b_(j)\widetilde{\mathbf{x}}_{v}=\mathbf{x}_{v}+\mathbf{p}_{v}=\mathbf{x}_{v}+\sum_{j=1}^{J} w_{v, j} \cdot \mathbf{b}_{j}, where w_(v,j)w_{v, j} denotes the learned attention weight of node vv over basis vector jj. 一个简单而有效的策略是仅修改节点特征。对于每个节点 v inVv \in \mathcal{V} , p_(v)\mathbf{p}_{v} 学习一个提示向量并将其添加到原始特征向量中 widetilde(x)_(v)=x_(v)+p_(v)\widetilde{\mathbf{x}}_{v}=\mathbf{x}_{v}+\mathbf{p}_{v} 。为了降低复杂性,一些方法对所有节点使用共享提示向量,即 p_(1)=p_(2)=cdots=p_(N)=p\mathbf{p}_{1}=\mathbf{p}_{2}=\cdots=\mathbf{p}_{N}=\mathbf{p} [155]。然而,共享提示可能缺乏表现力。为了解决这个问题,最近的工作在一组 JJ 基向量上引入了一种基于注意力的机制 {b_(1),dots,b_(J)}[155,175,174,176,177,178]\left\{\mathbf{b}_{1}, \ldots, \mathbf{b}_{J}\right\}[155,175,174,176,177,178] 。每个特定于节点的提示都计算为加权组合 widetilde(x)_(v)=x_(v)+p_(v)=x_(v)+sum_(j=1)^(J)w_(v,j)*b_(j)\widetilde{\mathbf{x}}_{v}=\mathbf{x}_{v}+\mathbf{p}_{v}=\mathbf{x}_{v}+\sum_{j=1}^{J} w_{v, j} \cdot \mathbf{b}_{j} ,其中 w_(v,j)w_{v, j} 表示节点 vv 在基向量上的学习注意力权重 jj 。
An alternative strategy is insertion-based prompting, which introduces learnable prompt nodes into the graph. These prompt nodes are connected to existing graph nodes either uniformly [179, 180] or based on similarity metrics [173], resulting in an enriched structure that encourages the model to focus on task-relevant subgraphs. 另一种策略是基于插入的提示,它将可学习的提示节点引入到图中。这些提示节点统一连接到现有的图节点[179,180]或基于相似性指标[173],从而形成一个丰富的结构,鼓励模型专注于与任务相关的子图。
Representation-Level Prompting. Representation-level prompting modifies latent node representations instead of the input graph. Given a hidden representation h_(v)\mathbf{h}_{v} for node vv, a transformation function t_(R)t_{R} applies the learned prompts to obtain a prompted embedding: 表示级别提示。表示级提示修改潜在节点表示而不是输入图。给定 node h_(v)\mathbf{h}_{v}vv 的隐藏表示,转换函数 t_(R)t_{R} 应用学习到的提示来获得提示嵌入:
A common approach is to apply an element-wise (Hadamard) product between the representation and a prompt vector widetilde(h)_(v)=p_(v)o.h_(v)\widetilde{\mathbf{h}}_{v}=\mathbf{p}_{v} \odot \mathbf{h}_{v}, where p_(v)\mathbf{p}_{v} acts as a gating vector that highlights task-relevant dimensions in h_(v)\mathbf{h}_{v}. Similar to data-level prompting, p_(v)\mathbf{p}_{v} may be shared across nodes [181] or generated in a node-specific manner. For example, ProNoG [182] computes prompt vectors based on multi-hop ego-networks of each node, allowing localized and personalized adaptation. 一种常见的方法是在表示和提示向量之间应用元素 (Hadamard) 乘积 widetilde(h)_(v)=p_(v)o.h_(v)\widetilde{\mathbf{h}}_{v}=\mathbf{p}_{v} \odot \mathbf{h}_{v} ,其中 p_(v)\mathbf{p}_{v} 充当门控向量,突出显示 中的 h_(v)\mathbf{h}_{v} 任务相关维度。与数据级提示类似, p_(v)\mathbf{p}_{v} 可以跨节点共享[181]或以特定于节点的方式生成。例如,ProNoG [182]基于每个节点的多跳自我网络计算提示向量,从而实现本地化和个性化的适应。
4.4.5 In-Context Learning 4.4.5 情境学习
In-Context Learning (ICL) [96] is a form of few-shot learning that enables LLMs to adapt to novel tasks without parameter updates. Instead of fine-tuning, the model is conditioned on a small set of input-label pairs-known as demonstrations-that are provided directly in the input sequence. This paradigm capitalizes on the remarkable in-context generalization capabilities of pre-trained language models, such as GPT-3 [56] and its successors. Formally, let the demonstration set be denoted as C_(K)={(q_(k),y_(k))}_(k=1)^(K)\mathcal{C}_{K}=\left\{\left(q_{k}, y_{k}\right)\right\}_{k=1}^{K}, where each tuple ( q_(k),y_(k)q_{k}, y_{k} ) represents a query and its corresponding label. Given a new query q_(v)q_{v} for a test sample vv, the foundation model ff uses the demonstration set C_(K)\mathcal{C}_{K} to generate a predicted label: 情境学习(ICL)[96]是一种少样本学习形式,使 LLM 能够在不更新参数的情况下适应新任务。该模型不是微调,而是以一小部分输入标签对(称为演示)为条件,这些对直接在输入序列中提供。这种范式利用了预训练语言模型(如 GPT-3 [56]及其后继者)卓越的上下文泛化能力。形式上,将演示集表示为 C_(K)={(q_(k),y_(k))}_(k=1)^(K)\mathcal{C}_{K}=\left\{\left(q_{k}, y_{k}\right)\right\}_{k=1}^{K} ,其中每个元组 ( q_(k),y_(k)q_{k}, y_{k} ) 表示一个查询及其相应的标签。给定测试样本的新查询 q_(v)q_{v} ,基础模型 ff 使用演示集 C_(K)\mathcal{C}_{K} 生成预测 vv 标签:
The model thus performs inference by conditioning on the examples in C_(K)\mathcal{C}_{K} as contextual guidance, without modifying its internal parameters. 因此,该模型通过对示例 C_(K)\mathcal{C}_{K} 进行条件反射作为上下文指导来执行推理,而无需修改其内部参数。
In the context of GFMs, ICL has gained attention as a promising strategy for handling text-attributed graphs, where each graph component (e.g., node, edge, or subgraph) is associated with descriptive textual information. 在 GFM 的背景下,ICL 作为一种有前途的处理文本属性图的策略而受到关注,其中每个图组件(例如,节点、边或子图)都与描述性文本信息相关联。
LLMs can process such textual attributes directly, allowing graph-based tasks to be reformulated as natural language problems solvable via prompting. However, applying ICL to TAGs introduces several challenges, most notably the construction of high-quality demonstration sets. Unlike i.i.d. samples in typical NLP settings, graph data exhibits rich structural dependencies, such as homophily, transitivity, and higher-order relations [33]. As a result, selecting representative and informative demonstrations requires careful consideration of the underlying graph structure. To address this, AskGNN [183] introduces a GNN-based retriever that selects relevant node-label pairs from the graph based on structural and semantic similarity. These selected instances are then formatted as natural language demonstrations and fed to an LLM for prediction. Similarly, retrievalaugmented generation (RAG) techniques [184] have been employed to enhance demonstration quality by dynamically retrieving the most relevant graph samples from an external memory or training set. Beyond node classification, ICL has also been explored in more complex graph tasks such as knowledge graph completion. In this setting, each sample corresponds to a triple (subject, relation, object), and ICL can be used to infer missing entities or relations [185, 186] 法学硕士可以直接处理此类文本属性,从而允许将基于图的任务重新表述为可通过提示解决的自然语言问题。然而,将 ICL 应用于 TAG 带来了一些挑战,其中最显着的是构建高质量的演示集。与典型 NLP 设置中的 i.i.d.样本不同,图数据表现出丰富的结构依赖性,如同质性、传递性和高阶关系[33]。因此,选择具有代表性和信息丰富的演示需要仔细考虑底层图结构。为了解决这个问题,AskGNN [183]引入了一种基于 GNN 的检索器,该检索器根据结构和语义相似性从图中选择相关的节点-标签对。然后,这些选定的实例被格式化为自然语言演示,并提供给 LLM 进行预测。同样,已采用检索生成(RAG)技术[184],通过从外部存储器或训练集中动态检索最相关的图样本来提高演示质量。除了节点分类之外,ICL 还被探索在更复杂的图任务中,例如知识图谱补全。在这种情况下,每个样本对应于一个三元组(主体、关系、客体),ICL 可用于推断缺失的实体或关系[185,186]
4.4.6 Prototype Learning 4.4.6 原型学习
Prototype learning [187] is a classification paradigm that represents each class by a prototype vector in the embedding space and classifies instances based on proximity to these prototypes [188]. Unlike traditional approaches that rely on dedicated classifiers (e.g., fully connected layers or softmax heads), prototype learning performs classification by comparing instance representations with class prototypes, offering a more interpretable and often parameter-efficient alternative. In the context of GFMs, prototype learning has gained traction due to its compatibility with both node-level and graph-level tasks. Given a learned representation for a node or subgraph, the model assigns a class label corresponding to the closest prototype in embedding space [22]. Formally, the predicted label for an instance vv with representation h_(v)\mathbf{h}_{v} is computed as: 原型学习[187]是一种分类范式,它通过嵌入空间中的原型向量表示每个类,并根据与这些原型的接近程度对实例进行分类[188]。与依赖专用分类器(例如全连接层或 softmax 头)的传统方法不同,原型学习通过将实例表示与类原型进行比较来执行分类,从而提供了一种更具可解释性且通常具有参数效率的替代方案。在 GFM 的背景下,原型学习因其与节点级和图级任务的兼容性而受到关注。给定节点或子图的学习表示,模型会分配一个类标签,对应于嵌入空间中最近的原型[22]。形式上, vv 具有表示的实例的预测标签计算 h_(v)\mathbf{h}_{v} 如下:
where h_(c)\mathbf{h}_{c} denotes the prototype for class cc and dist(*,*)\operatorname{dist}(\cdot, \cdot) is a distance metric, typically Euclidean or cosine distance. Prototype learning methods can be broadly categorized into two classes based on how the prototypes are generated: (i) prototypes derived from node representations, and (ii) prototypes learned via additional class nodes from external resources. 其中 h_(c)\mathbf{h}_{c} 表示类 cc 的原型, dist(*,*)\operatorname{dist}(\cdot, \cdot) 是距离度量,通常是欧几里得或余弦距离。根据原型的生成方式,原型学习方法大致可分为两类:(i) 从节点表示派生的原型,以及 (ii) 通过外部资源的附加类节点学习的原型。
Prototypes from Node Representations. A common and intuitive strategy is to compute class prototypes by averaging the representations of labeled nodes in the training set [23]. These prototypes serve as centroids that capture the semantic distribution of each class in the embedding space. Specifically, for class cc, the prototype widetilde(h)_(c)\widetilde{\mathbf{h}}_{c} is calculated as: widetilde(h)_(c)=Mean({h_(v)∣y_(v)=c,v inV_(l)})\widetilde{\mathbf{h}}_{c}=\operatorname{Mean}\left(\left\{\mathbf{h}_{v} \mid y_{v}=c, v \in \mathcal{V}_{l}\right\}\right), where V_(l)\mathcal{V}_{l} denotes the set of labeled nodes and h_(v)\mathbf{h}_{v} is the representation of node vv obtained from a GFM encoder. This method is entirely parameter-free and leverages the inductive bias of neighborhood aggregation in GNNs. It has been adopted in recent works for few-shot or prompt-based graph learning [181, 189, 182, 190]. Despite its simplicity, it often achieves strong performance, especially when the embeddings are well-separated in the latent space. 来自节点表示的原型。一种常见且直观的策略是通过对训练集中标记节点的表示进行平均来计算类原型[23]。这些原型充当质心,捕获嵌入空间中每个类的语义分布。具体来说,对于类 cc ,原型计算 widetilde(h)_(c)\widetilde{\mathbf{h}}_{c} 为: widetilde(h)_(c)=Mean({h_(v)∣y_(v)=c,v inV_(l)})\widetilde{\mathbf{h}}_{c}=\operatorname{Mean}\left(\left\{\mathbf{h}_{v} \mid y_{v}=c, v \in \mathcal{V}_{l}\right\}\right) ,其中 V_(l)\mathcal{V}_{l} 表示标记节点的集合, h_(v)\mathbf{h}_{v} 是从 GFM 编码器 vv 获得的节点的表示。该方法完全无参数,并利用 GNN 中邻域聚合的归纳偏差。它已被用于少样本或基于提示的图学习[181,189,182,190]。尽管它很简单,但它通常能够实现强大的性能,特别是当嵌入在潜在空间中分离良好时。
Prototypes from Extra Class Nodes. An alternative approach models class prototypes explicitly as learnable nodes within an auxiliary graph. This method constructs a bipartite graph G_(g)=(V_(g),E_(g))\mathcal{G}_{g}=\left(\mathcal{V}_{g}, \mathcal{E}_{g}\right), where the node set V_(g)\mathcal{V}_{g} is composed of data nodes V_(d)\mathcal{V}_{d} and class nodes V_(c)={c_(1),c_(2),dots,c_(C)}\mathcal{V}_{c}=\left\{c_{1}, c_{2}, \ldots, c_{C}\right\}. Each data node in V_(d)\mathcal{V}_{d} corresponds to a labeled instance (e.g., a node or graph), while each class node represents a distinct class label. Edges are typically constructed between every data node and every class node to facilitate cross-node interaction. The class prototype hat(h)_(c)\hat{\mathbf{h}}_{c} is then defined as the learned embedding of class node cc after message passing on the graph. Such prototype construction has been employed in recent works [108, 22, 23], where LLMs are integrated into a graph prompt framework with class nodes. 来自额外类节点的原型。另一种方法是将类原型显式建模为辅助图中的可学习节点。该方法构造一个二分图 G_(g)=(V_(g),E_(g))\mathcal{G}_{g}=\left(\mathcal{V}_{g}, \mathcal{E}_{g}\right) ,其中节点集 V_(g)\mathcal{V}_{g} 由数据节点 V_(d)\mathcal{V}_{d} 和类节点组成 V_(c)={c_(1),c_(2),dots,c_(C)}\mathcal{V}_{c}=\left\{c_{1}, c_{2}, \ldots, c_{C}\right\} 。中的 V_(d)\mathcal{V}_{d} 每个数据节点对应一个标记的实例(例如,节点或图),而每个类节点代表一个不同的类标签。边缘通常在每个数据节点和每个类节点之间构建,以促进跨节点交互。然后 hat(h)_(c)\hat{\mathbf{h}}_{c} ,类原型被定义为在图上传递消息 cc 后类节点的学习嵌入。这种原型构建已在最近的工作中得到采用[108,22,23],其中 LLM 被集成到具有类节点的图提示框架中。
4.4.7 Discussion 4.4.7 讨论
The diverse adaptation strategies for GFMs reflect a rich design space tailored to different downstream scenarios, data regimes, and computational constraints. Each method offers distinct benefits and trade-offs in terms of adaptability, scalability, and task performance. Transfer learning and distillation are well-suited for static environments with available training labels; test-time adaptation and prompting enable flexible inference in GFM 的多样化适配策略反映了针对不同下游场景、数据状态和计算约束量身定制的丰富设计空间。每种方法在适应性、可扩展性和任务性能方面都提供了不同的优势和权衡。迁移学习和蒸馏非常适合具有可用训练标签的静态环境;测试时适配和提示可实现灵活的推理
dynamic or low-resource settings; in-context learning excels in zero-shot generalization; and prototype learning provides interpretable, few-shot-friendly classification. Here is the detailed discussion. 动态或低资源设置;情境学习擅长零样本泛化;原型学习提供可解释的、少数镜头友好的分类。这是详细的讨论。
Transfer Learning is arguably the most conventional yet effective strategy for adapting GFMs. By initializing from a pre-trained model, it significantly reduces the need for large labeled datasets and accelerates convergence. Full fine-tuning provides maximum flexibility, enabling the model to specialize for taskspecific patterns. However, it often requires extensive computational resources and risks overfitting on small target datasets. Adaptive fine-tuning and parameter-efficient fine-tuning mitigate these issues by limiting the number of trainable parameters, though they may underperform when task distributions deviate significantly from the pre-training regime. 迁移学习可以说是适应 GFM 的最传统但最有效的策略。通过从预训练模型初始化,它显着减少了对大型标记数据集的需求并加速了收敛。完全微调提供了最大的灵活性,使模型能够专门针对特定于任务的模式。然而,它通常需要大量的计算资源,并且存在在小型目标数据集上过度拟合的风险。自适应微调和参数高效微调通过限制可训练参数的数量来缓解这些问题,尽管当任务分布明显偏离预训练制度时,它们的性能可能会不佳。
Distillation offers a compelling path for compressing and deploying GFMs in resource-constrained environments. By transferring knowledge from a large teacher model to a smaller student, it balances performance and efficiency. Moreover, distillation enables deployment in latency-sensitive applications such as edge devices or mobile platforms. The main limitation lies in the quality of the distilled knowledge-student models may fail to fully capture the nuanced relational patterns encoded by the teacher, especially when the structural complexity of graph data is high. 蒸馏为在资源受限的环境中压缩和部署 GFM 提供了一条引人注目的途径。通过将知识从大型教师模型转移到较小的学生,它平衡了绩效和效率。此外,蒸馏可以部署在边缘设备或移动平台等延迟敏感型应用程序中。主要局限性在于提炼知识的质量——学生模型可能无法完全捕捉教师编码的细微差别的关系模式,尤其是在图数据结构复杂度较高的情况下。
Test-Time Adaptation addresses the challenge of domain shift without requiring access to source data or retraining. It enables models to dynamically adapt to new distributions, making it well-suited for continual and online learning scenarios. However, TTA is inherently limited by the lack of ground-truth supervision during inference and may suffer from unstable updates if the self-supervised signals are noisy or uninformative. Additionally, TTA assumes that test data arrives in a sequential or batch-wise fashion, which may not align with all practical settings. 测试时自适应解决了域转移的挑战,而无需访问源数据或重新训练。它使模型能够动态适应新的分布,使其非常适合持续和在线学习场景。然而,TTA 本质上受到推理过程中缺乏地面实况监督的限制,如果自监督信号嘈杂或信息不足,则可能会遭受不稳定的更新。此外,TTA 假设测试数据以顺序或批量方式到达,这可能与所有实际设置不一致。
Graph Prompting is an efficient alternative to parameter tuning, leveraging learned prompt vectors to guide frozen GFMs toward specific tasks. Its main advantage lies in its modularity-prompts can be reused, swapped, or composed without modifying the base model. Data-level prompting and representation-level prompting both offer flexible mechanisms to influence model behavior. However, prompting often relies on careful prompt engineering or tuning, and performance may plateau if the model lacks sufficient task alignment. Moreover, prompts may not generalize well across tasks or domains without re-optimization. 图提示是参数调整的有效替代方案,利用学习到的提示向量来引导冻结的 GFM 完成特定任务。它的主要优点在于其模块化——提示可以重用、交换或组合,而无需修改基础模型。数据级提示和表示级提示都提供了影响模型行为的灵活机制。然而,提示通常依赖于仔细的提示工程或调整,如果模型缺乏足够的任务对齐,性能可能会趋于稳定。此外,如果不重新优化,提示可能无法很好地跨任务或域进行泛化。
In-Context Learning enables zero-shot or few-shot adaptation without any gradient updates, making it ideal for quick deployment in low-resource or dynamic environments. When applied to text-attributed graphs, ICL allows LLMs to infer over graph-structured data by conditioning on task demonstrations. This approach eliminates the need for fine-tuning and benefits from the generative capacity of LLMs. However, the effectiveness of ICL hinges on the quality and relevance of demonstrations. Constructing demonstration sets for graphs is particularly challenging due to inter-instance dependencies, and poor demonstration selection can lead to significant performance degradation. 上下文学习无需任何梯度更新即可实现零样本或少样本适应,非常适合在资源匮乏或动态环境中快速部署。当应用于文本属性的图时,ICL 允许 LLM 通过以任务演示为条件来推断图结构数据。这种方法消除了微调的需要,并受益于法学硕士的生成能力。然而,ICL 的有效性取决于演示的质量和相关性。由于实例间依赖关系,为图构建演示集特别具有挑战性,并且不良的演示选择可能会导致性能显着下降。
Prototype Learning introduces a parameter-efficient and interpretable classification framework. It is particularly appealing for few-shot learning, where class prototypes derived from limited labeled data provide strong generalization. Methods based on averaging node representations are simple and effective, while graph-based class node construction captures richer semantics. Nonetheless, prototype learning assumes that class clusters are well-formed in the embedding space, which may not hold in noisy or heterophilic graphs. Furthermore, its reliance on distance metrics may overlook complex decision boundaries that could be better captured by discriminative classifiers. 原型学习引入了一个参数高效且可解释的分类框架。它对于少量学习特别有吸引力,其中从有限的标记数据中得出的类原型提供了很强的泛化性。基于平均节点表示的方法简单有效,而基于图的类节点构建则捕获了更丰富的语义。尽管如此,原型学习假设类簇在嵌入空间中格式良好,这在嘈杂或异性恋图中可能不成立。此外,它对距离指标的依赖可能会忽略复杂的决策边界,而这些边界可以被判别分类器更好地捕获。
5 Universal Graph Foundation Models 5 个通用图基础模型
5.1 Design Principle 5.1 设计原则
Graph foundation models are designed to operate across diverse domains and tasks, adapting to various graph structures and distributions. This vision parallels the role of LLMs in natural language processing and VLMs in computer vision. By leveraging insights from existing foundation models, we can formulate principles that guide the development of GFMs capable of addressing cross-domain and cross-task challenges. This section delineates the core characteristics and design principles essential for constructing such models. 图基础模型旨在跨不同的领域和任务运行,适应各种图结构和分布。这一愿景与法学硕士在自然语言处理中的作用和 VLM 在计算机视觉中的作用相似。通过利用现有基础模型的见解,我们可以制定指导能够解决跨领域和跨任务挑战的 GFM 开发的原则。本节描述了构建此类模型所必需的核心特征和设计原则。
Handling Heterogeneous Graph Distributions: Graphs originating from various domains-such as molecular structures, social networks, and financial transaction systems-exhibit significant variability in size, connectivity, node attributes, density, and associated tasks. A GFM must generalize across such heterogeneous distributions with minimal domain-specific retraining. Analogous to LLMs, which leverage next-token prediction to extract semantic representations from extensive text corpora, GFMs require specifically designed self-supervised pretraining objectives to facilitate learning from large-scale, multi-domain graph datasets. The goal is to embed diverse graph structures into a unified latent space that enables meaningful cross-domain knowledge transfer. 处理异构图分布:源自不同领域(例如分子结构、社交网络和金融交易系统)的图在大小、连通性、节点属性、密度和相关任务方面表现出显着的可变性。GFM 必须在此类异构分布之间进行泛化,并具有最少的特定领域的再训练。与 LLM 类似,LLM 利用下一个标记预测从广泛的文本语料库中提取语义表示,GFM 需要专门设计的自监督预训练目标,以促进从大规模、多域图数据集中学习。目标是将不同的图结构嵌入到一个统一的潜在空间中,从而实现有意义的跨领域知识转移。
Addressing Task Conflicts: Graph-based tasks often involve competing objectives. For instance, node classification requires an understanding of local adjacency structures (e.g., homophily and heterophily), whereas graph classification focuses on higher-order motifs and structural patterns. LLMs reconcile such conflicts by framing all language-related tasks within a unified objective, question-answering. Similarly, GFMs must establish a shared inductive bias that harmonizes task representations across graphs. Multi-task learning techniques can further aid in balancing divergent learning objectives, ensuring that the model remains effective across a broad spectrum of graph-based tasks. 解决任务冲突:基于图形的任务通常涉及相互竞争的目标。例如,节点分类需要了解局部邻接结构(例如,同质和异质),而图分类则侧重于高阶基序和结构模式。法学硕士通过将所有与语言相关的任务构建在一个统一的目标(问答)中来调和此类冲突。同样,GFM 必须建立一个共享的归纳偏差,以协调跨图的任务表示。多任务学习技术可以进一步帮助平衡不同的学习目标,确保模型在广泛的基于图形的任务中保持有效。
Facilitating Positive Transfer: Despite domain shifts, certain high-level graph properties-such as topology patterns and homophily - exhibit consistency across diverse datasets. A well-designed GFM should learn a shared latent space that aligns graphs from different domains while preserving task-specific nuances. The challenge lies in mitigating discrepancies between tasks that possess distinct inductive biases while promoting knowledge transfer [36,22,25,81]. Achieving this requires carefully balancing pretraining objectives with downstream adaptations, employing task-aware mechanisms to align model representations. Effective pretraining strategies should not only capture comprehensive graph semantics but also maintain adaptability, ensuring seamless alignment between pretraining and downstream tasks. 促进正向转移:尽管存在域偏移,但某些高级图属性(例如拓扑模式和同质性)在不同数据集中表现出一致性。一个设计良好的 GFM 应该学习一个共享的潜在空间,以对齐来自不同领域的图形,同时保留特定于任务的细微差别。挑战在于在促进知识转移的同时减轻具有明显归纳偏差的任务之间的差异[36,22,25,81]。实现这一目标需要仔细平衡预训练目标与下游适应,采用任务感知机制来调整模型表示。有效的预训练策略不仅应该捕获全面的图语义,还应该保持适应性,确保预训练和下游任务之间的无缝对齐。
In the following sections, we examine existing universal GFMs through the lens of their approaches to addressing the three core challenges outlined above. Specifically, we categorize these methods across three fundamental levels: (1) model-level, which focuses on model unification strategies to enhance transferability across tasks and domains; (2) pretrain-level, which explores techniques for domain alignment during pretraining to enable cross-domain generalization; and (3) adaptation-level, which investigates downstream task adaptation mechanisms that facilitate efficient fine-tuning and transfer learning. 在以下部分中,我们将通过解决上述三个核心挑战的方法来研究现有的通用 GFM。具体来说,我们将这些方法分为三个基本层面:(1)模型级,侧重于模型统一策略,以增强跨任务和跨领域的可转移性;(2)预训练级别,探索预训练期间的域对齐技术,以实现跨域泛化;(3)适应层面,研究促进高效微调和迁移学习的下游任务适应机制。
Graph model-based universal GFMs primarily follow a pretrain-then-adapt paradigm. Given a large-scale, diverse graph database spanning multiple domains and tasks, the goal is to pretrain a graph encoder that captures transferable structural and semantic patterns, which can then be adapted to unseen graphs for downstream tasks. We summarize existing works based on the backbones, pretraining methods, adaptation strategies, as well as the techniques used in handling feature, structure, and task heterogeneity in Table 2. 基于图模型的通用 GFM 主要遵循预训练然后适应范式。给定一个跨越多个领域和任务的大规模、多样化的图数据库,目标是预训练一个图编码器,该编码器捕获可转移的结构和语义模式,然后可以将其适应下游任务的看不见的图。我们根据表 2 中的骨干、预训练方法、适应策略以及处理特征、结构和任务异构性的技术总结了现有工作。
5.2.1 Model Unification 5.2.1 模型统一
A core challenge in building universal GFMs lies in designing GNN encoders capable of generalizing across tasks, domains, and graph topologies. Existing approaches toward model unification can be broadly categorized into two paradigms: (1) explicit unification through task and input reformulation, and (2) implicit unification via architectural generalization and invariance enforcement. 构建通用 GFM 的核心挑战在于设计能够跨任务、域和图拓扑泛化的 GNN 编码器。现有的模型统一方法大致可分为两种范式:(1)通过任务和输入重新表述进行显式统一,以及(2)通过架构泛化和不变性强制执行进行隐式统一。
Method Name Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
OpenGraph [63] GNN Supervised Finetune Data - SVD N/A Explicit - Subgraph Link
OFA [22] GNN Supervised Graph Prompting, Prototype Data - Text Attribute N/A Explicit - Subgraph Link
MetaFP [191] GNN Supervised Distillation Model - Projection N/A N/A -
SCORE [192] GNN Supervised Test-time Adaptation Data - Text Attribute, Model - Projection Data - Augment, Data Synthesize Explicit - Link -
STAGE [193] GNN Supervised Test-time Adaptation Data - Others N/A N/A -
HoloGNN [77] GNN Contrastive Finetune N/A N/A Implicit - Regularizer -
FoToM [194] GNN Contrastive Finetune N/A Loss - Pretrain N/A Link
EGI [195] GNN Contrastive Finetune N/A N/A Explicit - Subgraph Link
GIT [76] GNN Contrastive Finetune, Prototype Data - Text Attribute N/A Explicit - Tree Link
RiemannGFM [196] GNN Contrastive Finetune N/A Model - Tangent Space Implicit - Codebook Link
BooG [197] GNN Contrastive Finetune, Prototype Data - Text Attribute Data - Augment Explicit - Subgraph Link
UniGLM [198] GNN Contrastive Finetune, Test-time Adaptation Data - Text Attribute N/A N/A -
AnyGraph [64] GNN Contrastive Finetune, Graph Prompting Data - SVD, Model MoE Data - Augment Explicit - Subgraph Link
GCC [199] GNN Contrastive Finetune N/A N/A Explicit - Subgraph Link
GraphAlign [200] GNN Contrastive Finetune Data - Text Attribute N/A Implicit - MoE Link
HGPROMPT [190] GNN Contrastive Graph Prompting Model - Projection N/A Explicit - Subgraph Link
OMOG [201] GNN Contrastive Graph Prompting Data - Text Attribute Model - MoE N/A -
SAMGPT [65] GNN Contrastive Graph Prompting Model - Projection Model - Prompt Learning Explicit - Subgraph -
Prodigy [108] GNN Contrastive Graph Prompting N/A Loss - Multi-task Explicit - Subgraph Link
PGT [202] GNN Generative Finetune N/A N/A N/A -
UniGraph [75] GNN Generative Finetune, In-context, Prototype Data - Text Attribute Loss - Pretrain Explicit - Subgraph Link
UniGraph 2 [203] GNN Generative Finetune Data - Text Attribute Loss - Pretrain Explicit - Subgraph, Implicit - MoE Link
UniAug [70] GNN Generative Test-time Adaptation Data - Node Property Data - Augment N/A Link
All in One [173] GNN Hybrid Finetune, Prototype Data - SVD Model - Meta Learning Explicit - Subgraph Link
PatchNet [204] GNN Hybrid Finetune Model - Projection N/A Implicit - Augment Link
AUX-TS [152] GNN Hybrid Finetune N/A N/A Implicit - Aux. Loss Link
GFT [23] GNN Hybrid Finetune, Prototype Data - Text Attribute Loss - Pretrain, Model Codebook Explicit - Tree Link
ProNoG [182] GNN Hybrid Graph Prompting N/A Model - Codebook Explicit - Subgraph -
MultiGPrompt [205] GNN Hybrid Graph Prompting Model - Projection Loss - Pretrain Explicit - Subgraph Link
GraphPrompt [181] GNN Hybrid Graph Prompting N/A N/A Explicit - Subgraph Link
RAGraph [206] GNN Hybrid Graph Prompting, In-context N/A Model - Codebook N/A Link
IGAP [71] GNN Hybrid Graph Prompting Model - Projection Reg Explicit - Link -| Method Name | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| OpenGraph [63] | GNN | Supervised | Finetune | Data - SVD | N/A | Explicit - Subgraph | Link |
| OFA [22] | GNN | Supervised | Graph Prompting, Prototype | Data - Text Attribute | N/A | Explicit - Subgraph | Link |
| MetaFP [191] | GNN | Supervised | Distillation | Model - Projection | N/A | N/A | - |
| SCORE [192] | GNN | Supervised | Test-time Adaptation | Data - Text Attribute, Model - Projection | Data - Augment, Data Synthesize | Explicit - Link | - |
| STAGE [193] | GNN | Supervised | Test-time Adaptation | Data - Others | N/A | N/A | - |
| HoloGNN [77] | GNN | Contrastive | Finetune | N/A | N/A | Implicit - Regularizer | - |
| FoToM [194] | GNN | Contrastive | Finetune | N/A | Loss - Pretrain | N/A | Link |
| EGI [195] | GNN | Contrastive | Finetune | N/A | N/A | Explicit - Subgraph | Link |
| GIT [76] | GNN | Contrastive | Finetune, Prototype | Data - Text Attribute | N/A | Explicit - Tree | Link |
| RiemannGFM [196] | GNN | Contrastive | Finetune | N/A | Model - Tangent Space | Implicit - Codebook | Link |
| BooG [197] | GNN | Contrastive | Finetune, Prototype | Data - Text Attribute | Data - Augment | Explicit - Subgraph | Link |
| UniGLM [198] | GNN | Contrastive | Finetune, Test-time Adaptation | Data - Text Attribute | N/A | N/A | - |
| AnyGraph [64] | GNN | Contrastive | Finetune, Graph Prompting | Data - SVD, Model MoE | Data - Augment | Explicit - Subgraph | Link |
| GCC [199] | GNN | Contrastive | Finetune | N/A | N/A | Explicit - Subgraph | Link |
| GraphAlign [200] | GNN | Contrastive | Finetune | Data - Text Attribute | N/A | Implicit - MoE | Link |
| HGPROMPT [190] | GNN | Contrastive | Graph Prompting | Model - Projection | N/A | Explicit - Subgraph | Link |
| OMOG [201] | GNN | Contrastive | Graph Prompting | Data - Text Attribute | Model - MoE | N/A | - |
| SAMGPT [65] | GNN | Contrastive | Graph Prompting | Model - Projection | Model - Prompt Learning | Explicit - Subgraph | - |
| Prodigy [108] | GNN | Contrastive | Graph Prompting | N/A | Loss - Multi-task | Explicit - Subgraph | Link |
| PGT [202] | GNN | Generative | Finetune | N/A | N/A | N/A | - |
| UniGraph [75] | GNN | Generative | Finetune, In-context, Prototype | Data - Text Attribute | Loss - Pretrain | Explicit - Subgraph | Link |
| UniGraph 2 [203] | GNN | Generative | Finetune | Data - Text Attribute | Loss - Pretrain | Explicit - Subgraph, Implicit - MoE | Link |
| UniAug [70] | GNN | Generative | Test-time Adaptation | Data - Node Property | Data - Augment | N/A | Link |
| All in One [173] | GNN | Hybrid | Finetune, Prototype | Data - SVD | Model - Meta Learning | Explicit - Subgraph | Link |
| PatchNet [204] | GNN | Hybrid | Finetune | Model - Projection | N/A | Implicit - Augment | Link |
| AUX-TS [152] | GNN | Hybrid | Finetune | N/A | N/A | Implicit - Aux. Loss | Link |
| GFT [23] | GNN | Hybrid | Finetune, Prototype | Data - Text Attribute | Loss - Pretrain, Model Codebook | Explicit - Tree | Link |
| ProNoG [182] | GNN | Hybrid | Graph Prompting | N/A | Model - Codebook | Explicit - Subgraph | - |
| MultiGPrompt [205] | GNN | Hybrid | Graph Prompting | Model - Projection | Loss - Pretrain | Explicit - Subgraph | Link |
| GraphPrompt [181] | GNN | Hybrid | Graph Prompting | N/A | N/A | Explicit - Subgraph | Link |
| RAGraph [206] | GNN | Hybrid | Graph Prompting, In-context | N/A | Model - Codebook | N/A | Link |
| IGAP [71] | GNN | Hybrid | Graph Prompting | Model - Projection | Reg | Explicit - Link | - |
Explicit Unification. Explicit unification approaches seek to transform diverse graph-based tasks into a unified prediction format, facilitating a shared pretraining and inference pipeline. Based on task granularity, we identify three major categories: link-level, subgraph-level, and tree-level unification. 明确统一。显式统一方法旨在将不同的基于图的任务转换为统一的预测格式,从而促进共享的预训练和推理管道。基于任务粒度,我们确定了三大类:链接级、子图级和树级统一。
Link-Level Unification. One strategy frames all graph-based tasks as link prediction problems by introducing class nodes into the graph and predicting links between these nodes and task-relevant nodes. Unlike traditional architectures that append a classifier atop node embeddings, these models cast classification as a link prediction task: 链路级统一。一种策略通过将类节点引入图并预测这些节点与任务相关节点之间的链接,将所有基于图的任务视为链接预测问题。与在节点嵌入之上附加分类器的传统架构不同,这些模型将分类转换为链接预测任务:
where h_(v_(i))\mathbf{h}_{v_{i}} and h_(c_(k))\mathbf{h}_{c_{k}} denote the embeddings of the target node and class node, respectively, and sigma\sigma is a scoring function such as softmax or sigmoid. GPPT [74] pioneered this approach by combining masked edge prediction and prompting techniques. Subsequent methods [71, 182] extend this framework with prompt tokens for enhanced task alignment. While elegant and general, this formulation may overlook fine-grained substructures essential for certain prediction tasks. 其中 h_(v_(i))\mathbf{h}_{v_{i}} 和 分别 h_(c_(k))\mathbf{h}_{c_{k}} 表示目标节点和类节点的嵌入, sigma\sigma 并且是评分函数,例如 softmax 或 sigmoid。GPPT [74]通过结合掩膜边缘预测和提示技术,开创了这种方法。后续方法[71,182]通过提示标记扩展了该框架,以增强任务对齐。虽然优雅而笼统,但这种表述可能会忽略某些预测任务所必需的细粒度子结构。
Subgraph-Level Unification. To incorporate local structural context, a second category formulates graph tasks at the subgraph level [199, 181, 173, 108, 22, 75]. This two-stage process first extracts ego-graphs centered around target nodes and then applies a GNN encoder to each subgraph. For instance, in node classification, an ego-graph is constructed around each node, where the label of the induced subgraph corresponds to the label of the central node. The same principle extends to edge- and graph-level tasks, thereby unifying diverse graph tasks at the subgraph level. This is formulated as: 子图级统一。为了纳入局部结构上下文,第二类在子图级别制定图任务 [199, 181, 173, 108, 22, 75]。这个两阶段过程首先提取以目标节点为中心的自我图,然后将 GNN 编码器应用于每个子图。例如,在节点分类中,围绕每个节点构建一个自我图,其中诱导子图的标签对应于中心节点的标签。同样的原理也延伸到边缘和图级任务,从而在子图级别统一不同的图任务。这被表述为:
where G_(v_(i))\mathcal{G}_{v_{i}} is the ego-subgraph around node v_(i)v_{i} within radius rr. The resulting subgraph embedding encoded via GNN f_("GNN ")f_{\text {GNN }} is passed to a classifier f_("Classifier ")f_{\text {Classifier }}, which can be a linear head [199] or a class-node-based scoring function [181, 173]. 其中 G_(v_(i))\mathcal{G}_{v_{i}} 是半径 v_(i)v_{i} 内节点周围的自我子图 rr 。通过 GNN f_("GNN ")f_{\text {GNN }} 编码的生成子图嵌入被传递给分类器 f_("Classifier ")f_{\text {Classifier }} ,分类器可以是线性头[199]或基于类节点的评分函数[181,173]。
Early works, such as GCC [199], adopt contrastive pretraining to capture structural patterns across multidomain graphs. This paradigm is further extended by GraphPrompt [181], Prodigy [108], and All in One [173], which introduce prompt learning techniques to enhance alignment between pretraining and downstream tasks. OFA [22] and UniGraph [75] generalize this approach to cross-domain scenarios. This paradigm supports a unified framework for node, edge, and graph-level tasks and has shown strong empirical performance in domain adaptation and transfer learning. However, subgraph-level unification suffers from two major limitations: (1) subgraph extraction introduces substantial computational overhead, increasing both time and memory costs, and (2) message-passing GNNs may struggle to capture essential substructures, leading to suboptimal representation learning [207, 208, 209, 210]. 早期的工作,如 GCC [199],采用对比预训练来捕获多域图的结构模式。GraphPrompt [181]、Prodigy [108] 和 All in One [173] 进一步扩展了这一范式,它们引入了提示学习技术来增强预训练和下游任务之间的一致性。OFA [22] 和 UniGraph [75] 将这种方法推广到跨域场景。该范式支持节点级、边缘级和图级任务的统一框架,在领域自适应和迁移学习方面表现出强大的实证性能。然而,子图级统一存在两个主要局限性:(1)子图提取引入了大量的计算开销,增加了时间和内存成本,以及(2)消息传递 GNN 可能难以捕获基本子结构,导致表示学习不理想[207,208,209,210]。
Tree-Level Unification. A more recent and efficient alternative introduces virtual nodes that connect to task-relevant nodes, eliminating the need for subgraph extraction. These virtual nodes serve as surrogates for tree-structured representations, with their embeddings used for downstream predictions. For example, in node classification, virtual nodes are linked to all original nodes, and their embeddings, learned via message passing, serve as the final representations: 树级统一。一种更新且更有效的替代方案引入了连接到任务相关节点的虚拟节点,无需提取子图。这些虚拟节点充当树结构表示的代理,其嵌入用于下游预测。例如,在节点分类中,虚拟节点链接到所有原始节点,通过消息传递学习到它们的嵌入作为最终表示:
where G^(+)\mathcal{G}^{+}augments the original graph with a virtual node vv connected to all task-relevant nodes, f_(GNN)(G^(+))_([v])f_{\mathrm{GNN}}\left(\mathcal{G}^{+}\right)_{[\mathrm{v}]} indicates the embedding of the virtual nodes, and V_("task ")\mathcal{V}_{\text {task }} is the set of task-relevant nodes (e.g., all nodes for node classification). The GNN encoder computes the embedding of vv, which reflects aggregated information from the graph. This design significantly improves efficiency while preserving representation capacity. GFT [23] pioneered this approach, demonstrating both empirical and theoretical evidence that tree similarity correlates with improved task transferability. GIT [76] further formalized the stability, transferability, and generalization of tree-based representations from a theoretical perspective. 其中 G^(+)\mathcal{G}^{+} 用 vv 连接到所有任务相关节点的虚拟节点来扩充原始图, f_(GNN)(G^(+))_([v])f_{\mathrm{GNN}}\left(\mathcal{G}^{+}\right)_{[\mathrm{v}]} 表示虚拟节点的嵌入,并且 V_("task ")\mathcal{V}_{\text {task }} 是任务相关节点的集合(例如,用于节点分类的所有节点)。GNN 编码器计算 vv 的嵌入,它反映了图中的聚合信息。这种设计显着提高了效率,同时保留了表示能力。GFT [23]开创了这种方法,证明了树相似性与任务可转移性提高相关的经验和理论证据。GIT [76]从理论角度进一步形式化了基于树的表示的稳定性、可转移性和泛化。
Implicit Unification. While explicit task reformulation provides a principled way to unify graph tasks, an alternative line of research focuses on implicit architectural unification-modifying the underlying model design to enable generalization across diverse tasks and domains without altering the task definitions themselves. One representative example is HoloGNN [77], which addresses the limitation that conventional GNN architectures are often hardwired for specific task types, such as node-level or link-level classification. These models may fail to generalize across tasks that involve different permutation symmetries or structural granularity. To overcome this, HoloGNN introduces expansion and reduction maps that explicitly model node-permutation symmetries. By decomposing the input graph into permutation-invariant components and then reconstructing task-specific views, HoloGNN enables a single architecture to adapt flexibly across varied learning tasks. SCORE [192] proposes relation graph framework: Rather than operating directly on input graphs, SCORE constructs a semantic interaction graph that encodes cross-domain entity relationships. By integrating semantic-conditioned message passing, the model dynamically adapts to domain-specific patterns while preserving shared structural and semantic invariants. AnyGraph [64] focuses on enhancing model expressiveness through a mixture-ofexperts architecture combined with high-order structural injection. By introducing high-order connectivity patterns and gating mechanisms, AnyGraph captures both local and non-local interactions while maintaining modularity and scalability. OpenGraph [63] tackles graph heterogeneity from a data representation perspective. It introduces a topology-aware tokenizer that converts variable-sized graph structures (e.g., adjacency matrices) into fixed-length sequences suitable for Transformer-based encoders. This tokenizer preserves key topological properties while allowing foundation models to operate uniformly across graphs of different sizes and shapes. Together, these implicit unification strategies demonstrate that architecture design, independent of task reformulation, plays a pivotal role in building transferable and robust GFMs. As graph data continues to grow in complexity and diversity, the interplay between explicit and implicit unification mechanisms will be essential for the development of scalable, general-purpose graph models. 隐式统一。虽然显式任务重新表述提供了一种统一图任务的原则性方法,但另一种研究方向侧重于隐式架构统一——修改底层模型设计,以便在不改变任务定义本身的情况下实现跨不同任务和领域的泛化。一个代表性的例子是 HoloGNN [77],它解决了传统 GNN 架构通常针对特定任务类型(如节点级或链路级分类)进行硬连线的限制。这些模型可能无法跨涉及不同排列对称性或结构粒度的任务进行推广。为了克服这个问题,HoloGNN 引入了扩展和缩减映射,这些映射显式模拟了节点排列对称性。通过将输入图分解为排列不变的组件,然后重建特定于任务的视图,HoloGNN 使单一架构能够灵活地适应不同的学习任务。SCORE [192]提出了关系图框架:SCORE 不是直接对输入图进行作,而是构建了一个对跨域实体关系进行编码的语义交互图。通过集成语义条件消息传递,该模型动态适应特定领域的模式,同时保留共享的结构和语义不变量。AnyGraph [64]专注于通过混合专家架构与高阶结构注入相结合来增强模型的表现力。通过引入高阶连接模式和门控机制,AnyGraph 可以捕获本地和非本地交互,同时保持模块化和可扩展性。OpenGraph [63]从数据表示的角度解决了图异构性问题。 它引入了一个拓扑感知分词器,可将可变大小的图结构(例如邻接矩阵)转换为适合基于 Transformer 的编码器的固定长度序列。该分词器保留了关键的拓扑属性,同时允许基础模型在不同大小和形状的图中统一运行。这些隐式统一策略共同表明,独立于任务重新表述的架构设计在构建可转移和稳健的 GFM 方面发挥着关键作用。随着图数据的复杂性和多样性不断增长,显式和隐式统一机制之间的相互作用对于开发可扩展的通用图模型至关重要。
5.2.2 Domain Alignment in Pretraining 5.2.2 预训练中的域对齐
Feature Alignment. Ensuring feature consistency across diverse graph structures is fundamental to designing GNN-based GFMs, as a single GNN inherently struggles to generalize across heterogeneous graph features [211]. Existing approaches can be broadly categorized into two paradigms: (1) textual and multimodal feature alignment and (2) model-based feature alignment. 特征对齐。确保不同图结构之间的特征一致性是设计基于 GNN 的 GFM 的基础,因为单个 GNN 本质上很难跨异构图特征进行泛化[211]。现有方法大致可分为两种范式:(1)文本和多模态特征对齐和(2)基于模型的特征对齐。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
Textual and Multimodal Feature Alignment. To unify heterogeneous graph signals, recent works project textual or multimodal attributes into a shared representation space. Given a graph G=(X,A,D)\mathcal{G}=(\mathbf{X}, \mathbf{A}, \mathbf{D}), where D={d_(i)}_(i=1)^(N)\mathbf{D}=\left\{\mathbf{d}_{i}\right\}_{i=1}^{N} denotes the set of node-level textual or multimodal descriptions, a pretrained encoder f_("enc ")f_{\text {enc }} (e.g., SentenceBERT [212], CLIP [60]) is used to map each description to a latent embedding: 文本和多模态特征对齐。为了统一异构图信号,最近的工作将文本或多模态属性投射到一个共享的表示空间中。给定一个图 G=(X,A,D)\mathcal{G}=(\mathbf{X}, \mathbf{A}, \mathbf{D}) ,其中 D={d_(i)}_(i=1)^(N)\mathbf{D}=\left\{\mathbf{d}_{i}\right\}_{i=1}^{N} 表示节点级文本或多模态描述的集合,使用预训练编码器 f_("enc ")f_{\text {enc }} (例如,SentenceBERT [212],CLIP [60])将每个描述映射到潜在嵌入:
where h_(i)inR^(d)\mathbf{h}_{i} \in \mathbb{R}^{d} represents the aligned feature vector of node v_(i)v_{i} in the shared semantic space. In this framework, d_(i)\mathbf{d}_{i} may correspond to natural language descriptions, visual content, or a combination thereof, depending on the graph modality. 其中 h_(i)inR^(d)\mathbf{h}_{i} \in \mathbb{R}^{d} ,表示节点在共享语义空间 v_(i)v_{i} 中的对齐特征向量。在这个框架中, d_(i)\mathbf{d}_{i} 可以对应于自然语言描述、视觉内容或它们的组合,具体取决于图模态。
Models such as OFA [22] utilize fixed templates and SentenceBERT to encode textual descriptions, while UniGraph [75] fine-tunes the encoder f_("enc ")f_{\text {enc }} during pretraining to improve alignment across graph domains. UniGLM [198] further adopts contrastive pretraining to train f_("enc ")f_{\text {enc }} for stronger discriminative alignment. To extend beyond single-modal representations, UniGraph2 [203] employs CLIP as a unified encoder for both text and image modalities, effectively aligning multimodal node features into a coherent embedding space. In more advanced designs such as GraphAlign [200], a mixture-of-experts model dynamically selects among several encoders using a learnable routing mechanism. OFA [22]等模型利用固定模板和 SentenceBERT 对文本描述进行编码,而 UniGraph [75]则在预训练期间微调编码器 f_("enc ")f_{\text {enc }} ,以改善跨图域的对齐。UniGLM [198]进一步采用对比预训练来训练 f_("enc ")f_{\text {enc }} 更强的判别对齐。为了超越单模态表示,UniGraph2 [203]采用 CLIP 作为文本和图像模态的统一编码器,有效地将多模态节点特征对齐到一个连贯的嵌入空间中。在 GraphAlign [200]等更高级的设计中,专家混合模型使用可学习的路由机制在多个编码器之间动态选择。
Despite their effectiveness, one practical bottleneck of multimodal feature alignment lies in the limited availability of rich textual or image-labeled graphs. To address this, TANS [211] proposes synthetic graph augmentation by leveraging LLMs to automatically generate textual descriptions from raw graph structures, enabling broader adoption of alignment-based GFMs even in non-textural settings. 尽管它们很有效,但多模态特征对齐的一个实际瓶颈在于富文本或图像标记图的可用性有限。为了解决这个问题,TANS [211]提出了通过利用 LLM 从原始图结构自动生成文本描述来进行合成图增强,即使在非纹理环境中也能更广泛地采用基于对齐的 GFM。
Model-Based Feature Alignment. Beyond preprocessing with textual or multimodal encoders, a parallel line of research focuses on architectural innovations that directly align heterogeneous node features through model design. These approaches aim to internally reconcile feature discrepancies across graphs from different domains or modalities, without relying solely on external encoders. One common strategy involves introducing a domain-specific projection function f_("proj ")f_{\text {proj }} that transforms raw node features x_(i)\mathbf{x}_{i} into a unified latent space h_(i)=f_("proj ")(x_(i)),AAv_(i)inV\mathbf{h}_{i}=f_{\text {proj }}\left(\mathbf{x}_{i}\right), \forall v_{i} \in \mathcal{V}. For instance, DARE [191] adopts model reprogramming, where f_("proj ")f_{\text {proj }} consists of lightweight input and output adapters appended to a frozen pretrained GNN. This enables effective adaptation across downstream tasks without modifying core model parameters. Such reprogramming reduces training costs and enhances parameter reuse across domains. 基于模型的特征对齐。除了使用文本或多模态编码器进行预处理之外,平行的研究还侧重于通过模型设计直接对齐异构节点特征的架构创新。这些方法旨在在内部协调来自不同领域或模态的图之间的特征差异,而无需仅仅依赖外部编码器。一种常见的策略是引入特定于域的投影函数 f_("proj ")f_{\text {proj }} ,将原始节点特征 x_(i)\mathbf{x}_{i} 转换为统一的潜在空间 h_(i)=f_("proj ")(x_(i)),AAv_(i)inV\mathbf{h}_{i}=f_{\text {proj }}\left(\mathbf{x}_{i}\right), \forall v_{i} \in \mathcal{V} 。例如,DARE [191]采用模型重编程,其中 f_("proj ")f_{\text {proj }} 由轻量级输入和输出适配器组成,附加到冻结的预训练 GNN。这样可以在不修改核心模型参数的情况下有效适应下游任务。这种重新编程降低了训练成本并增强了跨域的参数重用。
Another technique employs singular value decomposition (SVD) to orthogonalize feature spaces before further alignment. Let X=USigmaV^(TT)\mathbf{X}=\mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^{\top} be the SVD decomposition of node features. The transformed features U Sigma\mathbf{U \Sigma} serve as normalized input embeddings: 另一种技术采用奇异值分解 (SVD) 在进一步对齐之前正交特征空间。设 X=USigmaV^(TT)\mathbf{X}=\mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^{\top} 为节点特征的 SVD 分解。转换后的特征 U Sigma\mathbf{U \Sigma} 用作规范化的输入嵌入:
where f_("align ")f_{\text {align }} can incorporate learnable tokens [65, 71], LLM-based augmentation modules [63], or mixture-of-expert routers [64] to further adapt feature semantics. 可以 f_("align ")f_{\text {align }} 结合可学习的标记[65,71]、基于 LLM 的增强模块[63]或混合专家路由器[64]来进一步适应特征语义。
PatchNet [204] proposes a more compositional approach by constructing graph patches-small, learnable semantic units representing transferable substructures. Let P_(i)={G_(i)^((1)),dots,G_(i)^((K))}\mathcal{P}_{i}=\left\{\mathcal{G}_{i}^{(1)}, \ldots, \mathcal{G}_{i}^{(K)}\right\} denote the set of KK patches extracted for node v_(i)v_{i}. Each patch is encoded independently and then aggregated to form the final representation h_(i)=Aggregate({f_("patch ")(G_(i)^((k)))}_(k=1)^(K))\mathbf{h}_{i}=\operatorname{Aggregate}\left(\left\{f_{\text {patch }}\left(\mathcal{G}_{i}^{(k)}\right)\right\}_{k=1}^{K}\right), where f_("patch ")f_{\text {patch }} is a GNN-based encoder over patches, and Aggregate denotes a pooling or attention mechanism. PatchNet [204]提出了一种更具组合性的方法,通过构建图补丁——表示可转移子结构的小型、可学习的语义单元。让 P_(i)={G_(i)^((1)),dots,G_(i)^((K))}\mathcal{P}_{i}=\left\{\mathcal{G}_{i}^{(1)}, \ldots, \mathcal{G}_{i}^{(K)}\right\} 表示为节点提取的 KK 补丁集 v_(i)v_{i} 。每个补丁都独立编码,然后聚合以形成最终表示 h_(i)=Aggregate({f_("patch ")(G_(i)^((k)))}_(k=1)^(K))\mathbf{h}_{i}=\operatorname{Aggregate}\left(\left\{f_{\text {patch }}\left(\mathcal{G}_{i}^{(k)}\right)\right\}_{k=1}^{K}\right) ,其中 f_("patch ")f_{\text {patch }} 是补丁上的基于 GNN 的编码器,聚合表示池化或注意力机制。
Despite their effectiveness in aligning heterogeneous node features, model-based alignment approaches exhibit several notable limitations. A primary drawback lies in their limited generalizability to unseen graphs. Many of these methods-such as domain-specific projection layers, SVD-based transformations, or learnable tokens-are tightly coupled to the feature distribution and structural patterns of the pretraining graphs. As a result, when deployed on graphs from entirely new domains or with previously unseen feature spaces, the alignment mechanisms may fail to preserve semantic consistency or yield meaningful representations. 尽管基于模型的对齐方法在对齐异构节点特征方面很有效,但它们仍表现出一些显着的局限性。一个主要缺点在于它们对看不见的图的泛化性有限。其中许多方法(例如特定领域的投影层、基于 SVD 的转换或可学习标记)与预训练图的特征分布和结构模式紧密耦合。因此,当部署在来自全新域或具有以前未见过的特征空间的图上时,对齐机制可能无法保持语义一致性或产生有意义的表示。
Structure Alignment. Graphs originating from different domains often exhibit fundamentally distinct structural patterns. For instance, social networks are characterized by high clustering coefficients and frequent triangular motifs, while molecular graphs are dominated by small cycles and functional substructures. Such 结构对齐。来自不同域的图通常表现出根本不同的结构模式。例如,社交网络的特点是聚类系数高和三角形图案频繁,而分子图则以小周期和功能子结构为主。这样
structural heterogeneity presents a significant challenge for building graph foundation models that generalize across domains. To address this, a range of strategies have been proposed to align structural information during pretraining and inference. 结构异质性给构建跨域泛化的图基础模型提出了重大挑战。为了解决这个问题,已经提出了一系列策略来调整预训练和推理过程中的结构信息。
Domain-Invariant Pretraining Objectives. One class of methods seeks to learn domain-invariant structural representations by optimizing pretraining objectives that generalize across diverse graph topologies. FoToM [194] adopts adversarial contrastive learning to disentangle domain-specific features while preserving task-relevant structure. By learning representations that are indistinguishable across domains from the perspective of an adversary, the model acquires a form of structure-aware invariance. 域不变预训练目标。一类方法试图通过优化跨不同图拓扑泛化的预训练目标来学习域不变结构表示。FoToM [194]采用对抗性对比学习来理清特定领域的特征,同时保留任务相关结构。通过从对手的角度学习跨领域无法区分的表示,该模型获得了一种结构感知不变性的形式。
Structural Vocabulary Construction. Another approach involves defining a shared vocabulary [213] of structural motifs that capture recurring patterns across graphs. GFT [23] constructs a tree-based codebook during pretraining that encodes canonical structural features. This codebook remains fixed during inference, providing a consistent basis for interpreting new graphs. Similarly, RiemannGFM [196] introduces a geometric extension, modeling both tree and cycle motifs in Riemannian space to better capture curved and hierarchical graph structures. These vocabulary-based approaches promote structural alignment by grounding graph representations in discrete, reusable structural primitives. 结构词汇结构。另一种方法涉及定义结构基序的共享词汇表[213],以捕获跨图的重复模式。GFT [23]在预训练过程中构建了一个基于树的码本,对规范的结构特征进行编码。该密码本在推理过程中保持固定,为解释新图提供了一致的基础。同样,RiemannGFM [196]引入了几何扩展,在黎曼空间中对树形和循环形图案进行建模,以更好地捕获曲线和分层图结构。这些基于词汇表的方法通过将图表示建立在离散的、可重用的结构原语中来促进结构对齐。
Graph Prompting for Structural Adaptation. Graph prompt learning has also emerged as an effective mechanism for structural alignment. IGAP [71] analyzes spectral discrepancies between graphs and uses learnable prompt tokens to align representations in the spectral domain. Empirical results suggest that graph signals are more transferable in the low-frequency space, which can be amplified through spectral prompt injection. BooG [197] introduces virtual nodes during pretraining to harmonize structural contexts across graph domains. ProNoG [182] extends this further by employing a control network that generates node-specific prompts adaptively, enabling fine-grained structural calibration without manual intervention. 结构自适应的图提示。图提示学习也已成为一种有效的结构对齐机制。IGAP [71]分析图之间的频谱差异,并使用可学习的提示标记来对齐谱域中的表示。实证结果表明,图信号在低频空间中更具可传递性,可以通过频谱提示注入进行放大。BooG [197]在预训练过程中引入了虚拟节点,以协调跨图域的结构上下文。ProNoG [182]通过采用自适应生成特定节点提示的控制网络进一步扩展了这一点,无需人工干预即可实现细粒度结构校准。
Mixture-of-Experts for Structural Diversity. An alternative perspective on structure alignment emphasizes architectural diversity. OMOG [201] proposes a mixture-of-experts (MoE) framework, motivated by the observation that a single GNN often fails to capture the inductive biases necessary for structurally diverse graphs. OMOG pretrains multiple specialized GNNs, each tuned to a distinct structural domain, and stores them in a model bank. At inference time, a gating function dynamically selects the most relevant expert based on the similarity between the input and pretrained graphs. This expert routing mechanism reduces negative transfer and provides modular flexibility in adapting to novel structures. 结构多样性的专家混合。结构对齐的另一种观点强调建筑的多样性。OMOG [201]提出了一个专家混合(MoE)框架,其动机是观察到单个 GNN 通常无法捕获结构多样化图所需的归纳偏差。OMOG 预训练多个专用 GNN,每个 GNN 都调整到不同的结构域,并将它们存储在模型库中。在推理时,门控函数根据输入图和预训练图之间的相似性动态选择最相关的专家。这种专家布线机制减少了负转移,并提供了适应新结构的模块化灵活性。
Multi-Objective Learning. To effectively balance the inductive biases inherent in different downstream tasks, it is essential to employ diverse pretraining objectives tailored to distinct learning paradigms. A prevalent strategy involves leveraging multi-task learning frameworks, enabling models to jointly optimize multiple objectives [23, 214, 75, 202, 215, 216, 217]. 多目标学习。为了有效平衡不同下游任务中固有的归纳偏差,必须采用针对不同学习范式量身定制的不同预训练目标。一种流行的策略是利用多任务学习框架,使模型能够共同优化多个目标[23,214,75,202,215,216,217]。
For instance, to comprehensively capture knowledge embedded in graphs, GFT [23] integrates node-level, link-level, and semantic-level reconstruction objectives during pretraining, allowing the model to extract structural and semantic information from multiple perspectives. Similarly, UniGraph [75] co-trains a GNN and an LLM in a unified framework, employing contrastive learning on the GNN side while leveraging masked token prediction on the LLM side. This synergistic approach enhances the model’s capability to learn both graph topology and rich semantic representations. PGF [202] simultaneously optimizes feature reconstruction and local graph structure reconstruction under a graph transformer framework, demonstrating its effectiveness in complex industry-scale applications such as game data modeling. 例如,为了全面捕获嵌入在图中的知识,GFT [23]在预训练过程中集成了节点级、链路级和语义级的重建目标,使模型能够从多个角度提取结构和语义信息。同样,UniGraph [75]在统一的框架中共同训练 GNN 和 LLM,在 GNN 端采用对比学习,同时在 LLM 端利用掩码标记预测。这种协同方法增强了模型学习图拓扑和丰富语义表示的能力。PGF [202]在图转换器框架下同时优化了特征重建和局部图结构重建,证明了其在游戏数据建模等复杂行业规模应用中的有效性。
To achieve a better trade-off among competing objectives, recent works have explored advanced optimization techniques, including Pareto optimization, learnable task tokens, and meta-learning. Specifically, MultiGPrompt [205] introduces multiple learnable pretext tokens to bridge the gap between diverse task objectives, such as local-global contrastive learning, global-global contrastive learning, and link prediction. ParetoGNN [216] employs Pareto optimization to balance five distinct pretraining tasks, achieving an optimal tradeoff across objectives. Meanwhile, All in One [173] applies meta-learning to optimize multi-task prompt initialization, improving generalization and adaptability across various tasks. 为了在竞争目标之间实现更好的权衡,最近的工作探索了先进的优化技术,包括帕累托优化、可学习任务标记和元学习。具体来说,MultiGPrompt [205]引入了多个可学习的借口标记来弥合不同任务目标之间的差距,例如局部-全局对比学习、全局-全局对比学习和链接预测。ParetoGNN [216]采用 Pareto 优化来平衡五个不同的预训练任务,实现跨目标的最佳权衡。同时,All in One [173]应用元学习来优化多任务提示初始化,提高跨各种任务的泛化性和适应性。
5.2.3 Downstream Task Adaptation 5.2.3 下游任务适配
Fine-Tuning. Fine-tuning the pretrained model or employing linear probing remains the most widely adopted approach for adapting GFMs to downstream tasks, a paradigm that has been extensively utilized in graph self-supervised learning [199]. Traditional graph self-supervised learning methods typically follow a two-step procedure: (1) pretraining the model on a source graph, and (2) appending a linear classifier for downstream classification, either through full fine-tuning or linear probing, where only the final layer is updated. Extending this approach, GFMs aim to generalize to unseen graphs in an inductive setting-a key objective for achieving universal graph learning. However, positive transfer across domains remains challenging. To bridge this gap, researchers have proposed various strategies to facilitate rapid adaptation of pretrained models to downstream tasks. 微调。微调预训练模型或采用线性探测仍然是使 GFM 适应下游任务的最广泛采用的方法,这种范式已广泛应用于图自监督学习[199]。传统的图自监督学习方法通常遵循两步程序:(1)在源图上预训练模型,以及(2)通过完全微调或线性探测附加线性分类器进行下游分类,其中仅更新最后一层。扩展这种方法,GFM 旨在推广到归纳环境中看不见的图——这是实现通用图学习的关键目标。然而,跨域的积极转移仍然具有挑战性。为了弥合这一差距,研究人员提出了各种策略,以促进预训练模型快速适应下游任务。
Transfer Learning. A primary strategy for downstream adaptation is transfer learning, wherein a pretrained model is fine-tuned on new graphs to improve task-specific performance. While effective, this approach is computationally expensive, motivating the development of faster transfer learning techniques. EGI [195] enhances transferability by capturing essential graph structures through ego-graph distribution modeling, linking transferability to local graph Laplacians of the source and target domains. Additionally, AUX-TS [152] identifies discrepancies between the optimization objectives of self-supervised pretraining and downstream tasks, introducing auxiliary tasks with adaptive weighting to bridge this gap and improve adaptation efficiency. 迁移学习。下游适应的主要策略是迁移学习,其中预训练模型在新图上进行微调,以提高特定于任务的性能。虽然有效,但这种方法的计算成本很高,这促使了更快的迁移学习技术的开发。EGI [195]通过自我图分布建模捕获基本图结构,将可转移性与源域和目标域的局部图拉普拉斯联系起来,从而增强了可转移性。此外,AUX-TS [152]识别了自监督预训练和下游任务的优化目标之间的差异,引入了具有自适应加权的辅助任务来弥合这一差距并提高适应效率。
Prompt Learning. Inspired by prompt learning in LLMs, researchers have explored graph prompt learning as an efficient alternative to traditional fine-tuning. Instead of updating the pretrained model, this approach finetunes learnable graph prompt tokens, facilitating rapid adaptation to downstream tasks. GPPT [74] pioneers this direction by introducing three key components: (1) prompt addition, which reformulates input nodes as token pairs; (2) prompt answering, where pretrained models estimate linking probabilities between tokens; and (3) prompt tuning, which optimizes pretext task losses with orthogonal initialization and regularization, enabling adaptation without modifying the backbone model. Subsequent works have further refined this paradigm by (1) unifying pretraining and downstream tasks at the subgraph level [181, 108, 173, 22], (2) leveraging multi-scale graph prompt tokens [71, 65], (3) designing advanced prompt templates [173, 108, 22, 190], and (4) incorporating external knowledge sources [206, 192]. These advancements collectively enhance the efficiency and generalizability of graph prompt learning. 提示学习。受到法学硕士提示学习的启发,研究人员探索了图提示学习作为传统微调的有效替代方案。这种方法不是更新预训练模型,而是微调可学习的图提示标记,从而促进快速适应下游任务。GPPT [74]通过引入三个关键组件开创了这一方向:(1)提示加法,将输入节点重新表述为代币对;(2)提示回答,预训练模型估计标记之间的链接概率;(3)提示调优,通过正交初始化和正则化优化借口任务损失,无需修改骨干模型即可实现适配。随后的工作通过(1)在子图级别统一预训练和下游任务[181,108,173,22],(2)利用多尺度图提示标记[71,65],(3)设计高级提示模板[173,108,22,190],以及(4)结合外部知识源[206,192],进一步完善了这一范式。这些进步共同提高了图提示学习的效率和普遍性。
Prototype Learning. While graph prompt learning minimizes the number of tunable parameters, it still requires additional tuning efforts. Prototype learning offers an alternative approach by constructing classspecific prototype embeddings and using them for downstream task predictions, a technique widely employed in few-shot learning. Given kk classes, each with nn samples, class prototypes are formed by averaging the available samples. A new instance is then assigned to the closest prototype. Several works adopt this paradigm to facilitate efficient adaptation. OFA [22], UniGraph [75], and UniGraph2 [203] introduce a query-support framework, where support graphs are used to construct class prototypes. This framework can be further extended to zero-shot learning by leveraging external resources, such as textual class embeddings [22]. GFT [23] follows a similar paradigm but incorporates limited fine-tuning on a minimal set of examples, demonstrating that even a small number of labeled samples can significantly enhance downstream performance. 原型学习。虽然图提示学习最大限度地减少了可调参数的数量,但它仍然需要额外的调整工作。原型学习提供了一种替代方法,即构建特定于类的原型嵌入并将它们用于下游任务预测,这是一种广泛应用于少量学习的技术。给定 kk 类,每个类都有 nn 样本,类原型是通过对可用样本进行平均而形成的。然后将新实例分配给最近的原型。一些作品采用了这种范式来促进有效适应。OFA [22]、UniGraph [75]和 UniGraph2 [203]引入了一个查询支持框架,其中支持图用于构造类原型。通过利用外部资源,如文本类嵌入,该框架可以进一步扩展到零样本学习[22]。GFT [23]遵循类似的范式,但在最少的示例集上进行了有限的微调,表明即使是少量的标记样品也可以显着提高下游性能。
Structural Augmentation. Beyond model-centric adaptation approaches, structural augmentation has emerged as a complementary strategy for improving downstream task performance. UniAug [70] introduces a universal structural augmentor based on a discrete diffusion model, pretrained exclusively on graph structures from over 1,000 datasets. This augmentor can enhance the structures of downstream graphs through guided generation, benefiting node-, link-, and graph-level tasks. Empirical results suggest that increasing the volume of pretraining data improves downstream generalization, as it enriches the model’s understanding of diverse structural patterns across graphs. 结构增强。除了以模型为中心的适应方法之外,结构增强已成为提高下游任务绩效的补充策略。UniAug [70]引入了一种基于离散扩散模型的通用结构增强器,该模型专门针对来自 1,000 多个数据集的图结构进行预训练。该增强器可以通过引导生成来增强下游图的结构,有利于节点级、链路级和图级任务。实证结果表明,增加预训练数据量可以改善下游泛化,因为它丰富了模型对跨图不同结构模式的理解。
5.3 Language Model-Based Universal GFM 5.3 基于语言模型的通用 GFM
This section explores approaches that leverage large language models as predictors for graph-based tasks. LLM-based GFMs have design objective centered around three key components: (1) developing an effective tokenization strategy to transform graph data into text-based representations, (2) post-training LLMs to 本节探讨利用大型语言模型作为基于图形的任务的预测变量的方法。基于 LLM 的 GFM 的设计目标围绕三个关键组成部分:(1) 开发有效的标记化策略,将图形数据转换为基于文本的表示,(2) 训练后 LLM
Method Name Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
LangGFM [99] LLM Supervised Finetune, In-context Data - Text Attribute Loss - Pretrain Explicit - QA -
Meta-Transformer [218] LLM Generative Finetune N/A N/A N/A Link
QueryRAG [184] LLM Generative In-context Data - Text Attribute Data - Augment Explicit - QA -
GraphAgent [219] LLM Generative In-context Data - Text Attribute Data - Augment Explicit - QA -
Graph-ToolFormer [220] LLM Generative In-context Data - Text Attribute Data - Augment Explicit - QA Link
InstructGraph [221] LLM Generative Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA Link
Beyond Text [100] LLM Generative In-context Data - Text Attribute Data - Augment Explicit - QA -
Graph Agent [222] LLM Generative In-context Data - Text Attribute Data - Augment Explicit - QA -
InstructGLM [101] LLM Generative Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA Link
GraphICL [223] LLM Generative In-context Data - Text Attribute Data - Augment Explicit - QA -| Method Name | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| LangGFM [99] | LLM | Supervised | Finetune, In-context | Data - Text Attribute | Loss - Pretrain | Explicit - QA | - |
| Meta-Transformer [218] | LLM | Generative | Finetune | N/A | N/A | N/A | Link |
| QueryRAG [184] | LLM | Generative | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | - |
| GraphAgent [219] | LLM | Generative | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | - |
| Graph-ToolFormer [220] | LLM | Generative | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| InstructGraph [221] | LLM | Generative | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| Beyond Text [100] | LLM | Generative | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | - |
| Graph Agent [222] | LLM | Generative | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | - |
| InstructGLM [101] | LLM | Generative | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| GraphICL [223] | LLM | Generative | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | - |
incorporate graph-specific knowledge, and (3) designing advanced adaptation techniques to enhance alignment with downstream tasks. We summarize the LLM-based universal GFMs in Table 3. This formulation consists of three core components: 结合特定于图的知识,以及 (3) 设计先进的适应技术以增强与下游任务的一致性。我们在表 3 中总结了基于 LLM 的通用 GFM。该配方由三个核心成分组成:
Graph Tokenization: Define a tokenization function T\mathcal{T} that maps graph G\mathcal{G} into a text sequence: 图形标记化:定义将图形映射 G\mathcal{G} 到文本序列的标记化函数 T\mathcal{T} :
s=T(G)\mathbf{s}=\mathcal{T}(\mathcal{G})
where s=[s_(1),s_(2),dots,s_(L)]\mathbf{s}=\left[s_{1}, s_{2}, \ldots, s_{L}\right] is a sequence of LL tokens representing the nodes, edges, and attributes of the input graph. 其中 s=[s_(1),s_(2),dots,s_(L)]\mathbf{s}=\left[s_{1}, s_{2}, \ldots, s_{L}\right] 是表示输入图的节点、边和属性的标记序列 LL 。
2. Post-Training on Graph Data: Fine-tune a pretrained LLM with graph-specific objectives (e.g., masked token prediction, topology autoencoding) using the tokenized input s\mathbf{s} : 2. 图数据后训练:使用标记化输入 s\mathbf{s} 微调具有特定于图形的目标(例如,掩码标记预测、拓扑自动编码)的预训练 LLM:
where L_("graph ")\mathcal{L}_{\text {graph }} denotes the graph-aware training loss and yy is the supervision signal. 其中 L_("graph ")\mathcal{L}_{\text {graph }} 表示图感知训练损失, yy 是监督信号。
3. Downstream Adaptation: For downstream task inference, apply an adaptation function A\mathcal{A} (e.g., in-context learning, prompting, instruction tuning) to structure the input prompt and decode predictions: 3. 下游适配:对于下游任务推理,应用适配函数 A\mathcal{A} (例如上下文学习、提示、指令调优)来构建输入提示并解码预测:
where hat(y)\hat{y} is the model prediction adapted to the downstream task format. 其中 hat(y)\hat{y} 是适应下游任务格式的模型预测。
5.3.1 Model Unification 5.3.1 模型统一
Large language models are inherently designed as task-agnostic architectures capable of solving a wide range of text-based problems within a unified framework. To extend their capabilities to graph-structured data, a critical challenge lies in developing effective conversion schemes that translate graphs into representations amenable to language modeling. Specifically, the goal is to align the inductive biases of graph structures with the sequential and semantic processing capabilities of LLMs. Existing approaches to this problem fall broadly into two categories: (1) natural language conversion, which reformulates graph structures as textual narratives, and (2) structured format conversion, which encodes graphs into organized, code-like representations such as JSON or nested lists. 大型语言模型本质上被设计为与任务无关的架构,能够在统一的框架内解决各种基于文本的问题。为了将其功能扩展到图结构数据,一个关键挑战在于开发有效的转换方案,将图转换为适合语言建模的表示。具体来说,目标是将图结构的归纳偏差与法学硕士的顺序和语义处理能力保持一致。解决这个问题的现有方法大致分为两类:(1)自然语言转换,将图结构重新表述为文本叙述,以及(2)结构化格式转换,将图编码为有组织的、类似代码的表示形式,例如 JSON 或嵌套列表。
Natural Language Conversion. Natural language conversion methods [101, 99, 223, 100, 219, 184] represent graph elements (nodes, edges, and attributes) using human-readable descriptions. These techniques leverage the inherent linguistic capabilities of LLMs to perform reasoning over graphs by narratively expressing relational patterns and structural dependencies. For example, in citation graphs, nodes representing academic papers can be described using titles, abstracts, and citation relationships, thereby embedding graph semantics into natural language text [100]. While sharing the core objective of converting graphs into text, existing methods vary significantly in terms of their prompt design, context modeling, and reasoning strategies: 自然语言转换。自然语言转换方法 [101, 99, 223, 100, 219, 184] 使用人类可读的描述来表示图元素(节点、边和属性)。这些技术利用法学硕士固有的语言能力,通过叙述性地表达关系模式和结构依赖关系来对图进行推理。例如,在引文图中,可以使用标题、摘要和引文关系来描述代表学术论文的节点,从而将图语义嵌入到自然语言文本中[100]。虽然共享将图形转换为文本的核心目标,但现有方法在提示设计、上下文建模和推理策略方面存在很大差异:
Template Construction: Most approaches begin with handcrafted or automated templates that standardize the textual description of graph elements. These templates capture key graph components-such as node labels, edge types, degrees, or graph-level statistics-and embed them in coherent textual forms. QueryRAG [184], for instance, incorporates detailed structural summaries including adjacency properties and graph size indicators. 模板构建:大多数方法都从手工制作或自动化模板开始,这些模板标准化了图形元素的文本描述。这些模板捕获关键的图形组件(例如节点标签、边缘类型、度数或图形级统计数据),并将它们嵌入到连贯的文本形式中。例如,QueryRAG [184]包含详细的结构摘要,包括邻接属性和图形大小指标。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
Hierarchical Context Inclusion: To improve graph comprehension, several methods incorporate multi-hop neighborhood information [101, 223]. This allows LLMs to access not only direct relationships but also broader structural context. Such descriptions are often layered hierarchically to reflect structural depth. 分层上下文包含:为了提高图理解能力,有几种方法结合了多跳邻域信息[101,223]。这使得法学硕士不仅可以访问直接关系,还可以访问更广泛的结构背景。此类描述通常分层以反映结构深度。
Task-Specific Reasoning and Prompting: Some models further introduce dynamic reasoning mechanisms, using multi-step prompts, memory updates, and agent-style decision-making [219, 223]. These guided prompts help the model iteratively refine its understanding and decision process, improving performance on complex graph tasks such as traversal, path-finding, or subgraph reasoning. 特定任务的推理和提示:一些模型进一步引入了动态推理机制,使用多步骤提示、记忆更新和代理式决策[219,223]。这些引导式提示可帮助模型迭代完善其理解和决策过程,从而提高复杂图任务(例如遍历、寻路或子图推理)的性能。
Structured Format Conversion. Structured conversion approaches [222, 220, 221] provide an alternative to free-form text by encoding graphs into well-defined data structures such as JSON, code blocks, or nested lists. This design preserves graph topology while enabling LLMs to process the information in a structured and interpretable manner. The core idea is to treat graphs as serialized data that LLMs can parse, analyze, and manipulate. These methods typically exhibit the following key characteristics: 结构化格式转换。结构化转换方法[222,220,221]通过将图形编码为定义明确的数据结构(如 JSON、代码块或嵌套列表)来提供自由格式文本的替代方案。这种设计保留了图拓扑,同时使 LLM 能够以结构化和可解释的方式处理信息。核心思想是将图视为法学硕士可以解析、分析和作的序列化数据。这些方法通常表现出以下关键特征:
Predefined Templates and Tokenization: Structured methods use domain-specific templates to define graph elements. For instance, nodes may be represented as objects with fields for attributes and neighbors, while edges are encoded as relations within nested arrays [220]. Automated tools (e.g., ChatGPT) can populate these templates during preprocessing. 预定义模板和标记化:结构化方法使用特定于领域的模板来定义图形元素。例如,节点可以表示为具有属性和邻居字段的对象,而边则被编码为嵌套数组中的关系[220]。自动化工具(例如 ChatGPT)可以在预处理期间填充这些模板。
API-Augmented Inference: Several approaches extend the inference pipeline by integrating APIs that enable dynamic graph traversal or data retrieval during the reasoning process. This augmentation facilitates tasks such as knowledge graph completion or personalized recommendation [220]. API 增强推理:多种方法通过集成 API 来扩展推理管道,这些 API 在推理过程中实现动态图遍历或数据检索。这种增强有助于完成知识图谱或个性化推荐等任务[220]。
Hierarchical Context Encoding: Similar to natural language methods, structured formats may also include hierarchical representations of node neighborhoods. GraphAgent [222] demonstrates this by incorporating recursive context trees, improving the LLM’s ability to model nested relational dependencies. 分层上下文编码:与自然语言方法类似,结构化格式也可能包括节点邻域的分层表示。GraphAgent [222]通过合并递归上下文树来证明这一点,提高了 LLM 对嵌套关系依赖关系进行建模的能力。
Both paradigms-natural language and structured conversion-offer viable paths to unify graph reasoning under LLM frameworks. Natural language prompts prioritize interpretability and ease of human understanding, while structured formats provide clarity and consistency, particularly for complex or large-scale graph tasks. The choice between these methods often reflects trade-offs in generalization, computational efficiency, and compatibility with downstream applications. 自然语言和结构化转换范式都为在 LLM 框架下统一图推理提供了可行的途径。自然语言提示优先考虑可解释性和人类理解的便利性,而结构化格式则提供清晰度和一致性,特别是对于复杂或大规模的图形任务。这些方法之间的选择通常反映了泛化、计算效率以及与下游应用程序兼容性方面的权衡。
5.3.2 Domain Alignment in Model Training 5.3.2 模型训练中的域对齐
Once graph templates have been defined and translated into structured textual representations, LLMs can naturally interpret and reason over graph data using their native text-processing capabilities. A straightforward approach involves directly leveraging pretrained LLMs as zero-shot or few-shot predictors. In this setting, graph-formatted prompts are fed into the model, enabling in-context learning without any additional parameter updates [223, 100, 222, 219, 184]. This paradigm capitalizes on the strong generalization and compositional reasoning abilities of LLMs, allowing them to perform tasks through purely prompt-based supervision. 一旦定义了图形模板并将其转换为结构化文本表示,LLM 就可以使用其本机文本处理功能自然地解释和推理图形数据。一种简单的方法涉及直接利用预训练的法学硕士作为零样本或少量预测变量。在此设置中,图形格式的提示被输入模型,无需任何额外的参数更新即可实现上下文学习[223,100,222,219,184]。这种范式利用了法学硕士强大的泛化和组合推理能力,使他们能够通过纯粹基于提示的监督来执行任务。
While zero-shot performance is often competitive, empirical evidence suggests that post-training adapta-tion-particularly techniques such as supervised fine-tuning (SFT) and preference alignment (e.g., PPO, DPO)-can substantially improve alignment between LLMs and graph-structured tasks. Accordingly, recent research has explored various strategies to better adapt LLMs to graph reasoning through domain-specific training objectives and multi-task instruction tuning. 虽然零样本性能通常具有竞争力,但经验证据表明,训练后适应——特别是监督微调(SFT)和偏好对齐(例如 PPO、DPO)等技术——可以显着改善 LLM 和图结构任务之间的一致性。因此,最近的研究探索了各种策略,通过特定领域的训练目标和多任务指令调整,更好地使法学硕士适应图推理。
Instruction-Tuned Fine-Tuning. InstructGraph [221] introduces a supervised fine-tuning approach that incorporates structured graph reasoning tasks into the LLM training process. By pairing graph-encoded inputs with curated instructions and target outputs, the model learns to follow graph-specific reasoning patterns. Preference alignment techniques such as reinforcement learning with human feedback (RLHF) or direct preference optimization (DPO) are further employed to enhance output faithfulness and task adherence. InstructGLM [101] generalizes this concept through multi-prompt training, where a diverse set of graph tasks-including classification, generation, and summarization-are cast into instruction-based prompts. The LLM is trained across this multi-task corpus to promote generalization and cross-task transfer. This unified 指令调谐微调。InstructGraph [221]引入了一种监督微调方法,该方法将结构化图推理任务纳入 LLM 训练过程。通过将图形编码的输入与精选指令和目标输出配对,模型学习遵循特定于图形的推理模式。进一步采用偏好对齐技术,例如人类反馈强化学习 (RLHF) 或直接偏好优化 (DPO),以增强输出忠实度和任务依从性。InstructGLM [101]通过多提示训练推广了这一概念,其中一组不同的图任务(包括分类、生成和汇总)被转换为基于指令的提示中。LLM 跨此多任务语料库进行训练,以促进泛化和跨任务转移。这统一了
training paradigm enables LLMs to handle graph problems with varying input-output formats and structural complexities. 训练范式使法学硕士能够处理具有不同输入输出格式和结构复杂性的图问题。
Self-Supervised Graph Alignment. LangGFM [99] proposes an alternative fine-tuning strategy grounded in self-supervised learning (SSL). Inspired by traditional graph pretraining techniques, LangGFM introduces two novel SSL objectives tailored for LLM-based graph understanding: Topology Autoencoding and Feature Masked Autoencoding. The topology autoencoder encourages the model to reconstruct structural information (e.g., edge connections, adjacency statistics) from textual graph descriptions, while the feature masked autoencoder masks node attributes and predicts them from context-analogous to masked language modeling, but applied to graph-encoded text. These self-supervised objectives promote alignment between LLM representations and graph topologies without requiring manually labeled data. As a result, LangGFM achieves strong performance on downstream tasks while maintaining label efficiency and robustness to domain shifts. 自监督图对齐。LangGFM [99]提出了一种基于自监督学习(SSL)的替代微调策略。受传统图预训练技术的启发,LangGFM 引入了两个专为基于 LLM 的图理解量身定制的新型 SSL 目标:拓扑自动编码和特征掩码自动编码。拓扑自动编码器鼓励模型从文本图描述中重建结构信息(例如,边缘连接、邻接统计),而特征掩码自动编码器则掩盖节点属性并从上下文类比到掩码语言建模中预测它们,但应用于图形编码的文本。这些自监督目标促进了 LLM 表示和图拓扑之间的一致性,而无需手动标记数据。因此,LangGFM 在下游任务上实现了强大的性能,同时保持了标签效率和对域偏移的鲁棒性。
5.3.3 Downstream Task Adaptation 5.3.3 下游任务适配
Zero-Shot Reasoning. Large language models exhibit strong zero-shot generalization capabilities, allowing them to perform a wide range of tasks without explicit fine-tuning. This property extends naturally to LLMbased GFMs, enabling them to solve graph-related tasks directly from textual representations of graph data. By designing appropriate prompts that align graph structures with LLMs’ pretraining distributions, researchers have demonstrated that LLMs can perform node classification, link prediction, and reasoning tasks without updating any model parameters [101, 100, 222, 221, 218]. The effectiveness of zero-shot adaptation hinges on high-quality template engineering, where graphs are serialized into descriptive formats that LLMs can interpret semantically. 零样本推理。大型语言模型表现出强大的零样本泛化能力,使它们能够执行广泛的任务,而无需显式微调。该属性自然扩展到基于 LLM 的 GFM,使它们能够直接从图形数据的文本表示中解决与图形相关的任务。通过设计适当的提示,使图结构与 LLM 的预训练分布保持一致,研究人员已经证明,LLM 可以在不更新任何模型参数的情况下执行节点分类、链接预测和推理任务[101,100,222,221,218]。零样本适应的有效性取决于高质量的模板工程,其中图表被序列化为法学硕士可以从语义上解释的描述性格式。
In-Context Learning. Beyond zero-shot reasoning, LLMs can leverage in-context learning to improve task performance by utilizing demonstrations within input prompts. This approach involves prepending labeled graph instances or structured explanations before queries, injecting task-specific knowledge without requiring model parameter updates. Several works explore this paradigm by employing basic in-context learning strategies using labeled graph instances, effectively guiding LLMs in downstream tasks [100,222,221]. 情境学习。除了零样本推理之外,法学硕士还可以利用上下文学习,通过利用输入提示中的演示来提高任务绩效。这种方法涉及在查询之前添加标记的图形实例或结构化解释,注入特定于任务的知识,而无需更新模型参数。一些工作通过使用标记图实例采用基本的上下文学习策略来探索这种范式,有效地指导 LLM 进行下游任务[100,222,221]。
To enhance in-context learning, researchers have developed advanced prompting techniques that incorporate structured representations of graph data. GraphICL [223] introduces a structured prompting framework that enables general-purpose LLMs to outperform specialized graph models in resource-constrained and out-ofdomain tasks. The designed prompts consist of four key components: (1) task descriptions to define objectives, (2) anchor node text to contextualize relevant entities, (3) structure-aware information to provide topological insights, and (4) labeled demonstrations to facilitate few-shot learning setting. These elements collectively enable adaptation to node classification and link prediction tasks. GraphAgent [219] reformulates downstream graph tasks as an agent planning problem, where LLMs take structured actions based on graph topology and node content. This framework incorporates a hierarchical memory mechanism that integrates both long-term and short-term memory modules, enabling efficient handling of large-scale graph data. By storing high-quality examples dynamically, GraphAgent significantly improves LLMs’ ability to generalize across graph domains. QueryRAG [184] enhances in-context learning by explicitly incorporating query nodes, contextual information from graph neighbors, and corresponding labels into structured prompts. This design ensures that graph structure serves as an inherent contextual feature, allowing LLMs to better capture relational dependencies and improve reasoning over graph data. 为了增强上下文学习,研究人员开发了先进的提示技术,其中包含图形数据的结构化表示。GraphICL [223]引入了一个结构化的提示框架,使通用 LLM 能够在资源受限和域外任务中优于专用图模型。设计的提示由四个关键组成部分组成:(1)用于定义目标的任务描述,(2)锚节点文本以将相关实体置于上下文中,(3)结构感知信息以提供拓扑见解,以及(4)标记演示以促进少量学习设置。这些元素共同实现了对节点分类和链路预测任务的适应。GraphAgent [219]将下游图任务重新表述为代理规划问题,其中 LLM 根据图拓扑和节点内容采取结构化作。该框架采用了分层记忆机制,集成了长期和短期记忆模块,能够高效处理大规模图数据。通过动态存储高质量示例,GraphAgent 显着提高了 LLM 跨图域泛化的能力。QueryRAG [184]通过将查询节点、来自图邻居的上下文信息以及相应的标签显式合并到结构化提示中来增强上下文学习。这种设计确保图结构作为固有的上下文特征,使法学硕士能够更好地捕获关系依赖关系并改进对图数据的推理。
Large language models have demonstrated remarkable capabilities in processing unstructured data such as natural language, exhibiting strong generalization in zero-shot and few-shot settings. Their ability to perform compositional reasoning, follow instructions, and adapt to diverse downstream tasks has made them a cornerstone of modern AGI. However, LLMs are inherently limited in their capacity to process structured data-such as graphs-which encode high-order dependencies, long-range interactions, and complex relational structures that are not naturally captured by sequential token representations. Although one strategy involves tokenizing graph-structured data into textual sequences to leverage LLMs directly, this transformation often introduces substantial inductive bias mismatch. Critical structural information-such as neighborhood 大型语言模型在处理自然语言等非结构化数据方面表现出卓越的能力,在零样本和少量设置中表现出很强的泛化能力。它们执行组合推理、遵循指令和适应各种下游任务的能力使它们成为现代 AGI 的基石。然而,LLM 处理结构化数据(例如图)的能力本质上是有限的,这些数据编码了高阶依赖关系、远程交互和复杂的关系结构,而这些结构不会被顺序标记表示自然捕获。尽管一种策略涉及将图结构数据标记为文本序列以直接利用法学硕士,但这种转变通常会引入大量的归纳偏差失配。关键结构信息——例如邻域
Data - Augment, Model - Retriever 数据 - 增强、模型 - 检索器
Explicit - QA 显式 - QA
-
GraphAgent [219] 图形代理 [219]
GNN + LLM GNN + 法学硕士
Generative 生成
Finetune, In-context 微调,上下文
Data - Text Attribute 数据 - 文本属性
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
Link 链接
Method Name Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
GOFA [53] GNN + LLM Supervised Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA, Explicit - Subgraph Link
GraphGPT [224] GNN + LLM Supervised Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA Link
GALLM [225] GNN + LLM Supervised Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA -
LLaGA [78] GNN + LLM Supervised In-context Data - Text Attribute Data - Augment Explicit - QA Link
GraphCLIP [104] GNN + LLM Contrastive Finetune Data - Text Attribute Loss - Pretrain Explicit - QA Link
GraphTranslator [103] GNN + LLM Generative Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA Link
PromptGFM [226] GNN + LLM Generative Finetune, In-context Data - Text Attribute Data - Augment, Model - Codebook Explicit - QA Link
GraphPrompter [227] GNN + LLM Generative Finetune Data - Text Attribute Data - Augment Explicit - QA Link
TEA-GLM [228] GNN + LLM Generative Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA Link
NT-LLM [229] GNN + LLM Generative Finetune Data - Text Attribute Data - Augment Explicit - QA -
AskGNN [183] GNN + LLM Generative Finetune, In-context Data - Text Attribute Data - Augment, Model - Retriever Explicit - QA -
GraphAgent [219] GNN + LLM Generative Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA Link| Method Name | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| GOFA [53] | GNN + LLM | Supervised | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA, Explicit - Subgraph | Link |
| GraphGPT [224] | GNN + LLM | Supervised | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| GALLM [225] | GNN + LLM | Supervised | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA | - |
| LLaGA [78] | GNN + LLM | Supervised | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| GraphCLIP [104] | GNN + LLM | Contrastive | Finetune | Data - Text Attribute | Loss - Pretrain | Explicit - QA | Link |
| GraphTranslator [103] | GNN + LLM | Generative | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| PromptGFM [226] | GNN + LLM | Generative | Finetune, In-context | Data - Text Attribute | Data - Augment, Model - Codebook | Explicit - QA | Link |
| GraphPrompter [227] | GNN + LLM | Generative | Finetune | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| TEA-GLM [228] | GNN + LLM | Generative | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| NT-LLM [229] | GNN + LLM | Generative | Finetune | Data - Text Attribute | Data - Augment | Explicit - QA | - |
| AskGNN [183] | GNN + LLM | Generative | Finetune, In-context | Data - Text Attribute | Data - Augment, Model - Retriever | Explicit - QA | - |
| GraphAgent [219] | GNN + LLM | Generative | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
topologies, connectivity patterns, and graph invariants-is frequently lost or obfuscated during linearization, leading to suboptimal reasoning performance. This challenge motivates the need for hybrid architectures that preserve graph inductive biases while enabling semantic generalization through LLMs. 拓扑、连通模式和图不变量在线性化过程中经常丢失或混淆,导致推理性能不佳。这一挑战促使人们需要混合架构,以保留图归纳偏差,同时通过法学硕士实现语义泛化。
Drawing inspiration from VLMs, which integrate visual embeddings into LLMs to support multimodal reasoning, recent efforts have explored hybrid Graph-Language Models that combine the representational power of GNNs with the reasoning capabilities of LLMs. These approaches aim to unify structural and semantic signals by bridging the modality gap between graphs and language. The development of such models typically involves three key steps: (1) training a GNN to capture structural information, (2) projecting graph embeddings into the token space of an LLM, and (3) leveraging LLMs for downstream inference tasks. Formally, this formulation consists of the following core components: 从 VLM 中汲取灵感,VLM 将视觉嵌入集成到 LLM 中以支持多模态推理,最近的努力探索了混合图语言模型,将 GNN 的表示能力与 LLM 的推理能力相结合。这些方法旨在通过弥合图和语言之间的模态差距来统一结构和语义信号。此类模型的开发通常涉及三个关键步骤:(1) 训练 GNN 以捕获结构信息,(2) 将图嵌入投影到 LLM 的标记空间中,以及 (3) 利用 LLM 执行下游推理任务。正式地,该配方由以下核心成分组成:
Graph Encoding: A GNN encoder is trained to extract node- or graph-level embeddings that capture both local and global structural dependencies: 图编码:GNN 编码器经过训练,可以提取节点级或图级嵌入,以捕获局部和全局结构依赖关系:
where X\mathbf{X} and A\mathbf{A} are the node features and adjacency matrix, respectively, and H={h_(1),dots,h_(N)}\mathbf{H}=\left\{\mathbf{h}_{1}, \ldots, \mathbf{h}_{N}\right\} denotes the learned node embeddings. 式中 X\mathbf{X} ,和分别 A\mathbf{A} 是节点特征和邻接矩阵,表示 H={h_(1),dots,h_(N)}\mathbf{H}=\left\{\mathbf{h}_{1}, \ldots, \mathbf{h}_{N}\right\} 学习到的节点嵌入。
2. Cross-Modality Projection: The graph embeddings are mapped into the token space of the LLM via a projection function rho\rho, aligning structural information with the language model’s input representation: 2. 跨模态投影:图嵌入通过投影函数 rho\rho 映射到 LLM 的标记空间中,将结构信息与语言模型的输入表示对齐:
Z=rho(H),quadZinR^(N xx d)\mathbf{Z}=\rho(\mathbf{H}), \quad \mathbf{Z} \in \mathbb{R}^{N \times d}
where Z\mathbf{Z} serves as the token-aligned graph representation and dd is the dimensionality of the LLM token space. 其中 Z\mathbf{Z} 用作标记对齐的图表示, dd 是 LLM 标记空间的维度。
3. Language-Based Inference: The LLM f_("LLM ")f_{\text {LLM }} consumes a prompt P\mathcal{P} that incorporates the projected graph embeddings Z\mathbf{Z} alongside optional task instructions and demonstration tokens: 3. 基于语言的推理:LLM f_("LLM ")f_{\text {LLM }} 使用一个提示, P\mathcal{P} 其中包含投影图嵌入 Z\mathbf{Z} 以及可选的任务指令和演示令牌:
where ||\| denotes the concatenation operation and hat(y)\hat{y} is the model prediction for the downstream task. 其中 ||\| 表示串联作, hat(y)\hat{y} 是下游任务的模型预测。
5.4.1 Model Unification 5.4.1 模型统一
Recent approaches integrating GNNs with LLMs generally adopt a framework akin to visual-language models, wherein a graph encoder extracts structural features, and an LLM performs reasoning over the extracted representations. Various techniques have been proposed to bridge the gap between graph embeddings and token-based language models, improving alignment, interpretability, and reasoning over graph-structured data. 最近将 GNN 与 LLM 集成的方法通常采用类似于视觉语言模型的框架,其中图编码器提取结构特征,LLM 对提取的表示进行推理。已经提出了各种技术来弥合图嵌入和基于标记的语言模型之间的差距,提高图结构数据的对齐性、可解释性和推理能力。
Several methods focus on encoding graph structures into tokenized representations that align with LLM processing. LLaGA [78] restructures nodes into sequential representations using two distinct templates: Neighborhood Detail, which captures local connectivity, and Hop-Field Overview, which encodes broader 有几种方法侧重于将图结构编码为与 LLM 处理一致的标记化表示。LLaGA [78]使用两个不同的模板将节点重组为顺序表示:邻域细节(捕获局部连通性)和跃域场概览(编码范围更广)
structural context. A learned projection module aligns these structured representations with LLM token embeddings. GraphAgent [230] introduces Graph-Token Grounding, mapping nodes and edges into structured Python objects, while Graph Tokenization embeds discrete graph entities for generative and predictive tasks. The model employs an intent recognition module to dynamically adjust system prompts based on user queries. GraphTranslator [103] facilitates interaction between GNNs and LLMs through a translation module that converts graph embeddings into textual representations, with a producer module generating alignment data to ensure consistency between graph structures and natural language descriptions. NT-LLM [229] employs Node Tokenization, selecting anchor nodes to efficiently encode graph structures while preserving topological integrity. This method enhances relational representation and reasoning capabilities within LLMs. TEA-GLM [228] refines prompt construction to generalize across diverse graph tasks, structuring prompts into three components: (1) graph information, (2) task descriptions, and (3) multiple-choice answers for cross-dataset reasoning. A dedicated projection module maps graph tokens into instruction-tuned LLMs, ensuring alignment across different tasks. 结构背景。学习的投影模块将这些结构化表示与 LLM 标记嵌入保持一致。GraphAgent [230]引入了 Graph-Token Grounding,将节点和边缘映射到结构化的 Python 对象中,而 Graph Tokenization 则嵌入了用于生成和预测任务的离散图实体。该模型采用意图识别模块,根据用户查询动态调整系统提示。GraphTranslator [103]通过翻译模块促进 GNN 和 LLM 之间的交互,该模块将图嵌入转换为文本表示,生产者模块生成对齐数据,以确保图结构和自然语言描述之间的一致性。NT-LLM [229]采用节点标记化,选择锚节点来有效地编码图结构,同时保持拓扑完整性。TEA-GLM [228]改进了提示结构,以推广不同的图任务,将提示结构结构化为三个部分:(1)图信息,(2)任务描述,以及(3)用于跨数据集推理的多项选择题。专用投影模块将图形标记映射到指令调整的 LLM 中,确保不同任务之间的一致性。
5.4.2 Domain Alignment in Model Training 5.4.2 模型训练中的域对齐
Once unified graph-text representations are obtained, domain alignment during model training plays a crucial role in enhancing reasoning capabilities and generalization across diverse graph tasks. 一旦获得统一的图文本表示,模型训练期间的域对齐在增强推理能力和跨不同图任务的泛化方面发挥着至关重要的作用。
Supervised Alignment. Supervised fine-tuning remains the most prevalent strategy for aligning GNN and LLM components. Typically, graph encoders are appended to LLMs, and the combined architecture is trained end-to-end on labeled datasets [78, 225, 229, 227, 103]. This allows the model to learn tight coupling between structural features and textual reasoning, adapting to specific downstream applications. 监督对齐。监督微调仍然是对齐 GNN 和 LLM 组件的最流行策略。通常,图形编码器被附加到 LLM 中,组合架构在标记的数据集上进行端到端训练[78,225,229,227,103]。这使得模型能够学习结构特征和文本推理之间的紧密耦合,适应特定的下游应用。
Self-Supervised Alignment. To reduce reliance on labeled data, several methods adopt self-supervised learning paradigms. GALLM [225] introduces a text matching objective using both manual and learnable soft category prompts, optimized via backpropagation. The model follows a two-stage pipeline consisting of SSL-based pretraining followed by supervised fine-tuning. GOFA [53] formulates graph understanding as sentence completion and reasoning, unifying diverse tasks such as structural analysis and question answering under a shared prompt space. LangGFM [99] adapts traditional GNN pretraining ideas into LLM-friendly formats, proposing Topology Autoencoding and Feature Masked Autoencoding to enhance structural reasoning via textualized inputs. 自监督对齐。为了减少对标记数据的依赖,几种方法采用了自监督学习范式。GALLM [225]引入了一种文本匹配目标,使用手动和可学习的软类别提示,并通过反向传播进行优化。该模型遵循一个两阶段管道,包括基于 SSL 的预训练和监督微调。GOFA [53]将图理解表述为句子完成和推理,将结构分析和问答等多种任务统一在一个共享的提示空间下。LangGFM [99]将传统的 GNN 预训练思想改编成 LLM 友好的格式,提出了拓扑自动编码和特征掩码自动编码,以通过文本化输入增强结构推理。
CLIP-like Alignment. Inspired by CLIP [60], GraphCLIP [104] applies contrastive learning to align graph representations with textual summaries. Large-scale graph-summary pairs are generated with the help of LLMs and used to train a contrastive encoder-decoder system. This pretraining strategy facilitates zero-shot and few-shot transfer, enabling robust generalization across domains and graph types. 类似 CLIP 的对齐方式。受 CLIP[60]的启发,GraphCLIP[104]应用对比学习将图表示与文本摘要对齐。在 LLM 的帮助下生成大规模图-摘要对,并用于训练对比编码器-解码器系统。这种预训练策略有助于零样本和少量转移,从而实现跨域和图类型的稳健泛化。
5.4.3 Downstream Task Adaptation 5.4.3 下游任务适配
Zero-Shot Reasoning. LLMs inherently support zero-shot reasoning by leveraging their pretrained knowledge and prompt-based adaptation. When coupled with graph representations, this capability enables effective inference on graph tasks such as node classification and link prediction without any additional fine-tuning [78, 53, 230, 227]. Zero-shot adaptation is particularly valuable in low-resource settings, where labeled graph data is scarce or unavailable. 零样本推理。法学硕士通过利用其预先训练的知识和基于提示的适应,本质上支持零样本推理。当与图表示相结合时,这种能力可以对图任务进行有效的推理,例如节点分类和链路预测,而无需任何额外的微调[78,53,230,227]。零样本适应在资源匮乏的环境中特别有价值,因为标记的图数据稀缺或不可用。
In-Context Learning (ICL). Several methods leverage ICL to improve graph reasoning by embedding task demonstrations directly into prompts. AskGNN [183] enhances classification accuracy through ICL Example Purification, a two-stage method that selects informative support examples via LLM-based scoring and filters class-imbalanced samples. This improves both representativeness and balance of in-context examples. RAGbased methods such as RAGraph [206] retrieve graph-relevant contexts during inference, dynamically enriching prompts and improving generalization to previously unseen graph structures. 情境学习 (ICL)。有几种方法利用 ICL 通过将任务演示直接嵌入到提示中来改进图推理。AskGNN [183]通过 ICL 示例纯化提高了分类精度,这是一种两阶段方法,通过基于 LLM 的评分选择信息支持示例,并过滤类别不平衡的样本。这提高了上下文示例的代表性和平衡性。基于 RAG 的方法,如 RAGraph [206],在推理过程中检索与图相关的上下文,动态丰富提示并改进对以前未见过的图结构的泛化。
Interpretability. One of the major advantages of using LLMs for graph reasoning is their ability to produce human-readable explanations. GraphTranslator [103] showcases this through multi-turn dialog reasoning over user activity graphs, supporting interpretability across behavioral analytics, social network dynamics, and 可解释性。使用法学硕士进行图推理的主要优势之一是它们能够产生人类可读的解释。GraphTranslator [103]通过对用户活动图的多轮对话推理来展示这一点,支持跨行为分析、社交网络动态和
recommendation scenarios. By combining natural language outputs with structured representations, such models enhance transparency and trustworthiness in graph-based AI systems. 推荐方案。通过将自然语言输出与结构化表示相结合,此类模型提高了基于图形的人工智能系统的透明度和可信度。
5.5 Discussion 5.5 讨论
Universal GFMs aim to generalize across graph domains and tasks by leveraging large-scale pretraining and adaptable architectures. Each type exhibits unique strengths and limitations, reflecting inherent trade-offs in model expressiveness, scalability, generalization, and interpretability. In summary, GNN-based models are efficient and structure-aware but semantically limited; LLM-based models are flexible and language-driven but struggle with graph topology; hybrid models aim to unify the best of both worlds but introduce significant system complexity. 通用 GFM 旨在通过利用大规模预训练和适应性架构来跨图域和任务进行泛化。每种类型都表现出独特的优势和局限性,反映了模型表现力、可扩展性、泛化性和可解释性方面的固有权衡。综上所述,基于 GNN 的模型是高效的和结构感知的,但在语义上是有限的;基于 LLM 的模型灵活且由语言驱动,但在图拓扑方面遇到困难;混合模型旨在统一两全其美,但会带来显着的系统复杂性。
Graph Model-Based GFMs. GNN-based universal models are structurally aligned with graph data and excel at capturing topological information through message passing and local neighborhood aggregation. These models are highly effective for tasks where structure is critical. Advantages: (1) Strong inductive biases tailored to graphs, supporting generalization across graph domains. (2) Computationally efficient for large-scale graphs due to localized computations. (3) Well-suited for structural tasks involving graph motifs. Limitations: (1) Limited ability to encode rich semantic information (e.g., textual or multimodal node attributes). (2) Difficulty scaling to out-of-domain tasks without significant architecture or prompt redesign. (2) Struggles with long-range dependencies due to local message passing constraints. 基于图模型的 GFM。基于 GNN 的通用模型在结构上与图数据对齐,擅长通过消息传递和局部邻域聚合捕获拓扑信息。这些模型对于结构至关重要的任务非常有效。优点:(1)针对图量身定制的强归纳偏差,支持跨图域的泛化。(2)由于局部化计算,大规模图的计算效率很高。(3) 非常适合涉及图主题的结构任务。局限性:(1)编码丰富语义信息(例如文本或多模态节点属性)的能力有限。(2) 在没有重大架构或及时重新设计的情况下难以扩展到域外任务。(2)由于本地消息传递约束,与远程依赖关系作斗争。
Language Model-Based GFMs. LLM-based approaches [231] leverage pretrained language models to reason over graph-structured data via textual or structured prompts. These methods translate graphs into sequences, enabling zero-shot or in-context reasoning across heterogeneous tasks. Advantages: (1) Exceptional generalization across tasks and domains via instruction following and prompt-based adaptation. (2) Flexibility to support multimodal data (e.g., text and images). (3) High interpretability due to natural language outputs and reasoning chains. Limitations: (1) Loss of structural fidelity when graph topology is linearized into token sequences. (2) Heavy computational cost due to large model sizes and input serialization overhead. (3) Lack of built-in graph inductive biases, requiring extensive template engineering for structural tasks. 基于语言模型的 GFM 基于 LLM 的方法[231]利用预训练的语言模型通过文本或结构化提示对图结构数据进行推理。这些方法将图形转换为序列,从而实现跨异构任务的零样本或上下文推理。优点:(1) 通过指令遵循和基于提示的适应,跨任务和跨领域的卓越泛化。(2)灵活支持多模态数据(如文本和图像)。(3)由于自然语言输出和推理链而具有很高的可解释性。局限性:(1)当图拓扑线性化为标记序列时,结构保真度会丧失。(2)由于模型大小大和输入序列化开销,计算成本高。(3)缺乏内置的图归纳偏差,需要大量的模板工程来完成结构任务。
Graph-Language Co-Training GFMs. Hybrid models integrate GNNs and LLMs to unify the structural reasoning of GNNs with the semantic and general-purpose reasoning capabilities of LLMs. Examples include LLaGA [78], GraphTranslator [103], and GraphCLIP [104]. These models project graph embeddings into the LLM token space, enabling multimodal and cross-domain inference. Advantages: (1) Combines structural awareness and semantic richness, improving task transferability and expressiveness. (2) Supports both structured and unstructured inputs, making it ideal for real-world applications (e.g., recommendation, knowledge graphs). (3) Enables flexible adaptation through prompting, fine-tuning, or contrastive alignment techniques. Limitations: (1) Complex model architecture with high computational and memory demands. (2) Requires careful alignment between graph embeddings and token representations. (3) Integration of supervision signals across modalities remains a challenging research problem. 图语言协同训练 GFM。混合模型集成了 GNN 和 LLM,以将 GNN 的结构推理与 LLM 的语义和通用推理能力统一起来。例如 LLaGA [78]、GraphTranslator [103]和 GraphCLIP [104]。这些模型将图嵌入投射到 LLM 标记空间中,从而实现多模态和跨域推理。优点:(1)结合结构意识和语义丰富度,提高任务可转移性和表达性。(2) 支持结构化和非结构化输入,非常适合实际应用(例如推荐、知识图谱)。(3) 通过提示、微调或对比对齐技术实现灵活适应。局限性:(1)模型架构复杂,对计算和内存要求高。(2)需要在图嵌入和标记表示之间进行仔细的对齐。(3)跨模态的监督信号整合仍然是一个具有挑战性的研究问题。
6 Task-Specific Graph Foundation Models 6 个特定于任务的图基础模型
6.1 Design Principle 6.1 设计原则
Task-specific GFMs are designed to operate across multiple domains while focusing on solving a single task (e.g., node classification, link prediction, graph generation). Although graphs from different domains may share inductive biases relevant to a particular downstream task, their structural properties can vary significantly. Therefore, an effective task-specific GFM must not only capture task-aware invariances across domains but also align disparate graph distributions to ensure robust generalization. This section outlines the core characteristics and design principles essential for developing task-specific GFMs. 特定于任务的 GFM 旨在跨多个域运行,同时专注于解决单个任务(例如,节点分类、链接预测、图形生成)。尽管来自不同领域的图可能共享与特定下游任务相关的归纳偏差,但它们的结构特性可能会有很大差异。因此,有效的特定任务 GFM 不仅必须捕获跨域的任务感知不变性,还必须对齐不同的图分布以确保稳健的泛化。本节概述了开发特定于任务的 GFM 所必需的核心特征和设计原则。
Task-Specific Inductive Bias. Different graph tasks impose distinct inductive biases on the learning process. For instance, node classification often relies on homophily and heterophily principles, link prediction emphasizes local and global connectivity patterns, while graph classification focuses on recognizing meaningful substructures. Despite structural variations across graphs, task-related representations tend to exhibit shared patterns. By leveraging this observation, a well-designed GFM can learn domain-invariant representations while preserving essential task-specific knowledge, ensuring adaptability across diverse datasets. 特定于任务的归纳偏差。不同的图任务对学习过程施加了不同的归纳偏差。例如,节点分类通常依赖于同质和异质原理,链路预测强调局部和全局连接模式,而图分类则侧重于识别有意义的子结构。尽管图之间的结构存在差异,但与任务相关的表示往往表现出共享模式。通过利用这一观察结果,精心设计的 GFM 可以学习领域不变表示,同时保留基本的任务特定知识,确保跨不同数据集的适应性。
Cross-Domain Alignment. Graphs from different domains differ in their structural properties, node types, feature distributions, and connectivity patterns. To ensure generalization to unseen graph distributions with minimal retraining, the model must learn representations that remain robust across domains. Achieving this requires techniques such as domain-adaptive architectures, where GNN layers dynamically adjust based on input domain characteristics, or domain-specific feature modulation through attention mechanisms and parameter-efficient adaptation. Such strategies help mitigate performance degradation when transitioning between domains. 跨域对齐。来自不同域的图在结构属性、节点类型、特征分布和连通性模式方面有所不同。为了确保以最少的再训练推广到看不见的图形分布,模型必须学习跨域保持稳健的表示。实现这一目标需要诸如领域自适应架构等技术,其中 GNN 层根据输入域特性动态调整,或者通过注意力机制和参数高效自适应进行特定领域的特征调制。此类策略有助于减轻域之间转换时的性能下降。
Balancing Domain Generalization and Task-Specific Adaptation. A fundamental challenge in taskspecific GFMs is balancing domain invariance with task-specific adaptation. Overemphasizing domain understanding may introduce inductive biases that favor specific domains at the expense of task-specific generalizations, whereas fully generalized models risk overlooking critical domain-specific patterns, thereby reducing performance. To strike this balance, domain generalization techniques such as domain regularization, normalization, and adversarial training can be employed. These approaches help prevent domain overfitting and catastrophic forgetting, enabling models to maintain both cross-domain adaptability and high task-specific performance. 平衡领域泛化和特定任务适应。特定任务 GFM 的一个基本挑战是平衡领域不变性与特定任务适应。过度强调领域理解可能会引入归纳偏差,以牺牲特定任务的泛化为代价来偏向特定领域,而完全泛化模型可能会忽略关键的特定领域模式,从而降低性能。为了实现这种平衡,可以采用领域泛化技术,例如领域正则化、归一化和对抗训练。这些方法有助于防止域过度拟合和灾难性遗忘,使模型能够保持跨域适应性和特定于任务的高性能。
In the following subsections, we systematically explore the design principles of GFMs across six fundamental graph-related tasks: node classification, link prediction, graph classification, question answering, anomaly detection, and recommendation. For each task, we examine the underlying design philosophies, key challenges, state-of-the-art methodologies, and promising directions for future research. 在以下小节中,我们系统地探讨了 GFM 的设计原则,涵盖了六个与图相关的基本任务:节点分类、链路预测、图分类、问答、异常检测和推荐。对于每项任务,我们都会研究基本的设计理念、主要挑战、最先进的方法以及未来研究的有希望的方向。
6.2 Node-level Task 6.2 节点级任务
Node-level tasks focus on predicting the properties or roles of nodes within a graph. Node classification is one of the fundamental node-level downstream tasks for graph learning that involves prediction on individual nodes within graphs, which gains significant attention in GFMs [250,251]. The node classification task is one of the most common node-level downstream tasks, which focuses on inferring the node labels [23,22,252,253]. Formally, given a graph G={V,E}\mathcal{G}=\{\mathcal{V}, \mathcal{E}\} with node features X\mathcal{X}, the node classification task aims to learn a function f:VrarrYf: \mathcal{V} \rightarrow \mathcal{Y} that maps each node to label hat(y)_(i)inY\hat{y}_{i} \in \mathcal{Y} to minimize the discrepancy between hat(y)_(i)\hat{y}_{i} and the ground truth y_(i)y_{i}. Conventional GNNs employ message-passing frameworks that aggregate feature information from neighboring nodes to the center node, and further utilize a classifier, such as MLP, to predict node labels based on learned representations [254]. However, GFMs for node-level tasks still face the following challenges: (i) graph heterogeneity, (ii) cross-domain alignment, and (iii) balancing between domain generalization and task-specific adaptation. In this section, we will discuss existing methods for dealing with these challenges. 节点级任务侧重于预测图中节点的属性或角色。节点分类是图学习的基本节点级下游任务之一,涉及对图中单个节点的预测,这在 GFM 中引起了广泛关注[250,251]。节点分类任务是最常见的节点级下游任务之一,它侧重于推断节点标签[23,22,252,253]。形式上,给定一个 G={V,E}\mathcal{G}=\{\mathcal{V}, \mathcal{E}\} 具有节点特征 X\mathcal{X} 的图,节点分类任务旨在学习一个函数,该函数 f:VrarrYf: \mathcal{V} \rightarrow \mathcal{Y} 将每个节点映射到标签, hat(y)_(i)inY\hat{y}_{i} \in \mathcal{Y} 以最大限度地减少与基本事实之间的 hat(y)_(i)\hat{y}_{i} 差异 y_(i)y_{i} 。传统的 GNN 采用消息传递框架,将特征信息从相邻节点聚合到中心节点,并进一步利用分类器(如 MLP)根据学习到的表示来预测节点标签[254]。然而,节点级任务的 GFM 仍然面临以下挑战:(i)图异质性,(ii)跨域对齐,以及(iii)域泛化和任务特定适应之间的平衡。在本节中,我们将讨论应对这些挑战的现有方法。
Table 5: Summary of task-specific GFMs on node-level tasks. 表 5:节点级任务上特定于任务的 GFM 摘要。
Method Name 方法名称
Task 任务
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
ENGINE [232] 发动机 [232]
Node-Level 节点级
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
LLM-GNN [233] 法学硕士-GNN [233]
Node-Level 节点级
GNN
Supervised 监督
Finetune 微调
N/A 不适用
Data - Augment 数据 - 增强
N/A 不适用
Link 链接
GCNMF [234]
Node-Level 节点级
GNN
Supervised 监督
Finetune 微调
Data - Others 数据 - 其他
N/A 不适用
N/A 不适用
-
PCFI [235]
Node-Level 节点级
GNN
Supervised 监督
Finetune 微调
Data - Others 数据 - 其他
N/A 不适用
N/A 不适用
Link 链接
GRAFENNE [236] 格拉芬 [236]
Node-Level 节点级
GNN
Supervised 监督
Finetune 微调
Data - Others 数据 - 其他
N/A 不适用
N/A 不适用
-
GraphAny [66] 图形任意 [66]
Node-Level 节点级
GNN
Supervised 监督
N/A 不适用
Model - Projection 模型 - 投影
N/A 不适用
Implicit - Regularizer 隐式 - 正则化器
Link 链接
E-LLaGNN [237]
Node-Level 节点级
GNN
Supervised 监督
Finetune, Adaptation 微调、适应
Data - Text Attribute 数据 - 文本属性
Model - Retriever 型号 - Retriever
N/A 不适用
-
GraphFM [238] 图调频 [238]
Node-Level 节点级
GNN
Supervised 监督
Finetune 微调
Model - Projection 模型 - 投影
Model - Structure Learning 模型 - 结构学习
N/A 不适用
-
TPP [239]
Node-Level 节点级
GNN
Supervised 监督
Test-time Adaptation 测试时间适应
N/A 不适用
Model - Prompt Learning 模型 - 提示学习
N/A 不适用
Link 链接
SimMLP [158]
Node-Level 节点级
GNN
Contrastive 对比
Finetune 微调
N/A 不适用
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
FUG
Node-Level 节点级
GNN
Contrastive 对比
Finetune 微调
Data - SVD 数据 - SVD
N/A 不适用
N/A 不适用
Link 链接
GraphControl [240] 图形控制 [240]
Node-Level 节点级
GNN
Contrastive 对比
Finetune 微调
Model - Projection 模型 - 投影
Model - Prompt Learning 模型 - 提示学习
N/A 不适用
Link 链接
ZeroG [24] 零 G [24]
Node-Level 节点级
GNN
Contrastive 对比
Finetune, Prototype 微调,原型
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
Explicit - Subgraph 显式 - 子图
Link 链接
GraphLoRA [154] 图谱 [154]
Node-Level 节点级
GNN
Contrastive 对比
Finetune 微调
Model - Projection 模型 - 投影
Model - MoE, Model Structure Learning 模型 - MoE,关于模型结构学习的标准
Implicit - Regularizer 隐式 - 正则化器
Link 链接
MDGFM [241] 千年发展目标 [241]
Node-Level 节点级
GNN
Contrastive 对比
Graph Prompting 图形提示
Model - Projection 模型 - 投影
Model - Prompt Learning 模型 - 提示学习
Explicit - Link 显式 - 链接
-
GPT-GNN [114]
Node-Level 节点级
GNN
Generative 生成
Finetune 微调
N/A 不适用
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
GSPT [242]
Node-Level 节点级
GNN
Generative 生成
Finetune, Prototype 微调,原型
Data - Text Attribute 数据 - 文本属性
Loss - Multi-task 损失 - 多任务处理
N/A 不适用
-
GPPT [74] 全球物理物理学研究 [74]
Node-Level 节点级
GNN
Generative 生成
Graph Prompting 图形提示
N/A 不适用
Loss - Pretrain 损失 - 预训练
Explicit - Link 显式 - 链接
Link 链接
GCOPE [25]
Node-Level 节点级
GNN
Hybrid 混合
Graph Prompting 图形提示
Model - Projection 模型 - 投影
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
GDL4LLM [243]
Node-Level 节点级
LLM
Generative 生成
Finetune 微调
Data - Node Property 数据 - 节点属性
N/A 不适用
N/A 不适用
-
GraphText [231] 图形文本 [231]
Node-Level 节点级
LLM
Generative 生成
Test-time Adaptation 测试时间适应
Data - Text Attribute 数据 - 文本属性
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
Link 链接
LOGIN [244] 登录 [244]
Node-Level 节点级
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Data - Augment 日期 - 增加
Explicit - QA 显式 - QA
Link 链接
LangGSL [245] 朗 GSL [245]
Node-Level 节点级
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Structure Learning 模型 - 结构学习
N/A 不适用
-
Cella [246] 细胞 [246]
Node-Level 节点级
GNN + LLM GNN + 法学硕士
Supervised 监督
Test-time Adaptation 测试时间适应
Data - Text Attribute 数据 - 文本属性
Model - Structure Learning 模型 - 结构学习
N/A 不适用
-
G2P2 [247]
Node-Level 节点级
GNN + LLM GNN + 法学硕士
Contrastive 对比
Graph Prompting 图形提示
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
N/A 不适用
-
LangTopo [248] 朗托波 [248]
Node-Level 节点级
GNN + LLM GNN + 法学硕士
Generative 生成
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Data - Augment 日期 - 增加
N/A 不适用
-
Dr.E [249] E 博士 [249]
Node-Level 节点级
GNN + LLM GNN + 法学硕士
Generative 生成
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - QA 显式 - QA
Link 链接
Method Name Task Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
ENGINE [232] Node-Level GNN Supervised Finetune Data - Text Attribute Loss - Pretrain N/A Link
LLM-GNN [233] Node-Level GNN Supervised Finetune N/A Data - Augment N/A Link
GCNMF [234] Node-Level GNN Supervised Finetune Data - Others N/A N/A -
PCFI [235] Node-Level GNN Supervised Finetune Data - Others N/A N/A Link
GRAFENNE [236] Node-Level GNN Supervised Finetune Data - Others N/A N/A -
GraphAny [66] Node-Level GNN Supervised N/A Model - Projection N/A Implicit - Regularizer Link
E-LLaGNN [237] Node-Level GNN Supervised Finetune, Adaptation Data - Text Attribute Model - Retriever N/A -
GraphFM [238] Node-Level GNN Supervised Finetune Model - Projection Model - Structure Learning N/A -
TPP [239] Node-Level GNN Supervised Test-time Adaptation N/A Model - Prompt Learning N/A Link
SimMLP [158] Node-Level GNN Contrastive Finetune N/A Loss - Pretrain N/A Link
FUG Node-Level GNN Contrastive Finetune Data - SVD N/A N/A Link
GraphControl [240] Node-Level GNN Contrastive Finetune Model - Projection Model - Prompt Learning N/A Link
ZeroG [24] Node-Level GNN Contrastive Finetune, Prototype Data - Text Attribute Loss - Pretrain Explicit - Subgraph Link
GraphLoRA [154] Node-Level GNN Contrastive Finetune Model - Projection Model - MoE, Model Structure Learning Implicit - Regularizer Link
MDGFM [241] Node-Level GNN Contrastive Graph Prompting Model - Projection Model - Prompt Learning Explicit - Link -
GPT-GNN [114] Node-Level GNN Generative Finetune N/A Loss - Pretrain N/A Link
GSPT [242] Node-Level GNN Generative Finetune, Prototype Data - Text Attribute Loss - Multi-task N/A -
GPPT [74] Node-Level GNN Generative Graph Prompting N/A Loss - Pretrain Explicit - Link Link
GCOPE [25] Node-Level GNN Hybrid Graph Prompting Model - Projection Loss - Pretrain N/A Link
GDL4LLM [243] Node-Level LLM Generative Finetune Data - Node Property N/A N/A -
GraphText [231] Node-Level LLM Generative Test-time Adaptation Data - Text Attribute Data - Augment Explicit - QA Link
LOGIN [244] Node-Level GNN + LLM Supervised Finetune Data - Text Attribute Data - Augment Explicit - QA Link
LangGSL [245] Node-Level GNN + LLM Supervised Finetune Data - Text Attribute Model - Structure Learning N/A -
Cella [246] Node-Level GNN + LLM Supervised Test-time Adaptation Data - Text Attribute Model - Structure Learning N/A -
G2P2 [247] Node-Level GNN + LLM Contrastive Graph Prompting Data - Text Attribute Loss - Pretrain N/A -
LangTopo [248] Node-Level GNN + LLM Generative Finetune Data - Text Attribute Data - Augment N/A -
Dr.E [249] Node-Level GNN + LLM Generative Finetune Data - Text Attribute Model - Codebook Explicit - QA Link| Method Name | Task | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ENGINE [232] | Node-Level | GNN | Supervised | Finetune | Data - Text Attribute | Loss - Pretrain | N/A | Link |
| LLM-GNN [233] | Node-Level | GNN | Supervised | Finetune | N/A | Data - Augment | N/A | Link |
| GCNMF [234] | Node-Level | GNN | Supervised | Finetune | Data - Others | N/A | N/A | - |
| PCFI [235] | Node-Level | GNN | Supervised | Finetune | Data - Others | N/A | N/A | Link |
| GRAFENNE [236] | Node-Level | GNN | Supervised | Finetune | Data - Others | N/A | N/A | - |
| GraphAny [66] | Node-Level | GNN | Supervised | N/A | Model - Projection | N/A | Implicit - Regularizer | Link |
| E-LLaGNN [237] | Node-Level | GNN | Supervised | Finetune, Adaptation | Data - Text Attribute | Model - Retriever | N/A | - |
| GraphFM [238] | Node-Level | GNN | Supervised | Finetune | Model - Projection | Model - Structure Learning | N/A | - |
| TPP [239] | Node-Level | GNN | Supervised | Test-time Adaptation | N/A | Model - Prompt Learning | N/A | Link |
| SimMLP [158] | Node-Level | GNN | Contrastive | Finetune | N/A | Loss - Pretrain | N/A | Link |
| FUG | Node-Level | GNN | Contrastive | Finetune | Data - SVD | N/A | N/A | Link |
| GraphControl [240] | Node-Level | GNN | Contrastive | Finetune | Model - Projection | Model - Prompt Learning | N/A | Link |
| ZeroG [24] | Node-Level | GNN | Contrastive | Finetune, Prototype | Data - Text Attribute | Loss - Pretrain | Explicit - Subgraph | Link |
| GraphLoRA [154] | Node-Level | GNN | Contrastive | Finetune | Model - Projection | Model - MoE, Model Structure Learning | Implicit - Regularizer | Link |
| MDGFM [241] | Node-Level | GNN | Contrastive | Graph Prompting | Model - Projection | Model - Prompt Learning | Explicit - Link | - |
| GPT-GNN [114] | Node-Level | GNN | Generative | Finetune | N/A | Loss - Pretrain | N/A | Link |
| GSPT [242] | Node-Level | GNN | Generative | Finetune, Prototype | Data - Text Attribute | Loss - Multi-task | N/A | - |
| GPPT [74] | Node-Level | GNN | Generative | Graph Prompting | N/A | Loss - Pretrain | Explicit - Link | Link |
| GCOPE [25] | Node-Level | GNN | Hybrid | Graph Prompting | Model - Projection | Loss - Pretrain | N/A | Link |
| GDL4LLM [243] | Node-Level | LLM | Generative | Finetune | Data - Node Property | N/A | N/A | - |
| GraphText [231] | Node-Level | LLM | Generative | Test-time Adaptation | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| LOGIN [244] | Node-Level | GNN + LLM | Supervised | Finetune | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| LangGSL [245] | Node-Level | GNN + LLM | Supervised | Finetune | Data - Text Attribute | Model - Structure Learning | N/A | - |
| Cella [246] | Node-Level | GNN + LLM | Supervised | Test-time Adaptation | Data - Text Attribute | Model - Structure Learning | N/A | - |
| G2P2 [247] | Node-Level | GNN + LLM | Contrastive | Graph Prompting | Data - Text Attribute | Loss - Pretrain | N/A | - |
| LangTopo [248] | Node-Level | GNN + LLM | Generative | Finetune | Data - Text Attribute | Data - Augment | N/A | - |
| Dr.E [249] | Node-Level | GNN + LLM | Generative | Finetune | Data - Text Attribute | Model - Codebook | Explicit - QA | Link |
6.2.1 Handling Graph Heterogeneity 6.2.1 处理图异构性
Heterogeneous Feature Space. To handle the heterogeneous feature space challenge, existing works [141, 254, 255] widely adopt MLPs to map features into a shared latent space, while these approaches may become ineffective in general cases, such as missing attribute features, dynamic attribute feature space, etc. In light of this, earlier works [234, 235] focus on missing feature imputation problems in graph representation learning, where several node features are absent or the distribution is different from others. GCNmf [234] utilizes the Gaussian Mixture Model (GMM) [256] to model missing features. Particularly, it computes the expected activation of neurons in the first layer of GNN to handle the missing feature issue and learning graph representation simultaneously. Another work, PCPI [235] advances GCNmf in more challenging scenarios, i.e., high rates of missing features, by introducing pseudo-confidence, a channel-wise shortest path distance between missing feature nodes and the nearest known feature nodes. In detail, PCPI proposes a feature imputation scheme that performs channel-wise inter-node diffusion to recover the missing feature and node-wise inter-channel propagation to refine the node features. Besides the missing feature imputation problems, several studies [236, 66, 257] have explored the dynamic feature sets issue in graphs. For example, GRAFENNE [236] implements an allotropic transformation on graphs, decoupling nodes, and attribute features via bipartite encoding. GraphAny [66] computes fully-inductive features based on interactions between graph kernels and achieves inductive generalization across heterogeneous feature spaces. Additionally, GRAFENNE introduces a bipartite message-passing framework for these allotropically transformed graphs, allowing the model parameters size to remain independent of feature dimensions. This approach alleviates the heterogeneous and diverse feature dimensions issues in graphs, making the model adaptable to unseen nodes and features. Moreover, FUG [257] introduces a feature-universal contrastive pre-training strategy that avoids the need for model rebuilding and data reshaping to handle the feature heterogeneity issue. Specifically, it designs an encoder with contrastive constraints to emulate the Principle Component Analysis [258] generation of the basis transformation matrix, which is utilized to adapt features in various spaces. 异构特征空间。为了应对异构特征空间的挑战,现有工作[141,254,255]广泛采用 MLP 将特征映射到共享潜在空间中,而这些方法在一般情况下可能无效,如缺失属性特征、动态属性特征空间等。有鉴于此,早期的工作[234,235]侧重于图表示学习中的缺失特征插补问题,其中缺少多个节点特征或分布与其他特征不同。GCNmf [234]利用高斯混合模型(GMM)[256]对缺失特征进行建模。特别是,它计算 GNN 第一层神经元的预期激活,以同时处理缺失的特征问题和学习图表示。另一项工作 PCPI [235]通过引入伪置信度(缺失特征节点与最近的已知特征节点之间的通道最短路径距离)在更具挑战性的场景(即高缺失特征率)中推进了 GCNmf。具体而言,PCPI 提出了一种特征插补方案,该方案通过逐通道节点间扩散来恢复缺失的特征,并执行逐节点通道间传播来细化节点特征。除了缺失特征插补问题外,一些研究[236,66,257]还探讨了图中的动态特征集问题。例如,GRAFENNE [236]通过二分编码对图、解耦节点和属性特征实现了同素异形体变换。GraphAny [66]基于图核之间的相互作用计算全归纳特征,并实现跨异构特征空间的归纳推广。 此外,GRAFENNE 为这些同素变换图引入了二分消息传递框架,允许模型参数大小与特征维度保持独立。这种方法缓解了图中异构和多样化的特征维度问题,使模型能够适应看不见的节点和特征。此外,FUG [257]引入了一种特征通用对比预训练策略,避免了模型重建和数据重塑来处理特征异构性问题的需要。具体来说,它设计了一个具有对比约束的编码器来模拟基变换矩阵的主成分分析[258]生成,用于适应各种空间中的特征。
Structure Heterogeneity. Besides the heterogeneous feature space issue, structure heterogeneity is another challenge in building GFMs for downstream node classification tasks [259, 253]. GPT-GNN [114], one of the earlier works, proposes a generative pre-train GNNs strategy, which can be factorized into node attribute and 结构异质性。除了异质特征空间问题外,结构异质性是为下游节点分类任务构建 GFM 的另一个挑战[259,253]。GPT-GNN [114]是较早的工作之一,提出了一种生成式预训练 GNNs 策略,该策略可以分解为节点属性和
edge generation steps, to capture the inherent dependency between node attributes and graph structure over unlabeled graph data. Building on the pre-training concept, GPPT [74] introduces a prompt mechanism in the fine-tuning state, which modifies the input nodes to token pairs that are directly applied to pre-trained GNNs. In considering the deployment of GNNs in latency-sensitive applications, simMLP [158] introduces a simple yet efficient self-supervised framework that learns Multiple-Layer Perceptrons (MLPs) on graphs that align the representations encoded by graph context-aware GNNs and neighborhood dependency-free MLPs, thereby fully integrating the structural information into MLPs. 边缘生成步骤,以捕获节点属性和图结构之间对未标记图形数据的固有依赖关系。GPPT [74]在预训练概念的基础上,引入了一种微调状态下的提示机制,该机制将输入节点修改为直接应用于预训练 GNN 的标记对。在考虑在延迟敏感型应用中部署 GNN 时,simMLP [158]引入了一个简单而高效的自监督框架,该框架在图上学习多层感知器(MLP),这些感知器对齐图上下文感知 GNN 和邻域依赖性无 MLP 编码的表示,从而将结构信息完全集成到 MLP 中。
6.2.2 Cross-domain Alignment 6.2.2 跨域对齐
GNN-based Models. The aforementioned methods primarily pre-train and fine-tune GNNs over an identical graph dataset, which is trivial to adapt cross-domain scenarios where graphs with various node features and structure types. To handle this, one line of approaches [114, 74] employs pure GNNs for GFMs development. GCOPE [25] designs an “All in One and One for All” framework that trains a single model on diverse datasets to handle versatile downstream tasks across a variety of domains. Meanwhile, TPP [239] introduces Graph Class-Incremental Learning (GCIL), which employs the Laplacian smoothing approach to generate taskspecific prototypical embeddings for node classification tasks in various graphs. TPP theoretically analyzes the task-specific prototypes of the same graph task are nearly the same with a large smoothing step, while those of different tasks are distinct due to differences in graph structure and node attributes. Moreover, TPP adopts a graph prompting approach for GCIL that learns a small discriminative graph prompt for each task, essentially resulting in a separate classification mode for each graph task, thereby ensuring the trained GCIL model is both reply-free and forget-free. GraphControl [240] leverages universal structural pre-trained models to align the input space across various graphs and incorporate unique characteristics of target data as conditional inputs. Afterward, these conditions are integrated into the model during fine-tuning or prompt turning through ControlNet [260], facilitating various downstream node classifications. GraphLoRA [154] is another “pre-train, fine-tune” framework that utilizes Structure-aware Maximum Mean Discrepancy [261] to align divergent node feature distributions across source and target graphs. Besides the pre-trained GNN, GraphLoRA injects another small GNN in the fine-tuning stage to effectively bridge structural distribution gaps while mitigating catastrophic forgetting. Similarly, MDGFM [241] applies token operators for semantic alignment between graphs, and further refines each source domain using graph structure learning (GSL), integrating both feature and topology information in the pre-train stage. Moreover, GraphAny [66] attempts a full-inductive setup with respect to new structures, features, and label spaces, which includes two components: LinearGNNs and an inductive attention module. The LinearGNNs enable efficient inductive inference on unseen graphs, while the inductive attention module learns to adaptively aggregate predictions from multiple LinearGNNs. 基于 GNN 的模型。上述方法主要在相同的图数据集上对 GNN 进行预训练和微调,这对于适应具有各种节点特征和结构类型的图的跨域场景来说是微不足道的。为了解决这个问题,一种方法[114,74]采用纯 GNN 进行 GFM 开发。GCOPE [25]设计了一个“多合一和一合一”框架,该框架在不同的数据集上训练单个模型,以处理跨多个领域的多功能下游任务。同时,TPP [239]引入了图类增量学习(GCIL),它采用拉普拉斯平滑方法为各种图中的节点分类任务生成特定于任务的原型嵌入。TPP 从理论上分析,同一图任务的任务特定原型几乎相同,平滑步长较大,而不同任务的原型则由于图结构和节点属性的差异而不同。此外,TPP 对 GCIL 采用了图提示方法,为每个任务学习一个小的判别图提示,本质上为每个图任务产生单独的分类模式,从而确保训练好的 GCIL 模型既无回复又无遗忘。GraphControl [240]利用通用结构预训练模型来对齐各种图的输入空间,并将目标数据的独特特征合并为条件输入。之后,这些条件在微调或通过 ControlNet[260]的提示转向过程中被集成到模型中,从而促进各种下游节点分类。GraphLoRA [154]是另一个“预训练、微调”框架,它利用结构感知最大平均差异[261]来对齐源图和目标图之间的不同节点特征分布。 除了预训练的 GNN 之外,GraphLoRA 在微调阶段注入了另一个小 GNN,以有效弥合结构分布差距,同时减轻灾难性遗忘。同样,MDGFM [241]应用标记运算符进行图之间的语义对齐,并使用图结构学习(GSL)进一步细化每个源域,在预训练阶段整合特征和拓扑信息。此外,GraphAny [66]尝试对新的结构、特征和标签空间进行全归纳设置,其中包括两个组件:线性 GNN 和归纳注意力模块。LinearGNN 能够对看不见的图进行有效的归纳推理,而归纳注意力模块则学习自适应地聚合来自多个 LinearGNN 的预测。
Graph Transformers. Another line of approaches directly employs transformers for node representation learning in various domains. GSPT [242] is a feature-centric pre-training framework for text-attributed graphs (TAGs) that leverages a standard Transformer [55] to learn a unified model for node representations. GraphFM [238] extends GSPT to multi-domain scenarios via Perceiver-based encoder [262] to compress domain-specific features into a common latent space. GDL4LLM [243] treats graphs as a new language, which translates graphs into a graph language corpus and pre-trains transformer-based LLMs on this corpus to enable understanding of graph structures. The LLM is further fine-tuned as the next token prediction task for downstream node classification tasks. 图转换器。另一种方法是直接使用 Transformer 在各个领域进行节点表示学习。GSPT [242]是一个以特征为中心的文本属性图(TAG)预训练框架,它利用标准 Transformer [55]来学习节点表示的统一模型。GraphFM [238]通过基于感知器的编码器[262]将 GSPT 扩展到多域场景,将特定域特征压缩到一个公共的潜在空间中。GDL4LLM [243]将图视为一种新语言,它将图翻译成图语言语料库,并在该语料库上预训练基于 Transformer 的 LLM,以实现对图结构的理解。LLM 作为下游节点分类任务的下一个代币预测任务进行了进一步微调。
Hybrid Methods. Taking advantage of the exceptional natural language understanding capabilities of LLMs [263, 255], several works [247] also concentrate on hybrid methods that leverage both GNNs and LLMs to develop GFMs Hybrid methods often explore Text-Attributed Graphs (TAGs) datasets [264, 265] and can be further divided into two groups based on the types of the classifier mechanisms for downstream tasks, i.e., GNNs or LLMs. G2P2 [247] utilizes GNN as a predictor for text classification that jointly pre-trains text and graph encoders via three graph interaction-based contrastive strategies: (i) text-node interactions, (ii) text-summary interactions, and (iii) node-summary interactions. Another work, ENGINE [232] provides a parameter and memory-efficient fine-tuning method for textual graphs that incorporates GNN with LLM via side structure. Moreover, LLM-GNN [233] introduces a label-free node classification pipeline, i.e., “LLMs-asannotator”, which merely trains GNNs over a small ratio of annotated nodes by LLMs for node classification tasks on remaining nodes. Following the “LLMs-as-annotator” concept, several approaches [244, 246] have 混合方法。利用 LLM 卓越的自然语言理解能力[263,255],一些工作[247]也专注于利用 GNN 和 LLM 来开发 GFM 的混合方法混合方法通常探索文本属性图(TAG)数据集[264,265],并且可以根据下游任务的分类器机制的类型进一步分为两组, 即 GNN 或 LLMs。G2P2 [247]利用 GNN 作为文本分类的预测器,通过三种基于图交互的对比策略联合预训练文本和图形编码器:(i)文本-节点交互,(ii)文本-摘要交互,以及(iii)节点-摘要交互。另一项工作,ENGINE [232]为文本图提供了一种参数和内存高效的微调方法,该方法通过侧结构将 GNN 与 LLM 相结合。此外,LLM-GNN [233]引入了一种无标签节点分类管道,即“LLMs-asannotator”,它只是由 LLM 在一小部分注释节点上训练 GNN,以完成其余节点上的节点分类任务。遵循“LLMs-as-annotator”概念,几种方法[244,246]具有
Table 6: Summary of task-specific GFMs on link-level tasks. 表 6:链路级任务上特定于任务的 GFM 摘要。
Method Name 方法名称
Tasks 任务
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
MOTIF [271] 主题 [271]
Link-Level 链路级
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
-
ULTRA [28] 超 [28]
Link-Level 链路级
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
Link 链接
KG-ICL [272]
Link-Level 链路级
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
Link 链接
UltraQuery [29] 超查询 [29]
Link-Level 链路级
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
Link 链接
Cross-GCN [273] 交叉 GCN [273]
Link-Level 链路级
GNN
Supervised 监督
Finetune 微调
N/A 不适用
Loss - Auxiliary 损失 - 辅助
N/A 不适用
-
UniLP [274]
Link-Level 链路级
GNN
Supervised 监督
Finetune 微调
N/A 不适用
N/A 不适用
Explicit - Subgraph 显式 - 子图
-
GITL [275]
Link-Level 链路级
GNN
Supervised 监督
Distillation 蒸馏
N/A 不适用
Loss - Auxiliary 损失 - 辅助
N/A 不适用
-
CNHHEC [276]
Link-Level 链路级
GNN
Supervised 监督
Test-time Adaptation 测试时间适应
N/A 不适用
N/A 不适用
Implicit - Regularizer 隐式 - 正则化器
-
GraphFormers [277] 图形形成器 [277]
Link-Level 链路级
GNN
Contrastive 对比
Finetune, Prototype 微调,原型
Data - Text Attribute 数据 - 文本属性
N/A 不适用
N/A 不适用
Link 链接
ISDEA+ [278] 国际标准行业+[278]
Link-Level 链路级
GNN
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
-
MTDEA [279]
Link-Level 链路级
GNN
Contrastive 对比
Finetune 微调
N/A 不适用
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
-
Edgeformers [280] 封边机 [280]
Link-Level 链路级
GNN
Hybrid 混合
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
N/A 不适用
Link 链接
Method Name Tasks Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
MOTIF [271] Link-Level GNN Supervised Finetune Data - Text Attribute Model - Codebook Explicit - Link -
ULTRA [28] Link-Level GNN Supervised Finetune Data - Text Attribute Model - Codebook Explicit - Link Link
KG-ICL [272] Link-Level GNN Supervised Finetune Data - Text Attribute Model - Codebook Explicit - Link Link
UltraQuery [29] Link-Level GNN Supervised Finetune Data - Text Attribute Model - Codebook Explicit - Link Link
Cross-GCN [273] Link-Level GNN Supervised Finetune N/A Loss - Auxiliary N/A -
UniLP [274] Link-Level GNN Supervised Finetune N/A N/A Explicit - Subgraph -
GITL [275] Link-Level GNN Supervised Distillation N/A Loss - Auxiliary N/A -
CNHHEC [276] Link-Level GNN Supervised Test-time Adaptation N/A N/A Implicit - Regularizer -
GraphFormers [277] Link-Level GNN Contrastive Finetune, Prototype Data - Text Attribute N/A N/A Link
ISDEA+ [278] Link-Level GNN Contrastive Finetune Data - Text Attribute Model - Codebook Explicit - Link -
MTDEA [279] Link-Level GNN Contrastive Finetune N/A Model - Codebook Explicit - Link -
Edgeformers [280] Link-Level GNN Hybrid Finetune Data - Text Attribute N/A N/A Link| Method Name | Tasks | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| MOTIF [271] | Link-Level | GNN | Supervised | Finetune | Data - Text Attribute | Model - Codebook | Explicit - Link | - |
| ULTRA [28] | Link-Level | GNN | Supervised | Finetune | Data - Text Attribute | Model - Codebook | Explicit - Link | Link |
| KG-ICL [272] | Link-Level | GNN | Supervised | Finetune | Data - Text Attribute | Model - Codebook | Explicit - Link | Link |
| UltraQuery [29] | Link-Level | GNN | Supervised | Finetune | Data - Text Attribute | Model - Codebook | Explicit - Link | Link |
| Cross-GCN [273] | Link-Level | GNN | Supervised | Finetune | N/A | Loss - Auxiliary | N/A | - |
| UniLP [274] | Link-Level | GNN | Supervised | Finetune | N/A | N/A | Explicit - Subgraph | - |
| GITL [275] | Link-Level | GNN | Supervised | Distillation | N/A | Loss - Auxiliary | N/A | - |
| CNHHEC [276] | Link-Level | GNN | Supervised | Test-time Adaptation | N/A | N/A | Implicit - Regularizer | - |
| GraphFormers [277] | Link-Level | GNN | Contrastive | Finetune, Prototype | Data - Text Attribute | N/A | N/A | Link |
| ISDEA+ [278] | Link-Level | GNN | Contrastive | Finetune | Data - Text Attribute | Model - Codebook | Explicit - Link | - |
| MTDEA [279] | Link-Level | GNN | Contrastive | Finetune | N/A | Model - Codebook | Explicit - Link | - |
| Edgeformers [280] | Link-Level | GNN | Hybrid | Finetune | Data - Text Attribute | N/A | N/A | Link |
been introduced to advance LLM-GNN to some extent. LOGIN [244] develops an “LLMs-as-Consultants” paradigm that merely consults LLMs over low-confident prediction nodes and augments the original graphs based on the LLM feedback. Similarly, Cella [246] employs an active node selection process to sift out representative nodes based on label inharmonicity and entropy. Cella then annotates these representative nodes via LLMs and applies a Dirichlet Energy-based graph rewiring strategy [266] to minimize the adverse effects of noisy or missing links in the original graphs. In addition to annotation, some studies leverage LLMs to enhance the node representation in GFMs. EhLLaGNN [237] samples high-quality neighborhoods through LLMs, followed by on-demand neighborhood feature enhancement using diverse prompts from its prompt catalog. These enhanced neighborhood features are further aggregated with the central node to generate representative node embeddings for downstream node-level tasks. 被引入以在一定程度上推进 LLM-GNN。LOGIN [244]开发了一种“LLMs-as-Consultants”范式,该范式仅通过低置信度预测节点咨询 LLM,并根据 LLM 反馈增强原始图。同样,Cella [246]采用主动节点选择过程,根据标签不和谐性和熵筛选出具有代表性的节点。然后,Cella 通过 LLM 注释这些代表性节点,并应用基于狄利克雷能量的图重新布线策略[266],以最大限度地减少原始图中噪声或缺失链接的不利影响。除了注释之外,一些研究还利用 LLM 来增强 GFM 中的节点表示。EhLLaGNN [237]通过 LLM 对高质量的邻域进行采样,然后使用其提示目录中的各种提示进行按需邻域特征增强。这些增强的邻域特征与中心节点进一步聚合,为下游节点级任务生成具有代表性的节点嵌入。
Hybrid Approaches. Another group of works directly uses LLMs as classifiers for downstream node-level tasks. GraphText [231] advocates a framework that enables training-free graph reasoning in text space by translating graphs into natural language. It incorporates the inductive bias of GNNs by constructing a graphsyntax tree and then processing natural language sequences derived from the traversal of the graph-syntax tree to perform node prediction and reasoning through LLMs. Meanwhile, zeroG [24] proposes a zero-shot learning framework that first leverages LLMs to encode text graphs into uniform feature space and further employs the LoRA strategy to train a small language model over sampled subgraph data based on prompts. 混合方法。另一组作品直接使用 LLM 作为下游节点级任务的分类器。GraphText [231]提倡一种框架,通过将图翻译成自然语言,在文本空间中实现无需训练的图推理。它通过构建图语法树,然后处理从图语法树遍历中得出的自然语言序列,通过 LLM 进行节点预测和推理,从而结合 GNN 的归纳偏差。同时,zeroG [24]提出了一种零样本学习框架,该框架首先利用 LLM 将文本图编码为统一的特征空间,并进一步采用 LoRA 策略根据提示在采样子图数据上训练小型语言模型。
As demonstrated in LLMNodeBed [263], LLMs have superior domain generalization ability, while GNNs demonstrate better task-specific adaptation by modeling structure information. To mitigate the capability of LLMs and GNNs for GFMs, several works have attempted to align the GNNs with LLMs in node representations. Inspired by vector quantization [267], LangTopo [248] proposes a framework that aligns language descriptions of graphs with tokenized topological modeling, enabling LLMs to learn graph structures. Another work, i.e., Dr.E [249], introduces a dual-residual vector quantized-variational autoEncoder that aligns LLMs with graph data in natural language. GSLM [245] designs a co-training pipeline that trains language models and GNNs iteratively. After filtering out noisy information from raw node texts via LLMs, it iteratively optimizes an LLM and a GNN, where the LLM generates graph structures, embeddings, and pseudo labels based on cleaned text attributes. In turn, the GNN refines the graph structure and provides updated pseudo labels back to the LLM. 正如 LLMNodeBed[263]所证明的那样,LLM 具有优越的领域泛化能力,而 GNN 通过对结构信息进行建模表现出更好的任务特异性适应。为了减轻 LLM 和 GNN 对 GFM 的能力,一些工作试图在节点表示中将 GNN 与 LLM 保持一致。受矢量量化[267]的启发,LangTopo[248]提出了一个框架,将图的语言描述与标记化拓扑建模相结合,使 LLM 能够学习图结构。另一项工作,即 Dr.E [249],引入了一种双残差向量量化变分自动编码器,该自动编码器将 LLM 与自然语言的图形数据对齐。GSLM [245]设计了一个共同训练管道,迭代地训练语言模型和 GNN。通过 LLM 从原始节点文本中过滤掉嘈杂信息后,它迭代优化 LLM 和 GNN,其中 LLM 根据清理后的文本属性生成图结构、嵌入和伪标签。反过来,GNN 细化图结构,并将更新的伪标签提供回 LLM。
6.2.4 Future Direction 6.2.4 未来方向
Although the node-level task has gained significant attention in the literature, and numerous studies have been proposed to study the node-level task, several directions remain under-explored and may be regarded as further directions. Key areas include exploring graph pattern modeling beyond message-passing frameworks [86], robustness [268], and explainability of GNNs [269, 270]. 尽管节点级任务在文献中引起了极大的关注,并且已经提出了大量研究来研究节点级任务,但有几个方向仍未得到充分探索,可以被视为进一步的方向。关键领域包括探索消息传递框架之外的图模式建模[86]、鲁棒性[268]和 GNN 的可解释性[269,270]。
6.3 Link-Level Task 6.3 链路级任务
In graph learning, link-level tasks focus on understanding and predicting the relationships (or links) between nodes in a graph. These tasks are crucial for analyzing the structure and dynamics of networks, such as 在图学习中,链接级任务侧重于理解和预测图中节点之间的关系(或链接)。这些任务对于分析网络的结构和动态至关重要,例如
social networks, biological networks, or recommendation systems. These tasks include link prediction, edge classification, network completion and so on. GFMs focusing on link-level tasks need the ability to generalize and transfer knowledge across different domains and datasets. Thus, this presents several challenges. For instance, heterogeneous relationships pose a significant issue, as links can represent a variety of semantics-such as “friendship” versus “transaction” in social networks. During inference, GFMs have to learn transferable representations that enable inference on any graph with arbitrary entity and relation vocabularies. Additionally, temporal dynamics must be considered, as links in dynamic graphs, like financial networks, necessitate the modeling of time-sensitive patterns. 社交网络、生物网络或推荐系统。这些任务包括链路预测、边缘分类、网络补全等。专注于链接级任务的 GFM 需要能够跨不同领域和数据集泛化和转移知识。因此,这带来了一些挑战。例如,异构关系构成了一个重大问题,因为链接可以代表各种语义,例如社交网络中的“友谊”与“交易”。在推理过程中,GFM 必须学习可转移的表示,以便在具有任意实体和关系词汇的任何图上进行推理。此外,还必须考虑时间动态,因为动态图中的链接(如金融网络)需要对时间敏感的模式进行建模。
6.3.1 Inductive Reasoning Approaches 6.3.1 归纳推理方法
A large number of graph foundation models designed for link-specific tasks focus on the problem where novel nodes and relation types occur at test time. 大量为特定于链路的任务设计的图基础模型专注于在测试时出现新节点和关系类型的问题。
Knowledge Graph-based Methods. Ultra, [28], as an approach for learning universal and transferable graph representations, leverages a transformation to convert the original graph to relation graph G_(r)=LIFT(G)\mathcal{G}_{r}=\operatorname{LIFT}(\mathcal{G}) to build a graph of relations G_(r)=(V_(l),R_(l),E_(l))\mathcal{G}_{r}=\left(\mathcal{V}_{l}, \mathcal{R}_{l}, \mathcal{E}_{l}\right) given a graph G=(V,R,E)\mathcal{G}=(\mathcal{V}, \mathcal{R}, \mathcal{E}). Then we can obtain the dd dimensional node representations XinR^(|R|xx d)\mathbf{X} \in \mathbb{R}^{|\mathcal{R}| \times d} of G_(r)\mathcal{G}_{r} given a query ( h,qh, q, ?). Ultra utilizes the invariance of the relational structure and employs relative relation representations for parameterizing any unseen relation instead of relative entity representations. Furthermore, UltraQuery [29], another graph foundation model, was proposed focusing on the inductive zero-shot complex logical query answering (CLQA) problem. The novel inductive relation projection design and a differentiable but non-parametric fuzzy logical operator enable UltraQuery to be vocabulary-indepedent and generalizable to new entities and relations. 基于知识图谱的方法。Ultra,[28]作为一种学习通用和可转移图表示的方法,它利用变换将原始图转换为关系图 G_(r)=LIFT(G)\mathcal{G}_{r}=\operatorname{LIFT}(\mathcal{G}) ,以构建给 G_(r)=(V_(l),R_(l),E_(l))\mathcal{G}_{r}=\left(\mathcal{V}_{l}, \mathcal{R}_{l}, \mathcal{E}_{l}\right) 定图的关系图 G=(V,R,E)\mathcal{G}=(\mathcal{V}, \mathcal{R}, \mathcal{E}) 。然后我们可以得到给定查询 XinR^(|R|xx d)\mathbf{X} \in \mathbb{R}^{|\mathcal{R}| \times d} 的 G_(r)\mathcal{G}_{r}dd 维度节点表示( h,qh, q , ?Ultra 利用关系结构的不变性,并采用相对关系表示来参数化任何看不见的关系,而不是相对实体表示。此外,还提出了另一种图基础模型 UltraQuery [29],重点关注归纳式零样本复杂逻辑查询应答(CLQA)问题。新颖的归纳关系投影设计和可微但非参数的模糊逻辑运算符使 UltraQuery 能够独立于词汇表并可推广到新的实体和关系。
Double Equivariance-based Methods. Through the perspective of double equivariance (for nodes and relation types), [278,279] tried to solve the same problem. Specifically, given a training graph A^(tr)\mathbf{A}^{t r}, with node set V^(tr)\mathcal{V}^{t r} and relation set R^(tr)\mathcal{R}^{t r}, they aim to learn a model capable of accurately predicting missing triplets in a test graph A^(te)\mathbf{A}^{t e}, with node set V^(te)\mathcal{V}^{t e} and relation set R^(te)\mathcal{R}^{t e}, involving both new nodes and new relations types: V^(tr)⊈V^(te)\mathcal{V}^{t r} \nsubseteq \mathcal{V}^{t e} and R^(tr)⊈R^(te)\mathcal{R}^{t r} \nsubseteq \mathcal{R}^{t e}. MTDEA, [279], learns to partition the set of relations into distinct clusters, where each cluster exclusively contains relation types that are exchangeable among themselves seen as a multi-task setting. Accordingly, they train multiple graph models, one for each cluster (relation types) so that at test time, they can employ an adaptation procedure to assign new relation types to the most appropriate cluster, thus ensuring generalization to previously unseen relation types. 基于双等方差的方法。通过双重等方差(对于节点和关系类型)的视角,[278,279]试图解决同样的问题。具体来说,给定一个训练图 A^(tr)\mathbf{A}^{t r} ,具有节点集 V^(tr)\mathcal{V}^{t r} 和关系集 R^(tr)\mathcal{R}^{t r} ,他们旨在学习一个能够准确预测测试图 A^(te)\mathbf{A}^{t e} 中缺失三元组的模型,具有节点集 V^(te)\mathcal{V}^{t e} 和关系集 R^(te)\mathcal{R}^{t e} ,涉及新节点和新关系类型: V^(tr)⊈V^(te)\mathcal{V}^{t r} \nsubseteq \mathcal{V}^{t e} 和 R^(tr)⊈R^(te)\mathcal{R}^{t r} \nsubseteq \mathcal{R}^{t e} 。MTDEA [279]学会了将关系集划分为不同的聚类,其中每个聚类仅包含相互交换的关系类型,被视为多任务设置。因此,他们训练多个图模型,每个聚类(关系类型)一个,以便在测试时,他们可以采用适配程序将新的关系类型分配给最合适的聚类,从而确保推广到以前未见过的关系类型。
Theoretical Analysis. Recently, authors further demonstrated the generality of double equivariant structural representations in a theoretical framework [278]. They found that Ultra [28] and InGram [281] conform to this framework despite their diverse architectural designs. Beyond that, they proposed a more robust and stable variant of InGram [281], named DEq-InGram and a modeling framework called ISDEA+, which could transform any GNNs designed for homogeneous graphs into double equivariant models suitable for knowledge graphs. Concurrently, [271] also conducted a rigorous study of the expressive power of knowledge graph foundation models and found that the expressive power depends on the motifs that are used to learn the relation representations. Given a KGG=(V,R,E)\operatorname{KG} \mathcal{G}=(\mathcal{V}, \mathcal{R}, \mathcal{E}), they summarize the frameworks into three steps-use a set F\mathcal{F} of motifs to compute a relational hypergraph LIFT_(F)(G)\operatorname{LIFT}_{\mathcal{F}}(G), apply a relation encoder on the hypergraph LIFT_(F)(G)\operatorname{LIFT}_{\mathcal{F}}(G) to obtain relation representations and use the relation representations and apply an entity encoder on the KGG\mathrm{KG} \mathcal{G} to obtain final link encodings. Additionally, they designed richer motifs than binary motifs in existing works to increase the foundation model’s performance. 理论分析。最近,作者在理论框架中进一步证明了双等变结构表示的通用性[278]。他们发现,尽管 Ultra [28]和 InGram [281]的架构设计多种多样,但它们仍符合这一框架。除此之外,他们还提出了一种更稳健、更稳定的 InGram 变体[281],名为 DEq-InGram,以及一个名为 ISDEA+的建模框架,该框架可以将任何为同次图设计的 GNN 转换为适用于知识图谱的双等变模型。同时,[271]还对知识图谱基础模型的表达能力进行了严格的研究,发现表达能力取决于用于学习关系表示的主题。给定一个 KGG=(V,R,E)\operatorname{KG} \mathcal{G}=(\mathcal{V}, \mathcal{R}, \mathcal{E}) ,他们将框架总结为三个步骤——使用一组 F\mathcal{F} 基序来计算关系超图 LIFT_(F)(G)\operatorname{LIFT}_{\mathcal{F}}(G) ,在超图上应用关系编码器 LIFT_(F)(G)\operatorname{LIFT}_{\mathcal{F}}(G) 来获得关系表示,并使用关系表示并在上 KGG\mathrm{KG} \mathcal{G} 应用实体编码器来获得最终的链接编码。此外,他们在现有作品中设计了比二元主题更丰富的主题,以提高基础模型的性能。
6.3.2 In-Context Learning Approaches 6.3.2 情境学习方法
KG-ICL [272], on the other hand, used graph prompt learning and in-context learning techniques. They extract a prompt graph P_(c)=(E_(pmt)inE,R_(pmt)inR,T_(pmt)inT)\mathcal{P}_{c}=\left(\mathcal{E}_{p m t} \in \mathcal{E}, \mathcal{R}_{p m t} \in \mathcal{R}, \mathcal{T}_{p m t} \in \mathcal{T}\right) with an example fact about the query ( u,q,vu, q, v ) and the KGG=(E,R,T)\mathrm{KG} \mathcal{G}=(\mathcal{E}, \mathcal{R}, \mathcal{T}) and employ a unifier tokenizer to map entities and relations in prompt graphs into predefined tokens for further training the foundation knowledge graph model. Meanwhile, UniLP [274] used in-context learning to combine the generalizability of heuristic approaches with the pattern-learning capabilities of parametric models. The built universal link predictor can autonomously identify connectivity patterns across diverse graphs, ready for immediate application to any unseen graph dataset without targeted training. 而 KG-ICL [272]则使用图提示学习和上下文学习技术。他们提取一个提示图, P_(c)=(E_(pmt)inE,R_(pmt)inR,T_(pmt)inT)\mathcal{P}_{c}=\left(\mathcal{E}_{p m t} \in \mathcal{E}, \mathcal{R}_{p m t} \in \mathcal{R}, \mathcal{T}_{p m t} \in \mathcal{T}\right) 其中包含有关查询 ( u,q,vu, q, v ) 和 的 KGG=(E,R,T)\mathrm{KG} \mathcal{G}=(\mathcal{E}, \mathcal{R}, \mathcal{T}) 示例事实,并使用统一分词器将提示图中的实体和关系映射到预定义的标记中,以进一步训练基础知识图模型。同时,UniLP [274]使用上下文学习将启发式方法的泛化性与参数模型的模式学习能力相结合。构建的通用链路预测器可以自主识别不同图中的连接模式,无需有针对性训练即可立即应用于任何看不见的图数据集。
Table 7: Summary of task-specific GFMs on graph-level tasks. 表 7:图级任务上特定于任务的 GFM 摘要。
Method Name 方法名称
Tasks-Specific 特定于任务
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
AAGOD [282] 亚戈德 [282]
Graph-Level 图形级别
GNN
Supervised 监督
Finetune 微调
N/A 不适用
Data - Augment 数据 - 增强
N/A 不适用
Link 链接
G-Adapter [283] G 适配器 [283]
Graph-Level 图形级别
GNN
Supervised 监督
Finetune 微调
N/A 不适用
Loss - Auxiliary 损失 - 辅助
N/A 不适用
-
GTOT-Tuning [284] GTOT 调谐 [284]
Graph-Level 图形级别
GNN
Supervised 监督
Finetune 微调
N/A 不适用
Loss - Auxiliary 损失 - 辅助
N/A 不适用
-
CDFSGC [285]
Graph-Level 图形级别
GNN
Contrastive 对比
Finetune, Prototype 微调,原型
Data - Node Property 数据 - 节点属性
Loss - Multi-task 损失 - 多任务处理
N/A 不适用
-
L2P-GNN [286]
Graph-Level 图形级别
GNN
Contrastive 对比
Finetune 微调
N/A 不适用
Model - Meta Learning 模型 - Meta Learning
N/A 不适用
Link 链接
GROVER [287] 格罗弗 [287]
Graph-Level 图形级别
GNN
Generative 生成
Finetune 微调
N/A 不适用
Loss - Pretrain 损失 - 预训练
N/A 不适用
-
G-TUNING [288] G-调谐 [288]
Graph-Level 图形级别
GNN
Generative 生成
Finetune 微调
N/A 不适用
Model - Structure Learning 模型 - 结构学习
Implicit - Codebook 隐式 - 密码本
-
Mole-BERT [73] 鼹鼠-伯特 [73]
Graph-Level 图形级别
GNN
Hybrid 混合
Finetune 微调
N/A 不适用
Loss - Pretrain, Model Codebook 损失 - 预训练,模型密码本
N/A 不适用
Link 链接
AdapterGNN [153] 适配器 GNN [153]
Graph-Level 图形级别
GNN
Hybrid 混合
Finetune 微调
N/A 不适用
Loss - Auxiliary 损失 - 辅助
N/A 不适用
Link 链接
Feature-Struct [289] 特征结构 [289]
Graph-Level 图形级别
GNN
Hybrid 混合
Finetune 微调
Model - Projection 模型 - 投影
N/A 不适用
N/A 不适用
-
GPF [155]
Graph-Level 图形级别
GNN
Hybrid 混合
Graph Prompting 图形提示
N/A 不适用
Model - Prompt Learning 模型 - 提示学习
N/A 不适用
Link 链接
PLM-SGT [290]
Graph-Level 图形级别
LLM
Supervised 监督
Finetune 微调
N/A 不适用
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
-
GraphsGPT [291] 图 GPT [291]
Graph-Level 图形级别
LLM
Generative 生成
Finetune, Prototype 微调,原型
N/A 不适用
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
Link 链接
LLM4GraphGen [292] LLM4 图形生成 [292]
Graph-Level 图形级别
LLM
Generative 生成
In-context 上下文
Data - Node Property, Data - Text Attribute 数据 - 节点属性,数据 - 文本属性
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
-
GIMLET [293] 金曲 [293]
Graph-Level 图形级别
GNN + LLM GNN + 法学硕士
Supervised 监督
In-context 上下文
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
Link 链接
GALLON [294] 加仑 [294]
Graph-Level 图形级别
GNN + LLM GNN + 法学硕士
Supervised 监督
Distillation 蒸馏
Data - Node Property 数据 - 节点属性
Loss - Auxiliary 损失 - 辅助
N/A 不适用
-
UniMoT [31]
Graph-Level 图形级别
GNN + LLM GNN + 法学硕士
Generative 生成
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook, Model - Retriever 模型 - 密码本, 模型 - 检索器
N/A 不适用
-
DiGress [295] 迪格雷斯 [295]
Graph Generation 图形生成
GNN
Supervised 监督
Test-time Adaptation 测试时间适应
N/A 不适用
Data - Augment 日期 - 增加
N/A 不适用
-
GraphVAE [296] 图阿联酋 [296]
Graph Generation 图形生成
GNN
Supervised 监督
Test-time Adaptation 测试时间适应
N/A 不适用
Data - Augment 日期 - 增加
N/A 不适用
-
GDSS [297]
Graph Generation 图形生成
GNN
Supervised 监督
Test-time Adaptation 测试时间适应
N/A 不适用
Data - Augment 数据 - 增强
N/A 不适用
Link 链接
UniAug [70]
Graph Generation 图形生成
GNN
Generative 生成
Test-time Adaptation 测试时间适应
Data - Node Property 数据 - 节点属性
Data - Augment 数据 - 增强
N/A 不适用
Link 链接
LLM4GraphGen [292] LLM4 图形生成 [292]
Graph Generation 图形生成
LLM
Generative 生成
In-context 上下文
Data - Node Property, Data - Text Attribute 数据 - 节点属性,数据 - 文本属性
With the development of attention mechanism [55], some works combined GNNs and Transformers to utilize the language model’s generalization ability. For instance, GraphFormers [277] fuse the text encoding and the graph aggregation into an iterative workflow by nesting the layerwise GNN components alongside the transformer blocks of language models. Edgeformers [280] further incorporates text semantics on edges in a contextualized way on textual-edge networks G=(V,E,D)\mathcal{G}=(\mathcal{V}, \mathcal{E}, \mathcal{D}) where each edge e_(ij)inEe_{i j} \in \mathcal{E} is associated with a document d_(ij)inDd_{i j} \in \mathcal{D}. The inherent generalization capability of language models endows these models with the potential for transfer learning and serves as foundation models for link prediction. 随着注意力机制的发展[55],一些工作将 GNNs 和 Transformers 结合起来,利用语言模型的泛化能力。例如,GraphFormers [277]通过将分层 GNN 组件与语言模型的 transformer 块嵌套在一起,将文本编码和图聚合融合到迭代工作流程中。Edgeformers [280]在文本边缘网络上以上下文化的方式进一步整合了边缘上的文本语义, G=(V,E,D)\mathcal{G}=(\mathcal{V}, \mathcal{E}, \mathcal{D}) 其中每个边缘 e_(ij)inEe_{i j} \in \mathcal{E} 都与文档 d_(ij)inDd_{i j} \in \mathcal{D} 相关联。语言模型固有的泛化能力赋予了这些模型迁移学习的潜力,并作为链接预测的基础模型。
6.3.4 Hybrid Methods 6.3.4 混合方法
Meanwhile, there are also other works from different views. For example, DGASN [276] attempted to study the problem of cross-network homophilous and heterophilous edge classification where we have two graphs G^(A)\mathcal{G}^{A} and G^(B)\mathcal{G}^{B} and the aligned adjacent matrix A^(A,B)in{0,1}^(n_(A)xxn_(B))\mathbf{A}^{A, B} \in\{0,1\}^{n_{A} \times n_{B}}. It employs adversarial domain adaptation to mitigate domain divergence when transferring knowledge from source domains to target domains. In [275], authors design a new setting, Graph Intersection-induced Transfer Learning (GITL), meaning that a denser graph may share nodes with a sparse original graph, which offers a natural bridge for transferring selective, meaningful knowledge. They also propose a framework to tackle this setting from two angles: training instance optimization and prediction broadcast. 同时,还有其他不同观点的作品。例如,DGASN [276]试图研究跨网络同质和异质边缘分类的问题,其中我们有两个图 G^(A)\mathcal{G}^{A} 和 G^(B)\mathcal{G}^{B} 对齐的相邻矩阵 A^(A,B)in{0,1}^(n_(A)xxn_(B))\mathbf{A}^{A, B} \in\{0,1\}^{n_{A} \times n_{B}} 。它采用对抗性领域适应来减轻将知识从源领域转移到目标领域时的领域分歧。在[275]中,作者设计了一种新的设置,即图交点诱导迁移学习(GITL),这意味着更密集的图可能与稀疏的原始图共享节点,这为转移选择性的、有意义的知识提供了天然的桥梁。他们还提出了一个框架,从两个角度解决这一设置:训练实例优化和预测广播。
6.3.5 Future Directions 6.3.5 未来方向
Future graph foundation models for link-level tasks may explore in-context learning concepts within more challenging scenarios, such as dynamic and heterogeneous graphs. Another promising avenue for future research is to enhance the computational efficiency of Graph Foundation Models (GFMs), enabling scalability to real-world large-scale graphs. This is necessary because current algorithms typically demand a high pre-processing cost and extensive training time. 未来用于链接级任务的图基础模型可能会在更具挑战性的场景中探索上下文学习概念,例如动态图和异构图。未来研究的另一个有前途的途径是提高图基础模型 (GFM) 的计算效率,从而实现对现实世界大规模图的可扩展性。这是必要的,因为当前的算法通常需要高昂的预处理成本和较长的训练时间。
6.4 Graph-Level Task 6.4 图形级任务
Graph-level tasks for graph foundation models encompass graph classification, graph regression, graph generation, and more. Numerous researchers are engaged in pre-training and fine-tuning efforts aimed at developing a foundation model that is well-suited for cross-domain datasets and tasks. However, the designing 图基础模型的图级任务包括图分类、图回归、图生成等。许多研究人员正在进行预训练和微调工作,旨在开发非常适合跨领域数据集和任务的基础模型。然而,设计
algorithm faces challenges because graphs vary drastically across domains. For example, molecules follow strict chemical rules (e.g., valency), while social networks exhibit scale-free or community structures. They also range from small (e.g., proteins with tens of nodes) to massive (e.g., citation networks with millions of nodes), requiring flexible architectures. From the task perspective, graph-level objectives (e.g., predicting drug efficacy vs. classifying social networks) require different inductive biases, complicating multi-task learning. 算法面临挑战,因为不同领域的图形差异很大。例如,分子遵循严格的化学规则(例如化合价),而社交网络则表现出无尺度或群落结构。它们的范围也从小(例如,具有数十个节点的蛋白质)到大规模(例如,具有数百万个节点的引文网络),需要灵活的架构。从任务的角度来看,图级目标(例如,预测药物疗效与对社交网络进行分类)需要不同的归纳偏差,使多任务学习变得复杂。
6.4.1 Pre-Training Stages 6.4.1 预训练阶段
Early works mainly focused on the pre-training strategies of graph models as they tried to build a powerful base model that could be used for various downstream tasks. In L2P-GNN [286], authors pre-trained the GNN model to simulate the fine-tuning process on downstream tasks, so as to directly optimize the pre-trained model’s quick adaptability to downstream tasks. During pre-training, L2P-GNN constructed a parent task T_(G)\mathcal{T}_{G} consisting of kk child tasks {T_(G)^(1),dots,T_(G)^(k)}\left\{\mathcal{T}_{G}^{1}, \ldots, \mathcal{T}_{G}^{k}\right\} for a graph G\mathcal{G}. It also designed a dual adaptation mechanism at both node and graph levels to utilize the intrinsic structures of label-free graph data as self-supervision to learn local and global representations simultaneously. Mole-BERT [73] proposed triplet masked contrastive learning (TMCL) using triplet (G,G^(M1),G^(M2))\left(\mathcal{G}, \mathcal{G}^{M 1}, \mathcal{G}^{M 2}\right) for graph-level pre-training to model the heterogeneous semantic similarity between molecules for effective molecule retrieval. Along with Masked Atoms Modeling (MAM), their pre-training method can achieve superior performance while not requiring any domain knowledge. GROVER [287] combines node-, edge-, and graph-level self-supervised tasks in their GNN-Transformer style framework. After pre-training on large-scale unlabeled molecular datasets, GROVER can learn rich implicit knowledge and transfer it to downstream graph-level tasks easily. GraphsGPT [291] designed an end-to-end pure transformer-based encoder to learn graph words W=Graph2Seq([GP]_(1),[GP]_(2),dots,[GP]_(k),FTSeq)\mathcal{W}=\operatorname{Graph} 2 \operatorname{Seq}\left([G P]_{1},[G P]_{2}, \ldots,[G P]_{k}, F T S e q\right) and decoder (GraphGPT) to restore the graph structure h,p=GraphGPT([W,[BOS])h, p=\operatorname{GraphGPT}([\mathcal{W},[B O S]). Pretrained on 100M molecules, Graph2Seq excels in graph-level tasks including graph classification and regression. Also at the same time, GraphGPT can serve as a strong graph generator. A recent work [289] analyzed the extent to which pre-trained GNNs can be transferred across datasets by measuring the impact of pretraining datasets on downstream generalization and the inclusion of feature information via structuralization. 早期的工作主要集中在图模型的预训练策略上,因为他们试图构建一个强大的基础模型,可用于各种下游任务。在 L2P-GNN [286]中,作者对 GNN 模型进行了预训练,以模拟下游任务的微调过程,从而直接优化预训练模型对下游任务的快速适应能力。在预训练期间,L2P-GNN 构建了一个由图的子任务组成的父任务 T_(G)\mathcal{T}_{G}G\mathcal{G} 。 kk{T_(G)^(1),dots,T_(G)^(k)}\left\{\mathcal{T}_{G}^{1}, \ldots, \mathcal{T}_{G}^{k}\right\} 它还设计了一种节点和图层面的双重适应机制,利用无标签图数据的内在结构作为自我监督,同时学习局部和全局表示。Mole-BERT [73]提出了三元组掩蔽对比学习(TMCL),使用三元组 (G,G^(M1),G^(M2))\left(\mathcal{G}, \mathcal{G}^{M 1}, \mathcal{G}^{M 2}\right) 进行图级预训练,以模拟分子之间的异构语义相似性,以实现有效的分子检索。与掩码原子建模 (MAM) 一起,他们的预训练方法可以实现卓越的性能,同时不需要任何领域知识。GROVER [287]在其 GNN-Transformer 风格框架中结合了节点级、边缘级和图级自监督任务。在大规模未标记分子数据集上进行预训练后,GROVER 可以学习丰富的隐性知识,并将其轻松转移到下游图级任务中。GraphsGPT [291]设计了一种端到端的纯基于 Transformer 的编码器来学习图词 W=Graph2Seq([GP]_(1),[GP]_(2),dots,[GP]_(k),FTSeq)\mathcal{W}=\operatorname{Graph} 2 \operatorname{Seq}\left([G P]_{1},[G P]_{2}, \ldots,[G P]_{k}, F T S e q\right) ,并设计了解码器(GraphGPT)来恢复图结构 h,p=GraphGPT([W,[BOS])h, p=\operatorname{GraphGPT}([\mathcal{W},[B O S]) 。Graph2Seq 在 100M 分子上进行了预训练,在图级任务中表现出色,包括图分类和回归。同时,GraphGPT 可以作为强大的图形生成器。 最近的一项工作[289]通过测量预训练数据集对下游泛化的影响以及通过结构化包含特征信息,分析了预训练 GNN 在数据集之间的传输程度。
6.4.2 Fine-Tuning Stages 6.4.2 微调阶段
Another line of work, on the other hand, centered on the fine-tuning process of graph models. 另一方面,另一行工作集中在图模型的微调过程上。
Full Fine-tuning. G-Tuning [288] identifies the structural divergence between pre-training and downstream graphs and proposes to preserve the generative patterns of the downstream tasks, the graphon, which is a continuous and symmetric function W:[0,1]^(2)rarr[0,1]W:[0,1]^{2} \rightarrow[0,1] indicating the probability of two points u_(i),u_(j)in[0,1]u_{i}, u_{j} \in[0,1] forming an edge. The specific design makes G-Tuning suitable for cross-domain tasks. GTOT-Tuning [284] formulate graph local knowledge transfer as an Optimal Transport (OT) problem with a structural prior and construct the GTOT regularizer to constrain the fine-tuned model behaviors. By preserving the local feature invariances between fine-tuned and pre-trained models, GTOT-Tuning has great generalization ability. 全面微调。G-Tuning [288]确定了预训练图和下游图之间的结构差异,并提出了保留下游任务的生成模式,即 graphon,这是一个连续的对称函数 W:[0,1]^(2)rarr[0,1]W:[0,1]^{2} \rightarrow[0,1] ,表示两个点 u_(i),u_(j)in[0,1]u_{i}, u_{j} \in[0,1] 形成一条边的概率。具体的设计使得 G-Tuning 适用于跨域任务。GTOT-Tuning [284]将图局部知识转移表述为具有结构先验的最优传输(OT)问题,并构建 GTOT 正则化器来约束微调后的模型行为。通过保留微调模型和预训练模型之间的局部特征不变性,GTOT-Tuning 具有很强的泛化能力。
Parameter-Efficient Fine-tuning. There are also many parameter-efficient fine-tuning (PEFT) approaches. For example, AdapterGNN [153] and G-adapter [283] both use the adapter module (A(x)=(\mathbf{A}(x)={:BN(W_(up)(ReLU(W_("down ")(x)))))\left.\mathrm{BN}\left(\mathbf{W}_{u p}\left(\operatorname{ReLU}\left(\mathbf{W}_{\text {down }}(\mathbf{x})\right)\right)\right)\right) in the GNN area. By introducing a small amount of tunable parameters, they can outperform the traditional full fine-tuning method while improving the base model’s generalization. GPF [155], as a prompt learning technique, injects learnable prompt vectors into the feature space of the original datasets. Specifically, given a learnable prompt vector p_(i)\boldsymbol{p}_{i}, node v_(i)v_{i} will have a prompted feature vector tilde(x)_(i)=x_(i)+p_(i)\tilde{\boldsymbol{x}}_{i}=\boldsymbol{x}_{i}+\boldsymbol{p}_{i} and GPF simply sets a prompt vector p\boldsymbol{p} shared by all the nodes, i.e., p_(1)=p_(2)=cdots=p_(n)=p\boldsymbol{p}_{1}=\boldsymbol{p}_{2}=\cdots=\boldsymbol{p}_{n}=\boldsymbol{p}. Authors also provide rigorous derivations to demonstrate the universality of GPF and make a guarantee of its effectiveness. The effectiveness and efficacy position these PEFT methods as a compelling alternative to full fine-tuning for downstream adaptations, such as molecular graph classification and regression. 参数高效的微调。还有许多参数高效的微调 (PEFT) 方法。例如,AdapterGNN [153] 和 G-adapter [283] 都使用 GNN 区域中的适配器模块 (A(x)=(\mathbf{A}(x)={:BN(W_(up)(ReLU(W_("down ")(x)))))\left.\mathrm{BN}\left(\mathbf{W}_{u p}\left(\operatorname{ReLU}\left(\mathbf{W}_{\text {down }}(\mathbf{x})\right)\right)\right)\right) 。通过引入少量的可调参数,它们可以优于传统的全微调方法,同时提高基础模型的泛化性。GPF [155]作为一种提示学习技术,将可学习的提示向量注入到原始数据集的特征空间中。具体来说,给定一个可学习的提示向量 p_(i)\boldsymbol{p}_{i} ,节点 v_(i)v_{i} 将有一个提示的特征向量 tilde(x)_(i)=x_(i)+p_(i)\tilde{\boldsymbol{x}}_{i}=\boldsymbol{x}_{i}+\boldsymbol{p}_{i} ,GPF 只是设置一个由所有节点 p\boldsymbol{p} 共享的提示向量,即 p_(1)=p_(2)=cdots=p_(n)=p\boldsymbol{p}_{1}=\boldsymbol{p}_{2}=\cdots=\boldsymbol{p}_{n}=\boldsymbol{p} .作者还提供了严格的推导来证明 GPF 的普遍性并保证其有效性。这些 PEFT 方法的有效性和有效性使这些方法成为下游适应(例如分子图分类和回归)的完全微调的引人注目的替代方案。
6.4.3 LLM-incorporated Approaches 6.4.3 LLM 合并方法
Recently with the prosperity of LLMs, some works are utilizing the inherited knowledge from LLMs to help solve the graph-level problems. 最近,随着 LLM 的蓬勃发展,一些工作正在利用 LLM 继承的知识来帮助解决图级问题。
LLM for Understanding. In an early work [290], authors first transformed the graphs into pure text and then fine-tuned GPT-2 and GPT-3 using natural language. Results on molecular classification demonstrate the promising of this direction. GALLON [294] utilized multimodal molecular data to learn representations and extract prior knowledge from powerful and pre-trained large language models (e.g., GPT-4V) using prompt by leveraging their multimodality capabilities, i.e. R_(:))=LLM(P_(:)),E_(:)),S_(:)),I_(:)))\mathcal{R}_{\rangle}=\operatorname{LLM}\left(\mathcal{P}_{\rangle}, \mathcal{E}_{\rangle}, \mathcal{S}_{\rangle}, \mathcal{I}_{\rangle}\right)for molecule G_(:))\mathcal{G}_{\rangle}. It further distills the advantages of GNN and LLM into an MLP, aiming to capture the most effective representations for molecular structures. GIMLET [293] also extends language models to handle graph and text data by applying the transformer mechanism with generalized position embedding and decoupled attention. It incorporates molecule graphs G\mathcal{G} and task instructions TT into the graph-text language model GIMLET and decodes the output as text uniformly for different tasks, i.e. hat(y)_(str)=GIMLET(G,T)\hat{y}_{s t r}=\operatorname{GIMLET}(\mathcal{G}, T), where hat(y)_(str)\hat{y}_{s t r} is the label string. Instructionbased pretraining expressed in natural language enables GIMLET to transfer to a broad range of zero-shot graph-level tasks. UniMoT [31] introduced a molecule tokenizer specifically designed for LLMs, enabling the tokenization of molecules into short sequences of causal-dependent molecule tokens. It can unify the modalities of molecule {s_(i)}_(i=1)^(M)\left\{s_{i}\right\}_{i=1}^{M} and text {t_(i)}_(i=1)^(M)\left\{t_{i}\right\}_{i=1}^{M} under a shared token representation and an autoregressive training paradigm. With the help of LLMs and multi-stage training strategies, UniMoT excels at both graph comprehension and generation tasks. 用于理解的法学硕士。在早期的工作[290]中,作者首先将图表转换为纯文本,然后使用自然语言对 GPT-2 和 GPT-3 进行微调。分子分类结果证明了这一方向的前景。GALLON [294]利用多模态分子数据,利用其多模态能力(即分子), R_(:))=LLM(P_(:)),E_(:)),S_(:)),I_(:)))\mathcal{R}_{\rangle}=\operatorname{LLM}\left(\mathcal{P}_{\rangle}, \mathcal{E}_{\rangle}, \mathcal{S}_{\rangle}, \mathcal{I}_{\rangle}\right) 利用强大的预训练大型语言模型(例如 GPT-4V)的提示来学习表示并提取先验知识 G_(:))\mathcal{G}_{\rangle} 。它进一步将 GNN 和 LLM 的优势提炼成 MLP,旨在捕获分子结构的最有效表示。GIMLET [293]还通过应用具有广义位置嵌入和解耦注意力的 Transformer 机制,扩展了语言模型来处理图形和文本数据。它将分子图 G\mathcal{G} 和任务指令 TT 整合到图文本语言模型 GIMLET 中,并将输出统一解码为文本,用于不同的任务,即 hat(y)_(str)=GIMLET(G,T)\hat{y}_{s t r}=\operatorname{GIMLET}(\mathcal{G}, T) ,其中 hat(y)_(str)\hat{y}_{s t r} 是标签字符串。以自然语言表达的基于指令的预训练使 GIMLET 能够转移到广泛的零样本图级任务。UniMoT [31]引入了一种专门为 LLM 设计的分子标记器,能够将分子标记化为因果依赖性分子标记的短序列。它可以在共享的标记表示和自回归训练范式下统一分子 {s_(i)}_(i=1)^(M)\left\{s_{i}\right\}_{i=1}^{M} 和文本 {t_(i)}_(i=1)^(M)\left\{t_{i}\right\}_{i=1}^{M} 的模态。在法学硕士和多阶段训练策略的帮助下,UniMoT 在图理解和生成任务方面都表现出色。
LLM for Generation. LLM4GraphGen [292], on the other hand, explores the ability of LLMs for graph generation with systematical task designs and extensive experiments. They designed comprehensive experiments to evaluate the graph generation ability of LLMs by proposing tasks with varying difficulty, including rule-based graph generation, distribution-based graph generation, and property-based graph generation. Diverse LLMs and prompts revealed insightful observations. 代代法学硕士。另一方面,LLM4GraphGen [292]通过系统的任务设计和广泛的实验探索了 LLM 在图生成方面的能力。他们设计了全面的实验,通过提出不同难度的任务来评估 LLM 的图生成能力,包括基于规则的图生成、基于分布的图生成和基于属性的图生成。多样化的法学硕士和提示揭示了富有洞察力的观察结果。
6.4.4 Graph Generation 6.4.4 图形生成
Graph generation, a crucial subdomain of graph-level tasks, seeks to generate graphs that adhere to specific rules, distributions, or domain-based properties. Designing GFMs for such tasks requires handling a wide spectrum of structural and semantic complexities across domains. Traditional graph generative models such as GraphVAE [296], GDSS [297], and DiGress [295] are typically limited to single domains and struggle with generalization. Recent works like UniAug [70] address this by incorporating diffusion models that scale across diverse graph distributions, aiming for universality. In particular, it employs a structure-only discrete diffusion model to pre-train on thousands of graphs and augment downstream datasets with guided structure synthesis, improving generalization without relying on feature similarity. These approaches signify a shift toward cross-domain scalability in graph generation, aligning GFMs with the multi-task and multi-domain capabilities observed in language and vision foundation models. 图生成是图级任务的一个关键子域,旨在生成符合特定规则、分布或基于域的属性的图。为此类任务设计 GFM 需要处理跨域的各种结构和语义复杂性。传统的图生成模型,如 GraphVAE [296]、GDSS [297]和 DiGress [295],通常仅限于单个域,并且难以泛化。最近的工作,如 UniAug [70],通过整合跨不同图分布扩展的扩散模型来解决这个问题,旨在实现普遍性。特别是,它采用纯结构离散扩散模型对数千个图进行预训练,并通过引导结构合成增强下游数据集,从而在不依赖特征相似性的情况下提高泛化性。这些方法标志着图生成向跨域可扩展性的转变,使 GFM 与语言和视觉基础模型中观察到的多任务和多域功能保持一致。
Inspired by the success of large generative models like Stable Diffusion and GPT, recent graph foundation models embrace large-scale pre-training and cross-modal prompting to enhance graph generation. LGGM [299] pre-trains on over 5,000 graphs from 13 domains, encoding diverse structural patterns that enable superior zero-shot and fine-tuned generation. LGGM also introduces Text-to-Graph generation, wherein users provide textual prompts-such as graph domains or structural statistics (e.g., clustering coefficient)-to guide generation, leveraging the world knowledge embedded in language models. In parallel, InstructG2I [298] proposes a multimodal approach where graph-structured data enriches with image and text attributes to guide diffusion-based image generation. Its graph-conditioned generation is both expressive and controllable, offering smooth interpolation across styles or domains. These pre-training and prompting strategies reflect the growing synergy between LLMs and GNNs, expanding the frontier of GFMs toward open-ended, user-controllable graph synthesis. 受到 Stable Diffusion 和 GPT 等大型生成模型成功的启发,最近的图基础模型采用大规模预训练和跨模态提示来增强图生成。LGGM [299]对来自 13 个域的 5,000 多个图进行预训练,对不同的结构模式进行编码,从而实现卓越的零样本和微调生成。LGGM 还引入了文本到图形生成,其中用户提供文本提示(例如图域或结构统计(例如聚类系数))来指导生成,利用语言模型中嵌入的世界知识。同时,InstructG2I [298]提出了一种多模态方法,其中图结构数据富含图像和文本属性,以指导基于扩散的图像生成。其图形条件生成既富有表现力又可控,提供跨样式或域的平滑插值。这些预训练和提示策略反映了 LLM 和 GNN 之间日益增长的协同作用,将 GFM 的前沿扩展到开放式、用户可控的图合成。
6.4.5 Hybrid Approaches 6.4.5 混合方法
Meanwhile, there are also other relevant works regarding graph models. 同时,还有其他关于图模型的相关工作。
Few-shot Graph Classification. For instance, CDFSGC [285] and CDTC [300] studied the problem of few-shot graph classification across domains. In this setting, we aim to learn a model with good generalization ability that can predict the label of graphs in the target domain D^(T)\mathcal{D}^{T} given few-shot annotated examples from the target domain and the source domain data D^(S)\mathcal{D}^{S} where D^(T)\mathcal{D}^{T} and D^(S)\mathcal{D}^{S} have different marginal distributions P_(D^(T))\mathcal{P}_{\mathcal{D}^{T}} 少样本图分类。例如,CDFSGC [285] 和 CDTC [300] 研究了跨域的少样本图分类问题。在这种情况下,我们旨在学习一个具有良好泛化能力的模型,该模型可以预测目标域 D^(T)\mathcal{D}^{T} 中图形的标签,给定目标域和源域数据 D^(S)\mathcal{D}^{S} 中的少量注释示例,其中 D^(T)\mathcal{D}^{T} 具有 D^(S)\mathcal{D}^{S} 不同的边际分布 P_(D^(T))\mathcal{P}_{\mathcal{D}^{T}}
Table 8: Summary of task-specific GFMs on question answering, recommendation, anomaly detection. 表 8:关于问答、推荐、异常检测的特定任务 GFM 摘要。
Method Name 方法名称
Tasks 任务
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
MCDGRAPH [301]
Question Answering 问答
GNN
Generative 生成
In-context 上下文
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
Link 链接
GT2Vec [302]
Question Answering 问答
LLM
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
-
GPT-4V-GSR [303]
Question Answering 问答
LLM
Generative 生成
In-context 上下文
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
-
G-Retriever [304]
Question Answering 问答
GNN + LLM GNN + 法学硕士
Supervised 监督
In-context 上下文
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
Link 链接
GITA [305] 梵歌 [305]
Question Answering 问答
GNN + LLM GNN + 法学硕士
Generative 生成
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
-
GFM-RAG [306]
Question Answering 问答
GNN + LLM GNN + 法学硕士
Hybrid 混合
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
N/A 不适用
Link 链接
SR-MDFM [27]
Recommendation 建议
GNN
Supervised 监督
Test-time Adaptation 测试时间适应
Data - Text Attribute 数据 - 文本属性
Data - Augment 数据 - 增强
N/A 不适用
-
PCRec [307]
Recommendation 建议
GNN
Contrastive 对比
Finetune 微调
N/A 不适用
Loss - Pretrain 损失 - 预训练
Explicit - Subgraph 显式 - 子图
-
LLMRec [308] 法学硕士 [308]
Recommendation 建议
GNN
Hybrid 混合
In-context 上下文
Data - Text Attribute 数据 - 文本属性
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
Link 链接
VIP5 [309] 贵宾5 [309]
Recommendation 建议
LLM
Generative 生成
In-context 上下文
Data - Text Attribute 数据 - 文本属性
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
Link 链接
2T [310] 2吨 [310]
Recommendation 建议
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
N/A 不适用
-
RLMRec [311]
Recommendation 建议
GNN + LLM GNN + 法学硕士
Hybrid 混合
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
AnomalyGFM [26] 异常 GFM [26]
Anomaly Detection 异常检测
GNN
Supervised 监督
Prototype 原型
Model - Projection 模型 - 投影
Model - Prompt Learning 模型 - 提示学习
Explicit - Link 显式 - 链接
Link 链接
CDFS-GAD [312]
Anomaly Detection 异常检测
GNN
Contrastive 对比
Graph Prompting 图形提示
Model - Projection 模型 - 投影
Model - Prompt Learning 模型 - 提示学习
N/A 不适用
-
ACT [313] 法案 [313]
Anomaly Detection 异常检测
GNN
Contrastive 对比
Prototype 原型
N/A 不适用
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
UNPrompt [314]
Anomaly Detection 异常检测
GNN
Contrastive 对比
Test-time Adaptation 测试时间适应
Model - Projection 模型 - 投影
Data - Augment 数据 - 增强
N/A 不适用
Link 链接
ARC [315] 弧 [315]
Anomaly Detection 异常检测
GNN
Generative 生成
Prototype 原型
Model - Projection 模型 - 投影
N/A 不适用
N/A 不适用
Link 链接
Commander [316] 指挥官 [316]
Anomaly Detection 异常检测
GNN
Hybrid 混合
Test-time Adaptation 测试时间适应
N/A 不适用
Loss - Pretrain 损失 - 预训练
N/A 不适用
-
Method Name Tasks Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
MCDGRAPH [301] Question Answering GNN Generative In-context Data - Text Attribute N/A Explicit - QA Link
GT2Vec [302] Question Answering LLM Contrastive Finetune Data - Text Attribute Data - Augment Explicit - QA -
GPT-4V-GSR [303] Question Answering LLM Generative In-context Data - Text Attribute N/A Explicit - QA -
G-Retriever [304] Question Answering GNN + LLM Supervised In-context Data - Text Attribute N/A Explicit - QA Link
GITA [305] Question Answering GNN + LLM Generative Finetune Data - Text Attribute N/A Explicit - QA -
GFM-RAG [306] Question Answering GNN + LLM Hybrid Finetune Data - Text Attribute N/A N/A Link
SR-MDFM [27] Recommendation GNN Supervised Test-time Adaptation Data - Text Attribute Data - Augment N/A -
PCRec [307] Recommendation GNN Contrastive Finetune N/A Loss - Pretrain Explicit - Subgraph -
LLMRec [308] Recommendation GNN Hybrid In-context Data - Text Attribute Data - Augment Explicit - QA Link
VIP5 [309] Recommendation LLM Generative In-context Data - Text Attribute Data - Augment Explicit - QA Link
2T [310] Recommendation GNN + LLM Supervised Finetune Data - Text Attribute N/A N/A -
RLMRec [311] Recommendation GNN + LLM Hybrid Finetune Data - Text Attribute Loss - Pretrain N/A Link
AnomalyGFM [26] Anomaly Detection GNN Supervised Prototype Model - Projection Model - Prompt Learning Explicit - Link Link
CDFS-GAD [312] Anomaly Detection GNN Contrastive Graph Prompting Model - Projection Model - Prompt Learning N/A -
ACT [313] Anomaly Detection GNN Contrastive Prototype N/A Loss - Pretrain N/A Link
UNPrompt [314] Anomaly Detection GNN Contrastive Test-time Adaptation Model - Projection Data - Augment N/A Link
ARC [315] Anomaly Detection GNN Generative Prototype Model - Projection N/A N/A Link
Commander [316] Anomaly Detection GNN Hybrid Test-time Adaptation N/A Loss - Pretrain N/A -| Method Name | Tasks | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| MCDGRAPH [301] | Question Answering | GNN | Generative | In-context | Data - Text Attribute | N/A | Explicit - QA | Link |
| GT2Vec [302] | Question Answering | LLM | Contrastive | Finetune | Data - Text Attribute | Data - Augment | Explicit - QA | - |
| GPT-4V-GSR [303] | Question Answering | LLM | Generative | In-context | Data - Text Attribute | N/A | Explicit - QA | - |
| G-Retriever [304] | Question Answering | GNN + LLM | Supervised | In-context | Data - Text Attribute | N/A | Explicit - QA | Link |
| GITA [305] | Question Answering | GNN + LLM | Generative | Finetune | Data - Text Attribute | N/A | Explicit - QA | - |
| GFM-RAG [306] | Question Answering | GNN + LLM | Hybrid | Finetune | Data - Text Attribute | N/A | N/A | Link |
| SR-MDFM [27] | Recommendation | GNN | Supervised | Test-time Adaptation | Data - Text Attribute | Data - Augment | N/A | - |
| PCRec [307] | Recommendation | GNN | Contrastive | Finetune | N/A | Loss - Pretrain | Explicit - Subgraph | - |
| LLMRec [308] | Recommendation | GNN | Hybrid | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| VIP5 [309] | Recommendation | LLM | Generative | In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| 2T [310] | Recommendation | GNN + LLM | Supervised | Finetune | Data - Text Attribute | N/A | N/A | - |
| RLMRec [311] | Recommendation | GNN + LLM | Hybrid | Finetune | Data - Text Attribute | Loss - Pretrain | N/A | Link |
| AnomalyGFM [26] | Anomaly Detection | GNN | Supervised | Prototype | Model - Projection | Model - Prompt Learning | Explicit - Link | Link |
| CDFS-GAD [312] | Anomaly Detection | GNN | Contrastive | Graph Prompting | Model - Projection | Model - Prompt Learning | N/A | - |
| ACT [313] | Anomaly Detection | GNN | Contrastive | Prototype | N/A | Loss - Pretrain | N/A | Link |
| UNPrompt [314] | Anomaly Detection | GNN | Contrastive | Test-time Adaptation | Model - Projection | Data - Augment | N/A | Link |
| ARC [315] | Anomaly Detection | GNN | Generative | Prototype | Model - Projection | N/A | N/A | Link |
| Commander [316] | Anomaly Detection | GNN | Hybrid | Test-time Adaptation | N/A | Loss - Pretrain | N/A | - |
and P_(D^(s))\mathcal{P}_{\mathcal{D}^{s}}, and the label space Y^(S)\mathcal{Y}^{S} and Y^(T)\mathcal{Y}^{T} are disjoint. CDFSGC [285] proposed a graph encoder that learns to attend to three congruent views of graphs, one contextual and two topological views, to learn representations of task-specific information for fast adaptation and task-agnostic information for knowledge transfer. Coupled with metric-based meta-learning frameworks, their method achieved great performance across three graph classification tasks in different domains. Furthermore, to tackle the domain shift issue, CDTC [300] designed a novel Cross-domain Task Coordinator (CDTC) to leverage a small set of labeled target domain data as prompt tasks {p^(t)}_(t=1)^(T)\left\{\mathbf{p}^{t}\right\}_{t=1}^{T}, then model the association and discover the relevance between meta-tasks from the source domain and the prompt tasks. After being integrated with the optimization-based meta-learning process and trained with reinforcement learning in an end-to-end manner, CDTC excels at multiple cross-domain few-shot graph classification tasks. 和 P_(D^(s))\mathcal{P}_{\mathcal{D}^{s}} ,标签空间 Y^(S)\mathcal{Y}^{S} 和 Y^(T)\mathcal{Y}^{T} 是不相交的。CDFSGC [285]提出了一种图编码器,该编码器可以学习关注图的三个全等视图,一个上下文视图和两个拓扑视图,以学习特定于任务的信息的表示以进行快速适应,并学习与任务无关的信息以进行知识转移。结合基于度量的元学习框架,他们的方法在不同领域的三个图分类任务中取得了出色的性能。此外,为了解决领域偏移问题,CDTC [300]设计了一种新型的跨域任务协调器(CDTC),利用一小组标记的目标域数据作为提示任务 {p^(t)}_(t=1)^(T)\left\{\mathbf{p}^{t}\right\}_{t=1}^{T} ,然后对关联进行建模,发现源域的元任务与提示任务之间的相关性。在与基于优化的元学习过程集成并以端到端的方式进行强化学习训练后,CDTC 在多个跨领域少样本图分类任务中表现出色。
Out-of-Distribution Detection. AAGOD [282] endowed a well-trained GNN with the OOD detection ability across domains where we have P_("in ")\mathcal{P}_{\text {in }} and P_("out ")\mathcal{P}_{\text {out }}, two distinct distributions defined in graph space. In the training phase, a graph dataset D_(id)={G^(1),dots,G^(n)}\mathcal{D}_{i d}=\left\{G^{1}, \ldots, G^{n}\right\} sampled from the in-distribution P_(in)\mathcal{P}_{i n} is available for model learning and the general purpose of is to distinguish whether a graph belongs to in-distribution P_("in ")\mathcal{P}_{\text {in }} or not at the test phase. AAGOD does not require modifying its parameters by designing an effective framework with the Learnable Amplifier Generator (LAG) and Regularize Learning Strategy (RLS). 分布外检测。AAGOD [282]赋予了训练有素的 GNN 跨域的 OOD 检测能力,其中我们在 P_("in ")\mathcal{P}_{\text {in }}P_("out ")\mathcal{P}_{\text {out }} 图空间中定义了两个不同的分布。在训练阶段,从分布 P_(in)\mathcal{P}_{i n} 内采 D_(id)={G^(1),dots,G^(n)}\mathcal{D}_{i d}=\left\{G^{1}, \ldots, G^{n}\right\} 样的图数据集可用于模型学习,其一般目的是在测试阶段区分图是否属于分布 P_("in ")\mathcal{P}_{\text {in }} 内。AAGOD 不需要通过使用可学习放大器生成器 (LAG) 和正则化学习策略 (RLS) 设计有效的框架来修改其参数。
6.4.6 Future Directions 6.4.6 未来方向
The future of graph foundation models (GFMs) for graph-level tasks lies in developing unified and flexible architectures, such as Graph Transformers with structural encoding, to capture domain-agnostic properties while enabling hierarchical pooling for multi-scale representations. Scalability challenges can be mitigated through subgraph sampling, linearized attention, and distributed training, while meta-learning and promptbased fine-tuning will enable rapid adaptation to new domains. Robustness to noise and sparsity can be improved via graph denoising autoencoders and data augmentation. Integrating multi-modal data (e.g., text, images) and creating cross-domain benchmarks with unified metrics will further advance evaluation and standardization. Finally, prioritizing explainability through interpretable pooling and fairness via adversarial debiasing will ensure trustworthy and ethical AI applications, paving the way for GFMs to revolutionize domains like drug discovery, fraud detection, and climate modeling. 用于图级任务的图基础模型 (GFM) 的未来在于开发统一且灵活的架构,例如具有结构编码的图转换器,以捕获与领域无关的属性,同时为多尺度表示实现分层池化。可扩展性挑战可以通过子图采样、线性化注意力和分布式训练来缓解,而元学习和基于提示的微调将能够快速适应新领域。可以通过图形去噪自动编码器和数据增强来提高噪声和稀疏性的鲁棒性。整合多模态数据(例如文本、图像)并创建具有统一指标的跨域基准将进一步推进评估和标准化。最后,通过可解释的汇集优先考虑可解释性,并通过对抗性去偏见来确保公平性,将确保人工智能应用值得信赖和合乎道德,为 GFM 彻底改变药物发现、欺诈检测和气候建模等领域铺平道路。
6.5 Question Answering 6.5 问答
Question Answering (QA) is another type of downstream task in GFMs. Existing studies for QA tasks commonly employ GNNs as enhancer and further leverage LLMs to generate answers. GFM-RAG [306] designs a graph foundation model with retrieval augmented generation for QA tasks. It first builds a knowledge graph index (KG-index) from the documents to capture the relationships between entities. Then, GFM-RAG feeds the query and the constructed KG-index into the pre-trained GFM retriever to obtain relevant documents for LLM generation. The GFM retriever experiences large-scale training and can be directly applied to unseen 问答 (QA) 是 GFM 中的另一种下游任务。现有的 QA 任务研究通常使用 GNN 作为增强器,并进一步利用 LLM 来生成答案。GFM-RAG [306]为 QA 任务设计了一个具有检索增强生成功能的图基础模型。它首先从文档中构建知识图谱索引(KG-index),以捕获实体之间的关系。然后,GFM-RAG 将查询和构建的 KG 索引输入到预训练的 GFM 检索器中,以获取用于 LLM 生成的相关文档。GFM 检索器经历大规模训练,可直接应用于看不见的地方
datasets without fine-tuning. Similarly, G-Retriever [304] introduces additional subgraph construction step before LLM generation for the QA tasks. Moreover, G-Retriever also provides a GraphQA benchmark, that contains three datasets, i.e., ExplaGraphs [317], SceneGraphs [318], and WebQSP [319, 320]. GITA [305] introduces a graph visualizer component to obtain graph visualizations. These constructed visual graphs along with textual descriptions for graph structures are fed into visual language models (VLMs) to obtain QA tasks. Another work [303] proposes a paradigm for understanding and reasoning over graph image data by integrating image encoding and multimodal technologies, i.e., OCR. VGCURE [301] introduces a comprehensive benchmark that covers 22 tasks to examine the fundamental understanding and reasoning capabilities of VLMs. GT2VEC [302] is a framework that learns joint embeddings of text and graph data using Large Language Models (LLMs) by projecting graph embeddings into the text embedding space and employing contrastive learning for alignment. This approach enhances semantic coherence between the modalities for QA tasks. 无需微调的数据集。同样,G-Retriever [304]在 QA 任务的 LLM 生成之前引入了额外的子图构建步骤。此外,G-Retriever 还提供了一个 GraphQA 基准测试,其中包含三个数据集,即 ExplaGraphs[317]、SceneGraphs[318]和 WebQSP[319,320]。GITA [305]引入了一个图形可视化器组件来获得图形可视化。这些构建的可视化图以及图结构的文本描述被输入到视觉语言模型 (VLM) 中以获得 QA 任务。另一项工作[303]提出了一种通过集成图像编码和多模态技术(即 OCR)来理解和推理图图像数据的范式。VGCURE [301]引入了一个涵盖 22 项任务的综合基准测试,以检查 VLM 的基本理解和推理能力。GT2VEC [302]是一个框架,它通过将图嵌入投影到文本嵌入空间中并采用对比学习进行对齐,使用大型语言模型(LLMs)学习文本和图形数据的联合嵌入。这种方法增强了 QA 任务模态之间的语义一致性。
Future Directions. Future directions for developing GFMs for QA tasks focus on improving adaptability, scalability, and reasoning capabilities. One promising direction is dynamic graph construction from unstructured data during retrieval, which enables models to create context-specific graphs on the fly [321, 322]. Multimodal integration is another key direction, which allows GFMs to handle textual, visual, and other modalities for richer reasoning capability [323, 294]. Additionally, improving explainability and transparency in GFMs will be critical for applications requiring trust and accountability, such as medical or legal QA systems [323, 324]. 未来方向。未来开发 QA 任务 GFM 的方向侧重于提高适应性、可扩展性和推理能力。一个有前途的方向是在检索过程中从非结构化数据构建动态图,这使模型能够动态创建特定于上下文的图[321,322]。多模态集成是另一个关键方向,它允许 GFM 处理文本、视觉和其他模态,以获得更丰富的推理能力[323,294]。此外,提高 GFM 的可解释性和透明度对于需要信任和问责制的应用至关重要,例如医疗或法律 QA 系统[323,324]。
6.6 Graph Anomaly Detection 6.6 图形异常检测
Graph anomaly detection (GAD) aims to identify anomaly samples in terms of node and structure levels that deviate from the majority of samples. Early studies [313, 316] focus on cross-domain graph anomaly detection (CD-GAD), i.e., detecting anomalous nodes in an unlabeled target graph by training models over auxiliary, related source graphs with labeled abnormal and normal nodes. These studies commonly employ domain adaptation approaches to tackle this problem. Commander [316] introduces three components, i.e., a domain discriminator for domain alignment, an anomaly classifier to detect anomalies, and an attribute decoder to provide additional signals for assessing node abnormality, to study the cross-domain graph anomaly detection tasks. Another study, i.e. ACT [313], also proposes a domain adaptation approach that jointly optimizes: (i) unsupervised contrastive learning over normal representations of nodes in the target graph and (ii) anomaly-aware one-class alignment that aligns contrastive node representations and representations of labeled normal nodes in the source graph. Moreover, in the contrastive learning stage, CD-GAD enforces deviation of normal node representations from labeled anomalous nodes in the source graph. Besides CD-GAD, CDFS-GAD [312] proposes a more prevalent and complex scenario of cross-domain few-shot graph anomaly detection. The goal is to identify anomalies within sparsely labeled target graphs using auxiliary graphs from a related yet distinct domain. To address this problem, CDFS-GAD introduces a prompt tuning module to extract domain-specific features tailored to each domain, further design an adaptive hypersphere classification loss to enhance the discrimination between normal and abnormal instances via domain-sensitive norms. 图异常检测 (GAD) 旨在根据偏离大多数样本的节点和结构水平来识别异常样本。早期研究[313,316]侧重于跨域图异常检测(CD-GAD),即通过训练模型在具有标记异常和正常节点的辅助相关源图上来检测未标记的目标图中的异常节点。这些研究通常采用领域适应方法来解决这个问题。Commander [316]引入了三个组件,即用于域对齐的域鉴别器、用于检测异常的异常分类器和用于评估节点异常的附加信号的属性解码器,以研究跨域图异常检测任务。另一项研究,即 ACT [313],也提出了一种域自适应方法,该方法共同优化:(i)对目标图中节点的正态表示进行无监督对比学习,以及(ii)异常感知单类对齐,对齐对比节点表示和源图中标记的正态节点的表示。此外,在对比学习阶段,CD-GAD 强制正常节点表示与源图中标记的异常节点的偏差。除了 CD-GAD 之外,CDFS-GAD [312]提出了一种更普遍、更复杂的跨域少样本图异常检测场景。目标是使用来自相关但不同的域的辅助图来识别稀疏标记的目标图中的异常。针对这一问题,CDFS-GAD 引入了提示调优模块,提取针对每个域的域特定特征,进一步设计自适应超球体分类损失,通过域敏感范数增强正常实例和异常实例的区分。
More recently, several studies have started to focus on GFMs for GAD. ARC [315] designs a “one-for-all” GAD model to detect anomalies across various graph datasets on-the-fly. It is equipped with the in-context learning to extract dataset specific patterns from the target datasets with few-shot normal samples at the inference stage, without the need for fine-tune on the target dataset. Other studies focus on zero-shot GAD tasks, i.e., no label information in the target graphs are provided. UNPrompt [314], introduces a simple prompt tuning module that captures the generalized patterns among latent attributes of normal nodes while minimizing that of abnormal nodes. Another study, AnomalyGFM [26], proposes a GFM for graph anomaly detection that leverages graph-agnostic representations to achieve strong zero-shot generalization. Besides, if the labels of samples s are available, it supports few-shot prompt tuning across diverse graph datasets. By aligning learnable normal and abnormal class prototypes with node representation residuals, AnomalyGFM distills discriminative features, enabling effective anomaly measurement in a unified feature space. 最近,一些研究开始关注广泛性焦虑症的 GFM。ARC [315]设计了一个“一对一”的 GAD 模型,用于即时检测各种图数据集中的异常。它配备了上下文学习,可以在推理阶段从目标数据集中提取特定模式,并在推理阶段使用少量样本的正态样本,而无需对目标数据集进行微调。其他研究侧重于零样本 GAD 任务,即目标图中没有提供标签信息。UNPrompt [314]引入了一个简单的提示调整模块,该模块捕获正常节点潜在属性之间的广义模式,同时最小化异常节点的模式。另一项研究 AnomalyGFM [26]提出了一种用于图异常检测的 GFM,该 GFM 利用与图无关的表示来实现强零样本泛化。此外,如果样本的标签可用,则支持跨不同图数据集的少样本提示调整。通过将可学习的正常和异常类原型与节点表示残差对齐,AnomalyGFM 提炼出判别特征,从而在统一的特征空间中实现有效的异常测量。
Future Directions. GAD methods for dynamic and heterogeneous graphs represent a promising research direction, as real-world graphs are typically dynamic-with nodes and edges continuously evolving-and heterogeneous in both node attributes and structures, e.g., social networks and financial transaction networks [325, 326]. Enhancing the robustness of GNNs against adversarial attacks and noisy data is another 未来方向。动态图和异构图的 GAD 方法代表了一个有前途的研究方向,因为现实世界的图通常是动态的,节点和边不断演变,并且在节点属性和结构上都是异构的,例如社交网络和金融交易网络[325,326]。增强 GNN 对抗对抗性攻击和嘈杂数据的鲁棒性是另一回事
critical research focus [327]. Furthermore, improving the interpretability of GNN-based anomaly detection models is essential for fostering trust, particularly in high-stakes applications, e.g., cybersecurity and fraud detection, etc [325]. 关键研究重点[327]。此外,提高基于 GNN 的异常检测模型的可解释性对于培养信任至关重要,特别是在高风险应用中,例如网络安全和欺诈检测等[325]。
6.7 Recommendation 6.7 推荐
Recommendation systems, an important branch of AI, are trained to understand the preferences, previous decisions, and characteristics of people and products using data gathered about their interactions. Graph-based recommenders have demonstrated impressive capabilities in capturing complex user-item relationships, making them state-of-the-art approaches. With the development of GFMs and LLMs, they provide a new perspective for modern recommendation systems. The challenges are mainly due to the complexity of scalability and real-time requirements, domain-specific semantics, and dynamic user behavior. Specifically, platforms like Amazon involve graphs with billions of nodes and edges, straining computational resources. Different domains involve distinct interaction semantics (e.g., “purchase” in e-commerce vs. “follow” in social networks) and user interests may shift over time, requiring models to adapt to temporal patterns. 推荐系统是人工智能的一个重要分支,经过训练,可以使用收集的有关人员和产品交互的数据来了解人员和产品的偏好、先前的决策和特征。基于图形的推荐器在捕获复杂的用户-项目关系方面表现出了令人印象深刻的能力,使其成为最先进的方法。随着 GFM 和 LLM 的发展,它们为现代推荐系统提供了新的视角。挑战主要归因于可扩展性和实时需求的复杂性、特定领域的语义和动态用户行为。具体来说,像亚马逊这样的平台涉及具有数十亿个节点和边的图形,导致计算资源紧张。不同的领域涉及不同的交互语义(例如,电子商务中的“购买”与社交网络中的“关注”),用户兴趣可能会随着时间的推移而变化,需要模型适应时间模式。
GNN-based Approaches. An early work, PCRec [307], developed a pre-training Graph Neural Network (GNN) model for the cross-domain recommendation which adopts a contrastive self-supervised pre-training strategy. Then, the pre-trained GNN encoder can be initialized to generate node embeddings on the target domain and fine-tuned by a bipartite recommendation system using a BPR loss. 基于 GNN 的方法。早期的工作 PCRec [307]为跨域推荐开发了一种预训练图神经网络(GNN)模型,该模型采用对比自监督预训练策略。然后,可以初始化预训练的 GNN 编码器,在目标域上生成节点嵌入,并由使用 BPR 损失的二分推荐系统进行微调。
LLM-based Approaches. Targeting the scarcity of implicit feedback signals in recommendations, LLMRec [308] enhances recommender systems by incorporating LLMs to augment user-item interaction edges, item node attributes, and user node profiles by P_(u)=LLM(S_(u),Q_(u)),quadP_(v)=LLM(S_(v),Q_(v))\mathcal{P}_{u}=\operatorname{LLM}\left(S_{u}, Q_{u}\right), \quad \mathcal{P}_{v}=\operatorname{LLM}\left(S_{v}, Q_{v}\right). Training GNN backbones with a denoised data robustification mechanism enabled LLMRec to achieve great performance on various benchmarks. RLMRec [311], as a model-agnostic framework, is another method aiming to enhance existing GNN-based recommenders with LLM-empowered representation learning. RLMRec also utilizes contrastive and generative alignment techniques to align Collaborative Filtering (CF)-side relational embeddings with LLM-side semantic representations, effectively reducing feature noise. In [310], authors presented a graph-based foundation modeling approach tailored to personalization for the first time. They combined the advantages of LLMs and heterogeneous GNNs (HGNNs) and designed a two-tower (2T) architecture so that while the HGNN produces general-purpose embeddings, the 2T component models the sheer size of user-item interaction data in a continuous space. The benefit of such an approach is that it unifies representation learning across various tasks and enables information sharing. 基于 LLM 的方法。针对推荐中隐式反馈信号的稀缺性,LLMRec [308]通过整合 LLM 来增强推荐系统,以增强用户-项目交互边缘、项目节点属性和用户节点配置文件 P_(u)=LLM(S_(u),Q_(u)),quadP_(v)=LLM(S_(v),Q_(v))\mathcal{P}_{u}=\operatorname{LLM}\left(S_{u}, Q_{u}\right), \quad \mathcal{P}_{v}=\operatorname{LLM}\left(S_{v}, Q_{v}\right) 。使用去噪数据鲁棒化机制训练 GNN 主干使 LLMRec 能够在各种基准测试中取得出色的性能。RLMRec [311]作为一个与模型无关的框架,是另一种旨在通过 LLM 赋能的表示学习来增强现有的基于 GNN 的推荐器的方法。RLMRec 还利用对比和生成对齐技术,将协作过滤 (CF) 端关系嵌入与 LLM 端语义表示对齐,有效降低特征噪声。在[310]中,作者首次提出了一种针对个性化量身定制的基于图的基础建模方法。他们结合了 LLM 和异构 GNN (HGNN) 的优势,设计了双塔 (2T) 架构,以便在 HGNN 生成通用嵌入的同时,2T 组件对连续空间中用户-项目交互数据的绝对大小进行建模。这种方法的好处是它统一了各种任务的表示学习并实现了信息共享。
Future Directions. There are several promising future directions for the advancement of graph foundation models in recommendation systems. For instance, we could design lightweight GFM architectures and distributed training frameworks to enhance scalability and effectively manage large-scale graphs. Additionally, integrating graph structures with multi-modal data, such as text and images, could be explored. Adopting prompt-based learning for task-specific recommendations or introducing dynamic GFMs that capture real-time interactions and user interest shift are also viable strategies to consider. 未来方向。图基础模型在推荐系统中的进步有几个有前途的未来方向。例如,我们可以设计轻量级的 GFM 架构和分布式训练框架,以增强可扩展性并有效管理大规模图。此外,还可以探索将图结构与多模态数据(例如文本和图像)集成。采用基于提示的学习来提供特定于任务的建议,或引入捕捉实时交互和用户兴趣转变的动态 GFM 也是值得考虑的可行策略。
7 Domain-Specific Graph Foundation Models 7 个特定领域的图基础模型
7.1 Design Principle 7.1 设计原则
Despite the general-purpose nature of foundation models, there is growing interest in designing domain-specific GFMs. In this setting, a single model learns shared representations that generalize across related tasks within a specific domain. Constructing such models is non-trivial, as they must effectively capture the underlying principles and key properties of the target domain. For instance, in molecular graphs, the model must recognize and preserve key motifs and functional groups, while in knowledge graphs, it must infer relationships and correlations between triplets. This section outlines the core characteristics and design principles essential for developing domain-specific GFMs. 尽管基础模型具有通用性,但人们对设计特定领域的 GFM 越来越感兴趣。在此设置中,单个模型学习在特定域内的相关任务中泛化的共享表示。构建此类模型并非易事,因为它们必须有效地捕获目标领域的基本原理和关键属性。例如,在分子图中,模型必须识别并保留关键基序和官能团,而在知识图谱中,它必须推断三元组之间的关系和相关性。本节概述了开发特定领域 GFM 所必需的核心特征和设计原则。
Domain-Specific Expertise. Different domains exhibit distinct structural properties and encode unique forms of knowledge. There is no universal inductive bias [36] that can simultaneously capture the diverse characteristics required for every domain. Therefore, domain-specific GFMs necessitate customized model architectures, pretraining paradigms, and adaptation strategies. For example, in molecular graphs, subgraphlevel augmentation is crucial, as essential semantic components (e.g., aromatic rings and functional groups) are preserved at the subgraph level. Conversely, knowledge graphs often require node-level or edge-level augmentation to generate additional triplets and enhance relational reasoning. 特定领域的专业知识。不同的领域表现出不同的结构特性并编码独特的知识形式。没有普遍的归纳偏差[36]可以同时捕获每个域所需的不同特征。因此,特定领域的 GFM 需要定制的模型架构、预训练范式和适应策略。例如,在分子图中,子图级增强至关重要,因为基本语义成分(例如芳环和官能团)在子图级别保留。相反,知识图谱通常需要节点级或边缘级的增强来生成额外的三元组并增强关系推理。
Learning Task-Agnostic Representations. Tasks and graphs within the same domain typically exhibit strong correlations, making it possible to learn a common graph representation that benefits multiple downstream tasks. Techniques such as multi-task learning [30, 328], adversarial learning, augmentation strategies, and domain regularization can be employed to achieve robust and generalizable representations. However, despite these benefits, task interference remains a challenge, potentially leading to negative transfer effects. To mitigate this, incorporating task-aware output heads [329] or advanced task alignment strategies [293, 330] can improve task-specific adaptation while maintaining shared domain knowledge. 学习与任务无关的表示。同一域内的任务和图通常表现出很强的相关性,从而可以学习有利于多个下游任务的通用图表示。可以采用多任务学习[30,328]、对抗学习、增强策略和域正则化等技术来实现鲁棒和可推广的表示。然而,尽管有这些好处,任务干扰仍然是一个挑战,可能导致负面的传输效应。为了缓解这种情况,结合任务感知输出头[329]或高级任务对齐策略[293,330]可以改善特定于任务的适应,同时保持共享的领域知识。
Enhancing Interpretability and Trustworthiness. In many domain-specific applications (e.g., drug discovery, mathematical reasoning, academic research), it is not only important for GFMs to achieve high performance on downstream tasks but also to generate interpretable insights that contribute to domain advancement. For instance, a GFM pretrained on large-scale molecular datasets may aid in the discovery of novel functional groups, reaction mechanisms, or physicochemical properties. To enable such discoveries, domain-specific GFMs must prioritize interpretability and trustworthiness. This may involve incorporating explainable AI techniques, uncertainty quantification, and domain-specific validation mechanisms. 增强可解释性和可信度。在许多特定领域的应用(例如,药物发现、数学推理、学术研究)中,GFM 不仅要在下游任务上实现高性能,而且还要生成有助于领域进步的可解释见解。例如,在大规模分子数据集上预训练的 GFM 可能有助于发现新的官能团、反应机制或理化性质。为了实现此类发现,特定领域的 GFM 必须优先考虑可解释性和可信度。这可能涉及结合可解释的人工智能技术、不确定性量化和特定领域的验证机制。
In the following subsections, we systematically explore the design principles of GFMs across eight distinct domains: molecular graphs, heterogeneous graphs, knowledge graphs, temporal graphs, academia, graph-based mathematical reasoning, causal graphs, and semantic document graphs. For each domain, we discuss the underlying design philosophies, key challenges, state-of-the-art methodologies, and promising directions for future research. 在以下小节中,我们系统地探讨了 GFM 在八个不同领域的设计原则:分子图、异构图、知识图、时间图、学术界、基于图的数学推理、因果图和语义文档图。对于每个领域,我们讨论了潜在的设计理念、主要挑战、最先进的方法以及未来研究的有希望的方向。
7.2 Biology & Molecule Graph 7.2 生物学和分子图
Molecular graphs present unique challenges distinct from other graph domains due to their rich atomic and bond-level structural complexity and chemical feature diversity. Unlike homogeneous graphs, molecular representations must capture chemical symmetries, such as invariance under rotations, translations, and atom permutations [349]. Additionally, molecular properties arise from intricate interactions spanning multiple scales-from local functional groups to global molecular topology-and are influenced by quantum effects and flexible 3D conformations not captured by purely 2D connectivity [350]. Further challenges include scalability to an astronomically large molecular space and limited labeled data due to costly experiments [351]. These issues necessitate specialized graph foundation models integrating symmetry-aware, self-supervised, and physics-informed approaches for reliable molecular prediction and generalization. 分子图由于其丰富的原子和键级结构复杂性以及化学特征多样性,提出了与其他图域不同的独特挑战。与齐次图不同,分子表示必须捕捉化学对称性,例如旋转、平移和原子排列下的不变性[349]。此外,分子特性源于跨越多个尺度的复杂相互作用——从局部官能团到全局分子拓扑——并受到量子效应和柔性三维构象的影响,而纯粹的二维连接无法捕获[350]。进一步的挑战包括可扩展到天文级大分子空间,以及由于实验成本高昂而导致的标记数据有限[351]。这些问题需要专门的图基础模型,集成对称感知、自监督和物理信息方法,以实现可靠的分子预测和泛化。
Table 9: Summary of domain-specific GFMs on biology and molecule graphs. 表 9:生物学和分子图上结构域特异性 GFM 的总结。
Method Name 方法名称
Domain 域
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
MiniMol [88] 迷你摩尔 [88]
Biology & Molecule Graph 生物学与分子图
GNN
Supervised 监督
Finetune 微调
N/A 不适用
Data - Augment 数据 - 增强
Implicit - Aux. Loss 隐式 - 辅助损失
-
DPA-2 [328]
Biology & Molecule Graph 生物学与分子图
GNN
Supervised 监督
Distillation, Finetune 蒸馏、微调
N/A 不适用
Loss - Multi-task 损失 - 多任务处理
N/A 不适用
Link 链接
JMP [30]
Biology & Molecule Graph 生物学与分子图
GNN
Supervised 监督
Finetune 微调
Data - Node Property 数据 - 节点属性
Loss - Multi-task 损失 - 多任务处理
N/A 不适用
Link 链接
MACE [329] 狼牙棒 [329]
Biology & Molecule Graph 生物学与分子图
GNN
Supervised 监督
Finetune 微调
Data - Node Property 数据 - 节点属性
N/A 不适用
N/A 不适用
Link 链接
MolGPS [331] 摩尔全球定位系统 [331]
Biology & Molecule Graph 生物学与分子图
GNN
Supervised 监督
Test-time Adaptation 测试时间适应
N/A 不适用
N/A 不适用
N/A 不适用
Link 链接
DiG [332] 挖掘 [332]
Biology & Molecule Graph 生物学与分子图
GNN
Generative 生成
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Data - Augment 日期 - 增加
N/A 不适用
Link 链接
GROVER [287] 格罗弗 [287]
Biology & Molecule Graph 生物学与分子图
GNN
Generative 生成
Finetune 微调
N/A 不适用
Loss - Pretrain 损失 - 预训练
N/A 不适用
-
GTFM [333]
Biology & Molecule Graph 生物学与分子图
GNN
Generative 生成
Finetune 微调
N/A 不适用
Loss - Multi-task 损失 - 多任务处理
N/A 不适用
-
Mole-BERT [73] 鼹鼠-伯特 [73]
Biology & Molecule Graph 生物学与分子图
GNN
Hybrid 混合
Finetune 微调
N/A 不适用
Loss - Pretrain, Model Codebook 损失 - 预训练,模型密码本
N/A 不适用
Link 链接
MolecularGPT [334] 分子 GPT [334]
Biology & Molecule Graph 生物学与分子图
LLM
Supervised 监督
Finetune, In-context 微调,上下文
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
Link 链接
GP-GPT [335]
Biology & Molecule Graph 生物学与分子图
LLM
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Data - Augment 数据 - 增强
Explicit - QA 显式 - QA
-
BioBRIDGE [336] 生物桥 [336]
Biology & Molecule Graph 生物学与分子图
LLM
Contrastive 对比
In-context 上下文
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
Explicit - QA 显式 - QA
Link 链接
GraphsGPT [291] 图 GPT [291]
Biology & Molecule Graph 生物学与分子图
LLM
Generative 生成
Finetune, Prototype 微调,原型
N/A 不适用
Data - Augment 日期 - 增加
Explicit - QA 显式 - QA
Link 链接
ESMFold [337] ESM 折叠 [337]
Biology & Molecule Graph 生物学与分子图
LLM
Generative 生成
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
-
CaR [338] 卡尔 [338]
Biology & Molecule Graph 生物学与分子图
LLM
Generative 生成
Finetune, In-context 微调,上下文
Data - Text Attribute 数据 - 文本属性
Data - Augment 日期 - 增加
Explicit - QA 显式 - QA
Link 链接
InstructMol [339]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Auxiliary, Model - Retriever 损失 - 辅助,模型 - 猎犬
Explicit - QA 显式 - QA
Link 链接
ALIGNN [340] 阿莱恩 [340]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
N/A 不适用
Link 链接
GIMLET [293] 金曲 [293]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Supervised 监督
In-context 上下文
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
Link 链接
GALLON [294] 加仑 [294]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Supervised 监督
Distillation 蒸馏
Data - Node Property 数据 - 节点属性
Loss - Auxiliary 损失 - 辅助
N/A 不适用
-
Text2Mol [341] 文本 2Mol [341]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Multi-task 损失 - 多任务处理
N/A 不适用
Link 链接
MoleculeSTM [342] 分子 STM [342]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
Explicit - QA 显式 - QA
Link 链接
CLAMP [343] 夹具 [343]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
Explicit - QA 显式 - QA
Link 链接
GIT-Mol [344] GIT-摩尔 [344]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Multi-task 损失 - 多任务处理
Explicit - QA 显式 - QA
Link 链接
MolFM [345] 分子调频 [345]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
Explicit - QA 显式 - QA
Link 链接
MoMu [346]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Contrastive 对比
Finetune, In-context 微调,上下文
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
Explicit - QA 显式 - QA
Link 链接
UniMoT [31]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Generative 生成
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook, Model - Retriever 模型 - 密码本, 模型 - 检索器
N/A 不适用
Link 链接
MolCA [347] 摩尔卡 [347]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Generative 生成
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Retriever 型号 - Retriever
Explicit - QA 显式 - QA
Link 链接
ReLM [348]
Biology & Molecule Graph 生物学与分子图
GNN + LLM GNN + 法学硕士
Generative 生成
In-context 上下文
Data - Node Property 数据 - 节点属性
Model - Retriever 型号 - Retriever
Explicit - QA 显式 - QA
Link 链接
Method Name Domain Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
MiniMol [88] Biology & Molecule Graph GNN Supervised Finetune N/A Data - Augment Implicit - Aux. Loss -
DPA-2 [328] Biology & Molecule Graph GNN Supervised Distillation, Finetune N/A Loss - Multi-task N/A Link
JMP [30] Biology & Molecule Graph GNN Supervised Finetune Data - Node Property Loss - Multi-task N/A Link
MACE [329] Biology & Molecule Graph GNN Supervised Finetune Data - Node Property N/A N/A Link
MolGPS [331] Biology & Molecule Graph GNN Supervised Test-time Adaptation N/A N/A N/A Link
DiG [332] Biology & Molecule Graph GNN Generative Finetune Data - Text Attribute Data - Augment N/A Link
GROVER [287] Biology & Molecule Graph GNN Generative Finetune N/A Loss - Pretrain N/A -
GTFM [333] Biology & Molecule Graph GNN Generative Finetune N/A Loss - Multi-task N/A -
Mole-BERT [73] Biology & Molecule Graph GNN Hybrid Finetune N/A Loss - Pretrain, Model Codebook N/A Link
MolecularGPT [334] Biology & Molecule Graph LLM Supervised Finetune, In-context Data - Text Attribute N/A Explicit - QA Link
GP-GPT [335] Biology & Molecule Graph LLM Supervised Finetune Data - Text Attribute Data - Augment Explicit - QA -
BioBRIDGE [336] Biology & Molecule Graph LLM Contrastive In-context Data - Text Attribute Loss - Pretrain Explicit - QA Link
GraphsGPT [291] Biology & Molecule Graph LLM Generative Finetune, Prototype N/A Data - Augment Explicit - QA Link
ESMFold [337] Biology & Molecule Graph LLM Generative Finetune Data - Text Attribute N/A Explicit - QA -
CaR [338] Biology & Molecule Graph LLM Generative Finetune, In-context Data - Text Attribute Data - Augment Explicit - QA Link
InstructMol [339] Biology & Molecule Graph GNN + LLM Supervised Finetune Data - Text Attribute Loss - Auxiliary, Model - Retriever Explicit - QA Link
ALIGNN [340] Biology & Molecule Graph GNN + LLM Supervised Finetune Data - Text Attribute N/A N/A Link
GIMLET [293] Biology & Molecule Graph GNN + LLM Supervised In-context Data - Text Attribute N/A Explicit - QA Link
GALLON [294] Biology & Molecule Graph GNN + LLM Supervised Distillation Data - Node Property Loss - Auxiliary N/A -
Text2Mol [341] Biology & Molecule Graph GNN + LLM Contrastive Finetune Data - Text Attribute Loss - Multi-task N/A Link
MoleculeSTM [342] Biology & Molecule Graph GNN + LLM Contrastive Finetune Data - Text Attribute Loss - Pretrain Explicit - QA Link
CLAMP [343] Biology & Molecule Graph GNN + LLM Contrastive Finetune Data - Text Attribute Loss - Pretrain Explicit - QA Link
GIT-Mol [344] Biology & Molecule Graph GNN + LLM Contrastive Finetune Data - Text Attribute Loss - Multi-task Explicit - QA Link
MolFM [345] Biology & Molecule Graph GNN + LLM Contrastive Finetune Data - Text Attribute Loss - Pretrain Explicit - QA Link
MoMu [346] Biology & Molecule Graph GNN + LLM Contrastive Finetune, In-context Data - Text Attribute Loss - Pretrain Explicit - QA Link
UniMoT [31] Biology & Molecule Graph GNN + LLM Generative Finetune Data - Text Attribute Model - Codebook, Model - Retriever N/A Link
MolCA [347] Biology & Molecule Graph GNN + LLM Generative Finetune Data - Text Attribute Model - Retriever Explicit - QA Link
ReLM [348] Biology & Molecule Graph GNN + LLM Generative In-context Data - Node Property Model - Retriever Explicit - QA Link| Method Name | Domain | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| MiniMol [88] | Biology & Molecule Graph | GNN | Supervised | Finetune | N/A | Data - Augment | Implicit - Aux. Loss | - |
| DPA-2 [328] | Biology & Molecule Graph | GNN | Supervised | Distillation, Finetune | N/A | Loss - Multi-task | N/A | Link |
| JMP [30] | Biology & Molecule Graph | GNN | Supervised | Finetune | Data - Node Property | Loss - Multi-task | N/A | Link |
| MACE [329] | Biology & Molecule Graph | GNN | Supervised | Finetune | Data - Node Property | N/A | N/A | Link |
| MolGPS [331] | Biology & Molecule Graph | GNN | Supervised | Test-time Adaptation | N/A | N/A | N/A | Link |
| DiG [332] | Biology & Molecule Graph | GNN | Generative | Finetune | Data - Text Attribute | Data - Augment | N/A | Link |
| GROVER [287] | Biology & Molecule Graph | GNN | Generative | Finetune | N/A | Loss - Pretrain | N/A | - |
| GTFM [333] | Biology & Molecule Graph | GNN | Generative | Finetune | N/A | Loss - Multi-task | N/A | - |
| Mole-BERT [73] | Biology & Molecule Graph | GNN | Hybrid | Finetune | N/A | Loss - Pretrain, Model Codebook | N/A | Link |
| MolecularGPT [334] | Biology & Molecule Graph | LLM | Supervised | Finetune, In-context | Data - Text Attribute | N/A | Explicit - QA | Link |
| GP-GPT [335] | Biology & Molecule Graph | LLM | Supervised | Finetune | Data - Text Attribute | Data - Augment | Explicit - QA | - |
| BioBRIDGE [336] | Biology & Molecule Graph | LLM | Contrastive | In-context | Data - Text Attribute | Loss - Pretrain | Explicit - QA | Link |
| GraphsGPT [291] | Biology & Molecule Graph | LLM | Generative | Finetune, Prototype | N/A | Data - Augment | Explicit - QA | Link |
| ESMFold [337] | Biology & Molecule Graph | LLM | Generative | Finetune | Data - Text Attribute | N/A | Explicit - QA | - |
| CaR [338] | Biology & Molecule Graph | LLM | Generative | Finetune, In-context | Data - Text Attribute | Data - Augment | Explicit - QA | Link |
| InstructMol [339] | Biology & Molecule Graph | GNN + LLM | Supervised | Finetune | Data - Text Attribute | Loss - Auxiliary, Model - Retriever | Explicit - QA | Link |
| ALIGNN [340] | Biology & Molecule Graph | GNN + LLM | Supervised | Finetune | Data - Text Attribute | N/A | N/A | Link |
| GIMLET [293] | Biology & Molecule Graph | GNN + LLM | Supervised | In-context | Data - Text Attribute | N/A | Explicit - QA | Link |
| GALLON [294] | Biology & Molecule Graph | GNN + LLM | Supervised | Distillation | Data - Node Property | Loss - Auxiliary | N/A | - |
| Text2Mol [341] | Biology & Molecule Graph | GNN + LLM | Contrastive | Finetune | Data - Text Attribute | Loss - Multi-task | N/A | Link |
| MoleculeSTM [342] | Biology & Molecule Graph | GNN + LLM | Contrastive | Finetune | Data - Text Attribute | Loss - Pretrain | Explicit - QA | Link |
| CLAMP [343] | Biology & Molecule Graph | GNN + LLM | Contrastive | Finetune | Data - Text Attribute | Loss - Pretrain | Explicit - QA | Link |
| GIT-Mol [344] | Biology & Molecule Graph | GNN + LLM | Contrastive | Finetune | Data - Text Attribute | Loss - Multi-task | Explicit - QA | Link |
| MolFM [345] | Biology & Molecule Graph | GNN + LLM | Contrastive | Finetune | Data - Text Attribute | Loss - Pretrain | Explicit - QA | Link |
| MoMu [346] | Biology & Molecule Graph | GNN + LLM | Contrastive | Finetune, In-context | Data - Text Attribute | Loss - Pretrain | Explicit - QA | Link |
| UniMoT [31] | Biology & Molecule Graph | GNN + LLM | Generative | Finetune | Data - Text Attribute | Model - Codebook, Model - Retriever | N/A | Link |
| MolCA [347] | Biology & Molecule Graph | GNN + LLM | Generative | Finetune | Data - Text Attribute | Model - Retriever | Explicit - QA | Link |
| ReLM [348] | Biology & Molecule Graph | GNN + LLM | Generative | In-context | Data - Node Property | Model - Retriever | Explicit - QA | Link |
7.2.1 Graph Model-based Approaches 7.2.1 基于图模型的方法
Graph data format is a natural fit for molecules because each molecule can be represented as a graph of atoms and bonds. Graph-based methods such as message-passing operators in Graph Neural Networks, permit explicit modeling of local and long-range interactions within the graph, providing a structure-aware representation often crucial for accurate property predictions. 图数据格式非常适合分子,因为每个分子都可以表示为原子和键的图。基于图的方法(例如图神经网络中的消息传递运算符)允许对图中的局部和远程交互进行显式建模,提供结构感知表示,这对于准确的属性预测通常至关重要。
3D Graph. One line of research focuses on developing equivariant graph models that incorporate threedimensional (3D) coordinates and symmetries. The Equivariant Foundation Model for Atomistic Materials Chemistry [329] preserves rotational and translational symmetries, thereby enabling accurate classification and link prediction in materials chemistry. Similarly, Joint Multi-domain Pre-training (JMP) [30] employs GemNetOC to bridge small molecules, catalysts, and bulk materials for atomic property prediction, highlighting that an equivariant 3D message-passing strategy can generalize across diverse chemical domains. 3D 图形。其中一项研究重点是开发包含三维 (3D) 坐标和对称性的等变图模型。原子材料化学等变基础模型[329]保留了旋转和平移对称性,从而实现了材料化学中的精确分类和链接预测。同样,联合多域预训练(JMP)[30]采用 GemNetOC 桥接小分子、催化剂和块状材料进行原子性质预测,这突显了等变 3D 消息传递策略可以在不同的化学领域进行推广。
2D Graph. Other works concentrate on purely 2D connectivity and multi-task learning. Mole-BERT [73] combines a Graph Isomorphism Network (GIN) with masked-atom modeling and contrastive tasks, advancing property prediction by leveraging learned atom-level representations. Graphium [352] similarly supports multi-task learning across quantum-mechanical and bioassay datasets, employing GCNs or GINE variants. These frameworks underscore how large-scale or multi-task pre-training on 2D molecular graphs can yield strong predictive performance on tasks such as toxicity classification or ADMET endpoints. 2D 图形。其他工作专注于纯粹的 2D 连接和多任务学习。Mole-BERT [73]将图同构网络(GIN)与掩膜原子建模和对比任务相结合,通过利用学习到的原子级表示来推进属性预测。Graphium [352]同样支持跨量子力学和生物测定数据集的多任务学习,采用 GCN 或 GINE 变体。这些框架强调了在二维分子图上进行大规模或多任务预训练如何在毒性分类或 ADMET 终点等任务上产生强大的预测性能。
Efficiency-Focused Methods. While many of those graph models aim for maximal accuracy, some recent efforts prioritize parameter efficiency. MiniMol [88] adopts a compact GNN-based design and multi-task pre-training to handle quantum and biological assays simultaneously. By using fewer parameters while retaining robust predictive power, MiniMol exemplifies a trend toward more deployable, foundation-like GNN architectures. Altogether, these approaches show that well-designed message passing-either in 2D or with explicit 3D coordinates-remains a reliable backbone for molecular property prediction, and serves as a springboard for subsequent multimodal or language-driven models. 注重效率的方法。虽然其中许多图形模型都旨在实现最大精度,但最近的一些努力优先考虑参数效率。MiniMol [88]采用基于 GNN 的紧凑设计和多任务预训练,可同时处理量子和生物测定。通过使用更少的参数,同时保留强大的预测能力,MiniMol 体现了更可部署、类似基础的 GNN 架构的趋势。总而言之,这些方法表明,精心设计的消息传递(无论是二维还是显式三维坐标)仍然是分子属性预测的可靠支柱,并作为后续多模态或语言驱动模型的跳板。
7.2.2 Language Model-based Approaches 7.2.2 基于语言模型的方法
Language-based methods are motivated by the remarkable success of Transformers in capturing complex dependencies in sequences, including natural language. Because molecules can be linearized (for instance, via SMILES) or otherwise tokenized, researchers have explored purely Transformer architectures to model chemical or biological sequences, effectively treating them like language data. 基于语言的方法的动机是 Transformer 在捕获序列(包括自然语言)中的复杂依赖关系方面取得的显着成功。由于分子可以线性化(例如,通过 SMILES)或以其他方式标记化,因此研究人员探索了纯粹的 Transformer 架构来模拟化学或生物序列,有效地将它们视为语言数据。
Transformer-based Models. A prominent branch of LM-based research applies Transformers directly to molecular graphs. GROVER [287] merges GNN-like message passing with global self-attention for large-scale property prediction. DiG [353] (built on a Graphormer-style design) models entire equilibrium distributions of molecular conformations, providing thermodynamic insights beyond simple endpoint predictions. GraphGPT [291] views each node and edge as a “token,” training on 100 million molecules in a purely Transformer-based manner, whereas the Graph Transformer Foundation Model (GTFM) [333] specializes in ADMET tasks. BioBridge [336], meMeanwhile, employs a Transformer to align knowledge-graph triplets ( v_(i),e_(ij),v_(j)v_{i}, e_{i j}, v_{j} ) across multiple biomedical modalities without relying on GNNs. 基于变压器的模型。基于 LM 的研究的一个突出分支将 Transformer 直接应用于分子图。GROVER [287]将类 GNN 消息传递与全局自注意力相结合,用于大规模属性预测。DiG [353](基于 Graphormer 式设计)对分子构象的整个平衡分布进行建模,提供了超越简单终点预测的热力学见解。GraphGPT [291]将每个节点和边视为一个“标记”,以纯粹基于 Transformer 的方式对 1 亿个分子进行训练,而 Graph Transformer 基础模型(GTFM)[333]则专门研究 ADMET 任务。BioBridge [336]同时,me 同时使用 Transformer 在不依赖 GNN 的情况下跨多种生物医学模式对齐知识图谱三元组( v_(i),e_(ij),v_(j)v_{i}, e_{i j}, v_{j} )。
Large Language Models. Another direction bypasses graph encoding altogether by treating chemical or biological strings as input to large language models. Formally, given textual descriptions d_(G)\mathbf{d}_{\mathcal{G}}, one example [338] investigates whether an LLM alone, represented as LLM( d_(G)\mathbf{d}_{\mathcal{G}} ), can handle molecular property prediction in zero- or few-shot settings. MolecularGPT [334] enriches SMILES prompts d_(v)\mathbf{d}_{v} with structural “neighbor” demonstrations d_(N_(v))\mathbf{d}_{\mathcal{N}_{v}} to guide predictions. Beyond small molecules, ESMFold [337] leverages an LLM (ESM-2) trained on protein sequences d_("protein ")\mathbf{d}_{\text {protein }} to predict 3D protein structures with atomic-level resolution. Likewise, GP-GPT [335] employs a Llama-based model LLM ( d_("gene ")\mathbf{d}_{\text {gene }} ) to map genomic sequences to phenotypes, converting genomic knowledge into textual prompts. For cross-modal retrieval scenarios, Text2Mol [341] embeds textual queries d_("query ")\mathbf{d}_{\text {query }} and chemical representations (e.g., fingerprints x_(fp)\mathbf{x}_{\mathrm{fp}} ) into a shared embedding space. Collectively, these sequence-based Transformers and LLMs illustrate the versatility of self-attention mechanisms for capturing chemical and biological patterns, even without explicit graph message passing. 大型语言模型。另一个方向是将化学或生物字符串视为大型语言模型的输入,从而完全绕过图编码。形式上,给定文本描述 d_(G)\mathbf{d}_{\mathcal{G}} ,一个例子[338]研究了单独的 LLM(表示为 LLM( d_(G)\mathbf{d}_{\mathcal{G}} )是否可以在零样本或少样本设置中处理分子特性预测。MolecularGPT [334]通过结构“邻居”演示丰富了 SMILES 提示, d_(v)\mathbf{d}_{v}d_(N_(v))\mathbf{d}_{\mathcal{N}_{v}} 以指导预测。除了小分子之外,ESMFold [337]还利用在蛋白质序列 d_("protein ")\mathbf{d}_{\text {protein }} 上训练的 LLM(ESM-2)以原子级分辨率预测 3D 蛋白质结构。同样,GP-GPT [335]采用基于 Llama 的模型 LLM( d_("gene ")\mathbf{d}_{\text {gene }} )将基因组序列映射到表型,将基因组知识转化为文本提示。对于跨模态检索场景,Text2Mol [341]将文本查询 d_("query ")\mathbf{d}_{\text {query }} 和化学表示(例如指纹 x_(fp)\mathbf{x}_{\mathrm{fp}} )嵌入到共享嵌入空间中。总的来说,这些基于序列的 Transformer 和 LLM 说明了自注意力机制在捕获化学和生物模式方面的多功能性,即使没有显式图消息传递。
GNNs as Auxiliary Models. Some works pass GNN-derived embeddings Z=GNN(X,A)\mathbf{Z}=\mathrm{GNN}(\mathbf{X}, \mathbf{A}) to a Q-Former or similar projection module, producing discrete “molecule tokens” z_("token ")\mathbf{z}_{\text {token }} processed jointly with textual data d_(G)\mathbf{d}_{\mathcal{G}}. MolCA [347] and UniMoT [31] illustrate this principle by injecting graph-encoded structural information into frozen LLMs for tasks such as molecule-to-text generation, retrieval, or captioning. GIT-Mol [344] expands this concept to a three-way multimodal setting ( Z,d_(G),x_(img)\mathbf{Z}, \mathbf{d}_{\mathcal{G}}, \mathbf{x}_{\mathrm{img}} ) involving graphs, images, and text. Similarly, GIMLET [293] and the Molecular Multimodal Foundation Model [346] incorporate GNN-derived features Z\mathbf{Z} into instruction-based or attention-based frameworks alongside textual prompts. InstructMol [339] leverages molecule-text contrastive pre-training to align graph encoders GNN(•) with language models, while a multi-modal structure-text approach [342] employs contrastive objectives for text-based molecule retrieval and editing. GNN 作为辅助模型。一些作品将 Z=GNN(X,A)\mathbf{Z}=\mathrm{GNN}(\mathbf{X}, \mathbf{A}) GNN 衍生的嵌入传递给 Q-Former 或类似的投影模块,产生与文本数据 d_(G)\mathbf{d}_{\mathcal{G}} 联合处理的离散“分子标记” z_("token ")\mathbf{z}_{\text {token }} 。MolCA [347]和 UniMoT [31]通过将图形编码的结构信息注入冻结的 LLM 中来执行分子到文本的生成、检索或标题等任务,说明了这一原理。GIT-Mol [344]将这一概念扩展为涉及图形、图像和文本的三向多模态设置( Z,d_(G),x_(img)\mathbf{Z}, \mathbf{d}_{\mathcal{G}}, \mathbf{x}_{\mathrm{img}} )。同样,GIMLET[293]和分子多模态基础模型[346]将 GNN 衍生的特征 Z\mathbf{Z} 与文本提示一起整合到基于指令或基于注意力的框架中。InstructMol [339]利用分子-文本对比预训练使图编码器 GNN(•)与语言模型保持一致,而多模态结构-文本方法[342]则采用对比目标进行基于文本的分子检索和编辑。
7.2.3 Hybrid Approaches 7.2.3 混合方法
Although graph models excel at capturing local connectivity and structural nuances, and language models capture global or semantic nuances (especially when text is involved), neither alone may suffice for complex, real-world applications. Hybrid models integrate structural inductive biases from graph models, denoted as GNN(X,A), and language-based reasoning from Transformers or LLMs, represented as LLM( d_(G)\mathbf{d}_{\mathcal{G}} ), enabling more comprehensive representations of molecules for diverse tasks. 尽管图模型擅长捕捉局部连通性和结构细微差别,而语言模型则捕捉全局或语义细微差别(尤其是涉及文本时),但仅靠两者都不足以满足复杂的实际应用。混合模型集成了来自图模型的结构归纳偏差(表示为 GNN(X,A))和来自 Transformer 或 LLM 的基于语言的推理(表示为 LLM( d_(G)\mathbf{d}_{\mathcal{G}} ),能够更全面地表示不同任务的分子。
Task Specialization. Certain hybrid models address broader or more specialized tasks explicitly. ReLM [348] formulates reaction prediction as first generating candidate products using GNN outputs Z_("candidates ")\mathbf{Z}_{\text {candidates }}, which are subsequently ranked by an LLM conditioned on reaction conditions d_("reaction ")\mathbf{d}_{\text {reaction }}. CLAMP [343] aligns molecular encoders Z_("mol ")\mathbf{Z}_{\text {mol }} with text-based bioassay descriptions d_("bioassay ")\mathbf{d}_{\text {bioassay }} for zero-shot discovery tasks, and MolGPS [331] merges message-passing neural networks (MPNN) with Transformer modules to enhance scalability on large supervised datasets. DPA-2 [328] integrates symmetry-preserving GNN layers and Transformer-based attention for multi-task molecular simulations, whereas MolFM [345] unifies graph embeddings, textual Transformer embeddings, and knowledge-graph features Z_("KG ")\mathbf{Z}_{\text {KG }} for cross-modal retrieval. Additional specialized hybrid designs include GALLON [294], which distills representations from both GNN and LLM into a single multilayer perceptron (MLP) hat(y)=MLP(z_(GNN)||z_(LLM))\hat{y}=\operatorname{MLP}\left(\mathbf{z}_{\mathrm{GNN}} \| \mathbf{z}_{\mathrm{LLM}}\right) for molecular property prediction, and Hybrid-LLMGNN [340], which integrates crystallographic embeddings Z_("crystal ")\mathbf{Z}_{\text {crystal }} with language-based features for materials property inference. Together, these hybrid approaches demonstrate the substantial advantages gained from combining graph-level inductive biases with language-based models, resulting in robust performance in tasks such as molecular property prediction, reaction modeling, and text-guided molecular manipulation. 任务专业化。某些混合模型明确地解决更广泛或更专业的任务。ReLM [348]将反应预测表述为首先使用 GNN 输出生成候选产物 Z_("candidates ")\mathbf{Z}_{\text {candidates }} ,随后通过以反应条件 d_("reaction ")\mathbf{d}_{\text {reaction }} 为条件的 LLM 进行排名。CLAMP [343]将分子编码器 Z_("mol ")\mathbf{Z}_{\text {mol }} 与基于文本的生物测定描述 d_("bioassay ")\mathbf{d}_{\text {bioassay }} 对齐,用于零样本发现任务,MolGPS [331]将消息传递神经网络(MPNN)与 Transformer 模块相结合,以增强大型监督数据集的可扩展性。DPA-2 [328]集成了保持对称的 GNN 层和基于 Transformer 的注意力,用于多任务分子模拟,而 MolFM [345]统一了图嵌入、文本 Transformer 嵌入和知识图特征 Z_("KG ")\mathbf{Z}_{\text {KG }} ,用于跨模态检索。其他专门的混合设计包括 GALLON [294],它将 GNN 和 LLM 的表示提炼成一个多层感知器(MLP) hat(y)=MLP(z_(GNN)||z_(LLM))\hat{y}=\operatorname{MLP}\left(\mathbf{z}_{\mathrm{GNN}} \| \mathbf{z}_{\mathrm{LLM}}\right) 以进行分子性质预测,以及 Hybrid-LLMGNN [340],它将晶体学嵌入 Z_("crystal ")\mathbf{Z}_{\text {crystal }} 与基于语言的特征集成在一起,用于材料性质推断。这些混合方法共同展示了将图级归纳偏差与基于语言的模型相结合所获得的巨大优势,从而在分子性质预测、反应建模和文本引导的分子作等任务中具有强大的性能。
Table 10: Summary of domain-specific GFMs on computational graphs. 表 10:计算图上特定领域 GFM 的总结。
Method Name 方法名称
Domain 域
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
Triplet-GMPNN [354] 三联体-GMPNN [354]
Computational Graph 计算图
GNN
Supervised 监督
Finetune 微调
Data - Node Property 数据 - 节点属性
Loss - Multi-task 损失 - 多任务处理
N/A 不适用
-
GraphForge [355] 图形锻造 [355]
Computational Graph 计算图
LLM
Generative 生成
Finetune 微调
Data - Node Property 数据 - 节点属性
Model - Retriever 型号 - Retriever
Explicit - QA 显式 - QA
Link 链接
HLM-G [356]
Computational Graph 计算图
LLM
Generative 生成
Finetune, In-context 微调,上下文
Data - Node Property 数据 - 节点属性
Data - Augment 数据 - 增强
N/A 不适用
-
PathCompare [357] 路径比较 [357]
Computational Graph 计算图
LLM
Generative 生成
In-context 上下文
Data - Node Property 数据 - 节点属性
N/A 不适用
Explicit - QA 显式 - QA
Link 链接
Hyper-BAG and Hyper-COT [358] Hyper-BAG 和 Hyper-COT [358]
Future directions for Graph Foundation Models (GFMs) in molecular graphs center around several promising research trends. Firstly, developing equivariant GNN architectures that explicitly encode symmetries such as rotation ( RR ) and translation ( TT ) is crucial for accurately modeling molecular properties. Leveraging large-scale self-supervised pre-training on massive unlabeled molecular datasets could significantly enhance generalization and predictive robustness, particularly when handling novel molecular scaffolds. Physicsinformed neural networks that integrate quantum-chemical principles and physical laws represent another crucial avenue, enabling more faithful modeling of chemical interactions. Finally, multi-modal retrievalaugmented strategies that combine molecular graph data with textual and structural contexts are essential for improving interpretability and reliability in high-stakes applications, such as drug discovery and materials design. 分子图中图基础模型 (GFM) 的未来方向围绕着几个有前途的研究趋势。首先,开发显式编码旋转 ( RR ) 和平移 ( TT ) 等对称性的等变 GNN 架构对于准确建模分子特性至关重要。利用对海量未标记分子数据集进行大规模自监督预训练可以显着增强泛化和预测鲁棒性,特别是在处理新型分子支架时。集成量子化学原理和物理定律的物理信息神经网络是另一个重要途径,可以更忠实地模拟化学相互作用。最后,将分子图数据与文本和结构上下文相结合的多模态检索策略对于提高药物发现和材料设计等高风险应用的可解释性和可靠性至关重要。
7.3 Algorithmic Graphs 7.3 算法图
Graphs are fundamental data structures for representing algorithmic processes, combinatorial structures, and mathematical relationships. In this context, we use the term algorithmic graphs to refer to graph-structured inputs that encode procedural tasks-such as shortest path finding, satisfiability checking, sorting, and symbolic mathematics-that require reasoning over discrete structures. 图是表示算法过程、组合结构和数学关系的基本数据结构。在这种情况下,我们使用术语“算法图”来指代对过程任务进行编码的图结构输入,例如最短路径查找、可满足性检查、排序和符号数学,这些任务需要对离散结构进行推理。
Unlike conventional graph learning tasks focused on semantic inference (e.g., classification or recommendation), algorithmic graph problems demand models that can emulate or generalize classical algorithmic behaviors. In this section, we review GFMs designed for algorithmic reasoning, highlighting their ability to capture structural invariants, support multi-step reasoning, and generalize across instances of varying size and complexity. We further summarize representative benchmarks, learning paradigms, and model architectures that drive progress in this emerging direction. 与专注于语义推理(例如分类或推荐)的传统图学习任务不同,算法图问题需要能够模拟或推广经典算法行为的模型。在本节中,我们将回顾专为算法推理而设计的 GFM,重点介绍它们捕获结构不变量、支持多步骤推理以及跨不同大小和复杂性的实例泛化的能力。我们进一步总结了推动这一新兴方向进步的代表性基准、学习范式和模型架构。
7.3.1 Structured Graph Reasoning 7.3.1 结构化图推理
Task-Oriented Approaches. Early works in graph math reasoning focused on building generalist neural solvers capable of processing multiple algorithmic tasks within a unified framework. Triplet-GMPNN [354] introduced a GNN-based GFM designed to solve problems like shortest paths, sorting, and dynamic programming, formalized as learning a mapping Phi:(G,t)|->y\Phi:(\mathcal{G}, t) \mapsto y, where tt denotes the task type and yy the target solution. This multi-task formulation highlights the importance of task-conditioned graph message passing over adjacency matrix A and features X. GCoder [366] frames graph reasoning as structured program generation, learning to synthesize code c=Psi(G)c=\Psi(\mathcal{G}) directly from input graphs. By incorporating reinforcement learning with compiler feedback, GCoder improves program correctness and execution efficiency. This highlights how graph math GFMs can bridge declarative graph representations with executable procedural knowledge. GraphPatternBench [369] evaluates pattern recognition tasks over graph G\mathcal{G}, where models predict structural motifs (cliques, cycles) by learning a binary classification function f:V|->{0,1}f: \mathcal{V} \mapsto\{0,1\}. 以任务为导向的方法。图数学推理的早期工作侧重于构建能够在统一框架内处理多个算法任务的通才神经求解器。Triplet-GMPNN [354]引入了一种基于 GNN 的 GFM,旨在解决最短路径、排序和动态规划等问题,形式化为学习映射 Phi:(G,t)|->y\Phi:(\mathcal{G}, t) \mapsto y ,其中 tt 表示任务类型和 yy 目标解决方案。这种多任务表述强调了任务条件图消息传递邻接矩阵 A 的重要性,并具有 X。GCoder [366]将图推理框架为结构化程序生成,学习直接从输入图合成代码 c=Psi(G)c=\Psi(\mathcal{G}) 。通过将强化学习与编译器反馈相结合,GCoder 提高了程序的正确性和执行效率。这突出了图数学 GFM 如何将声明性图表示与可执行的过程知识联系起来。GraphPatternBench [369] 通过图评估模式识别任务 G\mathcal{G} ,其中模型通过学习二元分类函数 f:V|->{0,1}f: \mathcal{V} \mapsto\{0,1\} 来预测结构基序(派系、周期)。
Instruction-Tuned Approaches. Instruction-tuned reasoning is introduced by GraphInstruct [365], where graph tasks are specified via instruction II, and the GFM infers y=Theta(G,I)y=\Theta(\mathcal{G}, I). GraphWiz [32] enhances this process by requiring the GFM to generate interpretable step-by-step reasoning traces [y_(1),y_(2),dots,y_(T)]\left[y_{1}, y_{2}, \ldots, y_{T}\right], aligning intermediate solutions to the graph’s evolving state. MAGMA [370] further evaluates classical algorithms (e.g. BFS, DFS, Dijkstra) using similar intermediate reasoning traces across {G_(t)}\left\{\mathcal{G}_{t}\right\}. Structured graph representation is tackled by GraphLLM [367], which integrates learned graph embeddings z_(i)\mathbf{z}_{i} into pre-trained language models via graph-to-text serialization. GraphToken [368] instead learns structural embeddings directly z_(i)=phi(v_(i),N(v_(i)),A)\mathbf{z}_{i}=\phi\left(v_{i}, \mathcal{N}\left(v_{i}\right), \mathbf{A}\right), enabling the GFM to leverage topological context natively during reasoning. 指令调整方法。GraphInstruct [365]引入了指令调优推理,其中图任务通过指令指定 II ,GFM 推 y=Theta(G,I)y=\Theta(\mathcal{G}, I) 断。GraphWiz [32]通过要求 GFM 生成可解释的分步推理轨迹 [y_(1),y_(2),dots,y_(T)]\left[y_{1}, y_{2}, \ldots, y_{T}\right] ,使中间解与图的演变状态保持一致,从而增强了这一过程。MAGMA [370]进一步评估了经典算法(例如 BFS、DFS、Dijkstra),使用类似的中间推理轨迹 {G_(t)}\left\{\mathcal{G}_{t}\right\} 。GraphLLM [367]解决了结构化图表示问题,它通过图到文本序列化将学习到的图嵌入集成 z_(i)\mathbf{z}_{i} 到预训练的语言模型中。相反,GraphToken [368]直接学习结构嵌入 z_(i)=phi(v_(i),N(v_(i)),A)\mathbf{z}_{i}=\phi\left(v_{i}, \mathcal{N}\left(v_{i}\right), \mathbf{A}\right) ,使 GFM 能够在推理过程中本地利用拓扑上下文。
7.3.2 Benchmarking and Multi-Agent Collaboration 7.3.2 基准测试和多代理协作
Comprehensive benchmarking highlights strengths and limitations of current graph math GFMs. GraphArena [371] evaluates both polynomial-time and NP-complete tasks, tracking correctness and hallucination: ℓ=Eval(y_("pred "),y_("true "))\ell=\operatorname{Eval}\left(y_{\text {pred }}, y_{\text {true }}\right). Additionally, NLGraph [372] expands this to natural language queries over graphs, requiring models to parse task text into formal graph queries q=xi(d)q=\xi(d) before solving them. GPT4Graph [33] focuses on semantic and structural reasoning in graphs, including centrality estimation and graph classification. This benchmark highlights the gap between pre-trained LLMs and structured graph-aware GFMs. Multi-agent collaboration offers another promising direction for complex graph reasoning. GraphTeam [362] employs multiple specialized LLM agents, each responsible for parsing, retrieval, coding, or reasoning. Each agent computes z_(i)^((k))=phi_(k)(G)\mathbf{z}_{i}^{(k)}=\phi_{k}(\mathcal{G}) with intermediate results shared across agents for iterative refinement. GraphTool-Instruction [355] extends this by explicitly incorporating external tool calls, meaning that: y=Tool(q)y=\operatorname{Tool}(q) where qq is the subtask-specific query extracted by the model. GraphAgent-Reasoner [361] applies a similar agent decomposition but further breaks graph problems into node-centric subtasks, with individual agents solving y_(i)=phi(N(v_(i)),x_(i))y_{i}=\phi\left(\mathcal{N}\left(v_{i}\right), \mathbf{x}_{i}\right), with final aggregation yielding global predictions. 全面的基准测试突出了当前图数学 GFM 的优势和局限性。GraphArena [371]评估多项式时间和 NP 完成任务,跟踪正确性和幻觉: ℓ=Eval(y_("pred "),y_("true "))\ell=\operatorname{Eval}\left(y_{\text {pred }}, y_{\text {true }}\right) 。此外,NLGraph [372]将其扩展到对图的自然语言查询,要求模型在求解 q=xi(d)q=\xi(d) 任务文本之前将任务文本解析为正式的图查询。GPT4Graph [33]专注于图中的语义和结构推理,包括中心性估计和图分类。该基准测试凸显了预训练的 LLM 和结构化图感知 GFM 之间的差距。多智能体协作为复杂图推理提供了另一个有前途的方向。GraphTeam [362]采用了多个专门的 LLM 代理,每个代理负责解析、检索、编码或推理。每个代理都 z_(i)^((k))=phi_(k)(G)\mathbf{z}_{i}^{(k)}=\phi_{k}(\mathcal{G}) 使用跨代理共享的中间结果进行计算,以进行迭代细化。GraphTool-Instruction [355]通过显式合并外部工具调用来扩展这一点,这意味着: y=Tool(q)y=\operatorname{Tool}(q) 其中 qq 是模型提取的特定于子任务的查询。GraphAgent-Reasoner [361]应用了类似的代理分解,但进一步将图问题分解为以节点为中心的子任务,由单个代理求解 y_(i)=phi(N(v_(i)),x_(i))y_{i}=\phi\left(\mathcal{N}\left(v_{i}\right), \mathbf{x}_{i}\right) ,最终聚合产生全局预测。
7.3.3 Encoding Strategies and Algorithmic Refinement 7.3.3 编码策略和算法优化
Structure-Aware Encoding Approaches. Encoding graph structures effectively for LLM reasoning is a foundational challenge. Structured JSON encoding [373] serializes each node’s neighborhood into hierarchical JSON trees, preserving adjacency information N(v_(i))\mathcal{N}\left(v_{i}\right) directly within the prompt. Graph Linearization [359] encodes G\mathcal{G} into a linear sequence of tokens ordered by centrality ranking: pi=Order(V;Centrality)\pi=\operatorname{Order}(\mathcal{V} ; \operatorname{Centrality}). This preserves relational salience during sequence processing. PIE [364] structures graph reasoning into staged reasoning, decomposing into: 结构感知编码方法。有效地对图结构进行 LLM 推理是一项基本挑战。结构化 JSON 编码[373]将每个节点的邻域序列化为分层 JSON 树,直接在提示中保留邻接信息 N(v_(i))\mathcal{N}\left(v_{i}\right) 。图线性化[359]编码 G\mathcal{G} 为按中心性排名排序的标记线性序列: pi=Order(V;Centrality)\pi=\operatorname{Order}(\mathcal{V} ; \operatorname{Centrality}) 。这在序列处理过程中保留了关系显着性。PIE [364] 将图推理结构化为分阶段推理,分解为:
where each component produces structured pseudocode to guide the next stage. LogDepth Transformer Hierarchy [374] theoretically demonstrates that logarithmic-depth transformers (depth log N\log N ) efficiently capture long-range graph dependencies, outperforming standard transformers and GNNs on certain reasoning problems. This provides architectural guidance for future GFMs. 其中每个组件生成结构化伪代码来指导下一阶段。LogDepth Transformer Hierarchy [374]从理论上证明,对数深度转换器(深度 log N\log N )可以有效地捕获长距离图依赖关系,在某些推理问题上优于标准转换器和 GNN。这为未来的 GFM 提供了架构指导。
Extended Reasoning Approaches. LLM4Hypergraph [358] extends graph reasoning to hypergraphs H=\mathcal{H}= ( V,E\mathcal{V}, \mathcal{E} ), where hyperedges connect arbitrary subsets of nodes. The model applies specialized prompting strategies (Hyper-BAG, Hyper-COT) to encode these higher-order relations into: 扩展推理方法。LLM4Hypergraph [358] 将图推理扩展到超图 H=\mathcal{H}= ( V,E\mathcal{V}, \mathcal{E} ),其中超边连接节点的任意子集。该模型应用专门的提示策略(Hyper-BAG、Hyper-COT)将这些高阶关系编码为:
NLGIFT [375] measures out-of-distribution generalization by introducing diverse structural shifts (varying degree distributions, node attributes, and edge sparsity), requiring GFMs to adapt dynamically. Traversal remains central to graph math reasoning. PathCompare [357] enhances traversal reasoning by prompting models to compare candidate paths c=Compare(p_(1),p_(2))c=\operatorname{Compare}\left(p_{1}, p_{2}\right), where paths are sequences of edges (v_(i),v_(j))inE\left(v_{i}, v_{j}\right) \in \mathcal{E}. TREETOP [376] adapts these techniques to conversation trees, where nodes represent conversational turns and edges encode reply relations. NLGIFT [375]通过引入不同的结构变化(不同程度的分布、节点属性和边缘稀疏性)来衡量分布外泛化,要求 GFM 动态适应。遍历仍然是图数学推理的核心。PathCompare [357]通过提示模型比较候选路径 c=Compare(p_(1),p_(2))c=\operatorname{Compare}\left(p_{1}, p_{2}\right) 来增强遍历推理,其中路径是边序列 (v_(i),v_(j))inE\left(v_{i}, v_{j}\right) \in \mathcal{E} 。TREETOP [376]将这些技术应用于对话树,其中节点表示对话转折,边编码回复关系。
7.3.4 Future Directions 7.3.4 未来方向
Future graph math GFMs will integrate hybrid retrieval-augmented reasoning, dynamically fetching relevant subgraphs during inference to improve local-global context alignment. Inspired by Thought Propagation [360], 未来的图数学 GFM 将集成混合检索增强推理,在推理过程中动态获取相关子图,以改善局部-全局上下文对齐。受到思想传播的启发[360],
Table 11: Summary of domain-specific GFMs on document graphs. 表 11:文档图上特定领域的 GFM 摘要。
Method Name 方法名称
Domain 域
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
METAG [377] 梅塔格 [377]
Document Graph 文档图
LLM
Hybrid 混合
In-context, Test-time Adaptation 上下文、测试时适应
Data - Text Attribute 数据 - 文本属性
Model - Retriever 型号 - Retriever
Explicit - QA 显式 - QA
Link 链接
TAPE [89] 胶带 [89]
Document Graph 文档图
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune, In-context 微调,上下文
Data - Text Attribute 数据 - 文本属性
N/A 不适用
N/A 不适用
Link 链接
LLM4GraphTopology [378] LLM4 图拓扑 [378]
Document Graph 文档图
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune, In-context 微调,上下文
Data - Text Attribute 数据 - 文本属性
Data - Augment 日期 - 增加
N/A 不适用
Link 链接
G-Prompt [379] G-提示 [379]
Document Graph 文档图
GNN + LLM GNN + 法学硕士
Supervised 监督
In-context 上下文
Data - Text Attribute 数据 - 文本属性
Model - Retriever 型号 - Retriever
N/A 不适用
-
ConGraT [105] 恭喜 [105]
Document Graph 文档图
GNN + LLM GNN + 法学硕士
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
GLEM [106] 忘记 [106]
Document Graph 文档图
GNN + LLM GNN + 法学硕士
Hybrid 混合
Distillation, Finetune 蒸馏、微调
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
PATTON [380] 巴顿 [380]
Document Graph 文档图
GNN + LLM GNN + 法学硕士
Hybrid 混合
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Pretrain 损失 - 预训练
N/A 不适用
Link 链接
Method Name Domain Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
METAG [377] Document Graph LLM Hybrid In-context, Test-time Adaptation Data - Text Attribute Model - Retriever Explicit - QA Link
TAPE [89] Document Graph GNN + LLM Supervised Finetune, In-context Data - Text Attribute N/A N/A Link
LLM4GraphTopology [378] Document Graph GNN + LLM Supervised Finetune, In-context Data - Text Attribute Data - Augment N/A Link
G-Prompt [379] Document Graph GNN + LLM Supervised In-context Data - Text Attribute Model - Retriever N/A -
ConGraT [105] Document Graph GNN + LLM Contrastive Finetune Data - Text Attribute Loss - Pretrain N/A Link
GLEM [106] Document Graph GNN + LLM Hybrid Distillation, Finetune Data - Text Attribute Loss - Pretrain N/A Link
PATTON [380] Document Graph GNN + LLM Hybrid Finetune Data - Text Attribute Loss - Pretrain N/A Link| Method Name | Domain | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| METAG [377] | Document Graph | LLM | Hybrid | In-context, Test-time Adaptation | Data - Text Attribute | Model - Retriever | Explicit - QA | Link |
| TAPE [89] | Document Graph | GNN + LLM | Supervised | Finetune, In-context | Data - Text Attribute | N/A | N/A | Link |
| LLM4GraphTopology [378] | Document Graph | GNN + LLM | Supervised | Finetune, In-context | Data - Text Attribute | Data - Augment | N/A | Link |
| G-Prompt [379] | Document Graph | GNN + LLM | Supervised | In-context | Data - Text Attribute | Model - Retriever | N/A | - |
| ConGraT [105] | Document Graph | GNN + LLM | Contrastive | Finetune | Data - Text Attribute | Loss - Pretrain | N/A | Link |
| GLEM [106] | Document Graph | GNN + LLM | Hybrid | Distillation, Finetune | Data - Text Attribute | Loss - Pretrain | N/A | Link |
| PATTON [380] | Document Graph | GNN + LLM | Hybrid | Finetune | Data - Text Attribute | Loss - Pretrain | N/A | Link |
future models may recursively decompose graph problems into subproblems y=sum_(i)theta_(i)(y_(i))y=\sum_{i} \theta_{i}\left(y_{i}\right), where y_(i)y_{i} are solutions to extracted subgraphs. Combining structured retrieval, multi-agent collaboration, and external tool use will enhance adaptability and robustness across real-world tasks. Benchmarks like GraphPatternBench [369], NLGraph [372], and GraphArena [371] underscore the need for realistic datasets spanning scientific, social, and biological domains. Tool-augmented reasoning frameworks (GraphTool-Instruction [355], PIE [364]) point toward hybrid GFMs that blend learned reasoning with structured algorithmic calls, laying the foundation for the next generation of graph math reasoning models. 未来的模型可能会递归地将图问题分解为子问题 y=sum_(i)theta_(i)(y_(i))y=\sum_{i} \theta_{i}\left(y_{i}\right) ,其中 y_(i)y_{i} 是提取的子图的解。结合结构化检索、多代理协作和外部工具使用将增强现实世界任务的适应性和稳健性。GraphPatternBench [369]、NLGraph [372] 和 GraphArena [371] 等基准测试强调了对跨科学、社会和生物领域的真实数据集的需求。工具增强推理框架(GraphTool-Instruction [355]、PIE [364])指向混合 GFM,将学习推理与结构化算法调用相结合,为下一代图数学推理模型奠定了基础。
7.4 Document Network 7.4 文档网络
Semantic document graphs represent networks where nodes v_(i)inVv_{i} \in \mathcal{V} are textual entities (e.g., documents or records) and edges e_(ij)inEe_{i j} \in \mathcal{E} capture semantic links such as citations or topical similarity. Each node and edge may carry features x_(i),e_(ij)inR^(D)\mathbf{x}_{i}, \mathbf{e}_{i j} \in \mathbb{R}^{D}, forming attribute matrices XinR^(N xx D)\mathbf{X} \in \mathbb{R}^{N \times D} and EinR^(M xx D)\mathbf{E} \in \mathbb{R}^{M \times D}, with structure defined by adjacency Ain{0,1}^(N xx N)\mathbf{A} \in\{0,1\}^{N \times N}. Semantic document GFMs aim to jointly model textual semantics and graph structure, but face key challenges in content-structure alignment and cross-domain scalability. While LLMs encode rich text features z_(i)^("text ")\mathbf{z}_{i}^{\text {text }} and GNNs propagate structural signals z_(i)^("graph ")\mathbf{z}_{i}^{\text {graph }}, aligning these representations is difficult in heterogeneous document graphs with diverse relations and evolving content. 语义文档图表示节点 v_(i)inVv_{i} \in \mathcal{V} 是文本实体(例如文档或记录),边缘 e_(ij)inEe_{i j} \in \mathcal{E} 捕获语义链接(例如引文或主题相似性)的网络。每个节点和边都可以携带特征 x_(i),e_(ij)inR^(D)\mathbf{x}_{i}, \mathbf{e}_{i j} \in \mathbb{R}^{D} ,形成属性矩阵 XinR^(N xx D)\mathbf{X} \in \mathbb{R}^{N \times D} , EinR^(M xx D)\mathbf{E} \in \mathbb{R}^{M \times D} 并具有由邻接关系 Ain{0,1}^(N xx N)\mathbf{A} \in\{0,1\}^{N \times N} 定义的结构。语义文档 GFM 旨在联合对文本语义和图结构进行建模,但在内容-结构对齐和跨域可扩展性方面面临关键挑战。虽然 LLM 编码富文本特征 z_(i)^("text ")\mathbf{z}_{i}^{\text {text }} ,GNN 传播结构信号 z_(i)^("graph ")\mathbf{z}_{i}^{\text {graph }} ,但在具有不同关系和不断发展的内容的异构文档图中,对齐这些表示很困难。
Prompt-based Integration Approaches. To enhance multi-relation representation learning, METAG [377] proposes a multiplex embedding framework where a language model encoder generates node embeddings z_(i)\mathbf{z}_{i} by dynamically injecting relation-specific prior tokens from T\mathcal{T}. This results in relation-aware embeddings that adapt to different edge types in E, preserving parameter efficiency while enriching structural semantics. G-Prompt [379] extends task-specific adaptation by introducing a graph adapter-enhanced prompting strategy for TAGs. Given a graph G\mathcal{G} with adjacency A\mathbf{A}, it injects task-aware prompt embeddings into LLM(*)\operatorname{LLM}(\cdot), where local neighborhood features from N_(v)\mathcal{N}_{v} are fused into the prompt. This hybrid design retains both global semantics and local structure, improving few-shot and zero-shot node representation learning. 基于提示的集成方法。为了增强多关系表示学习,METAG [377]提出了一种多重嵌入框架,其中语言模型编码器 z_(i)\mathbf{z}_{i} 通过动态注入来自 T\mathcal{T} 的特定于关系的先验标记来生成节点嵌入。这导致了适应 E 中不同边缘类型的关系感知嵌入,在丰富结构语义的同时保持参数效率。G-Prompt [379]通过引入图适配器增强的 TAG 提示策略来扩展特定任务的适应。给定一个 G\mathcal{G} 具有邻接关系 A\mathbf{A} 的图,它将任务感知提示嵌入注入 LLM(*)\operatorname{LLM}(\cdot) ,其中局部 N_(v)\mathcal{N}_{v} 邻域特征融合到提示中。这种混合设计保留了全局语义和局部结构,改进了少量和零样本节点表示学习。
Multi-Objective Optimization Approaches. Beyond pre-training, PATTON [380] jointly optimizes two objectives: network-contextualized masked language modeling (MLM) and masked node prediction. The first reconstructs masked tokens in x_(i)\mathbf{x}_{i} using neighborhood context N_(v)\mathcal{N}_{v}, while the second predicts node identities based on textual embeddings. This dual-task setup aligns local semantic recovery with global graph reasoning, producing node embeddings z_(i)\mathbf{z}_{i} that capture both token-level and topological information. ConGraT [105] adopts a dual-encoder design, using a GNN and a pre-trained language model to generate z_(i)^("graph ")\mathbf{z}_{i}^{\text {graph }} and z_(i)^("text ")\mathbf{z}_{i}^{\text {text }}, respectively. A cross-modal contrastive loss aligns these views by maximizing mutual information, enhancing cross-domain generalization in semantic document GFMs. 多目标优化方法。除了预训练之外,PATTON [380]还共同优化了两个目标:网络上下文化掩码语言建模(MLM)和掩码节点预测。第一个使用邻域上下文重建 x_(i)\mathbf{x}_{i} 掩码标记 N_(v)\mathcal{N}_{v} ,而第二个则根据文本嵌入预测节点身份。这种双任务设置将本地语义恢复与全局图推理结合起来,生成捕获令牌级和拓扑信息 z_(i)\mathbf{z}_{i} 的节点嵌入。ConGraT [105]采用双编码器设计,分别使用 GNN 和预训练语言模型生成 z_(i)^("graph ")\mathbf{z}_{i}^{\text {graph }} 和 z_(i)^("text ")\mathbf{z}_{i}^{\text {text }} 。跨模态对比损失通过最大化互信息来调整这些视图,增强语义文档 GFM 中的跨域泛化。
Hybrid Training Approaches. LLM4GraphTopology [378] refines graph topology by prompting LLMs to assess semantic similarity between node pairs ( v_(i),v_(j)v_{i}, v_{j} ), adjusting edges in A\mathbf{A} based on similarity thresholds. This process is reinforced by pseudo-label propagation, where LLM-generated labels diffuse through the updated graph to enhance classification and clustering. To support scalable learning, GLEM [106] introduces a variational EM framework that alternates between GNN-based structural reasoning (E-step) and LLM-based semantic encoding (M-step). Node embeddings z_(i)\mathbf{z}_{i} are iteratively updated by propagating through GNN(A, X) and conditioning language representations on aggregated graph signals. TAPE [89] improves interpretability by combining LLM-generated explanations d_(v_(i))d_{v_{i}} with node features x_(i)\mathbf{x}_{i}, forming enriched inputs [x_(i)||d_(v_(i))]\left[\mathbf{x}_{i} \| d_{v_{i}}\right] for GNNs, leading to more accurate and interpretable node representations. 混合训练方法。LLM4GraphTopology [378]通过提示 LLM 评估节点对之间的语义相似性 v_(i),v_(j)v_{i}, v_{j} ( ),根据相似性阈值调整边缘 A\mathbf{A} 来完善图拓扑。伪标签传播加强了这一过程,其中 LLM 生成的标签在更新后的图中扩散以增强分类和聚类。为了支持可扩展的学习,GLEM [106]引入了一个变分 EM 框架,该框架在基于 GNN 的结构推理(E-step)和基于 LLM 的语义编码(M-step)之间交替。节点嵌入 z_(i)\mathbf{z}_{i} 通过通过 GNN(A, X) 传播和在聚合图信号上调节语言表示来迭代更新。TAPE [89]通过将 LLM 生成的 d_(v_(i))d_{v_{i}} 解释与节点特征 x_(i)\mathbf{x}_{i} 相结合,为 GNN [x_(i)||d_(v_(i))]\left[\mathbf{x}_{i} \| d_{v_{i}}\right] 形成丰富的输入,从而产生更准确和可解释的节点表示,从而提高了可解释性。
Table 12: Summary of domain-specific GFMs on heterogeneous graphs. 表 12:异构图上结构域特异性 GFM 的摘要。
Method Name 方法名称
Domain 域
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
Heterformer [381] 异性恋的形式[381]
Heterogeneous Graph 异构图
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
N/A 不适用
Link 链接
SELAR [382] 塞拉尔 [382]
Heterogeneous Graph 异构图
GNN
Supervised 监督
Finetune 微调
N/A 不适用
Loss - Auxiliary 损失 - 辅助
N/A 不适用
Link 链接
PT-HGNN [383]
Heterogeneous Graph 异构图
GNN
Contrastive 对比
Finetune 微调
N/A 不适用
Data - Augment, Loss Pretrain 数据 - 增强、损失预训练
Explicit - Subgraph 显式 - 子图
-
HetGPT [384]
Heterogeneous Graph 异构图
GNN
Contrastive 对比
Graph Prompting 图形提示
N/A 不适用
Loss - Pretrain 损失 - 预训练
Explicit - Link 显式 - 链接
-
CrossHG-Meta [385]
Heterogeneous Graph 异构图
GNN
Hybrid 混合
Test-time Adaptation 测试时间适应
N/A 不适用
Data - Augment 日期 - 增加
N/A 不适用
-
HierPromptLM [386] 这里提示 LM [386]
Heterogeneous Graph 异构图
LLM
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Multi-task 损失 - 多任务处理
N/A 不适用
Link 链接
THLM [387]
Heterogeneous Graph 异构图
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Data - Augment, Loss Pretrain 数据 - 增强、损失预训练
N/A 不适用
Link 链接
GaLM [388]
Heterogeneous Graph 异构图
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Auxiliary 损失 - 辅助
N/A 不适用
-
GHGRL [389] 温室气体和气温 [389]
Heterogeneous Graph 异构图
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
N/A 不适用
Link 链接
HiGPT [390]
Heterogeneous Graph 异构图
GNN + LLM GNN + 法学硕士
Contrastive 对比
Finetune 微调
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
Link 链接
Method Name Domain Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
Heterformer [381] Heterogeneous Graph GNN Supervised Finetune Data - Text Attribute N/A N/A Link
SELAR [382] Heterogeneous Graph GNN Supervised Finetune N/A Loss - Auxiliary N/A Link
PT-HGNN [383] Heterogeneous Graph GNN Contrastive Finetune N/A Data - Augment, Loss Pretrain Explicit - Subgraph -
HetGPT [384] Heterogeneous Graph GNN Contrastive Graph Prompting N/A Loss - Pretrain Explicit - Link -
CrossHG-Meta [385] Heterogeneous Graph GNN Hybrid Test-time Adaptation N/A Data - Augment N/A -
HierPromptLM [386] Heterogeneous Graph LLM Supervised Finetune Data - Text Attribute Loss - Multi-task N/A Link
THLM [387] Heterogeneous Graph GNN + LLM Supervised Finetune Data - Text Attribute Data - Augment, Loss Pretrain N/A Link
GaLM [388] Heterogeneous Graph GNN + LLM Supervised Finetune Data - Text Attribute Loss - Auxiliary N/A -
GHGRL [389] Heterogeneous Graph GNN + LLM Supervised Finetune Data - Text Attribute N/A N/A Link
HiGPT [390] Heterogeneous Graph GNN + LLM Contrastive Finetune Data - Text Attribute N/A Explicit - QA Link| Method Name | Domain | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Heterformer [381] | Heterogeneous Graph | GNN | Supervised | Finetune | Data - Text Attribute | N/A | N/A | Link |
| SELAR [382] | Heterogeneous Graph | GNN | Supervised | Finetune | N/A | Loss - Auxiliary | N/A | Link |
| PT-HGNN [383] | Heterogeneous Graph | GNN | Contrastive | Finetune | N/A | Data - Augment, Loss Pretrain | Explicit - Subgraph | - |
| HetGPT [384] | Heterogeneous Graph | GNN | Contrastive | Graph Prompting | N/A | Loss - Pretrain | Explicit - Link | - |
| CrossHG-Meta [385] | Heterogeneous Graph | GNN | Hybrid | Test-time Adaptation | N/A | Data - Augment | N/A | - |
| HierPromptLM [386] | Heterogeneous Graph | LLM | Supervised | Finetune | Data - Text Attribute | Loss - Multi-task | N/A | Link |
| THLM [387] | Heterogeneous Graph | GNN + LLM | Supervised | Finetune | Data - Text Attribute | Data - Augment, Loss Pretrain | N/A | Link |
| GaLM [388] | Heterogeneous Graph | GNN + LLM | Supervised | Finetune | Data - Text Attribute | Loss - Auxiliary | N/A | - |
| GHGRL [389] | Heterogeneous Graph | GNN + LLM | Supervised | Finetune | Data - Text Attribute | N/A | N/A | Link |
| HiGPT [390] | Heterogeneous Graph | GNN + LLM | Contrastive | Finetune | Data - Text Attribute | N/A | Explicit - QA | Link |
Future Directions. The evolution of semantic document graph foundation models reflects a steady progression from static graph encoders to dynamically adaptive graph-text co-modeling pipelines. Future directions include hierarchical prompting that integrates section-level and document-level contexts into z_(i)\mathbf{z}_{i}, contrastive augmentation that jointly aligns node text, edge descriptions d_(e_(ij))d_{e_{i j}}, and global metadata d_(g)d_{g}, and multi-lingual pre-training that generalizes GFMs across multilingual scientific corpora. These innovations will further position semantic document GFMs as central tools for scholarly discovery, legal document analysis, and large-scale retrieval across knowledge graphs. 未来方向。语义文档图基础模型的演变反映了从静态图编码器到动态自适应图文本协同建模管道的稳步发展。未来的方向包括将部分级和文档级上下文集成到 z_(i)\mathbf{z}_{i} 的分层提示、联合对齐节点文本、边缘描述 d_(e_(ij))d_{e_{i j}} 和全局元数据 d_(g)d_{g} 的对比增强,以及跨多语言科学语料库泛化 GFM 的多语言预训练。这些创新将进一步将语义文档 GFM 定位为学术发现、法律文档分析和跨知识图谱大规模检索的核心工具。
7.5 Heterogeneous Graph 7.5 异构图
A Heterogeneous Graph (HG) [259] is denoted as G=(V,E,T,R)\mathcal{G}=(\mathcal{V}, \mathcal{E}, \mathcal{T}, \mathcal{R}), comprising sets of nodes V\mathcal{V} and edges E.T\mathcal{E} . \mathcal{T} and R\mathcal{R} represent the types of nodes and edges, respectively, with the condition that |T|+|R| > 2|\mathcal{T}|+|\mathcal{R}|>2. Additionally, tau(*)\tau(\cdot) and varphi(*;*)\varphi(\cdot ; \cdot) serve as mapping functions to identify the types of nodes and edges, where tau(i)inT\tau(i) \in \mathcal{T} for any i inVi \in \mathcal{V}, and varphi(i,j)inR\varphi(i, j) \in \mathcal{R} for any edge (i,j)inE(i, j) \in \mathcal{E}. The central task in heterogeneous graph representation learning is to derive node embeddings that accurately reflect both structural and semantic contexts. 异构图(HG)[259]表示为 G=(V,E,T,R)\mathcal{G}=(\mathcal{V}, \mathcal{E}, \mathcal{T}, \mathcal{R}) ,由节点 V\mathcal{V} 和边的集合组成 E.T\mathcal{E} . \mathcal{T} ,分别 R\mathcal{R} 表示节点和边的类型,条件为 |T|+|R| > 2|\mathcal{T}|+|\mathcal{R}|>2 。此外, tau(*)\tau(\cdot) 并 varphi(*;*)\varphi(\cdot ; \cdot) 用作映射函数来标识节点和边的类型,其中 tau(i)inT\tau(i) \in \mathcal{T} 对于任何 i inVi \in \mathcal{V} 和 varphi(i,j)inR\varphi(i, j) \in \mathcal{R} 任何边 (i,j)inE(i, j) \in \mathcal{E} 。异构图表示学习的核心任务是推导准确反映结构和语义上下文的节点嵌入。
Graph Model-based Approaches. Early research efforts in heterogeneous graph foundation models predominantly leveraged self-supervised learning methodologies. SELAR [382] introduced a meta-learning approach to systematically balance auxiliary tasks, enhancing primary task performance through optimal node representations. Extending this approach, CrossHG-Meta [385] addressed the critical challenge of fewshot learning, mitigating data scarcity and emphasizing robust generalization across heterogeneous contexts. Concurrently, PT-HGNN [383] applied contrastive learning at both node and semantic levels, significantly capturing nuanced structural information. Subsequently, HetGPT [384] advanced these methodologies through graph prompting strategies, adapting pre-trained graph neural networks (GNNs) to diverse downstream tasks and thereby marking a shift toward more adaptive and flexible modeling paradigms. 基于图模型的方法。异构图基础模型的早期研究工作主要利用自监督学习方法。SELAR [382]引入了一种元学习方法来系统地平衡辅助任务,通过最优节点表示来增强主要任务的性能。CrossHG-Meta [385]扩展了这种方法,解决了少量学习的关键挑战,减轻了数据稀缺性,并强调了跨异构上下文的稳健泛化。同时,PT-HGNN [383]在节点和语义水平上应用了对比学习,显着捕获了细微的结构信息。随后,HetGPT [384]通过图提示策略推进了这些方法,使预训练的图神经网络(GNN)适应不同的下游任务,从而标志着向更具适应性和灵活性的建模范式的转变。
Language Model-based Approaches. Inspired by recent advances in natural language processing, particularly the transformative successes of large language models (LLMs), researchers began integrating language encoders into heterogeneous graph modeling to address heterogeneity through textual embedding strategies. GaLM [388] modified masked language modeling by incorporating structural signals from adjacency matrices Ain\mathbf{A} \in{0,1}^(N xx N)\{0,1\}^{N \times N} along with textual node information, enabling richer node representations. Further developments involved constructing graph-specific “tokens,” analogous to language tokens, effectively encapsulating both structural and semantic attributes. For example, Higpt [390] generated graph tokens from textual descriptions d_(v_(i))d_{v_{i}}, processing them through a heterogeneous graph transformer (HGT) and refining the results via inference with large language models (LLMs). HierPromptLM [386] similarly extracted metapath-based subgraph tokens using LLMs to construct prompts tailored for downstream fine-tuning tasks. In parallel, methods like GHGRL [389] leveraged the reasoning capabilities of LLMs to discern node types from textual attributes, resulting in semantically enriched node embeddings. The creation of benchmarks such as the Heterogeneous Text Attributed Graph (HTAG) datasets further facilitates empirical evaluation and validation of these methodologies. 基于语言模型的方法。受到自然语言处理最新进展的启发,特别是大型语言模型 (LLM) 的变革性成功,研究人员开始将语言编码器集成到异构图建模中,以通过文本嵌入策略解决异构性问题。GaLM [388]通过将邻接矩阵 Ain\mathbf{A} \in 中的结构信号 {0,1}^(N xx N)\{0,1\}^{N \times N} 与文本节点信息相结合,改进了掩码语言建模,从而实现了更丰富的节点表示。进一步的发展涉及构建特定于图形的“标记”,类似于语言标记,有效地封装结构和语义属性。例如,Higpt [390]从文本描述中生成图标记 d_(v_(i))d_{v_{i}} ,通过异构图转换器(HGT)对其进行处理,并通过使用大型语言模型(LLM)进行推理来完善结果。HierPromptLM [386]同样使用 LLM 提取了基于元路径的子图标记,以构建为下游微调任务量身定制的提示。同时,GHGRL [389]等方法利用 LLM 的推理能力从文本属性中辨别节点类型,从而产生语义丰富的节点嵌入。异构文本属性图 (HTAG) 数据集等基准的创建进一步促进了这些方法的实证评估和验证。
Hybrid Approaches. Most recently, hybrid models synthesizing GNNs and LLMs have achieved state-of-theart performance by simultaneously capturing structural dependencies and semantic details. Heterformer [381], for instance, integrated transformer-based neighbor aggregation with textual context, effectively unifying structural and semantic representations. Similarly, THLM [387] proposed a dual-encoder pre-training frame- 混合方法。最近,合成 GNN 和 LLM 的混合模型通过同时捕获结构依赖性和语义细节实现了最先进的性能。例如,异质器[381]将基于 Transformer 的邻居聚合与文本上下文集成在一起,有效地统一了结构和语义表示。同样,THLM [387]提出了一种双编码器预训练框架
Table 13: Summary of domain-specific GFMs on knowledge graphs, academic networks, temporal graphs, casual graphs. 表 13:知识图谱、学术网络、时间图、临时图上特定领域 GFM 的摘要。
Method Name 方法名称
Domain 域
Backbone 骨干
Pretrain 预训练
Adaptation 适应
Feature Align 特征对齐
Structure Align 结构对齐
Task Align 任务对齐
Github Github 的
MOTIF [271] 主题 [271]
Knowledge Graph 知识图谱
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
-
UltraQuery [29] 超查询 [29]
Knowledge Graph 知识图谱
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
Link 链接
ULTRA [28] 超 [28]
Knowledge Graph 知识图谱
GNN
Supervised 监督
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
Link 链接
KG-ICL [272]
Knowledge Graph 知识图谱
GNN
Supervised 监督
Graph Prompting 图形提示
Data - Text Attribute 数据 - 文本属性
Model - Codebook 型号 - Codebook
Explicit - Link 显式 - 链接
Link 链接
LitFM [391] 点燃调频 [391]
Academic Network 学术网络
GNN
Hybrid 混合
Finetune 微调
Data - Text Attribute 数据 - 文本属性
Loss - Multi-task 损失 - 多任务处理
Explicit - QA 显式 - QA
-
MiNT [392] 密西根州 [392]
Temporal Graph 时态图
GNN + LLM GNN + 法学硕士
Supervised 监督
Finetune 微调
N/A 不适用
N/A 不适用
N/A 不适用
Link 链接
ZCG [393] 中零碳 [393]
Causal Graph 因果图
LLM
Hybrid 混合
In-context 上下文
Data - Text Attribute 数据 - 文本属性
N/A 不适用
Explicit - QA 显式 - QA
Link 链接
Method Name Domain Backbone Pretrain Adaptation Feature Align Structure Align Task Align Github
MOTIF [271] Knowledge Graph GNN Supervised Finetune Data - Text Attribute Model - Codebook Explicit - Link -
UltraQuery [29] Knowledge Graph GNN Supervised Finetune Data - Text Attribute Model - Codebook Explicit - Link Link
ULTRA [28] Knowledge Graph GNN Supervised Finetune Data - Text Attribute Model - Codebook Explicit - Link Link
KG-ICL [272] Knowledge Graph GNN Supervised Graph Prompting Data - Text Attribute Model - Codebook Explicit - Link Link
LitFM [391] Academic Network GNN Hybrid Finetune Data - Text Attribute Loss - Multi-task Explicit - QA -
MiNT [392] Temporal Graph GNN + LLM Supervised Finetune N/A N/A N/A Link
ZCG [393] Causal Graph LLM Hybrid In-context Data - Text Attribute N/A Explicit - QA Link| Method Name | Domain | Backbone | Pretrain | Adaptation | Feature Align | Structure Align | Task Align | Github |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| MOTIF [271] | Knowledge Graph | GNN | Supervised | Finetune | Data - Text Attribute | Model - Codebook | Explicit - Link | - |
| UltraQuery [29] | Knowledge Graph | GNN | Supervised | Finetune | Data - Text Attribute | Model - Codebook | Explicit - Link | Link |
| ULTRA [28] | Knowledge Graph | GNN | Supervised | Finetune | Data - Text Attribute | Model - Codebook | Explicit - Link | Link |
| KG-ICL [272] | Knowledge Graph | GNN | Supervised | Graph Prompting | Data - Text Attribute | Model - Codebook | Explicit - Link | Link |
| LitFM [391] | Academic Network | GNN | Hybrid | Finetune | Data - Text Attribute | Loss - Multi-task | Explicit - QA | - |
| MiNT [392] | Temporal Graph | GNN + LLM | Supervised | Finetune | N/A | N/A | N/A | Link |
| ZCG [393] | Causal Graph | LLM | Hybrid | In-context | Data - Text Attribute | N/A | Explicit - QA | Link |
work that jointly leverages GNN-based structural insights and LLM-derived semantic encodings, producing comprehensive multimodal node embeddings. These hybrid methodologies epitomize a natural evolution in graph modeling by harmonizing complementary strengths, thereby establishing foundations for robust and contextually rich graph models. 共同利用基于 GNN 的结构洞察和 LLM 衍生的语义编码,产生全面的多模态节点嵌入的工作。这些混合方法通过协调互补优势,体现了图建模的自然演变,从而为稳健且上下文丰富的图模型奠定了基础。
Future Directions. Future research in heterogeneous graph foundation models presents numerous promising directions. Key areas include exploring advanced multimodal integration strategies to enhance structuralsemantic coherence, improving interpretability by explicitly aligning embeddings from distinct modalities, and developing dynamically adaptive models responsive to evolving graph structures. Moreover, research focusing on computational scalability and standardizing evaluation frameworks will be pivotal for advancing robust and contextually comprehensive graph foundation models. 未来方向。异构图基础模型的未来研究提出了许多有希望的方向。关键领域包括探索先进的多模态集成策略以增强结构语义连贯性,通过显式对齐来自不同模态的嵌入来提高可解释性,以及开发响应不断发展的图结构的动态自适应模型。此外,专注于计算可扩展性和标准化评估框架的研究对于推进稳健且上下文全面的图基础模型至关重要。
7.6 Knowledge Graph 7.6 知识图谱
Designing graph foundation models specifically for knowledge graphs (KGs) presents unique challenges, primarily due to the necessity of effectively inferring complex relationships and capturing implicit correlations among triplets (entity-relation-entity). Unlike general graphs, knowledge graphs require models to handle compositional generalization, logical consistency, and inductive inference, particularly for unseen entities and relations. Current research in graph foundation models for knowledge graphs predominantly addresses two key tasks: logical query answering and inductive link prediction. 专门为知识图谱 (KG) 设计图基础模型提出了独特的挑战,主要是由于需要有效地推断复杂的关系并捕获三元组(实体-关系-实体)之间的隐式相关性。与一般图不同,知识图谱要求模型处理组合泛化、逻辑一致性和归纳推理,特别是对于看不见的实体和关系。目前对知识图谱图基础模型的研究主要涉及两个关键任务:逻辑查询回答和归纳链接预测。
Logical Query Answering. In logical query answering task, the objective is to accurately respond to structured queries involving multi-hop reasoning and logical operators (e.g., intersection, union, negation). Formally, given a knowledge graph G=(E,R,T)G=(\mathcal{E}, \mathcal{R}, \mathcal{T}) with entities E\mathcal{E}, relations R\mathcal{R}, and triplets TsubeExxRxxE\mathcal{T} \subseteq \mathcal{E} \times \mathcal{R} \times \mathcal{E}, models aim to generalize compositionally and dynamically aggregate reasoning signals across queries, including those involving unseen entities and relations. For instance, UltraQuery [29] addresses this by defining both projection and logical operators as vocabulary-independent functions, achieving zero-shot generalization through inductive link prediction. Similarly, KG-ICL [272] utilizes prompt-based reasoning, dynamically constructing query-specific prompt graphs encoded by distinct message-passing networks, thus demonstrating robust generalization capabilities across various KGs. KICGPT [394] further incorporates LLMs in knowledge graph completion, leveraging the textual understanding of LLMs to enhance the model capacity. 逻辑查询应答。在逻辑查询应答任务中,目标是准确响应涉及多跳推理和逻辑运算符(例如,交集、并集、否定)的结构化查询。形式上,给定一个 G=(E,R,T)G=(\mathcal{E}, \mathcal{R}, \mathcal{T}) 包含实体 E\mathcal{E} 、关系 R\mathcal{R} 和三元组 TsubeExxRxxE\mathcal{T} \subseteq \mathcal{E} \times \mathcal{R} \times \mathcal{E} 的知识图谱,模型旨在跨查询(包括涉及看不见的实体和关系的查询)以组合方式泛化和动态聚合推理信号。例如,UltraQuery [29]通过将投影运算符和逻辑运算符定义为与词汇无关的函数来解决这个问题,通过归纳链接预测实现零样本泛化。同样,KG-ICL [272]利用基于提示的推理,动态构建由不同消息传递网络编码的特定查询提示图,从而在各种 KG 中展示了强大的泛化能力。KICGPT [394]进一步将 LLM 纳入知识图谱补全中,利用 LLM 的文本理解来增强模型能力。
Inductive Link Prediction. Inductive link prediction task focuses on predicting missing relations or links without relying on learned entity embeddings. The task can be formulated as learning transferable relation representations r inRr \in \mathcal{R} to infer missing links ( e_(h),r,e_(t)e_{h}, r, e_{t} ), even when entities e_(h),e_(t)e_{h}, e_{t} are unseen during training. ULTRA [28] addresses this by introducing a relation-based meta-graph structure, propagating information across relations rather than entities, thus enhancing transferability to novel knowledge graphs. Higher-order KGFM [271] extends this by modeling multi-relation motifs to capture complex relational dependencies, significantly improving expressive power for inductive reasoning. 电感式链路预测。归纳链接预测任务侧重于预测缺失关系或链接,而不依赖于学习的实体嵌入。该任务可以表述为学习可转移关系表示, r inRr \in \mathcal{R} 以推断缺失的链接 ( e_(h),r,e_(t)e_{h}, r, e_{t} ),即使实体 e_(h),e_(t)e_{h}, e_{t} 在训练过程中不可见。ULTRA [28]通过引入基于关系的元图结构来解决这个问题,在关系而不是实体之间传播信息,从而增强到新知识图谱的可转移性。高阶 KGFM [271]通过对多关系基序进行建模来捕获复杂的关系依赖关系,从而显着提高了归纳推理的表达能力。
Future Directions. Future research in graph foundation models for knowledge graphs should prioritize three main directions. Firstly, expanding beyond binary relations to handle complex, higher-order relational interactions such as temporal or n-ary relationships is critical. Secondly, integrating multi-modal data including textual, visual, and numerical information - into KGs will enable comprehensive multi-modal reasoning capabilities. Lastly, improving model interpretability, controllability, and computational scalability remains essential, particularly for applications in sensitive domains such as healthcare and finance, where transparency and computational efficiency are paramount. 未来方向。未来知识图谱图基础模型的研究应优先考虑三个主要方向。首先,超越二元关系以处理复杂的高阶关系交互,例如时间关系或 n 元关系至关重要。其次,将包括文本、视觉和数字信息在内的多模态数据集成到幼稚园中,将实现全面的多模态推理能力。最后,提高模型的可解释性、可控性和计算可扩展性仍然至关重要,特别是对于医疗保健和金融等敏感领域的应用程序,在这些领域,透明度和计算效率至关重要。
7.7 Temporal Graph 7.7 时间图
Temporal graphs, denoted as a sequence of evolving graph snapshots {G^(t)=(V^(t),E^(t))}_(t=1)^(T)\left\{\mathcal{G}^{t}=\left(\mathcal{V}^{t}, \mathcal{E}^{t}\right)\right\}_{t=1}^{T}, capture dynamic node interactions over time. Each node v_(i)inV^(t)v_{i} \in \mathcal{V}^{t} is associated with a time-dependent feature vector x_(i)^(t)in\mathbf{x}_{i}^{t} \inR^(D)\mathbb{R}^{D}, while each edge e_(ij)^(t)inE^(t)e_{i j}^{t} \in \mathcal{E}^{t} carries an evolving feature e_(ij)^(t)inR^(D)\mathbf{e}_{i j}^{t} \in \mathbb{R}^{D}. Modeling such structures requires simultaneously encoding spatial dependencies within each snapshot G^(t)\mathcal{G}^{t} and capturing temporal dependencies across graph states over time. Existing temporal GNNs rely heavily on handcrafted diffusion mechanisms to propagate node features across timestamps, which often struggle to generalize to unseen temporal patterns, especially when nodes and edges are associated with rich text descriptions d_(v_(i))^(t)d_{v_{i}}^{t} and d_(e_(ij))^(t)d_{e_{i j}}^{t} that evolve alongside the graph. 时间图,表示为一系列不断演变的图快照 {G^(t)=(V^(t),E^(t))}_(t=1)^(T)\left\{\mathcal{G}^{t}=\left(\mathcal{V}^{t}, \mathcal{E}^{t}\right)\right\}_{t=1}^{T} ,捕获随时间变化的动态节点交互。每个节点 v_(i)inV^(t)v_{i} \in \mathcal{V}^{t} 都与一个与时间相关的特征向量 x_(i)^(t)in\mathbf{x}_{i}^{t} \in 相关联 R^(D)\mathbb{R}^{D} ,而每条边 e_(ij)^(t)inE^(t)e_{i j}^{t} \in \mathcal{E}^{t} 都带有一个不断演化的特征 e_(ij)^(t)inR^(D)\mathbf{e}_{i j}^{t} \in \mathbb{R}^{D} 。对此类结构进行建模需要同时对每个快照中的空间依赖关系进行编码 G^(t)\mathcal{G}^{t} ,并捕获随时间变化的图形状态之间的时间依赖关系。现有的时间 GNN 严重依赖手工制作的扩散机制来跨时间戳传播节点特征,这些时间戳通常很难推广到看不见的时间模式,尤其是当节点和边与富文本描述相关联 d_(v_(i))^(t)d_{v_{i}}^{t} 并 d_(e_(ij))^(t)d_{e_{i j}}^{t} 与图一起演变时。
Language Models-based Approaches. Recent works have explored the potential of large language models (LLMs) in reasoning over temporal graphs. LLM4DyG [395] represents a pioneering effort in benchmarking LLMs for spatial-temporal graph tasks, treating dynamic graphs as serialized sequences of adjacency matrices {A^(t)}_(t=1)^(T)\left\{\mathbf{A}^{t}\right\}_{t=1}^{T} with evolving node descriptions {d_(v_(i))^(t)}\left\{d_{v_{i}}^{t}\right\}. The study highlights that as graph size NN and temporal density TT increase, LLM performance declines, emphasizing the challenge of maintaining spatial-temporal consistency at scale. To address this, Disentangled Spatial-Temporal Thoughts (DST2) was introduced, explicitly separating spatial reasoning within G^(t)\mathcal{G}^{t} from temporal reasoning across successive snapshots G^(t-1)rarrG^(t)\mathcal{G}^{t-1} \rightarrow \mathcal{G}^{t}, improving interpretability and prediction accuracy. A related line of work investigates how LLMs can model graph flow dynamics, where node states y_(i)^(t)y_{i}^{t} evolve based on local neighborhoods and historical attributes. FlowGPT [396] introduces a benchmark assessing LLMs’ ability to capture diffusion processes such as SIR (Susceptible-Infected-Removed). By serializing dynamic graphs into time-ordered sequences of node states and adjacency matrices, FlowGPT evaluates how well LLMs trace propagation patterns and identify influential nodes. Together, these works highlight the challenges and potential of LLM-based reasoning in evolving temporal graphs. 基于语言模型的方法。最近的工作探索了大型语言模型 (LLM) 在时间图推理方面的潜力。LLM4DyG [395]代表了对时空图任务的 LLM 进行基准测试的开创性努力,将动态图视为 {A^(t)}_(t=1)^(T)\left\{\mathbf{A}^{t}\right\}_{t=1}^{T} 具有不断演变的节点描述的邻接矩阵的序列 {d_(v_(i))^(t)}\left\{d_{v_{i}}^{t}\right\} 化序列。该研究强调,随着图大小 NN 和时间密度 TT 的增加,法学硕士的性能会下降,这强调了大规模保持时空一致性的挑战。为了解决这个问题,引入了解缠绕的时空思维(DST2),明确地将内部 G^(t)\mathcal{G}^{t} 空间推理与连续快照的时间推理分开 G^(t-1)rarrG^(t)\mathcal{G}^{t-1} \rightarrow \mathcal{G}^{t} ,提高了可解释性和预测准确性。相关工作线研究了 LLM 如何对图形流动动态进行建模,其中节点状态根据本地邻域和历史属性演 y_(i)^(t)y_{i}^{t} 变。FlowGPT [396]引入了一个基准,用于评估 LLM 捕获 SIR(易感-感染-移除)等扩散过程的能力。通过将动态图序列化为按时间顺序排列的节点状态和邻接矩阵序列,FlowGPT 评估了 LLM 跟踪传播模式和识别有影响力的节点的能力。这些工作共同强调了基于法学硕士的推理在不断发展的时间图中面临的挑战和潜力。
Transfer Learning Approaches. To enhance the adaptability of foundation models across diverse temporal graphs, MiNT [392] proposes a multi-network pretraining strategy. Instead of training on a single dynamic graph, MiNT learns from a collection of networks {G^((k))}_(k=1)^(K)\left\{\mathcal{G}^{(k)}\right\}_{k=1}^{K}, each spanning its own time horizon. The approach uses a single encoder to generate temporally-aware node representations z_(i)^((k,t))=GNN(x_(i)^((k,t)),A^((k,t)))\mathbf{z}_{i}^{(k, t)}=\operatorname{GNN}\left(\mathbf{x}_{i}^{(k, t)}, \mathbf{A}^{(k, t)}\right) that generalize to unseen graphs. By leveraging structural and temporal diversity, MiNT outperforms traditional temporal GNNs trained on individual datasets, demonstrating the promise of cross-network pretraining for temporal graph foundation models. 迁移学习方法。为了增强基础模型在不同时间图上的适应性,MiNT [392]提出了一种多网络预训练策略。MiNT 不是在单个动态图上进行训练,而是从一组网络中学习 {G^((k))}_(k=1)^(K)\left\{\mathcal{G}^{(k)}\right\}_{k=1}^{K} ,每个网络跨越自己的时间范围。该方法使用单个编码器来生成时间感知节点表示 z_(i)^((k,t))=GNN(x_(i)^((k,t)),A^((k,t)))\mathbf{z}_{i}^{(k, t)}=\operatorname{GNN}\left(\mathbf{x}_{i}^{(k, t)}, \mathbf{A}^{(k, t)}\right) ,这些节点表示可以推广到看不见的图形。通过利用结构和时间多样性,MiNT 的性能优于在单个数据集上训练的传统时间 GNN,展示了跨网络预训练时间图基础模型的前景。
Future Directions. These advances highlight the unique challenges of temporal graph reasoning, including disentangled spatial-temporal modeling, cross-graph generalization, and flow-aware sequence modeling. Future work may explore hierarchical temporal abstraction to enrich node representations z_(i)^(t)\mathbf{z}_{i}^{t} with multi-scale embeddings, enabling reasoning over both short- and long-term patterns. Adaptive serialization that adjusts input granularity based on graph density or event frequency could enhance LLM adaptability. Lastly, hybrid architectures combining GNN-based spatial encoding with LLM-based temporal reasoning show promise for building expressive, interpretable temporal graph foundation models. 未来方向。这些进步凸显了时态图推理的独特挑战,包括解缠时空建模、跨图泛化和流感知序列建模。未来的工作可能会探索分层时间抽象,以 z_(i)^(t)\mathbf{z}_{i}^{t} 通过多尺度嵌入丰富节点表示,从而实现对短期和长期模式的推理。根据图密度或事件频率调整输入粒度的自适应序列化可以增强 LLM 的适应性。最后,将基于 GNN 的空间编码与基于 LLM 的时间推理相结合的混合架构显示出构建富有表现力、可解释的时间图基础模型的前景。
7.8 Academic Network 7.8 学术网络
Academic citation graphs G=(V,E)\mathcal{G}=(\mathcal{V}, \mathcal{E}) encode structural dependencies between research papers, where nodes v_(i)inVv_{i} \in \mathcal{V} represent papers with metadata and text x_(i)inR^(D)\mathrm{x}_{i} \in \mathbb{R}^{D}, and edges e_(ij)inEe_{i j} \in \mathcal{E} capture citation links. These graphs require modeling both semantic relevance and citation dynamics while balancing local consistency with global citation flow. Unlike static retrieval, citation graphs reflect evolving scientific discourse, demanding embeddings z_(i)\mathbf{z}_{i} that integrate node content, citation context, and temporal relevance. This necessitates dynamically contextualized retrieval, where citation edges complement textual similarity. 学术引文图对研究论文之间的结构依赖关系 G=(V,E)\mathcal{G}=(\mathcal{V}, \mathcal{E}) 进行编码,其中节点 v_(i)inVv_{i} \in \mathcal{V} 表示具有元数据和文本 x_(i)inR^(D)\mathrm{x}_{i} \in \mathbb{R}^{D} 的论文,边缘 e_(ij)inEe_{i j} \in \mathcal{E} 捕获引文链接。这些图表需要对语义相关性和引文动态进行建模,同时平衡局部一致性与全局引文流。与静态检索不同,引文图反映了不断发展的科学话语,需要整合节点内容、引文上下文和时间相关性的嵌入 z_(i)\mathbf{z}_{i} 。这需要动态上下文检索,其中引文边缘补充文本相似性。
Graph-Augmented Retrieval and Generation. LitFM [391] pioneers this space by proposing the first literature foundation model explicitly designed to integrate academic citation graphs into LLM workflows. The framework uses a graph retriever that retrieves structurally relevant papers based on graph proximity in adjacency matrix A and citation-aware embeddings z_(i)\mathbf{z}_{i}, mitigating common LLM failures such as citation 图增强检索和生成。LitFM [391]开创了这一领域,提出了第一个明确设计用于将学术引文图集成到 LLM 工作流程中的文献基础模型。该框架使用图检索器,该检索器基于邻接矩阵 A 中的图邻近性和引文感知嵌入来检索结构上相关的论文 z_(i)\mathbf{z}_{i} ,从而减轻了常见的 LLM 故障,例如引文
hallucination and knowledge incompleteness. Beyond simple graph-enhanced retrieval, LitFM also employs instruction tuning over domain-specific citation graphs, enabling the model to generalize across citation prediction, related work generation, and literature review summarization tasks. The development of LitFM highlights the potential of academic graph foundation models to reshape how scientific literature is processed, summarized, and cited within LLM ecosystems. 幻觉和知识不完整。除了简单的图增强检索之外,LitFM 还对特定领域的引文图进行了指令调整,使模型能够在引文预测、相关工作生成和文献综述摘要任务中进行泛化。LitFM 的发展凸显了学术图基础模型重塑科学文献在 LLM 生态系统中的处理、总结和引用方式方面的潜力。
Future Directions. Future directions could focus on enriching paper embeddings z_(i)\mathbf{z}_{i} by incorporating temporal citation patterns and multi-modal signals such as figures, tables, and equation graphs. Additionally, expanding instruction tuning to include task compositions could enable more complex scholarly reasoning tasks. As citation graphs continue to grow in scale and complexity, the fusion of LLM semantic capabilities with structural insights from citation graphs will remain central to building robust academic graph foundation models. 未来方向。未来的方向可能侧重于通过结合时间引文模式和多模态信号 z_(i)\mathbf{z}_{i} (如图形、表格和方程图)来丰富论文嵌入。此外,扩展指令调整以包括任务组成可以实现更复杂的学术推理任务。随着引文图的规模和复杂性不断增长,法学硕士语义能力与引文图的结构见解的融合仍将是构建强大的学术图基础模型的核心。
7.9 Causal Graph 7.9 因果图
Causal graphs, represented as directed acyclic graphs (DAGs) G=(V,E)\mathcal{G}=(\mathcal{V}, \mathcal{E}), encode cause-effect relationships, where nodes v_(i)v_{i} denote variables and directed edges e_(ij)e_{i j} capture causal links. Node features x_(i)inR^(D)\mathbf{x}_{i} \in \mathbb{R}^{D} may include contextual data like text or metadata, forming hybrid semantic-causal structures. Designing causal GFMs poses unique challenges: the graphs are sparse and directional, requiring inference of asymmetric dependencies, and often originate from noisy, text-heavy data. This demands joint reasoning over textual evidence and graph structure, making semantic grounding and structural consistency critical but difficult to achieve. 因果图,表示为有向无环图 (DAG), G=(V,E)\mathcal{G}=(\mathcal{V}, \mathcal{E}) 编码因果关系,其中节点 v_(i)v_{i} 表示变量,有向边 e_(ij)e_{i j} 捕获因果关系。节点特征 x_(i)inR^(D)\mathbf{x}_{i} \in \mathbb{R}^{D} 可能包括文本或元数据等上下文数据,形成混合语义因果结构。设计因果 GFM 带来了独特的挑战:图是稀疏且有方向性的,需要推断不对称的依赖关系,并且通常源自嘈杂、文本繁重的数据。这需要对文本证据和图结构进行联合推理,这使得语义基础和结构一致性至关重要但难以实现。
Causal Reasoning with Language Models. To address causal graph reasoning, Causal-LLM [393] introduces a zero-shot approach that constructs pairwise queries like “Does v_(i)v_{i} cause v_(j)v_{j} ?” from unstructured text, iteratively building a global causal graph G\mathcal{G}. Without explicit supervision, it leverages the causal reasoning abilities of LLMs for scalable discovery, though it struggles with indirect chains (e.g., v_(i)rarrv_(k)rarrv_(j)v_{i} \rightarrow v_{k} \rightarrow v_{j} ), exposing limitations in prompt design. Complementing this, CLEAR [397] proposes a benchmark to evaluate LLMs on twenty causal tasks, including D-separation, backdoor adjustment, and effect estimation. Models are tested on structural and textual views of G\mathcal{G}, revealing performance drops on graphs with high-degree nodes or long paths. CLEAR also shows that phrasing and context order significantly affect results, highlighting the fragility of current prompting methods in causal reasoning. 语言模型的因果推理。为了解决因果图推理问题,Causal-LLM [393]引入了一种零样本方法,从非结构化文本中构建成对查询,如“Do v_(i)v_{i} cause v_(j)v_{j} ?”,迭代地构建全局因果图 G\mathcal{G} 。在没有明确监督的情况下,它利用法学硕士的因果推理能力进行可扩展的发现,尽管它在间接链(例如) v_(i)rarrv_(k)rarrv_(j)v_{i} \rightarrow v_{k} \rightarrow v_{j} 方面遇到了困难,暴露了提示设计的局限性。作为补充,CLEAR [397]提出了一个基准,用于评估 LLM 在 20 个因果任务上的基准,包括 D 分离、后门调整和效果估计。模型在 的 G\mathcal{G} 结构视图和文本视图上进行了测试,揭示了具有高阶节点或长路径的图形的性能下降。CLEAR 还表明,措辞和上下文顺序显着影响结果,凸显了当前提示方法在因果推理中的脆弱性。
Future Directions. The emergence of causal GFMs presents a promising path toward scalable, interpretable causal discovery across diverse scientific fields. Future research may explore structured causal prompting where queries incorporate domain-specific causal priors such as preferred causal ordering (e.g., temporal precedence). Another promising direction is to augment node features x_(i)\mathbf{x}_{i} with confidence-aware embeddings derived from multiple noisy sources, enabling the model to express uncertainty-aware causal graphs. Moreover, integrating causal graph learning into multi-modal GFMs, where text, tables, and graphs jointly contribute to causal inference, could significantly enhance causal discovery in data-rich scientific domains. These innovations will further solidify causal graph foundation models as essential tools for automated scientific reasoning and knowledge discovery. 未来方向。因果 GFM 的出现为跨不同科学领域可扩展、可解释的因果发现提供了一条充满希望的道路。未来的研究可能会探索结构化因果提示,其中查询包含特定领域的因果先验,例如首选因果排序(例如,时间优先级)。另一个有前途的方向是通过来自多个噪声源的置信度感知嵌入来增强节点特征 x_(i)\mathbf{x}_{i} ,使模型能够表达不确定性感知的因果图。此外,将因果图学习集成到多模态 GFM 中,其中文本、表格和图形共同有助于因果推理,可以显着增强数据丰富的科学领域的因果发现。这些创新将进一步巩固因果图基础模型作为自动化科学推理和知识发现的重要工具。
8 Theoretical Understandings 8 理论理解
8.1 Emergence and Scaling Law 8.1 涌现和缩放定律
Foundation models exhibit emergence [62], where increasing model size, data availability, and total compute leads to significant improvements in performance. These phenomena are characterized through neural scaling laws [12,13], which provide quantitative insights into model behavior under resource expansion. While scaling laws have been extensively studied in LLMs, replicating these observations in graph-based models remains an ongoing challenge. 基础模型表现出出现[62],其中模型大小、数据可用性和总计算量的增加导致性能的显着提高。这些现象通过神经缩放定律[12\u201213]来表征,这为资源扩展下的模型行为提供了定量的见解。虽然缩放定律已在法学硕士中进行了广泛研究,但在基于图的模型中复制这些观察结果仍然是一个持续的挑战。
8.1.1 Well-Structured Graphs 8.1.1 结构良好的图形
Current studies on graph scaling laws primarily focus on well-structured graph domains, such as molecular and atomic systems [398, 399, 30, 328, 329]. In these settings, graphs exhibit naturally predefined structures where individual components carry intrinsic semantics, akin to the structured nature of language with its grammatical rules. This structural consistency facilitates the emergence of scaling properties in graph learning. 目前对图缩放定律的研究主要集中在结构良好的图域上,例如分子和原子系统[398,399,30,328,329]。在这些设置中,图表表现出自然预定义的结构,其中各个组件具有内在语义,类似于语言的结构化性质及其语法规则。这种结构一致性促进了图学习中缩放属性的出现。
Empirical evidence supporting the existence of scaling laws in domain-specific graphs has emerged from molecular and atomic modeling. For instance, JMP [30] demonstrated improvements in atomic property prediction across diverse chemical domains by pretraining on large-scale datasets. The pretraining corpus comprised approximately 120 million systems from four distinct sources: OC20 (100M examples), OC22 ( 8 M examples), ANI- 1 x ( 2 M examples), and Transition- 1 x ( 10 M examples). These datasets included both equilibrium and non-equilibrium atomic structures, with energy and force labels serving as primary supervision signals. Notably, this work illustrated that a pretrained model could acquire transferable knowledge, requiring minimal labeled data for downstream adaptation. Following a similar paradigm, DPA-2 [328] leveraged largescale multi-task pretraining over 10 million atomic structures, facilitating knowledge transfer to unseen tasks. These studies [30, 328, 329] reinforce the understanding that scaling graph models with natural structures enhances performance, mirroring trends observed in LLMs and vision models. 支持特定域图中存在缩放定律的经验证据已经从分子和原子建模中出现。例如,JMP [30]通过对大规模数据集进行预训练,展示了跨不同化学域的原子性质预测的改进。预训练语料库包含来自四个不同来源的大约 1.2 亿个系统:OC20(100M 示例)、OC22(8 M 示例)、ANI- 1 x(2 M 示例)和 Transition-1 x(10 M 示例)。这些数据集包括平衡和非平衡原子结构,能量和力标签作为主要监督信号。值得注意的是,这项工作表明,预训练模型可以获取可转移的知识,需要最少的标记数据进行下游适应。遵循类似的范式,DPA-2 [328]利用了超过 1000 万个原子结构的大规模多任务预训练,促进了知识向看不见的任务的转移。这些研究[30,328,329]强化了这样一种理解,即使用自然结构缩放图模型可以提高性能,反映了在 LLM 和视觉模型中观察到的趋势。
To further quantify emergence in molecular graph, existing studies have sought to determine the critical thresholds of data and model size required to achieve scaling laws [398, 399]. Frey et al. [398] investigated neural scaling behaviors in large-scale graph-based chemical models, identifying key scaling exponents that capture performance improvements as model capacity and dataset size increase. Their findings revealed a scaling exponent of 0.17 for the largest dataset and 0.26 for equivariant GNN-based interatomic potential models, indicating measurable gains in pretraining loss with increased resources. Similarly, Chen et al. [399] conducted an extensive analysis of scaling laws in molecular graphs, examining the impact of data modality, dataset partitioning strategies, pretraining paradigms, and model capacity constraints. Their key insights include: Modality Dependence: Different molecular representations exhibit distinct scaling behaviors. Graph-based and fingerprint-based encodings demonstrate the highest data efficiency, whereas SMILES-based representations exhibit diminished performance improvements as dataset size increases. Pretraining Efficacy: Pretraining provides significant benefits in low-data regimes but exhibits diminishing returns in high-data scenarios, where negative transfer effects may arise. Dataset Partitioning: The efficiency of data utilization varies based on partitioning strategies. Random splits yield the highest efficiency, while scaffold-based and imbalanced splits introduce distribution shifts that lower learning effectiveness. Model Capacity Trade-offs: No straightforward relationship exists between dataset size and optimal model capacity. In some cases, smaller datasets necessitate larger models to achieve peak performance. 为了进一步量化分子图谱中的出现,现有研究试图确定实现缩放定律所需的数据和模型大小的临界阈值[398,399]。Frey 等[398]研究了基于图的大规模化学模型中的神经缩放行为,确定了随着模型容量和数据集大小的增加而捕获性能改进的关键缩放指数。他们的研究结果显示,最大数据集的缩放指数为 0.17,基于等变 GNN 的原子间势模型的缩放指数为 0.26,表明随着资源的增加,预训练损失的可测量收益。同样,Chen 等[399]对分子图中的缩放规律进行了广泛的分析,研究了数据模态、数据集分区策略、预训练范式和模型容量约束的影响。他们的主要见解包括: 模态依赖性:不同的分子表示表现出不同的缩放行为。基于图形和基于指纹的编码表现出最高的数据效率,而基于 SMILES 的表示随着数据集大小的增加表现出性能改进的减弱。预训练效率:预训练在低数据状态下提供了显着的好处,但在高数据场景中表现出收益递减,在高数据场景中可能会产生负面转移效应。数据集分区:数据利用效率因分区策略而异。随机拆分产生最高的效率,而基于支架和不平衡拆分引入分布偏移,从而降低学习效果。模型容量权衡:数据集大小和最佳模型容量之间不存在直接关系。在某些情况下,较小的数据集需要较大的模型才能实现最佳性能。
Beyond molecular graphs, scaling laws have also been explored in temporal graphs. MiNT [392] analyzed scaling behaviors in dynamic transaction networks, leveraging a collection of 84 temporal graphs. By pretraining on 64 networks and evaluating transferability on 20 unseen networks, MiNT achieved superior zero-shot performance, surpassing models trained on individual datasets. Crucially, their study demonstrated a consistent improvement in performance with increasing numbers of training networks. 除了分子图之外,缩放定律也在时间图中得到了探索。MiNT [392]利用 84 个时间图的集合分析了动态事务网络中的扩展行为。通过在 64 个网络上进行预训练并在 20 个看不见的网络上评估可转移性,MiNT 实现了卓越的零样本性能,超过了在单个数据集上训练的模型。至关重要的是,他们的研究表明,随着训练网络数量的增加,性能会持续提高。
8.1.2 General Graphs 8.1.2 常规图形
While scaling laws have been studied in structured domains, such as molecular and atomic graphs, their applicability to general-purpose GFMs remains largely unexplored. Recent efforts have attempted to investigate 虽然缩放定律已经在分子图和原子图等结构化领域进行了研究,但它们对通用 GFM 的适用性在很大程度上仍未得到探索。最近的努力试图调查
scaling behavior from both supervised learning [400] and self-supervised learning [401] perspectives. Under supervised learning settings, Liu et al. [400] analyzed scaling behavior in GFMs trained on up to 100 million parameters and 50 million samples. Their findings indicate that model depth plays a pivotal role in scaling performance. Furthermore, they observed that traditional measures of data volume, such as the number of graph instances, are ineffective due to the irregular size of individual graphs. Instead, they proposed using the number of nodes or edges as a more reliable metric for defining scaling laws in graph data. From a self-supervised learning perspective, Ma et al. [401] examined whether existing graph SSL techniques exhibit consistent neural scaling behavior. Their analysis revealed that while SSL loss continues to improve with increasing data and model sizes, downstream performance remains highly sensitive to architectural choices and pretext task design. Unlike in other domains, where larger datasets and models consistently yield better performance, graph SSL methods do not exhibit clear scaling trends. Consequently, they argue that existing graph SSL frameworks may not yet be suitable for training scalable GFMs. In addition, several studies have proposed advanced models [64, 76] and evaluated their scaling behavior. However, these analyses are often constrained to small graph datasets [64] or focus solely on model scaling rather than data scaling effects [76]. 从监督学习[400]和自监督学习[401]的角度扩展行为。在监督学习设置下,Liu 等[400]分析了在多达 1 亿个参数和 5000 万个样本上训练的 GFM 的缩放行为。他们的研究结果表明,模型深度在扩展性能方面起着关键作用。此外,他们观察到,由于单个图的大小不规则,传统的数据量衡量标准(例如图实例的数量)是无效的。相反,他们建议使用节点或边的数量作为更可靠的指标来定义图数据中的缩放规律。从自监督学习的角度来看,马等[401]研究了现有的图 SSL 技术是否表现出一致的神经缩放行为。他们的分析表明,虽然 SSL 丢失随着数据和模型大小的增加而继续改善,但下游性能仍然对架构选择和借口任务设计高度敏感。与其他领域不同,在其他领域,较大的数据集和模型始终产生更好的性能,图 SSL 方法并没有表现出明显的扩展趋势。因此,他们认为现有的图 SSL 框架可能还不适合训练可扩展的 GFM。此外,一些研究提出了先进的模型[64,76]并评估了它们的缩放行为。然而,这些分析通常仅限于小型图数据集[64],或者仅关注模型缩放而不是数据缩放效应[76]。
These observations raise a fundamental question: Do scaling laws inherently exist in graph learning? If not, what requirements must be satisfied to achieve a true scaling law for GFMs? In Section 10.1, we explore these open questions and propose potential directions for future research. 这些观察结果提出了一个基本问题:图学习中是否固有存在缩放定律?如果不是,必须满足哪些要求才能实现真正的 GFM 缩放定律?在第 10.1 节中,我们探讨了这些悬而未决的问题,并为未来的研究提出了潜在的方向。
8.2 Transferability 8.2 可转让性
Transferability refers to a model’s capability to extract patterns from source tasks and apply this knowledge to enhance performance on related target tasks [402, 403]. The principles behind the transferability is that the pretrained models capture the general, transferable patterns across domains. For example, on textual data, the transferable patterns can be treated as tokens, words, and phrases; on image data, the transferable patterns can be treated as contours, colors, textures, and edges of an image. Understanding transferable patterns is essential for developing graph foundation models. We discuss the transferability of GFMs from the perspective of single-task and cross-task in the following, respectively. 可转移性是指模型从源任务中提取模式并应用这些知识来提高相关目标任务性能的能力[402,403]。可转移性背后的原则是,预训练模型捕获跨域的通用、可转移模式。例如,在文本数据上,可转移的模式可以被视为标记、单词和短语;在图像数据上,可传输的图案可以被视为图像的轮廓、颜色、纹理和边缘。了解可转移模式对于开发图基础模型至关重要。下面我们分别从单任务和跨任务的角度讨论了 GFM 的可转移性。
8.2.1 Single-Task Transferability 8.2.1 单任务可转移性
Node-Level Tasks. Node-level tasks focus on understanding the properties of individual nodes within a graph, where these properties are typically influenced by the attributes of neighboring nodes. Depending on the nature of node interactions, connections can follow one of two fundamental principles: “similarity attracts” (homophily) or “opposite attracts” (heterophily). Specifically, homophily describes that connected nodes exhibit similar characteristics, whereas heterophily refers to that connected nodes possess dissimilar attributes. A primary challenge in ensuring the transferability of node-level tasks lies in designing a unified model capable of capturing both homophily and heterophily patterns. Standard GNN architectures struggle to generalize across both types of graphs, often exhibiting poor performance when applied jointly [404]. To address this issue, recent approaches incorporate additional textual descriptions to provide contextual information [24] or employ learnable prediction aggregation mechanisms [66] to adaptively model different node interaction patterns. 节点级任务。节点级任务侧重于了解图中各个节点的属性,其中这些属性通常受到相邻节点属性的影响。根据节点交互的性质,连接可以遵循两个基本原则之一:“相似性吸引”(同性恋)或“相反吸引”(异性)。具体来说,同质是指连通节点表现出相似的特征,而异质是指连通节点具有不同的属性。确保节点级任务的可转移性的主要挑战在于设计一个能够捕获同质和异质模式的统一模型。标准 GNN 架构难以在两种类型的图之间进行推广,在联合应用时通常表现出较差的性能[404]。为了解决这个问题,最近的方法结合了额外的文本描述来提供上下文信息[24],或采用可学习的预测聚合机制[66]来自适应地模拟不同的节点交互模式。
Link-Level Tasks. Link-level tasks require models to capture relational structures between node pairs, often relying on proximity-based measures to determine the likelihood of a connection. More precisely, if two nodes share common neighborhoods, they are more likely to be linked. Depending on the extent of neighborhood overlap, proximity can be classified into two levels: (1) local proximity, where nodes share direct neighbors, and (2) global proximity, where nodes exhibit high-order neighborhood relationships. Effectively modeling these proximity patterns is challenging, as traditional message-passing GNNs lack the necessary expressiveness to capture link isomorphism [405]. To enhance relational modeling capabilities, advanced techniques such as labeling tricks [405] introduce additional structural knowledge or positional embeddings to enrich link representations, enabling improved expressiveness. 链接级任务。链路级任务要求模型捕获节点对之间的关系结构,通常依靠基于邻近度的措施来确定连接的可能性。更准确地说,如果两个节点共享共同的邻域,它们就更有可能被链接起来。根据邻域重叠的程度,邻近度可分为两个级别:(1) 局部邻近度,其中节点共享直接邻居,以及 (2) 全局邻近度,其中节点表现出高阶邻域关系。有效地对这些邻近模式进行建模具有挑战性,因为传统的消息传递 GNN 缺乏捕获链路同构所需的表达能力[405]。为了增强关系建模能力,标记技巧[405]等先进技术引入了额外的结构知识或位置嵌入来丰富链接表示,从而提高表现力。
Graph-Level Tasks. Graph-level tasks involve learning representations that capture distinctive substructures known as graph motifs - small, recurring patterns that define structural properties within a graph. The complexity of graph-level transferability arises from that motif distributions vary across different graphs, 图形级任务。图级任务涉及学习表示,这些表示捕获称为图主题的独特子结构,即定义图中结构属性的小型、重复的模式。图级可转移性的复杂性源于不同图的基序分布不同,
requiring models to identify shared motifs across diverse motif sets, which potentially can be satisfied via disentangled representation learning [406] or invariance learning [407]. However, another fundamental challenge lies in the expressiveness limitations of message-passing GNNs, which are inherently constrained by the 1 -WL test [210]. Even if shared motif sets can be identified, standard GNN architectures may fail to encode them effectively. Addressing this limitation, expressive GNNs [408] have been proposed to enhance motif encoding capabilities. For a more detailed discussion, including network analysis, model expressiveness, and stability, we refer readers to a recent study [36] that analyzes these challenges in depth. 要求模型识别不同基序集中的共享基序,这可能可以通过解缠表示学习[406]或不变性学习[407]来满足。然而,另一个基本挑战在于消息传递 GNN 的表达性限制,这本质上受到 1 -WL 测试的限制[210]。即使可以识别共享基序集,标准 GNN 架构也可能无法有效地对它们进行编码。为了解决这一限制,已经提出了表达性 GNN[408]来增强基序编码能力。如需更详细的讨论,包括网络分析、模型表达性和稳定性,我们建议读者参考最近的一项研究[36],该研究深入分析了这些挑战。
8.2.2 Cross-Task Transferability 8.2.2 跨任务可转移性
Graphon. Graphon theory [409, 410, 288] has been explored as a theoretical foundation for identifying transferable patterns in graphs. If two graphs are generated from the same graphon, they are expected to share similar topological properties, which can result in high transferability. Ruiz et al. [409] established theoretical bounds on the embeddings of two graphs sampled from the same graphon, while Cao et al. [410] employed graphon theory to analyze transferability in pretraining and fine-tuning settings. By mapping pre-trained graphs into a graphon space, they ensured transferability if the target graph could be generated within this space. Extending this work, Sun et al. [288] proposed a fine-tuning strategy based on graphon theory. Despite its potential, the applicability of graphon theory to real-world graphs is constrained by its strong underlying assumptions [411]. Additionally, even when these assumptions hold, identifying a shared graphon from a large set of cross-domain graphs remains a significant challenge, limiting the practical utility of graphon-based methods in GFM design. 石墨。石墨子理论[409,410,288]已被探索为识别图中可转移模式的理论基础。如果两个图是从同一个图形子生成的,则它们应该具有相似的拓扑特性,这可以导致高可转移性。Ruiz 等[409]对从同一图形子采样的两张图的嵌入建立了理论边界,而 Cao 等[410]则采用图形子理论来分析预训练和微调设置中的可转移性。通过将预训练的图映射到图子空间中,如果可以在该空间内生成目标图,他们确保了可转移性。Sun 等[288]对这项工作的扩展提出了一种基于石墨子理论的微调策略。尽管石墨子理论具有潜力,但它在现实世界图中的适用性受到其强大的基本假设的限制[411]。此外,即使这些假设成立,从大量跨域图中识别共享的图形子仍然是一个重大挑战,限制了基于图形子的方法在 GFM 设计中的实际实用性。
Substructures. An alternative approach to defining transferable patterns involves leveraging recurring substructures such as triangles, stars, and kk-cliques [293,36]. These motifs frequently appear across different graph domains but may carry different semantic meanings. For instance, triangles commonly occur in citation networks, social networks, and molecular graphs, albeit with domain-specific interpretations. Building on this observation, recent studies [173, 22] have proposed subgraph-based learning frameworks, where sampled subgraphs containing structures are encoded using GNNs for prediction. From a theoretical perspective, these methods leverage graph spectrum analysis to quantify transferability. Levie et al. [412] analyzed transferability through stability, asserting that effective transfer should minimize sensitivity to small perturbations. Similarly, Levie et al. [411] demonstrated that transferability is feasible when different graphs discretize the same underlying space. Zhu et al. [195] further reinforced this perspective by showing that higher similarity in ego-graph distributions correlates with better transferability. 下部结构。定义可转移模式的另一种方法是利用循环出现的子结构,如三角形、星形和 kk 派系[293,36]。这些主题经常出现在不同的图域中,但可能具有不同的语义含义。例如,三角形通常出现在引文网络、社交网络和分子图中,尽管有特定领域的解释。基于这一观察结果,最近的研究[173,22]提出了基于子图的学习框架,其中包含结构的采样子图使用 GNN 进行编码以进行预测。从理论角度来看,这些方法利用图谱分析来量化可转移性。Levie 等[412]通过稳定性分析了可转移性,断言有效的转移应最大限度地减少对小扰动的敏感性。同样,Levie 等[411]证明,当不同的图离散化相同的底层空间时,可转移性是可行的。Zhu 等[195]进一步强化了这一观点,表明自我图分布的相似性越高,可转移性越高。
Tree Structures. While substructures (motifs) provide a promising foundation for transferable patterns, they are not always learnable within GNNs. Traditional message-passing GNNs, constrained by the 1 -WL test [413, 408], struggle to distinguish certain motifs [207, 208, 210], such as stars, conjoint cycles, and kk-cliques. To address this limitation, recent works have explored subtree-based representations to define transferable patterns. Wang et al. [23] were the first to propose subtree structures as fundamental transferable patterns, evaluating their effectiveness both empirically and theoretically. Building on this, Wang et al. [76] further established theoretical guarantees on the stability, transferability, and generalization of tree-based patterns. The key advantage of subtree-based methods is that message-passing GNNs can fully capture subtree structures [413]. However, a major limitation is that tree-based representations inherently discard certain structural dependencies, potentially leading to information loss. 树结构。虽然子结构(基序)为可转移模式提供了有希望的基础,但它们并不总是可以在 GNN 中学习。传统的消息传递 GNN 受到 1-WL 检验[413,408]的限制,很难区分某些基序[207,208,210],如恒星、联合循环和 kk -cliques。为了解决这一限制,最近的工作探索了基于子树的表示来定义可转移的模式。Wang 等[23]率先提出子树结构作为基本的可转移模式,从经验和理论上评估了其有效性。在此基础上,Wang 等[76]进一步建立了基于树的模式的稳定性、可转移性和泛化性的理论保证。基于子树的方法的主要优点是消息传递 GNN 可以完全捕获子树结构[413]。然而,一个主要限制是基于树的表示本质上会丢弃某些结构依赖关系,从而可能导致信息丢失。
9 Dataset Resources 9 数据集资源
Graph-structured data is prevalent across domains, leading to diverse benchmarks for evaluating graph learning methods, such as e-commerce, academic citation networks, knowledge bases, molecular science, temporal graphs, social network, brain map and images, which collectively cover various scales, features, and graphbased tasks. 图结构数据在各个领域都很普遍,导致评估图学习方法的基准多种多样,例如电子商务、学术引文网络、知识库、分子科学、时间图、社交网络、脑图和图像,它们共同涵盖了各种尺度、特征和基于图的任务。
9.1 Tasks and Domains Overview 9.1 任务和域概述
As shown in Table 14, the datasets span various domains, characterized by distinct structural and taskrelated properties. This diversity enables comprehensive evaluation of graph learning methodologies across multiple application scenarios, providing insights into both generalizability and domain-specific performance characteristics. 如表14所示,数据集跨越了各个领域,具有不同的结构和任务相关属性。这种多样性使得能够跨多个应用场景对图学习方法进行全面评估,从而深入了解通用性和特定领域的性能特征。
9.1.1 Tasks 9.1.1 任务
Node Classification. Node classification involves predicting labels for nodes within a graph, such as categorizing academic papers based on research areas or classifying products into specific categories. The task necessitates effective encoding of both node-specific features and neighborhood structural information to achieve optimal performance. 节点分类。节点分类涉及预测图中节点的标签,例如根据研究领域对学术论文进行分类或将产品分类为特定类别。该任务需要对特定于节点的特征和邻域结构信息进行有效编码,以实现最佳性能。
Link Prediction. Link prediction aims to forecast whether an edge exists between pairs of nodes or predict future edges in evolving networks. It is critical in recommendation systems and knowledge base completion. The fundamental challenge lies in effectively modeling node similarity based on both structural proximity and feature compatibility, often requiring sophisticated embedding techniques that capture higher-order connectivity patterns. 链接预测。链路预测旨在预测节点对之间是否存在边,或预测不断发展的网络中的未来边。它对于推荐系统和知识库的完成至关重要。根本挑战在于根据结构邻近性和特征兼容性有效地对节点相似性进行建模,通常需要复杂的嵌入技术来捕获高阶连接模式。
Graph Classification. Graph classification tasks predict labels for entire graphs, widely applied in molecular property prediction and social network analysis. This paradigm necessitates the development of effective graphlevel representations that preserve both local substructure information and global topological characteristics. Graph classification frameworks typically implement hierarchical pooling mechanisms to progressively coarsen graph representations, analogous to the pooling operations in convolutional neural networks for computer vision tasks. The efficacy of such approaches often depends on their ability to identify discriminative subgraphs within the broader graph structure. 图形分类。图分类任务对整个图进行标签预测,广泛应用于分子性质预测和社交网络分析。这种范式需要开发有效的图级表示,以保留局部子结构信息和全局拓扑特征。图分类框架通常实现分层池化机制来逐步粗略化图表示,类似于卷积神经网络中用于计算机视觉任务的池化作。此类方法的功效通常取决于它们在更广泛的图结构中识别判别子图的能力。
9.1.2 Domains 9.1.2 域名
E-commerce. Datasets from e-commerce platforms typically feature large-scale product graphs, where nodes represent products and edges represent user interactions or product similarities. These graphs frequently exhibit complex heterogeneous structures with multiple edge types representing diverse interaction modalities such as co-purchasing, co-viewing, and semantic similarity. 电子商务。来自电子商务平台的数据集通常具有大规模的产品图,其中节点代表产品,边缘代表用户交互或产品相似性。这些图经常表现出复杂的异构结构,具有多种边缘类型,代表不同的交互模式,例如共同购买、共同查看和语义相似性。
Academia. Academic datasets model citation networks, where nodes represent papers and edges represent citation relationships, enabling research area classification and citation prediction tasks. Such networks typically manifest temporal evolution characteristics, as citations accumulate over time and research trajectories evolve, presenting opportunities for dynamic graph modeling approaches. 学术界。学术数据集对引文网络进行建模,其中节点代表论文,边代表引文关系,从而实现研究领域分类和引文预测任务。随着引用量随着时间的推移而积累和研究轨迹的发展,此类网络通常表现出时间演变特征,为动态图建模方法提供了机会。
Knowledge Bases. Knowledge base datasets comprise structured entities and their interrelations, predominantly employed in link prediction tasks to infer missing relationships or validate entity links. These datasets often incorporate ontological constraints and hierarchical structures that introduce inductive biases beneficial for reasoning tasks. 知识库。知识库数据集由结构化实体及其相互关系组成,主要用于链接预测任务,以推断缺失的关系或验证实体链接。这些数据集通常包含本体约束和层次结构,引入有利于推理任务的归纳偏差。
Molecular Science. Molecular datasets consist of numerous small graphs representing molecular structures, primarily used in property prediction, drug discovery, and biochemical analysis tasks. These graphs exhibit strong regularity in node degree distributions and edge formations, reflecting the physical constraints of chemical bonding principles. 分子科学。分子数据集由许多代表分子结构的小图组成,主要用于特性预测、药物发现和生化分析任务。这些图在节点度分布和边缘形成方面表现出很强的规律性,反映了化学键原理的物理约束。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
Table 14: Overview of benchmark datasets used in graph learning research. The table categorizes datasets based on their domain, task type, and structural properties, including the number of nodes, edges, and classes. It also indicates whether datasets contain text attributes. 表 14:图学习研究中使用的基准数据集概述。该表根据数据集的域、任务类型和结构属性(包括节点、边和类的数量)对数据集进行分类。它还指示数据集是否包含文本属性。
Temporal Graphs. Temporal datasets feature dynamic graphs evolving over time, suitable for tasks such as temporal link prediction and anomaly detection in evolving networks. These datasets capture longitudinal structural transitions, enabling the modeling of evolutionary patterns and temporal dependencies in graph structures. 时态图。时间数据集具有随时间演变的动态图,适用于不断发展的网络中的时间链接预测和异常检测等任务。这些数据集捕获纵向结构转变,从而能够对图结构中的进化模式和时间依赖关系进行建模。
Social Networks. Social network datasets represent interactions among users, facilitating tasks like community detection, influence prediction, and content recommendation. These networks typically exhibit distinctive properties such as high clustering coefficients, small-world phenomena, and scale-free degree distributions that influence algorithm design considerations. 社交网络。社交网络数据集代表用户之间的交互,促进社区检测、影响预测和内容推荐等任务。这些网络通常表现出独特的特性,例如高聚类系数、小世界现象和影响算法设计考虑的无尺度度分布。
Brain Graphs. Brain graph datasets model neuronal connectivity or vascular structures, supporting tasks such as neurological disorder diagnosis and anatomical studies. These graphs present unique challenges due to their inherent multi-scale organization, from microscopic neuronal circuits to macroscopic brain regions connected through white matter tracts. 脑图。脑图数据集对神经元连接或血管结构进行建模,支持神经系统疾病诊断和解剖学研究等任务。这些图由于其固有的多尺度组织而提出了独特的挑战,从微观神经元回路到通过白质束连接的宏观大脑区域。
Images. Image-based graph datasets represent visual structures such as fingerprints, enabling tasks related to visual pattern recognition and image classification. These graphs typically encode spatial relationships between visual elements, transforming grid-structured image data into irregular graph structures that capture meaningful object-part relationships. 图像。基于图像的图数据集表示指纹等视觉结构,从而实现与视觉模式识别和图像分类相关的任务。这些图通常对视觉元素之间的空间关系进行编码,将网格结构的图像数据转换为不规则的图结构,以捕获有意义的对象-部分关系。
9.2 Benchmark Descriptions 9.2 基准测试描述
Recent graph learning research has introduced diverse benchmarks across multiple domains. Text-space Graph Foundation Models [414] provides 13 text-attributed benchmarks ranging from large-scale e-commerce networks (Products: 316 K nodes) to smaller academic graphs (Cora), focusing on node and link prediction. Knowledge Base Benchmarks [28] offers text-enhanced versions of standard knowledge graphs (WN18RR, FB15K237) for semantic link prediction tasks. Temporal Graph Benchmarks [415] includes evolving datasets like ICEWS1819 for temporal link prediction in dynamic structures. TAGLAS [416] introduces text-attributed molecular datasets (Chemblpre, PCBA) with hundreds of thousands of graphs for property prediction in drug discovery. Graph Pattern Recognition [369] presents structural-only benchmarks (ENZYMES, MUTAG) to evaluate topology comprehension without text attributes. Special-Purpose Benchmarks cover domain-specific tasks including brain connectivity (ogbl-vessel [419]), recommendation systems (MovieLens [99]), and social networks (Reddit, Flickr [418]). 最近的图学习研究引入了跨多个领域的不同基准。文本空间图基础模型[414]提供了 13 个文本归因基准,从大规模电子商务网络(产品:316 K 节点)到较小的学术图(Cora),重点关注节点和链路预测。知识库基准测试[28]为语义链接预测任务提供了标准知识图谱(WN18RR FB15K237)的文本增强版本。时间图基准[415]包括不断发展的数据集,例如用于动态结构中时间链接预测的 ICEWS1819。TAGLAS [416]引入了文本归因的分子数据集(Chemblpre、PCBA),其中包含数十万张图表,用于药物发现中的性质预测。图模式识别[369]提出了纯结构基准(ENZYMES、MUTAG)来评估没有文本属性的拓扑理解。特殊目的基准测试涵盖特定领域的任务,包括大脑连接(ogbl-vessel [419])、推荐系统(MovieLens [99])和社交网络(Reddit、Flickr [418])。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
10 Open Questions 10 个开放性问题
10.1 How to Enhance Scalability? 10.1 如何增强可扩展性?
LLMs achieve scalability through the scaling law, where larger models and more training data lead to significant performance improvements. However, such a trend has not yet to emerge in existing GFMs. To establish a true scaling law on graphs, we highlight three key aspects. LLM 通过缩放定律实现可扩展性,其中更大的模型和更多的训练数据可以显着提高性能。然而,这种趋势尚未在现有的 GFM 中出现。为了在图上建立真正的缩放定律,我们强调了三个关键方面。
Better Graph Backbones. GNNs suffer from intrinsic limitations, including over-smoothing [420], oversquashing [421], inadequate long-range dependency modeling [422], and limited expressiveness in capturing graph substructures [413]. These limitations hinder both model scalability [423] and efficient multi-GPU training [424]. To overcome these challenges, designing a more scalable graph backbone is imperative. Inspired by the success of transformer architectures in NLP and CV foundation models [56, 425], transformer-based architectures have emerged as potential candidates for graph learning [85, 87]. Basic graph transformers tokenize graphs into node sequences, but their applicability is constrained to small-scale graphs due to quadratic complexity [426]. Recent advancements propose leveraging substructures as graph tokens [86], where sequences of substructure patterns serve as representations for nodes, edges, and entire graphs. This method significantly reduces encoding complexity and improves scalability [86], making it a promising direction for future GFMs. 更好的图形主干网。GNN 存在内在局限性,包括过度平滑[420]、过度压扁[421]、长程依赖性建模不足[422]以及捕获图子结构时表达能力有限[413]。这些限制阻碍了模型的可扩展性[423]和高效的多 GPU 训练[424]。为了克服这些挑战,设计更具可扩展性的图主干网势在必行。受到 NLP 和 CV 基础模型中 Transformer 架构成功的启发 [56, 425],基于 Transformer 的架构已成为图学习的潜在候选者 [85, 87]。基本图转换器将图标记为节点序列,但由于二次复杂度,它们的适用性仅限于小尺度图[426]。最近的进展建议利用子结构作为图标记[86],其中子结构模式序列用作节点、边和整个图的表示。这种方法显著降低了编码复杂性,提高了可扩展性[86],使其成为未来 GFM 的一个有前途的方向。
Better Pretraining Objectives. A well-designed pretraining objective is essential for extracting transferable knowledge from large-scale datasets. In NLP and vision, foundation models predominantly employ generative pretraining, such as next-token prediction [56], to capture meaningful semantics. In contrast, most graph self-supervised learning approaches rely on contrastive pretraining, which has shown limited effectiveness compared to generative pretraining in other domains [120]. While some studies have explored reconstructionbased objectives for graphs [121], they primarily focus on low-level semantics, such as nodes and edges. This differs from NLP and CV, where models reconstruct word tokens and image patches, preserving high-level semantics [427]. As a result, existing generative graph methods fail to outperform contrastive learning. To achieve meaningful pretraining for GFMs, it is crucial to shift toward reconstructing high-level semantic structures. 更好的预训练目标。精心设计的预训练目标对于从大规模数据集中提取可转移知识至关重要。在 NLP 和视觉中,基础模型主要采用生成式预训练,如下一个标记预测[56],来捕捉有意义的语义。相比之下,大多数图自监督学习方法依赖于对比预训练,与其他领域的生成预训练相比,这显示出有限的有效性[120]。虽然一些研究已经探索了基于重建的图目标[121],但它们主要关注低级语义,如节点和边。这与 NLP 和 CV 不同,NLP 和 CV 的模型重建了单词标记和图像补丁,保留了高级语义[427]。因此,现有的生成图方法无法超越对比学习。为了实现有意义的 GFM 预训练,转向重建高级语义结构至关重要。
Better Learning Instances. In LLMs, sentences composed of word tokens serve as basic learning instances, while in VLMs, images consisting of visual patches act as primary learning instances [23]. Training from these instances allows foundation models to acquire transferable and scalable knowledge. However, it remains unclear which learning instances-nodes, edges, or entire graphs-should be scaled in GFMs [36]. Moreover, different graph-based tasks rely on distinct learning instances; node-level tasks focus on nodes, whereas graph-level tasks emphasize graphs. To align it across tasks, unified learning instances such as subgraphs [173] and trees [23] have been proposed. However, a critical question remains: can scaling these unified learning instances facilitate the acquisition of cross-task knowledge? 更好的学习实例。在 LLM 中,由单词标记组成的句子作为基本的学习实例,而在 VLM 中,由视觉补丁组成的图像充当主要学习实例[23]。从这些实例进行训练允许基础模型获取可转移和可扩展的知识。然而,目前尚不清楚哪些学习实例(节点、边或整个图)应该在 GFM 中进行缩放[36]。此外,不同的基于图的任务依赖于不同的学习实例;节点级任务侧重于节点,而图级任务则强调图形。为了在任务之间保持一致,提出了统一的学习实例,例如子图[173]和树[23]。然而,一个关键问题仍然存在:扩展这些统一学习实例是否有助于跨任务知识的获取?
10.2 How to Mitigating Data Scarcity? 10.2 如何缓解数据稀缺?
The effectiveness of existing foundation models, such as LLMs and LVMs, is largely attributed to their data-driven learning paradigm. Unlike textual and image data, which are readily accessible from online sources, obtaining graph datasets presents a significant challenge. To address the issue of data scarcity in graph learning, we outline three promising directions. 现有基础模型(例如 LLM 和 LVM)的有效性很大程度上归功于其数据驱动的学习范式。与可从在线资源轻松访问的文本和图像数据不同,获取图形数据集面临着重大挑战。为了解决图学习中的数据稀缺问题,我们概述了三个有前景的方向。
Automated Graph Collection Pipelines. The success of LLMs and LVMs has been facilitated by extensive datasets curated through automated web scraping techniques. Unlike text or images, which can be extracted from publicly available sources, graph construction often requires explicit human curation, as graphs encode semantic and domain-specific relationships. Recent advances in LLMs suggest that automated dataset construction [428] could serve as a viable solution for acquiring graph data. Leveraging such techniques, it is possible to systematically extract structured relationships from diverse sources, including academic repositories, biomedical databases, and online knowledge graphs. 自动图形收集管道。通过自动网络抓取技术策划的大量数据集促进了 LLM 和 LVM 的成功。与可以从公开来源中提取的文本或图像不同,图构建通常需要显式的人工管理,因为图编码语义和特定领域的关系。LLM 的最新进展表明,自动化数据集构建[428]可以作为获取图数据的可行解决方案。利用这些技术,可以系统地从不同来源提取结构化关系,包括学术存储库、生物医学数据库和在线知识图谱。
Synthetic Data Generation. Beyond directly collecting new datasets, data augmentation and synthesis techniques have been widely adopted in other domains to mitigate data scarcity. In LLMs, knowledge 合成数据生成。除了直接收集新数据集外,数据增强和合成技术已在其他领域被广泛采用,以缓解数据稀缺性。在法学硕士中,知识
distillation and data synthesis [429] have emerged as prominent strategies for enhancing model generalization. Similarly, in graph learning, recent advances in generative models, such as graph diffusion models [295], have enabled the augmentation of graph structures with synthetic data. For example, diffusion models have been employed to enhance structural diversity in graph datasets [70], while LLMs have been utilized to generate synthetic text attributes for text-attributed graphs [211]. These techniques provide a pathway to augmenting existing datasets with novel graph instances. Developing robust graph generation techniques that preserve the structural and semantic integrity of real-world graphs remains an important direction for future research. 蒸馏和数据合成[429]已成为增强模型泛化的重要策略。同样,在图学习中,生成模型的最新进展,如图扩散模型[295],使得使用合成数据增强图结构成为可能。例如,扩散模型已被用于增强图数据集的结构多样性[70],而 LLM 已被用于为文本归因图生成合成文本属性[211]。这些技术提供了一条使用新颖的图实例增强现有数据集的途径。开发强大的图生成技术来保持现实世界图的结构和语义完整性仍然是未来研究的重要方向。
Acquiring High-Quality Graph Data. While increasing dataset size is a common strategy for improving model performance, recent studies in LLM pretraining [430] and instruction tuning [431] have demonstrated that high-quality data can be more valuable than sheer data volume. In other words, models trained on a small set of high-quality data can achieve performance comparable to or even superior to those trained on large-scale low-quality datasets. Applying this principle to graph learning suggests that curating high-quality graph datasets may yield greater benefits than simply scaling dataset size. However, graph datasets are often incomplete, and evaluating their quality is inherently challenging. Although various data valuation techniques [432, 433] have been proposed to assess data contributions in machine learning, defining quality metrics for graph data remains an open problem. The quality of graph data is highly dependent on the chosen backbone architecture and pretraining strategy, necessitating further research into effective graph data valuation methodologies. 获取高质量的图形数据。虽然增加数据集大小是提高模型性能的常用策略,但最近在 LLM 预训练[430]和指令调优[431]方面的研究表明,高质量的数据可能比纯粹的数据量更有价值。换句话说,在一小部分高质量数据上训练的模型可以实现与在大规模低质量数据集上训练的模型相当甚至优于的性能。将这一原理应用于图学习表明,策划高质量的图数据集可能比简单地缩放数据集大小产生更大的好处。然而,图数据集通常不完整,评估其质量本质上具有挑战性。尽管已经提出了各种数据评估技术[432,433]来评估机器学习中的数据贡献,但定义图数据的质量指标仍然是一个悬而未决的问题。图数据的质量高度依赖于所选择的骨干架构和预训练策略,因此需要进一步研究有效的图数据估值方法。
10.3 How to Better Evaluate GFMs? 10.3 如何更好地评估 GFM?
Evaluating the effectiveness of GFMs is crucial. However, due to their broad applicability across diverse domains, traditional evaluation approaches using small-scale benchmarks are often insufficient. In this section, we discuss two aspects necessary for advancing the evaluation framework. 评估 GFM 的有效性至关重要。然而,由于它们在不同领域的广泛适用性,使用小规模基准的传统评估方法往往是不够的。在本节中,我们将讨论推进评估框架所必需的两个方面。
Developing Advanced Benchmarks. Assessing the power of foundation models requires well-designed benchmarks. However, existing graph benchmarks suffer from several limitations: (1) they often lack transformative real-world applications, (2) they are constructed in ways that do not meaningfully reflect practical use cases, and (3) the benchmarking culture in the graph community has been criticized for its inconsistencies and reproducibility issues [434]. Given these challenges, constructing high-quality, large-scale benchmarks that align with real-world scenarios is essential for evaluating GFMs effectively. Such benchmarks should encompass diverse graph structures, multiple learning tasks, and varying levels of supervision to provide a comprehensive assessment of model capabilities. 开发高级基准。评估基础模型的强大功能需要精心设计的基准测试。然而,现有的图基准测试存在几个局限性:(1)它们通常缺乏变革性的现实世界应用,(2)它们的构建方式没有有意义地反映实际用例,以及(3)图社区中的基准测试文化因其不一致和可重复性问题而受到批评[434]。鉴于这些挑战,构建符合现实场景的高质量、大规模基准对于有效评估 GFM 至关重要。此类基准应涵盖不同的图结构、多个学习任务和不同级别的监督,以提供对模型能力的全面评估。
Beyond Accuracy: Evaluating Generalization, Robustness, and Trustworthiness. Traditional evaluation metrics, such as accuracy, are insufficient to capture the full potential of GFMs. Beyond raw performance on benchmark datasets, it is critical to assess their generalization ability across different domains, their robustness against adversarial and noisy data, and their trustworthiness in high-stakes applications. Developing novel evaluation metrics and benchmarks that explicitly test these dimensions is necessary for a more holistic understanding of GFM capabilities. Future work should explore methodologies for measuring domain adaptation, reliability under distribution shifts, interpretability, and ethical considerations in graph-based decision-making. 超越准确性:评估泛化、稳健性和可信度。传统的评估指标(例如准确性)不足以充分发挥 GFM 的全部潜力。除了基准数据集的原始性能之外,评估它们跨不同领域的泛化能力、它们对对抗性和嘈杂数据的鲁棒性以及它们在高风险应用程序中的可信度也至关重要。开发明确测试这些维度的新评估指标和基准对于更全面地了解 GFM 功能是必要的。未来的工作应该探索衡量基于图的决策中的领域适应、分布变化下的可靠性、可解释性和伦理考虑的方法。
10.4 How to Better Utilize GFMs? 10.4 如何更好地利用 GFM?
Effectively leveraging GFMs is crucial for maximizing their impact across various domains. In this section, we discuss three key aspects that can enhance the adaptability, applicability, and multimodal integration. 有效利用 GFM 对于最大限度地发挥其在各个领域的影响力至关重要。在本节中,我们将讨论可以增强适应性、适用性和多模态集成的三个关键方面。
Advanced Adaptation Methods. LLMs have demonstrated remarkable capabilities in in-context learning and zero-shot generalization, allowing models to adapt to new tasks without fine-tuning. While recent advancements in graph prompt learning [435] have introduced similar adaptation techniques for GFMs, these approaches still require fine-tuning of prompt tokens to align with downstream tasks. An alternative direction follows the paradigm of VLMs, where LLMs are used to reason over structured graph data [227]. However, efficiently incorporating graph knowledge into this framework remains a significant challenge. This raises an important question: can we develop methods that enable seamless adaptation of GFMs without fine-tuning? Recently, 先进的适应方法。法学硕士在上下文学习和零样本泛化方面表现出了卓越的能力,使模型无需微调即可适应新任务。虽然图提示学习[435]的最新进展为 GFM 引入了类似的适应技术,但这些方法仍然需要对提示标记进行微调以与下游任务保持一致。另一种方向是遵循 VLM 的范式,其中 LLM 用于对结构化图数据进行推理[227]。然而,有效地将图知识纳入该框架仍然是一个重大挑战。这就提出了一个重要的问题:我们能否开发出无需微调即可无缝适应 GFM 的方法?最近
large visual models (LVMs) [109] have demonstrated the feasibility of handling diverse visual tasks in an autoregressive manner without relying on language-based interfaces. Extending this concept to graphs could enable GFMs to perform true zero-shot and in-context learning, eliminating the need for explicit task-specific adaptations. 大型视觉模型(LVM)[109]已经证明了在不依赖基于语言的接口的情况下以自回归方式处理各种视觉任务的可行性。将这一概念扩展到图形可以使 GFM 执行真正的零样本和上下文学习,从而消除对特定于任务的明确调整的需要。
Killing Applications. LLMs have demonstrated their capability to handle complex, domain-specific problems. While GFMs have shown success in applications such as social network analysis, drug property prediction, and recommender systems, these tasks can often be addressed effectively using traditional graph learning models, reducing the necessity for a GFM. Thus, it is essential to identify high-impact applications where simple graph learning approaches fall short. These applications should be sufficiently complex and demand capabilities that only GFMs can provide. Potential directions include: Chip design: Optimizing circuit layouts through learned structural representations [436]. Combinatorial optimization: Addressing NP-hard graph problems using scalable GFM-based solutions [437]. Relational databases: Enhancing query optimization and knowledge extraction from structured database systems [438]. 杀死应用程序。法学硕士已经展示了他们处理复杂的、特定领域的问题的能力。虽然 GFM 在社交网络分析、药物特性预测和推荐系统等应用中取得了成功,但这些任务通常可以使用传统的图学习模型有效解决,从而减少了对 GFM 的必要性。因此,必须确定简单图学习方法无法满足的高影响力应用程序。这些应用程序应该足够复杂,并且需要只有 GFM 才能提供的功能。潜在的方向包括:芯片设计:通过学习的结构表示来优化电路布局[436]。组合优化:使用基于 GFM 的可扩展解决方案解决 NP 硬图问题[437]。关系数据库:增强从结构化数据库系统中获取查询优化和知识提取[438]。
Integrating Multimodal Knowledge. Graph data inherently encapsulates structured knowledge spanning multiple modalities. For instance, molecular data can be represented as graphs, sequences, textual descriptions, or even photographic images; social networks involve diverse data types, including user names, job positions, profile images, structured activity logs, and historical interactions. Designing models that effectively integrate and process multimodal graph information remains a challenge. One crucial question is whether a unified model should be designed to handle different modalities collectively, or if modality-specific models should be employed to leverage their unique strengths. 整合多模态知识。图数据本质上封装了跨多种模式的结构化知识。例如,分子数据可以表示为图形、序列、文本描述,甚至是照片图像;社交网络涉及多种数据类型,包括用户名、职位、个人资料图片、结构化活动日志和历史交互。设计有效集成和处理多模态图信息的模型仍然是一个挑战。一个关键问题是,是否应该设计一个统一的模型来共同处理不同的模式,或者是否应该采用特定模式的模型来利用它们的独特优势。
Human-in-the-Loop. While GFMs offer strong generalization, many real-world scenarios-such as drug discovery, recommendation, and scientific workflows-require iterative human feedback or domain-specific supervision. Integrating human-in-the-loop mechanisms enables model correction, prompt refinement, and interactive adaptation, improving alignment with expert knowledge and practical needs. As GFMs scale toward broader deployment, human involvement will be essential for enhancing interpretability, control, and real-world reliability. 人机交互。虽然 GFM 提供了很强的泛化性,但许多现实世界的场景(例如药物发现、推荐和科学工作流程)需要迭代的人工反馈或特定领域的监督。集成人机机制可以实现模型修正、提示细化和交互式适配,提高与专业知识和实际需求的一致性。随着 GFM 向更广泛的部署扩展,人类的参与对于增强可解释性、控制力和现实世界的可靠性至关重要。
A deeper theoretical understanding of GFMs is essential for improving their effectiveness, reliability, and generalizability. In this section, we discuss three areas of theoretical investigation that remain open challenges. 对 GFM 的更深入的理论理解对于提高其有效性、可靠性和普遍性至关重要。在本节中,我们将讨论三个仍然悬而未决的理论研究领域。
Transferability. Graphs encode complex, often manually defined relationships, making their transferable knowledge less intuitive compared to text or images. While some empirical studies suggest the possibility of transferability even across seemingly unrelated domains, there remains a lack of both theoretical and intuitive explanations for such phenomena. For instance, it is difficult to conceptualize shared transferable patterns between social networks and molecular graphs due to the stark differences in their structural distributions. Recent works [23,76] have attempted to characterize transferability between graphs from different domains by analyzing subtree distributions. While these studies provide valuable insights, treating graphs solely as compositions of trees inevitably results in the loss of structural information [413]. A comprehensive theoretical framework is needed to define and quantify the transferable knowledge within and across graph domains, paving the way for more effective transfer learning strategies in GFMs. 可转移性。图形对复杂的、通常是手动定义的关系进行编码,与文本或图像相比,它们的可转移知识不太直观。虽然一些实证研究表明,即使在看似不相关的领域之间也存在可转移性的可能性,但对此类现象仍然缺乏理论和直觉的解释。例如,由于社交网络和分子图谱的结构分布存在明显差异,因此很难概念化社交网络和分子图谱之间的共享可转移模式。最近的工作[23,76]试图通过分析子树分布来表征来自不同域的图之间的可转移性。虽然这些研究提供了有价值的见解,但仅仅将图形视为树木的组成不可避免地会导致结构信息的丢失[413]。需要一个全面的理论框架来定义和量化图域内和跨图域的可转移知识,为 GFM 中更有效的迁移学习策略铺平道路。
Pattern Conflict Issue. It is even more challenging in identifying cross-task transferable patterns in GFMs due to the pattern conflicts. This issue arises when the same structural pattern carries different semantic meanings across diverse domains, leading to potential inconsistencies in learned representations. Consider a pretraining scenario where the model is trained on datasets spanning multiple domains, such as social networks and molecular networks. Suppose the model learns to recognize and leverage triangle structures during pretraining. In social networks, triangles often represent stability, following the principle of “the friend of my friend is my friend.” However, in molecular graphs, triangular patterns may indicate instability due to specific chemical constraints. This fundamental discrepancy in interpretation can severely degrade model performance [410, 36]. Existing methods [23, 22] either fail to address or only partially mitigate [76] the pattern conflict issue. Developing strategies to effectively resolve this challenge is crucial for constructing truly generalizable GFMs. 模式冲突问题。由于模式冲突,在 GFM 中识别跨任务可转移模式更具挑战性。当相同的结构模式在不同领域具有不同的语义含义时,就会出现这个问题,从而导致学习表示中潜在的不一致。考虑一个预训练场景,其中模型在跨多个域(例如社交网络和分子网络)的数据集上进行训练。假设模型在预训练期间学会识别和利用三角形结构。在社交网络中,三角形通常代表稳定,遵循“朋友的朋友就是我的朋友”的原则。然而,在分子图中,三角形图案可能表明由于特定的化学约束而导致的不稳定性。这种解释上的根本差异会严重降低模型性能[410,36]。现有方法[23,22]要么无法解决,要么只能部分缓解[76]模式冲突问题。制定有效解决这一挑战的策略对于构建真正可推广的 GFM 至关重要。
Robustness and Trustworthiness. Real-world graphs exhibit various undesirable properties, including longtail distributions, incompleteness, class imbalances, limited labeled data, and structural alterations. To develop robust and trustworthy GFMs, it is essential to understand how these models respond to such challenges. A promising direction involves analyzing the stability of GFMs under structural distribution shifts [76]. This includes studying their resilience to adversarial perturbations, handling missing or noisy data, and ensuring fairness in decision-making. Establishing theoretical guarantees for robustness will be crucial for deploying GFMs in high-stakes applications such as healthcare, finance, and cybersecurity. 稳健性和可信度。现实世界的图表现出各种不良属性,包括长尾分布、不完备性、类不平衡、有限的标记数据和结构改变。为了开发强大且值得信赖的 GFM,必须了解这些模型如何应对此类挑战。一个有前途的方向是分析 GFMs 在结构分布变化下的稳定性[76]。这包括研究它们对对抗性扰动的适应能力、处理丢失或嘈杂的数据以及确保决策的公平性。建立稳健性的理论保证对于在医疗保健、金融和网络安全等高风险应用中部署 GFM 至关重要。
Generalization. Balancing model fitting and generalization is fundamental in machine learning. Overemphasizing fitting capacity can lead to overfitting on specific datasets, impairing performance on unseen graphs, whereas excessive focus on generalization may compromise predictive accuracy on in-distribution tasks. Understanding the generalization of GFMs is crucial to optimizing this trade-off. A pioneering work [76] established generalization bounds for GFMs using subtree-based learning tokens. However, a broader generalization analysis that extends beyond tree structures is necessary to derive more applicable insights. 普遍化。平衡模型拟合和泛化是机器学习的基础。过度强调拟合能力可能会导致特定数据集的过度拟合,从而损害看不见的图的性能,而过度关注泛化可能会损害分布内任务的预测准确性。了解 GFM 的泛化对于优化这种权衡至关重要。一项开创性的工作[76]使用基于子树的学习标记建立了 GFM 的泛化边界。然而,为了得出更适用的见解,需要超越树结构的更广泛的泛化分析。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
11 Conclusion 11 结论
Graph Foundation Models represent a transformative paradigm in graph machine learning, aspiring to replicate the success of foundation models in natural language processing and computer vision within the structured domain of graphs. In this survey, we present a comprehensive and systematic review of the emerging landscape of GFMs. We begin by contextualizing their development, outlining the fundamental challenges that arise from graph heterogeneity, non-Euclidean structure, and cross-domain transferability. To unify the diverse body of work, we propose a general framework that decomposes GFMs into modular components-encompassing backbone architectures, pretraining strategies, and adaptation mechanisms. 图基础模型代表了图机器学习的变革范式,渴望在图的结构化领域复制基础模型在自然语言处理和计算机视觉方面的成功。在这项调查中,我们对 GFMs 的新兴格局进行了全面和系统的回顾。我们首先将它们的发展置于背景中,概述了图异质性、非欧几里得结构和跨域可转移性带来的基本挑战。为了统一多样化的工作,我们提出了一个通用框架,将 GFM 分解为模块化组件,包括主干架构、预训练策略和适应机制。
We categorize GFMs into three major classes: universal GFMs, which aim for broad generalization across tasks and domains; task-specific GFMs, which prioritize performance on focused objectives like link prediction or node classification; and domain-specific GFMs, which target specialized applications such as molecules, knowledge graphs, and computational graphs. For each category, we analyze core design principles, review representative methods, and conduct comparative analyses to highlight their relative strengths and limitations. Beyond empirical trends, we further investigate the theoretical underpinnings of GFMs, offering insights into expressiveness, transferability, and generalization guarantees. Despite notable progress, the field faces several open challenges. These include the scalability of GFMs to massive graphs; the integration of multimodal signals; the development of principled evaluation protocols; and the formulation of theoretical foundations that explain the transferability and generalization. 我们将 GFM 分为三大类:通用 GFM,旨在跨任务和领域的广泛推广;特定于任务的 GFM,它优先考虑链路预测或节点分类等重点目标的性能;以及特定领域的 GFM,针对分子、知识图谱和计算图等专业应用。对于每个类别,我们分析核心设计原则,审查具有代表性的方法,并进行比较分析,以突出它们的相对优势和局限性。除了经验趋势之外,我们还进一步研究了 GFM 的理论基础,提供了对表达性、可转移性和泛化保证的见解。尽管取得了显着进展,但该领域仍面临一些公开的挑战。其中包括 GFM 到海量图的可扩展性;多模态信号的集成;制定有原则的评估协议;以及解释可转移性和概括性的理论基础的制定。
Looking ahead, we envision GFMs as foundational infrastructure for general-purpose graph intelligence. Future research should focus on building more scalable, interpretable, and adaptable architectures, expanding graph pretraining corpora across real-world domains, and advancing theoretical frameworks that explain their behavior. By bridging structural inductive biases, GFMs hold immense promise for enabling new potentials in scientific discovery, industrial systems, and decision-making over structured data. 展望未来,我们将 GFM 设想为通用图智能的基础基础设施。未来的研究应侧重于构建更具可扩展性、可解释性和适应性的架构,在现实世界的领域扩展图预训练语料库,并推进解释其行为的理论框架。通过弥合结构归纳偏差,GFM 在科学发现、工业系统和结构化数据决策方面发挥新潜力方面具有巨大的前景。
References 引用
[1] Batta Mahesh et al. Machine learning algorithms-a review. IJSR, 2020. [1] 巴塔·马赫什等人。机器学习算法 - 综述。IJSR,2020 年。
[2] Guozhu Dong and Huan Liu. Feature engineering for machine learning and data analytics. CRC press, 2018. [2] 董国柱、刘桓。用于机器学习和数据分析的特征工程。CRC 出版社,2018 年。
[3] Hetal Bhavsar and Amit Ganatra. A comparative study of training algorithms for supervised machine learning. IJSCE, 2012. [3] 赫塔尔·巴夫萨尔和阿米特·加纳特拉。监督机器学习训练算法的比较研究。IJSCE,2012 年。
[4] Ethem Alpaydin. Machine learning. MIT press, 2021. [4] 埃瑟姆·阿尔佩丁。机器学习。麻省理工学院出版社,2021 年。
[5] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 2015. [5] Yann LeCun、Yoshua Bengio 和 Geoffrey Hinton。深度学习。自然,2015 年。
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. [6] 何凯明、张翔宇、任少任、孙健。用于图像识别的深度残差学习。在 CVPR,2016 年。
[7] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997. [7] 塞普·霍赫赖特和尤尔根·施密德胡伯。长短期记忆。神经计算,1997 年。
[8] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv, 2014. [8] Junyoung Chung、Caglar Gulcehre、KyungHyun Cho 和 Yoshua Bengio。门控循环神经网络在序列建模上的实证评价。arXiv,2014 年。
[9] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. A comprehensive survey on transfer learning. Proceedings of the IEEE, 2020. [9] 庄福真、齐志远、段克宇、东习东伯、朱永春、朱衡树、熊慧、何庆。迁移学习综合调查。IEEE 会议记录,2020 年。
[10] Xiao Liu, Fanjin Zhang, Zhenyu Hou, Li Mian, Zhaoyu Wang, Jing Zhang, and Jie Tang. Self-supervised learning: Generative or contrastive. TKDE, 2021. [10] 刘晓、张凡金、侯振宇、李勉、王兆宇、张静、唐杰。自监督学习:生成式或对比式学习。TKDE,2021 年。
[11] Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models. arXiv, 2021. [11] 里希·博马萨尼、德鲁·哈德森、埃桑·阿德利、拉斯·奥特曼、西姆兰·阿罗拉、西德尼·冯·阿尔克斯、迈克尔·伯恩斯坦、珍妮特·博格、安托万·博塞鲁特、艾玛·布伦斯基尔等。关于基础模型的机会和风险。arXiv,2021 年。
[12] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv, 2020. [12] 贾里德·卡普兰、萨姆·麦坎德利什、汤姆·海尼汉、汤姆·布朗、本杰明·切斯、雷旺·柴尔德、斯科特·格雷、亚历克·雷德福、杰弗里·吴和达里奥·阿莫迪。神经语言模型的缩放定律。arXiv,2020 年。
[13] Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters. arXiv, 2024. [13] 查理·斯内尔、李在勋、徐凯尔文和阿维拉尔·库马尔。以最佳方式缩放 llm 测试时间计算可能比缩放模型参数更有效。arXiv,2024 年。
[14] Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv, 2023. [14] 乔什·阿奇亚姆、史蒂文·阿德勒、桑迪尼·阿加瓦尔、喇嘛·艾哈迈德、伊尔格·阿卡亚、弗洛伦西亚·莱奥尼·阿莱曼、迪奥戈·阿尔梅达、扬科·阿尔滕施密特、萨姆·奥特曼、沙马尔·阿纳德卡特等。Gpt-4 技术报告。arXiv,2023 年。
[15] Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv, 2023. [15] 策周,倩丽,陈丽,余军,刘奕欣,王光静,张凯,程骥,闫启本,何丽芳,等。预训练基础模型综合综述:从 bert 到 chatgpt 的历史。arXiv,2023 年。
[16] Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, et al. Florence: A new foundation model for computer vision. arXiv, 2021. [16] 陆源、陈东东、陈怡玲、Noel Codella、戴曦阳、高建峰、胡厚东、黄学东、李博信、李春元等。佛罗伦萨:计算机视觉的新基础模型。arXiv,2021 年。
[17] Lijun Yu, José Lezama, Nitesh B Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G Hauptmann, et al. Language model beats diffusion-tokenizer is key to visual generation. In ICLR,2024I C L R, 2024. [17] 余丽君、何塞·莱扎马、尼特什·冈达瓦拉普、卢卡·维萨里、孙基赫、大卫·明南、郑勇、阿格里姆·古普塔、顾秀叶、亚历山大·豪普特曼等。语言模型击败扩散分词器是视觉生成的关键。在 ICLR,2024I C L R, 2024 .
[18] Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, and Alexei A Efros. Sequential modeling enables scalable learning for large vision models. arXiv, 2023. [18] 白宇彤、耿新阳、卡蒂凯亚·曼加拉姆、阿米尔·巴尔、艾伦·尤耶、特雷弗·达雷尔、吉滕德拉·马利克和阿列克谢·埃弗罗斯。顺序建模支持大型视觉模型的可扩展学习。arXiv,2023 年。
[19] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017. [19] 托马斯·基普夫和马克斯·威林。使用图卷积网络进行半监督分类。在 ICLR,2017 年。
[20] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In ICLR, 2018. [20] 佩塔尔·维利奇科维奇、吉列姆·库库鲁尔、阿兰特萨·卡萨诺瓦、阿德里亚娜·罗梅罗、彼得罗·廖和约书亚·本吉奥。绘制注意力网络图。在 ICLR,2018 年。
[21] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NeurIPS, 2017. [21] 威尔·汉密尔顿、应志涛和尤尔·莱斯科维茨。大图上的归纳表示学习。在 NeurIPS,2017 年。
[22] Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, and Muhan Zhang. One for all: Towards training one graph model for all classification tasks. In ICLR, 2024. [22] 刘浩、冯佳瑞、孔乐成、梁宁跃、陶大成、陈奕昕、张慕涵.One for all:为所有分类任务训练一个图形模型。在 ICLR,2024 年。
[23] Zehong Wang, Zheyuan Zhang, Nitesh V Chawla, Chuxu Zhang, and Yanfang Ye. GFT: Graph foundation model with transferable tree vocabulary. In NeurIPS, 2024. [23] 王泽宏、张哲元、Nitesh V Chawla、张楚旭和叶艳芳。GFT:具有可转移树词汇的图基础模型。在 NeurIPS 中,2024 年。
[24] Yuhan Li, Peisong Wang, Zhixun Li, Jeffrey Xu Yu, and Jia Li. Zerog: Investigating cross-dataset zero-shot transferability in graphs. In KDD, 2024. [24] Yuhan Li、Peisong Wang、Zhixun Li、Jeffrey Xu Yu 和 Jia Li。Zerog:研究图中的跨数据集零样本可转移性。在 KDD,2024 年。
[25] Haihong Zhao, Aochuan Chen, Xiangguo Sun, Hong Cheng, and Jia Li. All in one and one for all: A simple yet effective method towards cross-domain graph pretraining. In KDD, 2024. [25] 赵海红、陈奥川、孙相国、程洪、李家。多合一和一合一:一种简单而有效的跨域图预训练方法。在 KDD,2024 年。
[26] Hezhe Qiao, Chaoxi Niu, Ling Chen, and Guansong Pang. AnomalyGFM: Graph Foundation Model for Zero/Few-shot Anomaly Detection. arXiv, 2025. [26] 乔赫哲、牛朝曦、陈玲、庞冠松。AnomalyGFM:用于零/少样本异常检测的图基础模型。arXiv,2025 年。
[27] Yuqi Gong, Xichen Ding, Yehui Su, Kaiming Shen, Zhongyi Liu, and Guannan Zhang. An unified search and recommendation foundation model for cold-start scenario. In CIKM, 2023. [27] 龚玉琦、丁锡辰、苏野辉、沈凯明、刘忠义、张冠南。冷启动场景的统一搜索和推荐基础模型。在 CIKM,2023 年。
[28] Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foundation models for knowledge graph reasoning. In ICLR, 2024. [28] 米哈伊尔·加尔金、袁新宇、赫沙姆·莫斯塔法、唐健和朱兆成。走向知识图谱推理的基础模型。在 ICLR,2024 年。
[29] Mikhail Galkin, Jincheng Zhou, Bruno Ribeiro, Jian Tang, and Zhaocheng Zhu. A foundation model for zero-shot logical query reasoning. arXiv, 2024. [29] 米哈伊尔·加尔金、周金城、布鲁诺·里贝罗、唐健和朱兆成。零样本逻辑查询推理的基础模型。arXiv,2024 年。
[30] Nima Shoghi, Adeesh Kolluru, John R Kitchin, Zachary W Ulissi, C Lawrence Zitnick, and Brandon M Wood. From molecules to materials: Pre-training large generalizable models for atomic property prediction. arXiv, 2023. [30] 尼玛·肖吉、阿迪什·科鲁鲁、约翰·基钦、扎卡里·乌利西、C 劳伦斯·齐特尼克和布兰登·伍德。从分子到材料:预训练用于原子性质预测的大型可泛化模型。arXiv,2023 年。
[31] Juzheng Zhang, Yatao Bian, Yongqiang Chen, and Quanming Yao. Unimot: Unified molecule-text language model with discrete token representation. arXiv, 2024. [31] 张菊正、卞亚涛、陈永强、姚全明。Unimot:具有离散标记表示的统一分子文本语言模型。arXiv,2024 年。
[32] Nuo Chen, Yuhan Li, Jianheng Tang, and Jia Li. Graphwiz: An instruction-following language model for graph computational problems. In KDD, 2024. [32] Nuo Chen、Yuhan Li、Jianheng Tang 和 Jia Li。Graphwiz:图计算问题的指令跟踪语言模型。在 KDD,2024 年。
[33] Jiayan Guo, Lun Du, and Hengyu Liu. Gpt4graph: Can large language models understand graph structured data? an empirical evaluation and benchmarking. arXiv, 2023. [33] 郭嘉燕、伦杜、刘衡宇。Gpt4graph:大型语言模型能理解图结构化数据吗?实证评估和基准测试。arXiv,2023 年。
[34] Jiawei Liu, Cheng Yang, Zhiyuan Lu, Junze Chen, Yibo Li, Mengmei Zhang, Ting Bai, Yuan Fang, Lichao Sun, Philip S Yu, et al. Graph foundation models: Concepts, opportunities and challenges. TPAMI, 2025. [34] 刘佳伟, 杨成, 卢志远, 陈俊泽, 李一波, 张梦梅, 白婷, 芳元, 孙立超, 余菲利普, et al.图基础模型:概念、机遇和挑战。TPAMI,2025 年。
[35] Ziwen Zhao, Yixin Su, Yuhua Li, Yixiong Zou, Ruixuan Li, and Rui Zhang. A survey on self-supervised graph foundation models: Knowledge-based perspective. arXiv, 2024. [35] 赵子文、苏奕欣、李玉华、邹奕雄、李瑞璇、张瑞。自监督图基础模型综述:基于知识的视角.arXiv,2024 年。
[36] Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Michael Galkin, and Jiliang Tang. Graph foundation models are already here. In ICML, 2024. [36] 毛海涛、陈志凯、汤文卓、赵嘉楠、马尧、赵彤、尼尔·沙阿、迈克尔·加尔金和唐继良。图基础模型已经在这里。ICML,2024 年。
[37] Yuxiang Wang, Wenqi Fan, Suhang Wang, and Yao Ma. Towards graph foundation models: A transferability perspective. arXiv, 2025. [37] 王玉祥、范文琦、王苏杭、姚马。走向图基础模型:可转移性视角。arXiv,2025 年。
[38] Haihong Zhao, Chenyi Zi, Aochuan Chen, and Jia Li. A survey of cross-domain graph learning: Progress and future directions. arXiv, 2025. [38] 赵海红、紫晨怡、陈奥川、李佳。跨领域图学习的综述:进展和未来方向。arXiv,2025 年。
[39] Bin Wu, Yihang Wang, Yuanhao Zeng, Jiawei Liu, Jiashu Zhao, Cheng Yang, Yawen Li, Long Xia, Dawei Yin, and Chuan Shi. Graph foundation models for recommendation: A comprehensive survey. arXiv, 2025. [39] 吴斌、王一航、曾元昊、刘嘉伟、赵嘉树、程阳、李亚文、夏龙、尹大伟、史川。用于推荐的图形基础模型:综合调查。arXiv,2025 年。
[40] Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, and Jiawei Han. Large language models on graphs: A comprehensive survey. TKDE, 2024. [40] 金博文、刘刚、池韩、孟江、恒吉和韩嘉伟。图上的大型语言模型:综合调查。TKDE,2024 年。
[41] Yuhan Li, Zhixun Li, Peisong Wang, Jia Li, Xiangguo Sun, Hong Cheng, and Jeffrey Xu Yu. A survey of graph meets large language model: Progress and future directions. arXiv, 2023. [41] 李玉涵、李志勋、王培松、李佳、孙相国、程洪、徐宇杰弗里。图遇上大语言模型的综述:进展与未来方向 arXiv,2023 年。
[42] Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, et al. Graph machine learning in the era of large language models (llms). arXiv, 2024. [42] 范文琦, 王世杰, 黄佳妮, 陈志凯, 宋宇, 唐文卓, 毛海涛, 刘慧, 刘晓瑞, 尹大伟, et al.大型语言模型 (llms) 时代的图机器学习。arXiv,2024 年。
[43] Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh Chawla, and Chao Huang. A survey of large language models for graphs. In KDD, 2024. [43] 旭彬任、唐佳彬、尹大伟、尼特什·乔拉和黄朝。图大型语言模型调查。在 KDD,2024 年。
[44] John Adrian Bondy and Uppaluri Siva Ramachandra Murty. Graph theory. Springer Publishing Company, Incorporated, 2008. [44] 约翰·阿德里安·邦迪和乌帕鲁里·西瓦·拉马钱德拉·穆尔蒂。图论。施普林格出版公司,2008 年。
[45] Stuart E Dreyfus. An appraisal of some shortest-path algorithms. Operations research, 1969. [45] 斯图尔特·德雷福斯。对一些最短路径算法的评估。运筹学,1969 年。
[46] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 2007. [46] 乌尔丽克·冯·卢克斯堡。有关光谱聚类的教程。统计与计算,2007 年。
[47] S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. Graph kernels. JMLR, 2010. [47] S Vichy N Vishwanathan、Nicol N Schraudolph、Risi Kondor 和 Karsten M Borgwardt。图形内核。JMLR,2010 年。
[48] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD, 2014. [48] 布莱恩·佩罗齐、拉米·阿尔-尔福和史蒂文·斯基纳。Deepwalk:社会表征的在线学习。在 KDD,2014 年。
[49] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In KDD, 2016. [49] 阿迪亚·格罗弗和尤尔·莱斯科维茨。node2vec:网络的可扩展特征学习。在 KDD,2016 年。
[50] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW,2015W W W, 2015. [50] 唐健、孟曲、王明哲、张明、燕君、梅巧珠。线路:大规模信息网络嵌入。在 WWW,2015W W W, 2015 .
[51] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. TNNLS, 2020. [51] 吴宗翰、潘世瑞、陈凤文、龙国栋、张成琦、S Yu Philip。图神经网络综合综述。TNNLS,2020 年。
[52] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In ICML, 2017. [52] 贾斯汀·吉尔默、塞缪尔·舍恩霍尔茨、帕特里克·莱利、奥里奥尔·维尼亚尔斯和乔治·达尔。量子化学的神经消息传递。ICML,2017 年。
[53] Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, and Muhan Zhang. GOFA: A generative one-for-all model for joint graph language modeling. In ICLR, 2025. [53] 孔乐成、冯佳瑞、刘浩、黄成松、黄嘉欣、陈奕昕、张慕涵。GOFA:用于联合图语言建模的生成式一对全模型。ICLR,2025 年。
[54] Yixin Liu, Ming Jin, Shirui Pan, Chuan Zhou, Yu Zheng, Feng Xia, and S Yu Philip. Graph selfsupervised learning: A survey. TKDE, 2022. [54] 刘奕昕、金明、潘世瑞、周川、于正、凤霞、S 于菲利普。图自监督学习:一项调查。TKDE,2022 年。
[55] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017. [55] Ashish Vaswani、Noam Shazeer、Niki Parmar、Jakob Uszkoreit、Llion Jones、Aidan N Gomez、Ł ukasz Kaiser 和 Illia Polosukhin。你只需要关注。在 NeurIPS,2017 年。
[56] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. In NeurIPS, 2020. [56] 汤姆·布朗、本杰明·曼、尼克·莱德、梅兰妮·苏比亚、贾里德·卡普兰、普拉富拉·达里瓦尔、阿尔文德·尼拉坎坦、普拉纳夫·希亚姆、吉里什·萨斯特里、阿曼达·阿斯克尔等。语言模型是少数样本的学习者。在 NeurIPS,2020 年。
[57] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019. [57] 雅各布·德夫林、张明伟、李肯顿和克里斯蒂娜·图塔诺娃。Bert:用于语言理解的深度双向转换器的预训练。在 NAACL,2019 年。
[58] Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023. [58] 罗汉·阿尼尔、安德鲁·戴、奥尔汗·菲拉特、梅尔文·约翰逊、德米特里·莱皮欣、亚历山大·帕索斯、西亚马克·沙克里、伊曼纽尔·塔罗帕、佩奇·贝利、陈志峰等。Palm 2 技术报告。arXiv 预印本 arXiv:2305.10403,2023 年。
[59] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv, 2023. [59] 雨果·图夫龙、蒂博·拉夫里尔、戈蒂埃·伊扎卡尔、泽维尔·马丁内、玛丽-安妮·拉肖、蒂莫西·拉克鲁瓦、巴蒂斯特·罗齐埃、纳曼·戈亚尔、埃里克·汉布罗、费萨尔·阿扎尔等。骆驼:开放高效的基础语言模型。arXiv,2023 年。
[60] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In ICML, 2021. [60] 亚历克·雷德福、金钟旭、克里斯·哈拉西、阿迪亚·拉梅什、加布里埃尔·吴、桑迪尼·阿加瓦尔、吉里什·萨斯特里、阿曼达·阿斯克尔、帕梅拉·米什金、杰克·克拉克等。从自然语言监督中学习可转移的视觉模型。在 ICML,2021 年。
[61] Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In ICML, 2021. [61] 阿迪亚·拉梅什、米哈伊尔·巴甫洛夫、加布里埃尔·吴、斯科特·格雷、切尔西·沃斯、亚历克·雷德福、马克·陈和伊利亚·苏茨克弗。零样本文本到图像生成。在 ICML,2021 年。
[62] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models. arXiv, 2022. [62] 杰森·魏、易泰、里希·博马萨尼、科林·拉菲尔、巴雷特·佐夫、塞巴斯蒂安·博尔乔、丹尼·尤加塔玛、马丁·博斯马、丹尼·周、唐纳德·梅茨勒等。大语言模型的涌现能力。arXiv,2022 年。
[63] Lianghao Xia, Ben Kao, and Chao Huang. Opengraph: Towards open graph foundation models. In EMNLP, 2024. [63] 夏良浩、本考、黄朝。Opengraph:走向开放图基础模型。在 EMNLP,2024 年。
[64] Lianghao Xia and Chao Huang. Anygraph: Graph foundation model in the wild. arXiv, 2024. [64] 夏良浩、黄朝。Anygraph:野外的图基础模型。arXiv,2024 年。
[65] Xingtong Yu, Zechuan Gong, Chang Zhou, Yuan Fang, and Hui Zhang. Samgpt: Text-free graph foundation model for multi-domain pre-training and cross-domain adaptation. In WWW,2025W W W, 2025. [65] 俞兴同、龚泽川、常周、袁方、张惠。Samgpt:用于多域预训练和跨域适配的无文本图基础模型。在 WWW,2025W W W, 2025 .
[66] Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin, Hesham Mostafa, Michael M Bronstein, and Jian Tang. Fully-inductive node classification on arbitrary graphs. In ICLR, 2025. [66] 赵嘉楠、朱兆成、米哈伊尔·加尔金、赫沙姆·莫斯塔法、迈克尔·布朗斯坦和唐健。任意图上的全归纳节点分类。ICLR,2025 年。
[67] Nicolas Keriven. Not too little, not too much: a theoretical analysis of graph (over) smoothing. NeurIPS, 2022. [67] 尼古拉斯·凯里文。不太少,也不太多:图(过)平滑的理论分析。神经 IPS,2022 年。
[68] Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature. arXiv, 2021. [68] 杰克·托平、弗朗切斯科·迪乔瓦尼、本杰明·保罗·张伯伦、董晓雯和迈克尔·布朗斯坦。通过曲率了解图上的过度挤压和瓶颈。arXiv,2021 年。
[69] Yao Ma, Xiaorui Liu, Neil Shah, and Jiliang Tang. Is homophily a necessity for graph neural networks? In ICLR, 2022. [69] 姚马、刘晓蕊、尼尔·沙阿和唐继良。同质性是图神经网络的必要条件吗?在 ICLR,2022 年。
[70] Wenzhuo Tang, Haitao Mao, Danial Dervovic, Ivan Brugere, Saumitra Mishra, Yuying Xie, and Jiliang Tang. Cross-domain graph data scaling: A showcase with diffusion models. arXiv, 2024. [70] 汤文卓、毛海涛、达尼尔·德沃维奇、伊万·布鲁热雷、索米特拉·米什拉、谢玉英和唐继良。跨域图数据缩放:扩散模型展示。arXiv,2024 年。
[71] Yuchen Yan, Peiyan Zhang, Zheng Fang, and Qingqing Long. Inductive graph alignment prompt: Bridging the gap between graph pre-training and inductive fine-tuning from spectral perspective. In WWW, 2024. [71] 严玉臣、张培妍、郑芳、龙青青。归纳图对齐提示:从谱的角度弥合图预训练和归纳微调之间的差距。在 WWW,2024 年。
[72] Xingbo Fu, Yinhan He, and Jundong Li. Edge prompt tuning for graph neural networks. In The Thirteenth International Conference on Learning Representations, 2025. [72] Xingbo Fu、Yinhan He 和 Jundong Li. 图神经网络的边缘提示调整.在 2025 年第十三届学习表征国际会议上。
[73] Jun Xia, Chengshuai Zhao, Bozhen Hu, Zhangyang Gao, Cheng Tan, Yue Liu, Siyuan Li, and Stan Z Li. Mole-bert: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations, 2023. [73] 夏军、赵成帅、胡博珍、高张阳、谭程、刘跃、李思远和李志强。Mole-bert:重新思考分子的预训练图神经网络。在 2023 年第十一届学习表征国际会议上。
[74] Mingchen Sun, Kaixiong Zhou, Xin He, Ying Wang, and Xin Wang. Gppt: Graph pre-training and prompt tuning to generalize graph neural networks. In KDD, 2022. [74] 孙明臣、周开雄、何昕、王英、王昕。Gppt:图预训练和提示调优,泛化图神经网络。在 KDD,2022 年。
[75] Yufei He and Bryan Hooi. Unigraph: Learning a cross-domain graph foundation model from natural language. arXiv, 2024. [75] 何宇飞和布莱恩·胡伊。Unigraph:从自然语言中学习跨域图基础模型。arXiv,2024 年。
[76] Zehong Wang, Zheyuan Zhang, Tianyi Ma, Nitesh V Chawla, Chuxu Zhang, and Yanfang Ye. Learning cross-task generalities across graphs via task-trees. arXiv, 2024. [76] 王泽宏、张哲元、马天一、Nitesh V Chawla、张楚旭和叶艳芳。通过任务树跨图学习跨任务的通用性。arXiv,2024 年。
[77] Beatrice Bevilacqua, Joshua Robinson, Jure Leskovec, and Bruno Ribeiro. Holographic node representations: Pre-training task-agnostic node embeddings. In ICLR, 2025. [77] 比阿特丽斯·贝维拉夸、约书亚·罗宾逊、尤尔·莱斯科维茨和布鲁诺·里贝罗。全息节点表示:预训练与任务无关的节点嵌入。ICLR,2025 年。
[78] Runjin Chen, Tong Zhao, AJAY KUMAR JAISWAL, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant. In ICML, 2024. [78] 陈润金、赵彤、阿贾伊·库马尔·贾斯瓦尔、尼尔·沙阿和王张扬。Llaga:大型语言和图助手。ICML,2024 年。
[79] Zheyuan Zhang, Zehong Wang, Tianyi Ma, Varun Sameer Taneja, Sofia Nelson, Nhi Ha Lan Le, Keerthiram Murugesan, Mingxuan Ju, Nitesh V Chawla, Chuxu Zhang, et al. Mopi-hfrs: A multiobjective personalized health-aware food recommendation system with llm-enhanced interpretation. arXiv, 2024. [79] Zheyuan Zhang、Zehong Wang、Tianyi 马、Varun Sameer Taneja、Sofia Nelson、Nhi Ha Lan Le、Keerthiram Murugesan、Mingxuan Ju、Nitesh V Chawla、Chuxu Zhang 等。Mopi-hfrs:具有 llm 增强解释的多目标个性化健康意识食品推荐系统。arXiv,2024 年。
[80] Jiele Wu, Chunhui Zhang, Zheyuan Liu, Erchi Zhang, Steven Wilson, and Chuxu Zhang. Graphbert: Bridging graph and text for malicious behavior detection on social media. In 2022 IEEE International Conference on Data Mining (ICDM), pages 548-557. IEEE, 2022. [80] 吴洁乐、张春晖、刘哲元、张二赤、史蒂文·威尔逊和张楚旭。Graphbert:桥接图形和文本,用于检测社交媒体上的恶意行为。2022 年 IEEE 国际数据挖掘会议 (ICDM),第 548-557 页。IEEE,2022 年。
[81] Zehong Wang, Zheyuan Zhang, Chuxu Zhang, and Yanfang Ye. Subgraph pooling: Tackling negative transfer on graphs. In IJCAI, 2024. [81] 王泽宏、张哲元、张楚旭、叶艳芳。子图池化:解决图上的负转移问题。在 IJCAI,2024 年。
[82] Xiaojun Chen, Shengbin Jia, and Yang Xiang. A review: Knowledge reasoning over knowledge graph. Expert systems with applications, 2020. [82] 陈晓军、贾胜斌、杨翔。综述:知识推理胜过知识图谱。具有应用程序的专家系统,2020 年。
[83] Zheyuan Zhang, Yiyang Li, Nhi Ha Lan Le, Zehong Wang, Tianyi Ma, Vincent Galassi, Keerthiram Murugesan, Nuno Moniz, Werner Geyer, Nitesh V Chawla, et al. Ngqa: A nutritional graph question answering benchmark for personalized health-aware nutritional reasoning. arXiv, 2024. [83] 张哲元、李益阳、Nhi Ha Lan Le、王泽宏、马天一、文森特·加拉西、Keerthiram Murugesan、努诺·莫尼兹、维尔纳·盖尔、Nitesh V Chawla 等。Ngqa:个性化健康意识营养推理的营养图问答基准。arXiv,2024 年。
[84] Zheyuan Zhang, Zehong Wang, Shifu Hou, Evan Hall, Landon Bachman, Jasmine White, Vincent Galassi, Nitesh V Chawla, Chuxu Zhang, and Yanfang Ye. Diet-odin: A novel framework for opioid misuse detection with interpretable dietary patterns. In KDD, 2024. [84] 张哲元、王泽宏、侯世夫、埃文·霍尔、兰登·巴赫曼、茉莉·怀特、文森特·加拉西、尼特什·乔拉、张楚旭和叶艳芳。Diet-odin:一种用于阿片类药物滥用检测的新框架,具有可解释的饮食模式。在 KDD,2024 年。
[85] Vijay Prakash Dwivedi and Xavier Bresson. A generalization of transformer networks to graphs. arXiv, 2020. [85] 维杰·普拉卡什·德维维迪和泽维尔·布列松。变压器网络对图的推广。arXiv,2020 年。
[86] Zehong Wang, Zheyuan Zhang, Tianyi Ma, Nitesh V Chawla, Chuxu Zhang, and Yanfang Ye. Neural graph pattern machine. In ICML, 2025. [86] 王泽宏、张哲远、马天一、Nitesh V Chawla、张楚旭和叶艳芳。神经图模式机。在 ICML,2025 年。
[87] Ladislav Rampasek, Mikhail Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. Recipe for a general, powerful, scalable graph transformer. In NeurIPS, 2022. [87] 拉迪斯拉夫·兰帕塞克、米哈伊尔·加尔金、维杰·普拉卡什·德维维迪、安·团卢、盖伊·沃尔夫和多米尼克·贝伊尼。通用、强大、可扩展的图转换器的配方。在 NeurIPS,2022 年。
[88] Kerstin Klaser, Blazej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis Müller, Ali Parviz, Shenyang Huang, and Andrew W Fitzgibbon. Minimol: A parameter-efficient foundation model for molecular learning. In ICML 2024 Workshop, 2024. [88] Kerstin Klaser、Blazej Banaszewski、Samuel Maddrell-Mander、Callum McLean、Luis Müller、Ali Parviz、Shenyang Huang 和 Andrew W Fitzgibbon。Minimol:用于分子学习的参数高效基础模型。在 ICML 2024 研讨会上,2024 年。
[89] Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi. Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning. arXiv, 2023. [89] 何晓欣、泽维尔·布列松、托马斯·洛朗、亚当·佩罗德、扬·勒昆和布莱恩·胡伊。利用解释:Llm 到 lm 解释器,用于增强文本归因图表示学习。arXiv,2023 年。
[90] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv, 2023. [90] 赵新、周坤、李俊义、唐天一、王晓磊、侯宇鹏、闵英倩、张北辰、张俊杰、董子灿等。大型语言模型调查。arXiv,2023 年。
[91] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv, 2023. [91] 雨果·图夫龙、路易斯·马丁、凯文·斯通、彼得·阿尔伯特、阿姆贾德·阿尔马海里、亚斯敏·巴巴伊、尼古拉·巴什利科夫、苏米亚·巴特拉、普拉吉瓦尔·巴尔加瓦、什鲁蒂·博萨莱等。Llama 2:开放基础和微调的聊天模型。arXiv,2023 年。
[92] Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv, 2024. [92] 亚伦·格拉塔菲奥里、阿比曼尤·杜贝、阿比纳夫·贾赫里、阿比纳夫·潘迪、阿布舍克·卡迪安、艾哈迈德·达勒、艾莎·莱特曼、阿基尔·马图尔、艾伦·谢尔滕、亚历克斯·沃恩等。骆驼 3 群模型。arXiv,2024 年。
[93] Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report. arXiv, 2023. [93] 白金泽、白帅、褚云飞、崔泽宇、党凯、邓晓东、杨帆、葛文彬、于涵、黄飞等。Qwen 技术报告。arXiv,2023 年。
[94] Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv, 2025. [94] 郭大亚、杨德建、张浩伟、宋俊晓、张若宇、徐润信、朱启浩、马世荣、王培一、小毕等。Deepseek-r1:通过强化学习激励法学硕士的推理能力。arXiv,2025 年。
[95] Yaochen Zhu, Liang Wu, Binchi Zhang, Song Wang, Qi Guo, Liangjie Hong, Luke Simon, and Jundong Li. Understanding and modeling job marketplace with pretrained language models. In CIKM, 2024. [95] 朱耀辰、吴亮、张彬驰、王松、郭琦、洪良杰、Luke Simon 和李俊东。使用预训练语言模型理解和建模就业市场。在 CIKM,2024 年。
[96] Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2022. [96] 董清秀, 李磊, 戴大麦, 策正, 马景远, 李瑞, 夏鹤鸣, 徐晶晶, 吴志勇, 刘天宇, et al.关于情境学习的调查。arXiv 预印本 arXiv:2301.00234,2022 年。
[97] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 2022. [97] 魏杰森、王雪芝、戴尔·舒尔曼斯、马丁·博斯马、夏飞、池爱德、国乐、丹尼·周等。思维链提示引发大型语言模型中的推理。神经 IPS,2022 年。
[98] Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. NeurIPS, 2022. [98] 小岛武、顾世祥、马切尔·里德、松尾丰和岩泽佑介。大型语言模型是零样本推理器。神经 IPS,2022 年。
[99] Tianqianjin Lin, Pengwei Yan, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Jun Lin, Weikang Yuan, Junjie Cao, Changlong Sun, and Xiaozhong Liu. Langgfm: A large language model alone can be a powerful graph foundation model. arXiv, 2024. [99] 林天千金、闫鹏伟、宋开松、江卓仁、康洋洋、林军、袁伟康、曹俊杰、孙长龙、刘晓忠.Langgfm:一个大型语言模型本身就可以成为一个强大的图基础模型。arXiv,2024 年。
[100] Yuntong Hu, Zheng Zhang, and Liang Zhao. Beyond text: A deep dive into large language models’ ability on understanding graph data. arXiv, 2023. [100] 胡云彤、张正、赵良。超越文本:深入探讨大型语言模型理解图数据的能力。arXiv,2023 年。
[101] Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, and Yongfeng Zhang. Language is all a graph needs. In EACL,2024E A C L, 2024. [101] 叶若松、张采琦、王润辉、徐淑媛、张永峰。语言是图表所需要的一切。在 EACL,2024E A C L, 2024 .
[102] Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. In SIGIR, 2024. [102] 唐佳彬、杨宇浩、魏巍、石磊、苏立新、程素琦、尹大伟、黄超。Graphgpt:大型语言模型的图指令调优。在 SIGIR,2024 年。
[103] Mengmei Zhang, Mingwei Sun, Peng Wang, Shen Fan, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Cheng Yang, and Chuan Shi. Graphtranslator: Aligning graph model to large language model for open-ended tasks. In WWW,2024W W W, 2024. [103] 张萌梅、孙明伟、王鹏、沈凡、莫彦虎、徐晓晓、刘红、程阳、史川。Graphtranslator:将图形模型与大型语言模型对齐以执行开放式任务。在 WWW,2024W W W, 2024 .
[104] Yun Zhu, Haizhou Shi, Xiaotang Wang, Yongchao Liu, Yaoke Wang, Boci Peng, Chuntao Hong, and Siliang Tang. Graphclip: Enhancing transferability in graph foundation models for text-attributed graphs. arXiv, 2024. [104] 朱云、石海州、王晓棠、刘永超、王耀科、彭博慈、洪春涛、唐思亮。Graphclip:增强文本归因图的图基础模型的可转移性。arXiv,2024 年。
[105] William Brannon, Wonjune Kang, Suyash Fulay, Hang Jiang, Brandon Roy, Deb Roy, and Jad Kabbara. Congrat: Self-supervised contrastive pretraining for joint graph and text embeddings. arXiv, 2023. [105] 威廉·布兰农、康元俊、苏亚什·富莱、杭江、布兰登·罗伊、黛布·罗伊和贾德·卡巴拉。恭喜:联合图和文本嵌入的自监督对比预训练。arXiv,2023 年。
[106] Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. Learning on large-scale text-attributed graphs via variational inference. arXiv, 2022. [106] 赵嘉楠、孟曲、李朝卓、郝岩、刘倩、李瑞、谢行、唐健。通过变分推理在大规模文本归因图上学习。arXiv,2022 年。
[107] Lirong Wu, Haitao Lin, Cheng Tan, Zhangyang Gao, and Stan Z Li. Self-supervised learning on graphs: Contrastive, generative, or predictive. TKDE, 2021. [107] 吴丽荣、林海涛、谭成、高张阳和李斯坦。图上的自监督学习:对比、生成或预测。TKDE,2021 年。
[108] Qian Huang, Hongyu Ren, Peng Chen, Gregor Kržmanc, Daniel Zeng, Percy S Liang, and Jure Leskovec. Prodigy: Enabling in-context learning over graphs. In NeurIPS, 2023. [108] 黄倩、任宏宇、陈鹏、格雷戈尔·克日曼克、曾丹尼尔、梁珀西和莱斯科维茨。Prodigy:通过图表实现上下文学习。在 NeurIPS,2023 年。
[109] Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan L Yuille, Trevor Darrell, Jitendra Malik, and Alexei A Efros. Sequential modeling enables scalable learning for large vision models. In CVPR, 2024. [109] Yutong Bai、Xinyang Geng、Karttikeya Mangalam、Amir Bar、Alan L Yuille、Trevor Darrell、Jitendra Malik 和 Alexei A Efros。顺序建模支持大型视觉模型的可扩展学习。在 CVPR,2024 年。
[110] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. [110] Kyunghyun Cho、Bart Van Merriënboer、Caglar Gulcehre、Dzmitry Bahdanau、Fethi Bougares、Holger Schwenk 和 Yoshua Bengio。使用 RNN 编码器-解码器学习短语表示进行统计机器翻译。arXiv 预印本 arXiv:1406.1078,2014 年。
[111] Alex Graves and Alex Graves. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37-45, 2012. [111] 亚历克斯·格雷夫斯和亚历克斯·格雷夫斯。长短期记忆。使用循环神经网络进行监督序列标记,第 37-45 页,2012 年。
[112] Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. Graphrnn: Generating realistic graphs with deep auto-regressive models. In ICML, 2018. [112] 游嘉轩、雷克斯·英、向任、威廉·汉密尔顿和尤尔·莱斯科维茨。Graphrnn:使用深度自回归模型生成逼真的图形。ICML,2018 年。
[113] Mariya Popova, Mykhailo Shvets, Junier Oliva, and Olexandr Isayev. Molecularrnn: Generating realistic molecular graphs with optimized properties. arXiv, 2019. [113] 玛丽亚·波波娃、米哈伊洛·什维茨、朱尼尔·奥利瓦和奥列山德尔·伊萨耶夫。Molecularrnn:生成具有优化特性的逼真分子图。arXiv,2019 年。
[114] Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, and Yizhou Sun. Gpt-gnn: Generative pre-training of graph neural networks. In KDD, 2020. [114] 胡子牛、董玉晓、王宽三、张凯伟和孙一洲。Gpt-gnn:图神经网络的生成预训练。在 KDD,2020 年。
[115] Davide Bacciu, Alessio Micheli, and Marco Podda. Edge-based sequential graph generation with recurrent neural networks. Neurocomputing, 416:177-189, November 2020. [115] 大卫·巴丘、阿莱西奥·米凯利和马可·波达。使用循环神经网络生成基于边缘的顺序图。神经计算,416:177-189,2020 年 11 月。
[116] Nikhil Goyal, Harsh Vardhan Jain, and Sayan Ranu. Graphgen: A scalable approach to domain-agnostic labeled graph generation. In WWW,2020W W W, 2020. [116] 尼基尔·戈亚尔、哈什·瓦尔丹·耆那教和萨扬·拉努。Graphgen:一种与领域无关的标记图生成的可扩展方法。在 WWW,2020W W W, 2020 .
[117] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv, 2013. [117] 迪德里克·金马和马克斯·威林。自动编码变分贝叶。arXiv,2013 年。
[118] Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. Variational autoencoder for deep learning of images, labels and captions. In NeurIPS, 2016. [118] 蒲云辰、甘哲、里卡多·贺瑙、袁辛、李春元、安德鲁·史蒂文斯和劳伦斯·卡林。用于深度学习图像、标签和标题的变分自动编码器。在 NeurIPS,2016 年。
[119] Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv, 2016. [119] 托马斯·基普夫和马克斯·威林。变分图自动编码器。arXiv,2016 年。
[120] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In CVPR, 2022. [120] 何凯明、陈欣磊、谢赛宁、李阳浩、Piotr Dollár 和 Ross Girshick。屏蔽自动编码器是可扩展的视觉学习器。在 CVPR,2022 年。
[121] Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, and Jie Tang. Graphmae: Self-supervised masked graph autoencoders. In KDD, 2022. [121] 侯振宇、小刘、岑玉国、董玉晓、杨红霞、王春杰、唐杰。Graphmae:自监督掩码图自动编码器。在 KDD,2022 年。
[122] Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, and Jie Tang. Graphmae2: A decoding-enhanced masked self-supervised graph learner. In WWW,2023W W W, 2023. [122] 侯振宇、何宇飞、岑玉国、小刘、董玉晓、叶夫根尼·哈拉莫夫、唐杰。Graphmae2:解码增强的掩码自监督图学习器。在 WWW,2023W W W, 2023 .
[123] Qiaoyu Tan, Ninghao Liu, Xiao Huang, Soo-Hyun Choi, Li Li, Rui Chen, and Xia Hu. S2gae: Self-supervised graph autoencoders are generalizable learners with graph masking. In WSDM, 2023. [123] 谭巧宇、刘宁浩、黄晓、崔秀贤、李丽、陈瑞、胡夏。S2gae:自监督图自动编码器是具有图掩蔽的可推广学习器。在 WSDM,2023 年。
[124] Yijun Tian, Kaiwen Dong, Chunhui Zhang, Chuxu Zhang, and Nitesh V Chawla. Heterogeneous graph masked autoencoders. In AAAI, 2023. [124] 田怡君、董凯文、张春辉、张楚旭和 Nitesh V Chawla。异构图掩码自动编码器。在 AAAI,2023 年。
[125] Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Rethinking tokenizer and decoder in masked graph modeling for molecules. NeurIPS, 2023. [125] 刘志远、石耀瑞、张安、张恩志、川口健二、王翔和蔡达成。重新思考分子掩码图建模中的分词器和解码器。神经 IPS,2023 年。
[126] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In ICML, 2020. [126] 陈婷、西蒙·科恩布里斯、穆罕默德·诺鲁齐和杰弗里·辛顿。用于视觉表示对比学习的简单框架。在 ICML,2020 年。
[127] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020. [127] 何凯明、范浩琦、吴宇新、谢赛宁、罗斯·吉尔希克。用于无监督视觉表示学习的动量对比。在 CVPR,2020 年。
[128] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv, 2018. [128] 亚伦·范登奥德、亚哲·李和奥里奥尔·维尼亚尔斯。具有对比预测编码的表示学习。arXiv,2018 年。
[129] Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS, 2020. [129] 让-巴斯蒂安·格里尔、弗洛里安·斯特鲁布、弗洛朗·阿尔切、科伦坦·塔莱克、皮埃尔·里切蒙德、埃琳娜·布恰茨卡娅、卡尔·多尔施、贝尔纳多·阿维拉·皮雷斯、郭兆汉、穆罕默德·盖什拉吉·阿扎尔等。引导你自己的潜在能力——一种新的自监督学习方法。神经 IPS,2020 年。
[130] Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. NeurIPS, 2020. [130] 玛蒂尔德·卡隆、伊尚·米斯拉、朱利安·迈拉尔、普里亚·戈亚尔、彼得·博亚诺夫斯基和阿尔芒·朱林。通过对比聚类分配对视觉特征进行无监督学习。神经 IPS,2020 年。
[131] R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. arXiv, 2018. [131] R Devon Hjelm、Alex Fedorov、Samuel Lavoie-Marchildon、Karan Grewal、Phil Bachman、Adam Trischler 和 Yoshua Bengio。通过相互信息估计和最大化学习深度表示。arXiv,2018 年。
[132] Yiyue Qian, Tianyi Ma, Chuxu Zhang, and Yanfang Ye. Dual-level hypergraph contrastive learning with adaptive temperature enhancement. In Companion Proceedings of the ACM Web Conference 2024, 2024. [132] 钱一月、马天一、张楚旭、叶艳芳。具有自适应温度增强的双级超图对比学习。2024 年 ACM 网络会议的配套论文集,2024 年。
[133] Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. In ECCV, 2020. [133] 田永龙、迪利普·克里希南和菲利普·伊索拉。对比多视图编码。在 ECCV,2020 年。
[134] Tongzhou Wang and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML, 2020. [134] 王同洲和菲利普·伊索拉。通过超球体上的对齐和统一性来理解对比表示学习。在 ICML,2020 年。
[135] Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning? NeurIPS, 2020. [135] 田永龙、孙陈、本·普尔、迪利普·克里希南、科迪莉亚·施密德和菲利普·伊索拉。什么造就了对比学习的良好观点?神经 IPS,2020 年。
[136] Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. NeurIPS, 2020. [136] 普兰奈·科斯拉、彼得·泰特瓦克、王陈、亚伦·萨尔纳、田永龙、菲利普·伊索拉、亚伦·马斯基诺特、刘策和迪利普·克里希南。监督对比学习。神经 IPS,2020 年。
[137] Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. In ICLR, 2019. [137] 佩塔尔·维利奇科维奇、威廉·费杜斯、威廉·汉密尔顿、彼得罗·利奥、约书亚·本吉奥和 R 德文·赫尔姆。深图 infomax。在 ICLR,2019 年。
[138] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. NeurIPS, 2020. [138] 尤宇宁、陈天龙、隋永铎、陈婷、王张扬、杨慎。用增强绘制对比学习图。神经 IPS,2020 年。
[139] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. Deep graph contrastive representation learning. arXiv, 2020. [139] 朱彦桥、徐奕宸、冯宇、刘强、吴淑、王良。深度图对比表示学习。arXiv,2020 年。
[140] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. Graph contrastive learning with adaptive augmentation. In WWW,2021W W W, 2021. [140] 朱彦桥、徐奕辰、冯宇、刘强、吴淑、王良。使用自适应增强绘制对比学习图。在 WWW,2021W W W, 2021 .
[141] Zehong Wang, Qi Li, Donghua Yu, Xiaolong Han, Xiao-Zhi Gao, and Shigen Shen. Heterogeneous graph contrastive multi-view learning. In SDM, 2023. [141] 王泽宏、李琦、俞东华、韩晓龙、高晓志、沈世根。异构图对比多视图学习。在 SDM,2023 年。
[142] Zehong Wang, Donghua Yu, Shigen Shen, Shichao Zhang, Huawen Liu, Shuang Yao, and Maozu Guo. Select your own counterparts: self-supervised graph contrastive learning with positive sampling. TNNLS, 2024. [142] 王泽宏、俞东华、沈世根、张世超、刘华文、双瑶、郭茂祖。选择您自己的对应物:自监督图对比学习与正抽样。TNNLS,2024 年。
[143] Kaveh Hassani and Amir Hosein Khasahmadi. Contrastive multi-view representation learning on graphs. In ICML, 2020. [143] 卡维·哈萨尼和阿米尔·霍辛·哈萨迈迪。图形上的对比多视图表示学习。在 ICML,2020 年。
[144] Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L Dyer, Remi Munos, Petar Veličković, and Michal Valko. Large-scale representation learning on graphs via bootstrapping. In ICLR, 2022. [144] 尚塔努·塔库尔、科伦丁·塔莱克、穆罕默德·盖什拉吉·阿扎尔、迈赫迪·阿扎布、伊娃·戴尔、雷米·穆诺斯、佩塔尔·维利奇科维奇和米哈尔·瓦尔科。通过引导在图上进行大规模表示学习。在 ICLR,2022 年。
[145] Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L Dyer, Remi Munos, Petar Veličković, and Michal Valko. Large-scale representation learning on graphs via bootstrapping. In ICLR, 2021. [145] 尚塔努·塔库尔、科伦丁·塔莱克、穆罕默德·盖什拉吉·阿扎尔、迈赫迪·阿扎布、伊娃·戴尔、雷米·穆诺斯、佩塔尔·维利奇科维奇和米哈尔·瓦尔科。通过引导在图上进行大规模表示学习。在 ICLR,2021 年。
[146] Wei Jin, Tyler Derr, Haochen Liu, Yiqi Wang, Suhang Wang, Zitao Liu, and Jiliang Tang. Self-supervised learning on graphs: Deep insights and new direction. arXiv, 2020. [146] Wei Jin, Tyler Derr, Haochen Liu , Yiqi Wang, Suhang Wang, Zitao Liu 和 Jiliang Tang.图上的自监督学习:深刻洞察与新方向。arXiv,2020 年。
[147] Yizhu Jiao, Yun Xiong, Jiawei Zhang, Yao Zhang, Tianqi Zhang, and Yangyong Zhu. Sub-graph contrast for scalable self-supervised graph representation learning. In ICDM, 2020. [147] 焦一柱、熊云、张家伟、张耀、张天琦、朱阳勇。用于可扩展的自监督图表示学习的子图对比度。在 ICDM,2020 年。
[148] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In ECCV,2018E C C V, 2018. [148] 玛蒂尔德·卡隆、彼得·博亚诺夫斯基、阿曼德·朱林和马蒂斯·杜兹。用于视觉特征无监督学习的深度聚类。在 ECCV,2018E C C V, 2018 .
[149] R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. arXiv, 2019. [149] R Devon Hjelm、Alex Fedorov、Samuel Lavoie-Marchildon、Karan Grewal、Phil Bachman、Adam Trischler 和 Yoshua Bengio。通过相互信息估计和最大化学习深度表示。arXiv,2019 年。
[150] Wenting Zhao, Gongping Xu, Zhen Cui, Siqiang Luo, Cheng Long, and Tong Zhang. Deep graph structural infomax. In AAAI, 2023. [150] 赵文婷、徐公平、崔振、罗思强、成龙、张彤。深度图结构 infomax。在 AAAI,2023 年。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
[151] Xin Xu, Junping Du, Jie Song, and Zhe Xue. Infomax classification-enhanced learnable network for few-shot node classification. Electronics, 12(1), 2023. [151] 辛旭、杜俊平、松杰、哲雪。用于少量节点分类的 Infomax 分类增强型可学习网络。电子学, 12(1), 2023.
[152] Xueting Han, Zhenhuan Huang, Bang An, and Jing Bai. Adaptive transfer learning on graph neural networks. In KDD, 2021. [152] 韩雪婷、黄振桓、邦安、白景。图神经网络上的自适应迁移学习。在 KDD,2021 年。
[153] Shengrui Li, Xueting Han, and Jing Bai. Adaptergnn: Parameter-efficient fine-tuning improves generalization in gnns. In AAAI,2024A A A I, 2024. [153] 李胜瑞、韩雪婷、白景。Adaptergnn:参数高效的微调提高了 gnn 的泛化。在 AAAI,2024A A A I, 2024 .
[154] Zhe-Rui Yang, Jindong Han, Chang-Dong Wang, and Hao Liu. Graphlora: Structure-aware contrastive low-rank adaptation for cross-graph transfer learning. arXiv, 2024. [154] 杨哲瑞、韩进东、王昌东、刘浩。Graphlora:用于跨图迁移学习的结构感知对比低秩适应。arXiv,2024 年。
[155] Taoran Fang, Yunchao Zhang, Yang Yang, Chunping Wang, and Lei Chen. Universal prompt tuning for graph neural networks. NeurIPS, 2023. [155] 方陶然、张云超、杨洋、王春平、陈磊.图神经网络的通用提示调整。神经 IPS,2023 年。
[156] Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. IJCV, 2021. [156] 郭建平、俞宝生、斯蒂芬·马来亚银行和陶大成。知识蒸馏:一项调查。IJCV,2021 年。
[157] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv, 2015. [157] 杰弗里·辛顿、奥里奥尔·维尼亚尔斯和杰夫·迪恩。在神经网络中提炼知识。arXiv,2015 年。
[158] Zehong Wang, Zheyuan Zhang, Chuxu Zhang, and Yanfang Ye. Training mlps on graphs without supervision. In WSDM, 2025. [158] 王泽宏、张哲源、张楚旭、叶艳芳。在没有监督的情况下在图上训练 mlps。在 WSDM,2025 年。
[159] Chaitanya K Joshi, Fayao Liu, Xu Xun, Jie Lin, and Chuan Sheng Foo. On representation knowledge distillation for graph neural networks. TNNLS, 2022. [159] Chaitanya K Joshi、Fayao Liu、Xu Xun、Jie Lin 和 Chuan Sheng Foo。图神经网络的表示知识蒸馏.TNNLS,2022 年。
[160] Shichang Zhang, Yozen Liu, Yizhou Sun, and Neil Shah. Graph-less neural networks: Teaching old mlps new tricks via distillation. arXiv, 2021. [160] 张世昌、刘洋禅、孙益洲和尼尔·沙阿。无图神经网络:通过蒸馏向旧的 mlps 传授新技巧。arXiv,2021 年。
[161] Yijun Tian, Chuxu Zhang, Zhichun Guo, Xiangliang Zhang, and Nitesh Chawla. Learning mlps on graphs: A unified view of effectiveness, robustness, and efficiency. In ICLR, 2022. [161] 田奕君、张楚旭、郭志春、张相良和尼特什·乔拉。在图上学习 mlps:有效性、鲁棒性和效率的统一视图。在 ICLR,2022 年。
[162] Seunghyun Lee and Byung Cheol Song. Graph-based knowledge distillation by multi-head attention network. arXiv, 2019. [162] 李承贤和宋炳哲。基于图的多头注意力网络知识提炼。arXiv,2019 年。
[163] Lirong Wu, Haitao Lin, Yufei Huang, and Stan Z Li. Knowledge distillation improves graph structure augmentation for graph neural networks. NeurIPS, 2022. [163] Lirong Wu、Haitao Lin、Yufei Huang 和 Stan Z Li. 知识蒸馏改进了图神经网络的图结构增强。神经 IPS,2022 年。
[164] Jiaqi Ma and Qiaozhu Mei. Graph representation learning via multi-task knowledge distillation. arXiv, 2019. [164] 马佳琦和乔珠梅。通过多任务知识蒸馏进行图表示学习。arXiv,2019 年。
[165] Yating Ren, Junzhong Ji, Lingfeng Niu, and Minglong Lei. Multi-task self-distillation for graph-based semi-supervised learning. arXiv, 2021. [165] 亚婷任,季俊中,牛凌峰,雷明龙。基于图的半监督学习的多任务自蒸馏。arXiv,2021 年。
[166] Yunhui Liu, Zhen Tao, Xiang Zhao, Jianhua Zhao, Tao Zheng, and Tieke He. Learning accurate, efficient, and interpretable mlps on multiplex graphs via node-wise multi-view ensemble distillation. arXiv, 2025. [166] 刘云辉、甄涛、赵相、赵建华、陶铮、何铁克。通过按节点的多视图集成蒸馏在多重图上学习准确、高效和可解释的 mlps。arXiv,2025 年。
[167] Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. arXiv, 2020. [167] 王德全、埃文·谢尔哈默、刘绍腾、布鲁诺·奥尔斯豪森和特雷弗·达雷尔。帐篷:通过熵最小化完全适应测试时间。arXiv,2020 年。
[168] Jian Liang, Ran He, and Tieniu Tan. A comprehensive survey on test-time adaptation under distribution shifts. IJCV, 2025. [168] 简良、冉和、铁牛谭。分布变化下测试时间适应的综合综述。IJCV,2025 年。
[169] Dian Chen, Dequan Wang, Trevor Darrell, and Sayna Ebrahimi. Contrastive test-time adaptation. In CVPR, 2022. [169] 陈典、王德全、特雷弗·达雷尔和赛娜·易卜拉希米。对比测试时间适应。在 CVPR,2022 年。
[170] Malik Boudiaf, Romain Mueller, Ismail Ben Ayed, and Luca Bertinetto. Parameter-free online test-time adaptation. In CVPR, 2022. [170] 马利克·布迪亚夫、罗曼·穆勒、伊斯梅尔·本·阿耶德和卢卡·贝尔蒂内托。无参数在线测试时间适配。在 CVPR,2022 年。
[171] Wei Jin, Tong Zhao, Jiayuan Ding, Yozen Liu, Jiliang Tang, and Neil Shah. Empowering graph representation learning with test-time graph transformation. arXiv, 2022. [171] Wei Jin、Tong Zhao、Jiayuan Ding、Yozen Liu、Jiliang Tang 和 Neil Shah。通过测试时图转换增强图表示学习。arXiv,2022 年。
[172] Jiaxin Zhang, Yiqi Wang, Xihong Yang, Siwei Wang, Yu Feng, Yu Shi, Ruichao Ren, En Zhu, and Xinwang Liu. Test-time training on graphs with large language models (llms). In MM, 2024. [172] 张嘉欣、王奕琦、杨希红、王思伟、于峰、于石、瑞超任、恩朱、刘新旺。使用大型语言模型 (LLM) 对图形进行测试时训练。2024 年,MM。
[173] Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. All in one: Multi-task prompting for graph neural networks. In KDD, 2023. [173] 孙相国、洪程、贾丽、刘伯、关继红。多合一:图神经网络的多任务提示。在 KDD,2023 年。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
[174] Jiazheng Li, Jundong Li, and Chuxu Zhang. Instance-aware graph prompt learning. Transactions on Machine Learning Research, 2025. [174] 李嘉正、李俊东、张楚旭。实例感知图提示学习。机器学习研究汇刊,2025 年。
[175] Junhyun Lee, Wooseong Yang, and Jaewoo Kang. Subgraph-level universal prompt tuning. arXiv preprint arXiv:2402.10380, 2024. [175] Junhyun Lee、Wooseong Yang 和 Jaewoo Kang。子图级通用提示调整。arXiv 预印本 arXiv:2402.10380, 2024.
[176] Jiapeng Zhu, Zichen Ding, Jianxiang Yu, Jiaqi Tan, Xiang Li, and Weining Qian. Relief: Reinforcement learning empowered graph feature prompt tuning. arXiv preprint arXiv:2408.03195, 2024. [176] 朱嘉鹏、丁子辰、于建祥、谭佳琦、李祥、钱维宁。缓解:强化学习赋能图功能提示调优。arXiv 预印本 arXiv:2408.03195,2024 年。
[177] Zhengpin Li, Minhua Lin, Jian Wang, and Suhang Wang. Fairness-aware prompt tuning for graph neural networks. In THE WEB CONFERENCE 2025, 2025. [177] 李正品、林敏华、王健、王素航。图神经网络的公平性感知提示调优。在 2025 年、2025 年网络会议中。
[178] Bo Jiang, Hao Wu, Ziyan Zhang, Beibei Wang, and Jin Tang. A unified graph selective prompt learning for graph neural networks. arXiv preprint arXiv:2406.10498, 2024. [178] 江伯、吴浩、张子彦、王蓓蓓、唐金。一种面向图神经网络的统一图选择性提示学习。arXiv 预印本 arXiv:2406.10498,2024 年。
[179] Yun Zhu, Jianhao Guo, and Siliang Tang. Sgl-pt: A strong graph learner with graph prompt tuning. arXiv preprint arXiv:2302.12449, 2023. [179] 朱云、郭建浩、唐思良。Sgl-pt:具有图提示调整功能的强大图学习器。arXiv 预印本 arXiv:2302.12449,2023 年。
[180] Qingqing Ge, Zeyuan Zhao, Yiding Liu, Anfeng Cheng, Xiang Li, Shuaiqiang Wang, and Dawei Yin. Psp: Pre-training and structure prompt tuning for graph neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 423-439. Springer, 2024. [180] 葛青青、赵泽元、刘一丁、程安峰、李翔、王帅强、尹大伟.Psp:图神经网络的预训练和结构提示调整。欧洲数据库中的机器学习和知识发现联合会议,第 423-439 页。施普林格,2024 年。
[181] Zemin Liu, Xingtong Yu, Yuan Fang, and Xinming Zhang. Graphprompt: Unifying pre-training and downstream tasks for graph neural networks. In WWW,2023W W W, 2023. [181] 刘泽民、俞幸同、方元、张新明。Graphprompt:统一图神经网络的预训练和下游任务。在 WWW,2023W W W, 2023 .
[182] Xingtong Yu, Jie Zhang, Yuan Fang, and Renhe Jiang. Non-homophilic graph pre-training and prompt learning. In KDD,2025K D D, 2025. [182] 余行同、张杰、袁方、江仁和。非同质图预训练和提示学习。在 KDD,2025K D D, 2025 .
[183] Zhengyu Hu, Yichuan Li, Zhengyu Chen, Jingang Wang, Han Liu, Kyumin Lee, and Kaize Ding. Let’s ask GNN: Empowering large language model for graph in-context learning. In EMNLP, 2024. [183] 胡正宇、李义川、陈正宇、王金刚、刘汉、李圭民、丁凯泽。让我们问问 GNN:为图上下文学习赋能大型语言模型。在 EMNLP,2024 年。
[184] Jintang Li, Ruofan Wu, Yuchang Zhu, Huizhe Zhang, Liang Chen, and Zibin Zheng. Are large language models in-context graph learners? arXiv, 2025. [184] 李金堂、吴若凡、朱玉昌、张慧哲、陈良、郑子斌。大型语言模型是上下文图学习者吗?arXiv,2025 年。
[185] Udari Madhushani Sehwag, Kassiani Papasotiriou, Jared Vann, and Sumitra Ganesh. In-context learning with topological information for knowledge graph completion. arXiv preprint arXiv:2412.08742, 2024. [185] 乌达里·马杜沙尼·塞瓦格、卡西亚尼·帕帕索蒂里乌、贾里德·范恩和苏米特拉·加内什。使用拓扑信息进行上下文学习,用于知识图谱的完成。arXiv 预印本 arXiv:2412.08742,2024 年。
[186] Hanieh Khorashadizadeh, Nandana Mihindukulasooriya, Sanju Tiwari, Jinghua Groppe, and Sven Groppe. Exploring in-context learning capabilities of foundation models for generating knowledge graphs from text. arXiv preprint arXiv:2305.08804, 2023. [186] 哈涅·霍拉沙迪扎德、南达纳·米欣杜库拉苏里亚、桑朱·蒂瓦里、景华·格罗佩和斯文·格罗佩。探索基础模型的上下文学习功能,用于从文本生成知识图谱。arXiv 预印本 arXiv:2305.08804,2023 年。
[187] Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. Generalizing from a few examples: A survey on few-shot learning. CSUR, 2020. [187] 王亚庆、姚全明、James T Kwok 和 Lionel M Ni。从几个例子中概括一下:关于少量学习的调查。CSUR,2020 年。
[188] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. NeurIPS, 2017. [188] 杰克·斯内尔、凯文·斯沃斯基和理查德·泽梅尔。用于少量学习的原型网络。神经 IPS,2017 年。
[189] Xingtong Yu, Zhenghao Liu, Yuan Fang, Zemin Liu, Sihong Chen, and Xinming Zhang. Generalized graph prompt: Toward a unification of pre-training and downstream tasks on graphs. IEEE Transactions on Knowledge and Data Engineering, 2024. [189] 俞兴彤、刘正浩、方元、刘泽民、陈思宏、张新明。广义图提示:统一图上的预训练和下游任务。IEEE 知识与数据工程汇刊,2024 年。
[190] Xingtong Yu, Yuan Fang, Zemin Liu, and Xinming Zhang. Hgprompt: Bridging homogeneous and heterogeneous graphs for few-shot prompt learning. In AAAI,2024A A A I, 2024. [190] 俞幸同、袁方、刘泽民、张新明。Hgprompt:桥接同构图和异构图,以实现少量提示学习。在 AAAI,2024A A A I, 2024 .
[191] Yongcheng Jing, Chongbin Yuan, Li Ju, Yiding Yang, Xinchao Wang, and Dacheng Tao. Deep graph reprogramming. In CVPR, 2023. [191] 景永成、袁崇斌、李菊、杨一丁、王新超、陶大成.深度图重编程。在 CVPR,2023 年。
[192] Kai Wang and Siqiang Luo. Towards graph foundation models: The perspective of zero-shot reasoning on knowledge graphs. arXiv, 2024. [192] 王凯、罗思强.走向图基础模型:知识图谱上零样本推理的视角。arXiv,2024 年。
[193] Yangyi Shen, Jincheng Zhou, Beatrice Bevilacqua, Joshua Robinson, Charilaos Kanatsoulis, Jure Leskovec, and Bruno Ribeiro. Zero-shot generalization of GNNs over distinct attribute domains. In ICML Workshop, 2024. [193] 沈阳一、周金城、比阿特丽斯·贝维拉夸、约书亚·罗宾逊、查里劳斯·卡纳苏利斯、尤尔·莱斯科维茨和布鲁诺·里贝罗。GNN 在不同属性域上的零样本泛化。在 ICML 研讨会上,2024 年。
[194] Alex Davies, Riku Green, Nirav Ajmeri, and Telmo Silva Filho. Its all graph to me: Single-model graph representation learning on multiple domains. In NeurIPS 2023 Workshop, 2023. [194] 亚历克斯·戴维斯、里库·格林、尼拉夫·阿杰梅里和特尔莫·席尔瓦·菲略。对我来说,这都是图:在多个领域进行单模型图表示学习。在 NeurIPS 2023 研讨会上,2023 年。
[195] Qi Zhu, Carl Yang, Yidan Xu, Haonan Wang, Chao Zhang, and Jiawei Han. Transfer learning of graph neural networks with ego-graph information maximization. In NeurIPS, 2021. [195] 朱琦、杨卡尔、徐一丹、王浩南、张超、韩佳伟.基于自我图信息最大化的图神经网络迁移学习。在 NeurIPS,2021 年。
[196] Li Sun, Zhenhao Huang, Suyang Zhou, Qiqi Wan, Hao Peng, and Philip Yu. Riemanngfm: Learning a graph foundation model from riemannian geometry. In WWW,2025W W W, 2025. [196] 李孙、黄振浩、素阳周、万琦琦、彭浩、余菲利普.Riemanngfm:从黎曼几何学习图基础模型。在 WWW,2025W W W, 2025 .
[197] Yao Cheng, Yige Zhao, Jianxiang Yu, and Xiang Li. Boosting graph foundation model from structural perspective. arXiv, 2024. [197] 姚成,赵一格,俞建祥,李翔. 从结构角度提升图基础模型.arXiv,2024 年。
[198] Yi Fang, Dongzhe Fan, Sirui Ding, Ninghao Liu, and Qiaoyu Tan. Uniglm: Training one unified language model for text-attributed graphs. In WSDM, 2024. [198] 方毅、范东哲、丁思瑞、刘宁浩、谭乔宇.Uniglm:为文本属性图训练一个统一的语言模型。在 WSDM,2024 年。
[199] Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. Gcc: Graph contrastive coding for graph neural network pre-training. In KDD, 2020. [199] 邱杰忠、陈启斌、董玉晓、张静、杨红霞、丁明、王宽三、唐杰。Gcc:图神经网络预训练的图对比编码。在 KDD,2020 年。
[200] Zhenyu Hou, Haozhan Li, Yukuo Cen, Jie Tang, and Yuxiao Dong. Graphalign: Pretraining one graph neural network on multiple graphs via feature alignment. arXiv, 2024. [200] 侯振宇、李浩展、岑玉阔、唐杰、董玉晓。Graphalign:通过特征对齐在多个图上预训练一个图神经网络。arXiv,2024 年。
[201] Jingzhe Liu, Haitao Mao, Zhikai Chen, Wenqi Fan, Mingxuan Ju, Tong Zhao, Neil Shah, and Jiliang Tang. One model for one graph: A new perspective for pretraining with cross-domain graphs. arXiv, 2024. [201] 刘景哲、毛海涛、陈志凯、范文琦、鞠明轩、赵彤、沙尼尔、唐继良。一个模型换一个图:使用跨域图进行预训练的新视角。arXiv,2024 年。
[202] Yufei He, Zhenyu Hou, Yukuo Cen, Feng He, Xu Cheng, and Bryan Hooi. Generalizing graph transformers across diverse graphs and tasks via pre-training on industrial-scale data. arXiv, 2024. [202] 何宇飞、侯振宇、岑玉国、何峰、徐成和 Bryan Hooi。通过对工业规模数据进行预训练,在不同的图和任务中泛化图转换器。arXiv,2024 年。
[203] Yufei He, Yuan Sui, Xiaoxin He, Yue Liu, Yifei Sun, and Bryan Hooi. Unigraph2: Learning a unified embedding space to bind multimodal graphs. In WWW,2025W W W, 2025. [203] 何宇飞、袁绥、何晓欣、刘跃、孙一飞、胡伊。Unigraph2:学习一个统一的嵌入空间来绑定多模态图。在 WWW,2025W W W, 2025 .
[204] Yifei Sun, Yang Yang, Xiao Feng, Zijun Wang, Haoyang Zhong, Chunping Wang, and Lei Chen. Handling feature heterogeneity with learnable graph patches. In KDD, 2025. [204] 孙一飞,杨洋,冯晓,王子军,钟浩阳,王春平,陈磊.使用可学习图补丁处理特征异构性。在 KDD,2025 年。
[205] Xingtong Yu, Chang Zhou, Yuan Fang, and Xinming Zhang. Multigprompt for multi-task pre-training and prompting on graphs. In WWW,2024W W W, 2024. [205] 俞兴彤、常周、袁方、张新明.Multigprompt 用于多任务预训练和图表提示。在 WWW,2024W W W, 2024 .
[206] Xinke Jiang, Rihong Qiu, Yongxin Xu, Wentao Zhang, Yichen Zhu, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, and Yasha Wang. Ragraph: A general retrieval-augmented graph learning framework. In NeurIPS, 2024. [206] 江新科、邱日红、徐永新、张文涛、朱奕辰、张瑞哲、方玉晨、徐楚、赵俊峰、王雅莎.Ragraph:通用检索增强图学习框架。在 NeurIPS 中,2024 年。
[207] Vikas Garg, Stefanie Jegelka, and Tommi Jaakkola. Generalization and representational limits of graph neural networks. In ICML, 2020. [207] 维卡斯·加格、斯蒂芬妮·杰格尔卡和汤米·雅科拉。图神经网络的泛化和表示极限。在 ICML,2020 年。
[208] Zhengdao Chen, Lei Chen, Soledad Villar, and Joan Bruna. Can graph neural networks count substructures? In NeurIPS, 2020. [208] 陈正道、陈磊、索莱达·维拉尔和琼·布鲁娜。图神经网络可以计算子结构吗?在 NeurIPS,2020 年。
[209] Christopher Morris, Floris Geerts, Jan Tönshoff, and Martin Grohe. W1 meet vc. In ICML, 2023. [209] 克里斯托弗·莫里斯、弗洛里斯·吉尔茨、扬·通肖夫和马丁·格罗伊。W1 遇见 vc。ICML,2023 年。
[210] Bohang Zhang, Jingchu Gai, Yiheng Du, Qiwei Ye, Di He, and Liwei Wang. Beyond weisfeiler-lehman: A quantitative framework for gnn expressiveness. In ICLR, 2024. [210] 张伯航、盖景初、杜一恒、叶琦伟、迪赫、王立伟。超越 weisfeiler-lehman:gnn 表达性的定量框架。在 ICLR,2024 年。
[211] Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, and Yanfang Ye. Can llms convert graphs to text-attributed graphs? In NAACL, 2025. [211] 王泽宏、刘志伟、张哲元、马天一、张楚旭、叶艳芳。法学硕士可以将图形转换为文本属性的图形吗?在 NAACL,2025 年。
[212] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP, 2019. [212] 尼尔斯·雷默斯和伊琳娜·古列维奇。Sentence-bert:使用连体 bert-networks 的句子嵌入。在 EMNLP,2019 年。
[213] Ling Yang, Ye Tian, Minkai Xu, Zhongyi Liu, Shenda Hong, Wei Qu, Wentao Zhang, Bin Cui, Muhan Zhang, and Jure Leskovec. Vqgraph: Rethinking graph representation space for bridging gnns and mlps. arXiv, 2023. [213] 杨玲、叶田、徐敏凯、刘忠义、洪燊达、瞿伟、张文涛、崔斌、张慕涵、莱斯科维茨。Vqgraph:重新思考桥接 gnns 和 mlps 的图表示空间。arXiv,2023 年。
[214] Chunhui Zhang, Yijun Tian, Mingxuan Ju, Zheyuan Liu, Yanfang Ye, Nitesh Chawla, and Chuxu Zhang. Chasing all-round graph representation robustness: Model, training, and optimization. In The eleventh international conference on learning representations, 2022. [214] 张春辉、田奕君、鞠明轩、刘哲元、叶艳芳、Nitesh Chawla、张楚旭。追逐全方位图表示鲁棒性:模型、训练和优化。在 2022 年第十一届学习表征国际会议上。
[215] Yijun Tian, Chuxu Zhang, Ziyi Kou, Zheyuan Liu, Xiangliang Zhang, and Nitesh V Chawla. Ugmae: A unified framework for graph masked autoencoders. arXiv preprint arXiv:2402.08023, 2024. [215] 田奕军、张楚旭、寇子怡、刘哲元、张祥良、Nitesh V Chawla。Ugmae:图形掩码自动编码器的统一框架。arXiv 预印本 arXiv:2402.08023,2024 年。
[216] Mingxuan Ju, Tong Zhao, Qianlong Wen, Wenhao Yu, Neil Shah, Yanfang Ye, and Chuxu Zhang. Multi-task self-supervised graph neural networks enable stronger task generalization. In ICLR, 2023. [216] 鞠明轩、赵童、温乾隆、余文浩、沙尼尔、叶艳芳、张楚旭。多任务自监督图神经网络可实现更强的任务泛化。在 ICLR,2023 年。
[217] Zheyuan Liu, Chunhui Zhang, Yijun Tian, Erchi Zhang, Chao Huang, Yanfang Ye, and Chuxu Zhang. Fair graph representation learning via diverse mixture-of-experts. In Proceedings of the ACM Web Conference 2023, pages 28-38, 2023. [217] 刘哲元、张春辉、田奕军、张二赤、黄超、叶艳芳、张楚旭。通过不同的专家混合进行公平的图表示学习。2023 年 ACM 网络会议论文集,第 28-38 页,2023 年。
[218] Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, and Xiangyu Yue. Meta-transformer: A unified framework for multimodal learning. arXiv, 2023. [218] 张一元、龚凯雄、张开鹏、李宏生、于乔、欧阳万里、岳翔宇.元转换器:多模态学习的统一框架。arXiv,2023 年。
[219] Xinmiao Yu, Meng Qu, Xiaocheng Feng, and Bing Qin. Graphagent: Exploiting large language models for interpretable learning on text-attributed graphs, 2024. [219] 新妙于、孟曲、凤孝成、秦炳。Graphagent:利用大型语言模型对文本归因图进行可解释学习,2024 年。
[220] Jiawei Zhang. Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. arXiv, 2023. [220] 张佳伟.Graph-toolformer:通过 ChatGPT 增强的提示,赋予 llm 图形推理能力。arXiv,2023 年。
[221] Jianing Wang, Junda Wu, Yupeng Hou, Yao Liu, Ming Gao, and Julian McAuley. Instructgraph: Boosting large language models via graph-centric instruction tuning and preference alignment. arXiv, 2024. [221] 王佳宁、吴俊达、侯玉鹏、刘耀、高明、朱利安·麦考利。Instructgraph:通过以图为中心的指令调整和偏好对齐来提升大型语言模型。arXiv,2024 年。
[222] Qinyong Wang, Zhenxiang Gao, and Rong Xu. Graph agent: Explicit reasoning agent for graphs. arXiv, 2023. [222] 王钦永、高振祥、徐荣。图代理:图的显式推理代理。arXiv,2023 年。
[223] Yuanfu Sun, Zhengnan Ma, Yi Fang, Jing Ma, and Qiaoyu Tan. Graphicl: Unlocking graph learning potential in llms through structured prompt design. arXiv, 2025. [223] 孙元福、马正南、方毅、马静、谭巧玉。Graphicl:通过结构化提示设计释放法学硕士中的图学习潜力。arXiv,2025 年。
[224] Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. In SIGIR, 2024. [224] 唐佳彬、杨宇浩、魏巍、石磊、苏立新、程素琦、尹大伟、黄朝。Graphgpt:大型语言模型的图指令调优。在 SIGIR,2024 年。
[225] Haitong Luo, Xuying Meng, Suhang Wang, Tianxiang Zhao, Fali Wang, Hanyun Cao, and Yujun Zhang. Enhance graph alignment for large language models. arXiv, 2024. [225] 罗海彤、孟旭英、王素杭、赵天祥、王法立、曹汉云、张玉军.增强大型语言模型的图形对齐。arXiv,2024 年。
[226] Xi Zhu, Haochen Xue, Ziwei Zhao, Mingyu Jin, Wujiang Xu, Jingyuan Huang, Qifan Wang, Kaixiong Zhou, and Yongfeng Zhang. LLM as GNN: Graph vocabulary learning for graph foundation model, 2024. [226] 朱习、薛浩晨、赵紫薇、金明玉、徐吴江、黄景远、王启凡、周凯雄、张永峰。LLM 作为 GNN:图基础模型的图词汇学习,2024 年。
[227] Zheyuan Liu, Xiaoxin He, Yijun Tian, and Nitesh V Chawla. Can we soft prompt llms for graph learning tasks? In WWW,2024W W W, 2024. [227] Zheyuan Liu、Xiaoxin He、Yijun Tian 和 Nitesh V Chawla。我们可以为图学习任务软提示 llm 吗?在 WWW,2024W W W, 2024 .
[228] Duo Wang, Yuan Zuo, Fengzhi Li, and Junjie Wu. Llms as zero-shot graph learners: Alignment of gnn representations with llm token embeddings. NeurIPS, 2024. [228] 王铎、袁佐、李凤至、吴俊杰。Llm 作为零样本图学习器:gnn 表示与 llm 标记嵌入的一致性。神经 IPS,2024 年。
[229] Yanbiao Ji, Chang Liu, Xin Chen, Yue Ding, Dan Luo, Mei Li, Wenqing Lin, and Hongtao Lu. Nt-1lm: A novel node tokenizer for integrating graph structure into large language models. arXiv, 2024. [229] 纪彦彪、刘长、陈昕、丁岳、罗丹、李梅、林文庆、卢洪涛。Nt-1lm:一种新型节点分词器,用于将图结构集成到大型语言模型中。arXiv,2024 年。
[230] Yuhao Yang, Jiabin Tang, Lianghao Xia, Xingchen Zou, Yuxuan Liang, and Chao Huang. Graphagent: Agentic graph language assistant. arXiv, 2024. [230] 杨宇浩、唐佳彬、夏良浩、邹星辰、梁宇轩、黄朝。Graphagent:代理图语言助手。arXiv,2024 年。
[231] Jianan Zhao, Le Zhuo, Yikang Shen, Meng Qu, Kai Liu, Michael Bronstein, Zhaocheng Zhu, and Jian Tang. Graphtext: Graph reasoning in text space. arXiv, 2023. [231] 赵嘉楠、乐卓、沈一康、瞿孟、刘凯、迈克尔·布朗斯坦、朱兆成、唐健。Graphtext:文本空间中的图推理。arXiv,2023 年。
[232] Yun Zhu, Yaoke Wang, Haizhou Shi, and Siliang Tang. Efficient tuning and inference for large language models on textual graphs. arXiv, 2024. [232] 朱云、王耀克、石海州、唐思亮。对文本图上的大型语言模型进行高效调优和推理。arXiv,2024 年。
[233] Zhikai Chen, Haitao Mao, Hongzhi Wen, Haoyu Han, Wei Jin, Haiyang Zhang, Hui Liu, and Jiliang Tang. Label-free node classification on graphs with large language models (llms). In ICLR, 2023. [233] 陈志凯、毛海涛、温宏志、韩浩宇、金伟、张海洋、刘辉、唐继良。使用大型语言模型 (llms) 对图进行无标签节点分类。在 ICLR,2023 年。
[234] Hibiki Taguchi, Xin Liu, and Tsuyoshi Murata. Graph convolutional networks for graphs containing missing features. Future Generation Computer Systems, 2021. [234] 田口响、刘昕和村田刚。包含缺失特征的图的图卷积网络。下一代计算机系统,2021 年。
[235] Daeho Um, Jiwoong Park, Seulki Park, and Jin Young Choi. Confidence-based feature imputation for graphs with partially known features. In ICLR, 2023. [235] Daeho Um、Jiwoong Park、Seulki Park 和 Jin Young Choi。对具有部分已知特征的图形进行基于置信度的特征插补。在 ICLR,2023 年。
[236] Shubham Gupta, Sahil Manchanda, Sayan Ranu, and Srikanta J Bedathur. Grafenne: learning on graphs with heterogeneous and dynamic feature sets. In International Conference on Machine Learning, 2023. [236] Shubham Gupta、Sahil Manchanda、Sayan Ranu 和 Srikanta J Bedathur。Grafenne:在具有异构和动态特征集的图上学习。2023 年机器学习国际会议。
[237] Ajay Jaiswal, Nurendra Choudhary, Ravinarayana Adkathimar, Muthu P Alagappan, Gaurush Hiranandani, Ying Ding, Zhangyang Wang, Edward W Huang, and Karthik Subbian. All against some: Efficient integration of large language models for message passing in graph neural networks. arXiv, 2024. [237] Ajay Jaiswal、Nurendra Choudhary、Ravinarayana Adkathimar、Muthu P Alagappan、Gaurush Hiranandani、Ying Ding、Zhangyang Wang、Edward W Huang 和 Karthik Subbian。All against some:用于图神经网络中消息传递的大型语言模型的高效集成。arXiv,2024 年。
[238] Divyansha Lachi, Mehdi Azabou, Vinam Arora, and Eva Dyer. GraphFM: A Scalable Framework for Multi-Graph Pretraining. arXiv, 2024. [238] 迪维扬莎·拉奇、迈赫迪·阿扎布、维南·阿罗拉和伊娃·戴尔。GraphFM:用于多图预训练的可扩展框架。arXiv,2024 年。
[239] Chaoxi Niu, Guansong Pang, Ling Chen, and Bing Liu. Replay-and-forget-free graph class-incremental learning: A task profiling and prompting approach. arXiv preprint arXiv:2410.10341, 2024. [239] 牛朝曦、庞冠松、陈玲、刘兵。免重放和忘记的图类增量学习:一种任务分析和提示方法。arXiv 预印本 arXiv:2410.10341,2024 年。
[240] Yun Zhu, Yaoke Wang, Haizhou Shi, Zhenshuo Zhang, Dian Jiao, and Siliang Tang. Graphcontrol: Adding conditional control to universal graph pre-trained models for graph domain transfer learning. In WWW,2024W W W, 2024. [240] 朱云、王耀科、石海州、张振硕、董典、唐思亮。Graphcontrol:为通用图预训练模型添加条件控制,用于图域迁移学习。在 WWW,2024W W W, 2024 .
[241] Shuo Wang, Bokui Wang, Zhixiang Shen, Boyan Deng, and Zhao Kang. Multi-domain graph foundation models: Robust knowledge transfer via topology alignment. arXiv, 2025. [241] 王硕、王伯奎、沈志祥、邓伯彦、赵康。多域图基础模型:通过拓扑对齐实现稳健的知识转移。arXiv,2025 年。
[242] Yu Song, Haitao Mao, Jiachen Xiao, Jingzhe Liu, Zhikai Chen, Wei Jin, Carl Yang, Jiliang Tang, and Hui Liu. A pure transformer pretraining framework on text-attributed graphs. arXiv, 2024. [242] 于松、毛海涛、肖佳辰、刘景哲、陈志凯、金伟、杨卡尔、唐继良、刘慧。文本归因图上的纯 Transformer 预训练框架。arXiv,2024 年。
[243] Huachi Zhou, Jiahe Du, Chuang Zhou, Chang Yang, Yilin Xiao, Yuxuan Xie, and Xiao Huang. Each graph is a new language: Graph learning with llms. arXiv, 2025. [243] 华池周、杜嘉禾、闯周、常杨、萧一林、谢玉轩、小黄。每个图都是一种新语言:使用 llms 进行图学习。arXiv,2025 年。
[244] Yiran Qiao, Xiang Ao, Yang Liu, Jiarong Xu, Xiaoqian Sun, and Qing He. Login: A large language model consulted graph neural network training framework. arXiv, 2024. [244] 乔怡然、向敖、杨柳、徐佳荣、孙晓倩、何庆。登录:一个大型语言模型参考图神经网络训练框架。arXiv,2024 年。
[245] Guangxin Su, Yifan Zhu, Wenjie Zhang, Hanchen Wang, and Ying Zhang. Bridging large language models and graph structure learning models for robust representation learning. arXiv, 2024. [245] 苏广信、朱一帆、张文杰、王汉臣、张英。桥接大型语言模型和图结构学习模型,实现鲁棒的表示学习。arXiv,2024 年。
[246] Taiyan Zhang, Renchi Yang, Mingyu Yan, Xiaochun Ye, Dongrui Fan, and Yurui Lai. Cost-effective label-free node classification with llms. arXiv, 2024. [246] 张太炎、杨仁池、严明玉、叶晓春、范东瑞、赖玉瑞。使用 LLM 进行经济高效的无标签节点分类。arXiv,2024 年。
[247] Zhihao Wen and Yuan Fang. Augmenting low-resource text classification with graph-grounded pretraining and prompting. In SIGIR, 2023. [247] 志豪温、袁方.通过基于图形的预训练和提示来增强低资源文本分类。在 SIGIR,2023 年。
[248] Zhong Guan, Hongke Zhao, Likang Wu, Ming He, and Jianpin Fan. LangTopo: Aligning Language Descriptions of Graphs with Tokenized Topological Modeling. arXiv, 2024. [248] 钟关、赵宏科、吴立康、何明、范建品。LangTopo:将图的语言描述与标记化拓扑建模保持一致。arXiv,2024 年。
[249] Zipeng Liu, Likang Wu, Ming He, Zhong Guan, Hongke Zhao, and Nan Feng. Multi-view empowered structural graph wordification for language models. arXiv, 2024. [249] 刘子鹏、吴立康、何明、钟管、赵红科、凤南。语言模型的多视图赋能结构图措辞。arXiv,2024 年。
[250] Shunxin Xiao, Shiping Wang, Yuanfei Dai, and Wenzhong Guo. Graph neural networks in node classification: survey and evaluation. Machine Vision and Applications, 33(1):4, 2022. [250] 肖顺信、王世平、戴元飞、郭文忠。节点分类中的图神经网络:调查和评估。机器视觉与应用, 33(1):4, 2022.
[251] Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Mikhail Galkin, and Jiliang Tang. Position: Graph foundation models are already here. In ICML, 2024. [251] 毛海涛、陈志凯、汤文卓、赵佳楠、马尧、赵童、尼尔·沙阿、米哈伊尔·加尔金和唐吉良。位置:图基础模型已经在这里。ICML,2024 年。
[252] Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. Geom-gcn: Geometric graph convolutional networks. In ICLR, 2020. [252] 裴宏斌、魏炳哲、陈川、于磊、杨博。Geom-gcn:几何图卷积网络。在 ICLR,2020 年。
[253] Tianyi Ma, Yiyue Qian, Shinan Zhang, Chuxu Zhang, and Yanfang Ye. Adaptive expansion for hypergraph learning. arXiv, 2025. [253] 马天一、千一月、张世南、张楚旭、叶艳芳。超图学习的自适应扩展。arXiv,2025 年。
[254] Yiyue Qian, Chunhui Zhang, Yiming Zhang, Qianlong Wen, Yanfang Ye, and Chuxu Zhang. Comodality graph contrastive learning for imbalanced node classification. In NeurIPS, 2022. [254] 钱奕月、张春晖、张一鸣、温乾隆、叶艳芳、张楚旭。用于不平衡节点分类的共模态图对比学习。在 NeurIPS,2022 年。
[255] Tianyi Ma, Yiyue Qian, Zehong Wang, Zheyuan Zhang, Chuxu Zhang, and Yanfang Ye. Llm-empowered class imbalanced graph prompt learning for online drug trafficking detection. arXiv, 2025. [255] 马天一、钱奕月、王泽宏、张哲元、张楚旭、叶艳芳。Llm 赋能的类不平衡图提示学习,用于在线贩毒检测。arXiv,2025 年。
[256] Marek Śmieja, Łukasz Struski, Jacek Tabor, Bartosz Zieliński, and Przemysław Spurek. Processing of missing data by neural networks. In NeurIPS, 2018. [256] Marek Śmieja、Łukasz Struski、Jacek Tabor、Bartosz Zieliński 和 Przemysław Spurek。神经网络处理缺失数据。在 NeurIPS,2018 年。
[257] Jitao Zhao, Di Jin, Meng Ge, Lianze Shan, Xin Wang, Dongxiao He, and Zhiyong Feng. Fug: Featureuniversal graph contrastive pre-training for graphs with diverse node features. In NeurIPS, 2024. [257] 赵季涛、狄进、孟哥、山莲泽、王新、何东晓、冯志勇。Fug:Feature 针对具有不同节点特征的图进行通用图对比预训练。在 NeurIPS 中,2024 年。
[258] Hervé Abdi and Lynne J Williams. Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2010. [258] 埃尔维·阿卜迪和琳恩·威廉姆斯。主成分分析。Wiley 跨学科评论:计算统计,2010 年。
[259] Xiao Wang, Deyu Bo, Chuan Shi, Shaohua Fan, Yanfang Ye, and Philip S Yu. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE transactions on big data, 2022. [259] 小王、德玉波、传石、范少华、叶艳芳、于菲利普·斯·俞。异构图嵌入综述:方法、技术、应用和来源。IEEE 大数据汇刊,2022 年。
[260] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In ICCV, 2023. [260] Lvmin Zhang、Anyi Rao 和 Maneesh Agrawala。向文本到图像扩散模型添加条件控制。在 ICCV,2023 年。
[261] Karsten M Borgwardt, Arthur Gretton, Malte J Rasch, Hans-Peter Kriegel, Bernhard Schölkopf, and Alex J Smola. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 2006. [261] Karsten M Borgwardt、Arthur Gretton、Malte J Rasch、Hans-Peter Kriegel、Bernhard Schölkopf 和 Alex J Smola。通过核最大平均差异对结构化生物数据进行整合。生物信息学,2006 年。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
[262] Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, et al. Perceiver io: A general architecture for structured inputs & outputs. arXiv, 2021. [262] 安德鲁·耶格尔、塞巴斯蒂安·博尔乔、让-巴蒂斯特·阿莱拉克、卡尔·多尔施、卡塔林·约内斯库、大卫·丁、斯坎达·科普拉、丹尼尔·佐兰、安德鲁·布洛克、埃文·谢尔哈默等。Perceiver io:结构化输入和输出的通用架构。arXiv,2021 年。
[263] Xixi Wu, Yifei Shen, Fangzhou Ge, Caihua Shan, Yizhu Jiao, Xiangguo Sun, and Hong Cheng. A comprehensive analysis on llm-based node classification algorithms. arXiv, 2025. [263] 吴喜曦、沈一飞、葛方州、山彩华、焦一竹、孙相国、程洪。基于 LLM 的节点分类算法综合分析。arXiv,2025 年。
[264] Tianyi Ma, Yiyue Qian, Chuxu Zhang, and Yanfang Ye. Hypergraph contrastive learning for drug trafficking community detection. In ICDM, 2023. [264] 马天一、千一月、张楚旭、叶艳芳。用于贩毒社区检测的超图对比学习。在 ICDM,2023 年。
[265] Yiyue Qian, Tianyi Ma, Chuxu Zhang, and Yanfang Ye. Adaptive graph enhancement for imbalanced multi-relation graph learning. In WSDM, 2025. [265] 钱奕月、马天一、张楚旭、叶艳芳。用于不平衡多关系图学习的自适应图增强。在 WSDM,2025 年。
[266] Kaixiong Zhou, Xiao Huang, Daochen Zha, Rui Chen, Li Li, Soo-Hyun Choi, and Xia Hu. Dirichlet energy constrained learning for deep graph neural networks. In NerIPS, 2021. [266] 周凯雄、小黄、扎道琛、陈瑞、李李、崔秀贤、夏胡。深度图神经网络的狄利克雷能量约束学习。在 NerIPS 中,2021 年。
[267] Robert Gray. Vector quantization. IEEE Assp Magazine, 1984. [267] 罗伯特·格雷。矢量量化。IEEE Assp 杂志,1984 年。
[268] Yonghua Zhu, Lei Feng, Zhenyun Deng, Yang Chen, Robert Amor, and Michael Witbrock. Robust node classification on graph data with graph and label noise. In AAAAI, 2024. [268] 朱永华、雷锋、邓振云、杨晨、罗伯特·阿莫尔和迈克尔·维特布洛克。对具有图和标签噪声的图数据进行鲁棒节点分类。在 AAAAI,2024 年。
[269] Yiqiao Li, Jianlong Zhou, Sunny Verma, and Fang Chen. A survey of explainable graph neural networks: Taxonomy and evaluation metrics. arXiv, 2022. [269] 李一乔、周建龙、桑尼·维尔玛、陈方。可解释图神经网络的调查:分类和评估指标。arXiv,2022 年。
[270] Antonio Longa, Steve Azzolin, Gabriele Santin, Giulia Cencetti, Pietro Liò, Bruno Lepri, and Andrea Passerini. Explaining the explainers in graph neural networks: a comparative study. ACM Computing Surveys, 57(5):1-37, 2025. [270] 安东尼奥·隆加、史蒂夫·阿佐林、加布里埃尔·桑廷、朱莉娅·森塞蒂、彼得罗·利奥、布鲁诺·莱普里和安德里亚·帕塞里尼。解释图神经网络中的解释器:一项比较研究。ACM 计算调查,57(5):1-37,2025。
[271] Xingyue Huang, Pablo Barceló, Michael M Bronstein, İsmail İlkan Ceylan, Mikhail Galkin, Juan L Reutter, and Miguel Romero Orth. How expressive are knowledge graph foundation models? arXiv, 2025. [271] 黄星月、巴勃罗·巴塞罗、迈克尔·布朗斯坦、伊斯梅尔·伊尔坎·锡兰、米哈伊尔·加尔金、胡安·罗伊特和米格尔·罗梅罗·奥尔特。知识图谱基础模型的表现力如何?arXiv,2025 年。
[272] Yuanning Cui, Zequn Sun, and Wei Hu. A prompt-based knowledge graph foundation model for universal in-context reasoning. NeurIPS, 2025. [272] 崔元宁、孙泽群、胡卫。一种基于提示的知识图谱基础模型,用于通用上下文推理。神经 IPS,2025 年。
[273] Meng Jiang. Cross-network learning with partially aligned graph convolutional networks. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 746-755, 2021. [273] 孟江.使用部分对齐图卷积网络进行跨网络学习。第 27 届 ACM SIGKDD 知识发现与数据挖掘会议论文集,第 746-755 页,2021 年。
[274] Kaiwen Dong, Haitao Mao, Zhichun Guo, and Nitesh V Chawla. Universal link predictor by in-context learning on graphs. CoRR, 2024. [274] 董凯文、毛海涛、郭志春和尼特什·乔拉。通过图上下文学习的通用链接预测器。CoRR,2024 年。
[275] Wenqing Zheng, Edward W Huang, Nikhil Rao, Zhangyang Wang, and Karthik Subbian. You only transfer what you share: Intersection-induced graph transfer learning for link prediction. TMLR, 2023. [275] 郑文清、黄爱德华、饶尼基尔、王张阳和卡蒂克·苏比安。您只传输您共享的内容:用于链接预测的交集诱导图迁移学习。TMLR,2023 年。
[276] Xiao Shen, Mengqiu Shao, Shirui Pan, Laurence T. Yang, and Xi Zhou. Domain-adaptive graph attention-supervised network for cross-network edge classification. TNNLS, 2024. [276] 沈小、邵梦秋、潘世瑞、劳伦斯·杨、习周。用于跨网络边缘分类的域自适应图注意力监督网络。TNNLS,2024 年。
[277] Junhan Yang, Zheng Liu, Shitao Xiao, Chaozhuo Li, Defu Lian, Sanjay Agrawal, Amit Singh, Guangzhong Sun, and Xing Xie. Graphformers: Gnn-nested transformers for representation learning on textual graph. NeurIPS, 2021. [277] 杨俊汉、刘郑、肖石涛、李朝卓、连德福、桑杰·阿格拉瓦尔、阿米特·辛格、孙光忠、谢星。Graphformers:用于在文本图上进行表示学习的 gnn 嵌套转换器。神经 IPS,2021 年。
[278] Jianfei Gao, Yangze Zhou, Jincheng Zhou, and Bruno Ribeiro. Double equivariance for inductive link prediction for both new nodes and new relation types. arXiv preprint arXiv:2302.01313, 2023. [278] 高建飞、扬泽周、金城周、布鲁诺·里贝罗。新节点和新关系类型的归纳链接预测的双重等方差。arXiv 预印本 arXiv:2302.01313,2023。
[279] Jincheng Zhou, Beatrice Bevilacqua, and Bruno Ribeiro. A multi-task perspective for link prediction with new relation types and nodes. In NeurIPS Workshop, 2023. [279] 金城周、比阿特丽斯·贝维拉夸和布鲁诺·里贝罗。使用新关系类型和节点进行链接预测的多任务视角。在 NeurIPS 研讨会上,2023 年。
[280] Bowen Jin, Yu Zhang, Yu Meng, and Jiawei Han. Edgeformers: Graph-empowered transformers for representation learning on textual-edge networks. In ICLR, 2023. [280] 金博文、张宇、于孟、韩嘉伟。Edgeformers:图赋能的转换器,用于在文本边缘网络上进行表示学习。在 ICLR,2023 年。
[281] Jaejun Lee, Chanyoung Chung, and Joyce Jiyoung Whang. Ingram: Inductive knowledge graph embedding via relation graphs. In ICML, 2023. [281] 李在俊、钟灿英和黄智英。Ingram:通过关系图嵌入归纳知识图谱。ICML,2023 年。
[282] Yuxin Guo, Cheng Yang, Yuluo Chen, Jixi Liu, Chuan Shi, and Junping Du. A data-centric framework to endow graph neural networks with out-of-distribution detection ability. In KDD, 2023. [282] 郭宇新、杨成、陈玉洛、刘继希、史川、杜俊平。以数据为中心的框架,赋予图神经网络分布外检测能力。在 KDD,2023 年。
[283] Anchun Gui, Jinqiang Ye, and Han Xiao. G-adapter: Towards structure-aware parameter-efficient transfer learning for graph transformer networks. In AAAI, 2024. [283] 桂安春、叶金强、韩晓。G-adapter:面向图变换器网络的结构感知参数高效迁移学习。在 AAAI,2024 年。
[284] Jiying Zhang, Xi Xiao, Long-Kai Huang, Yu Rong, and Yatao Bian. Fine-tuning graph neural networks via graph topology induced optimal transport. In IJCAI, 2022. [284] 张继英、习晓、黄龙开、于融、卞亚陶。通过图拓扑微调图神经网络诱导了最优传输。2022 年 IJCAI。
[285] Kaveh Hassani. Cross-domain few-shot graph classification. In AAAI, 2022. [285] 卡维·哈萨尼。跨域少样本图分类。在 AAAI,2022 年。
[286] Yuanfu Lu, Xunqiang Jiang, Yuan Fang, and Chuan Shi. Learning to pre-train graph neural networks. AAAI, 2021. [286] 卢元福、江勋强、元方、传时。学习预训练图神经网络。AAAI,2021 年。
[287] Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data. In NeurIPS, 2020. [287] 于融、卞亚陶、徐廷洋、谢未央、英伟、黄文兵、黄君洲。大规模分子数据上的自监督图转换器。在 NeurIPS,2020 年。
[288] Yifei Sun, Qi Zhu, Yang Yang, Chunping Wang, Tianyu Fan, Jiajun Zhu, and Lei Chen. Fine-tuning graph neural networks by preserving graph generative patterns. In AAAI, 2024. [288] 孙一飞、朱琦、杨洋、王春平、范天宇、朱佳军、陈磊。通过保留图生成模式来微调图神经网络。在 AAAI,2024 年。
[289] Fabrizio Frasca, Fabian Jogl, Moshe Eliasof, Matan Ostrovsky, Carola-Bibiane Schönlieb, Thomas Gärtner, and Haggai Maron. Towards foundation models on graphs: An analysis on cross-dataset transfer of pretrained gnns. arXiv, 2024. [289] 法布里齐奥·弗拉斯卡、法比安·乔格尔、摩西·埃利亚索夫、马坦·奥斯特洛夫斯基、卡罗拉-比比安·舍恩利布、托马斯·加特纳和哈盖·马龙。面向图基础模型:预训练 gnn 跨数据集传输分析.arXiv,2024 年。
[290] Frederik Wenkel, Guy Wolf, and Boris Knyazev. Pretrained language models to solve graph tasks in natural language. In ICML Workshop, 2023. [290] 弗雷德里克·温克尔、盖伊·沃尔夫和鲍里斯·克尼亚泽夫。预训练的语言模型,用于用自然语言解决图任务。2023 年 ICML 研讨会。
[291] Zhangyang Gao, Daize Dong, Cheng Tan, Jun Xia, Bozhen Hu, and Stan Z Li. A graph is worth kk words: Euclideanizing graph using pure transformer. arXiv, 2024. [291] 张扬·高、董岱泽、程谭、夏军、胡伯珍、李斯坦。一个图值得 kk 一言:使用纯变压器的欧几里得化图。arXiv,2024 年。
[292] Yang Yao, Xin Wang, Zeyang Zhang, Yijian Qin, Ziwei Zhang, Xu Chu, Yuekui Yang, Wenwu Zhu, and Hong Mei. Exploring the potential of large language models in graph generation. arXiv, 2024. [292] 杨尧、王新、张泽阳、秦一健、张紫薇、徐褚、杨月魁、朱文武、红梅。探索大型语言模型在图生成中的潜力。arXiv,2024 年。
[293] Haiteng Zhao, Shengchao Liu, Ma Chang, Hannan Xu, Jie Fu, Zhihong Deng, Lingpeng Kong, and Qi Liu. Gimlet: A unified graph-text model for instruction-based molecule zero-shot learning. In NeurIPS, 2023. [293] 赵海腾、刘胜超、马昌、徐汉南、傅杰、邓志红、孔凌鹏、刘琦。Gimlet:用于基于指令的分子零样本学习的统一图文本模型。在 NeurIPS,2023 年。
[294] Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, and Suhang Wang. Llm and gnn are complementary: Distilling 1 lm for multimodal graph learning. arXiv, 2024. [294] 徐俊杰、吴宗宇、林敏华、张翔、王素航.Llm 和 gnn 是互补的:蒸馏 1 lm 用于多模态图学习。arXiv,2024 年。
[295] Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, and Pascal Frossard. Digress: Discrete denoising diffusion for graph generation. arXiv, 2022. [295] 克莱门特·维尼亚克、伊戈尔·克劳丘克、安托万·西劳丁、王博汉、沃尔坎·塞弗和帕斯卡尔·弗罗萨德。题外话:用于图生成的离散去噪扩散。arXiv,2022 年。
[296] Martin Simonovsky and Nikos Komodakis. Graphvae: Towards generation of small graphs using variational autoencoders. In ICANN, 2018. [296] 马丁·西蒙诺夫斯基和尼科斯·科莫达基斯。Graphvae:使用变分自动编码器生成小图。ICANN,2018 年。
[297] Jaehyeong Jo, Seul Lee, and Sung Ju Hwang. Score-based generative modeling of graphs via the system of stochastic differential equations. In ICML, 2022. [297] Jaehyeong Jo、Seul Lee 和 Sung Ju Hwang。通过随机微分方程组对图进行基于分数的生成建模。ICML,2022 年。
[298] Bowen Jin, Ziqi Pang, Bingjun Guo, Yu-Xiong Wang, Jiaxuan You, and Jiawei Han. Instructg2i: Synthesizing images from multimodal attributed graphs. arXiv, 2024. [298] 金博文、庞子琦、郭炳军、王毓雄、游家璇、韩嘉伟。Instructg2i:从多模态属性图合成图像。arXiv,2024 年。
[299] Yu Wang, Ryan A Rossi, Namyong Park, Huiyuan Chen, Nesreen K Ahmed, Puja Trivedi, Franck Dernoncourt, Danai Koutra, and Tyler Derr. Large generative graph models. arXiv, 2024. [299] Yu Wang、Ryan A Rossi、Namyong Park、Huiyuan Chen、Nesreen K Ahmed、Puja Trivedi、Franck Dernoncourt、Danai Koutra 和 Tyler Derr。大型生成式图模型。arXiv,2024 年。
[300] Qiannan Zhang, Shichao Pei, Qiang Yang, Chuxu Zhang, Nitesh Chawla, and Xiangliang Zhang. Cross-domain few-shot graph classification with a reinforced task coordinator. AAAI, 2023. [300] 张倩南、裴世超、杨强、张楚旭、尼特什·查拉、张向良。具有强化任务协调器的跨域少样本图分类。AAAI,2023 年。
[301] Yingjie Zhu, Xuefeng Bai, Kehai Chen, Yang Xiang, and Min Zhang. Benchmarking and improving large vision-language models for fundamental visual graph understanding and reasoning. arXiv, 2024. [301] 朱英杰、白雪峰、陈克海、杨翔、张敏.对大型视觉语言模型进行基准测试和改进,以实现基本的视觉图理解和推理。arXiv,2024 年。
[302] Jiacheng Lin, Kun Qian, Haoyu Han, Nurendra Choudhary, Tianxin Wei, Zhongruo Wang, Sahika Genc, Edward W. Huang, Sheng Wang, Karthik Subbian, Danai Koutra, and Jimeng Sun. GT2Vec: Large Language Models as Multi-Modal Encoders for Text and Graph-Structured Data. arXiv, 2025. [302] Jiacheng Lin, Kun Qian, Haoyu Han, Nurendra Choudhary, Tianxin Wei, Zhongruo Wang, Sahika Genc, Edward W. Huang, Sheng Wang, Karthik Subbian, Danai Koutra, and Jimeng Sun.GT2Vec:大型语言模型作为文本和图结构数据的多模态编码器。arXiv,2025 年。
[303] Qihang Ai, Jianwu Zhou, Haiyun Jiang, Lemao Liu, and Shuming Shi. When graph data meets multimodal: A new paradigm for graph understanding and reasoning. arXiv, 2023. [303] 艾启航、周建武、江海云、刘乐茂、史淑明。当图数据遇上多模态:图理解和推理的新范式。arXiv,2023 年。
[304] Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh V Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, and Bryan Hooi. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. In NeurIPS, 2024. [304] 何晓欣、田奕君、孙一飞、Nitesh V Chawla、Thomas Laurent、Yann LeCun、Xavier Bresson 和 Bryan Hooi。G-retriever:用于文本图理解和问答的检索增强生成。在 NeurIPS 中,2024 年。
[305] Yanbin Wei, Shuai Fu, Weisen Jiang, Zejian Zhang, Zhixiong Zeng, Qi Wu, James T. Kwok, and Yu Zhang. GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning. arXiv, 2024. [305] 魏彦斌、傅帅、江伟森、张泽建、曾志雄、吴琦、郭志强、张宇。GITA:用于视觉语言图推理的图形到视觉和文本的集成。arXiv,2024 年。
[306] Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Dinh Phung, Chen Gong, and Shirui Pan. Gfm-rag: Graph foundation model for retrieval augmented generation. arXiv, 2025. [306] 罗林浩、赵子成、Gholamreza Haffari、Dinh Phung、Chen Gong 和 Shirui Pan。Gfm-rag:检索增强生成的图基础模型。arXiv,2025 年。
[307] Chen Wang, Yueqing Liang, Zhiwei Liu, Tao Zhang, and S Yu Philip. Pre-training graph neural network for cross domain recommendation. In 2021 IEEE Third International Conference on Cognitive Machine Intelligence (CogMI), 2021. [307] 王陈、梁乐清、刘志伟、张涛和 S Yu Philip。用于跨域推荐的预训练图神经网络。2021 年 IEEE 第三届认知机器智能国际会议 (CogMI),2021 年。
[308] Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. Llmrec: Large language models with graph augmentation for recommendation. In WSDM, 2024. [308] 魏巍、旭斌任、唐家斌、王钦永、苏立新、程素琦、王俊峰、尹大伟、黄朝。Llmrec:具有图增强功能的大型语言模型,用于推荐。在 WSDM,2024 年。
[309] Shijie Geng, Juntao Tan, Shuchang Liu, Zuohui Fu, and Yongfeng Zhang. Vip5: Towards multimodal foundation models for recommendation. arXiv, 2023. [309] 耿世杰、谭俊涛、刘树昌、傅作辉、张永峰。Vip5:面向推荐的多模态基础模型。arXiv,2023 年。
[310] Andreas Damianou, Francesco Fabbri, Paul Gigioli, Marco De Nadai, Alice Wang, Enrico Palumbo, and Mounia Lalmas. Towards graph foundation models for personalization. In WWW,2024W W W, 2024. [310] 安德烈亚斯·达米亚努、弗朗切斯科·法布里、保罗·吉乔利、马可·德纳代、爱丽丝·王、恩里科·帕伦博和穆尼亚·拉尔马斯。走向个性化的图基础模型。在 WWW,2024W W W, 2024 .
[311] Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. Representation learning with large language models for recommendation. In WWW,2024W W W, 2024. [311] 旭彬任、魏巍、夏良浩、苏立新、程素琦、王俊峰、尹大伟、黄朝。使用大型语言模型进行推荐的表示学习。在 WWW,2024W W W, 2024 .
[312] Jiazhen Chen, Sichao Fu, Zhibin Zhang, Zheng Ma, Mingbin Feng, Tony S Wirjanto, and Qinmu Peng. Towards cross-domain few-shot graph anomaly detection. arXiv, 2024. [312] 陈佳珍、傅思超、张志斌、郑马、冯明斌、托尼·维尔詹托、彭秦木。面向跨域少样本图异常检测。arXiv,2024 年。
[313] Qizhou Wang, Guansong Pang, Mahsa Salehi, Wray Buntine, and Christopher Leckie. Cross-Domain Graph Anomaly Detection via Anomaly-aware Contrastive Alignment. In AAAI, 2023. [313] 王琦州、庞冠松、马赫萨·萨利希、雷·邦廷和克里斯托弗·莱基。通过异常感知对比对齐进行跨域图异常检测。在 AAAI,2023 年。
[314] Chaoxi Niu, Hezhe Qiao, Changlu Chen, Ling Chen, and Guansong Pang. Zero-shot generalist graph anomaly detection with unified neighborhood prompts. arXiv, 2024. [314] 牛朝曦、乔赫哲、陈长芦、陈玲、庞冠松。具有统一邻域提示的零样本通才图异常检测。arXiv,2024 年。
[315] Yixin Liu, Shiyuan Li, Yu Zheng, Qingfeng Chen, Chengqi Zhang, and Shirui Pan. Arc: a generalist graph anomaly detector with in-context learning. In NeurIPS, 2025. [315] Yixin Liu、Shiyuan Li、Yu Zheng、Qingfeng Chen、Chengqi Zhang 和 Shirui Pan。Arc:具有上下文学习的通才图异常检测器。在 NeurIPS 中,2025 年。
[316] Kaize Ding, Kai Shu, Xuan Shan, Jundong Li, and Huan Liu. Cross-domain graph anomaly detection. IEEE Transactions on Neural Networks and Learning Systems, 33(6):2406-2415, 2021. [316] 丁凯泽、开树、玄山、李俊东、刘桓。跨域图异常检测。IEEE 神经网络和学习系统汇刊,33(6):2406-2415,2021。
[317] Swarnadeep Saha, Prateek Yadav, Lisa Bauer, and Mohit Bansal. Explagraphs: An explanation graph generation task for structured commonsense reasoning. arXiv, 2021. [317] 斯瓦纳迪普·萨哈、普拉泰克·亚达夫、丽莎·鲍尔和莫希特·班萨尔。Explagraphs:结构化常识推理的解释图生成任务。arXiv,2021 年。
[318] Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In CVPR, 2019. [318] 德鲁·哈德森和克里斯托弗·曼宁。Gqa:用于现实世界视觉推理和组合问答的新数据集。在 CVPR,2019 年。
[319] Wen-tau Yih, Matthew Richardson, Christopher Meek, Ming-Wei Chang, and Jina Suh. The value of semantic parse labeling for knowledge base question answering. In ACL,2016A C L, 2016. [319] 温头·易、马修·理查森、克里斯托弗·米克、张明伟和徐珍娜。语义解析标记对知识库问答的价值。在 ACL,2016A C L, 2016 .
[320] Linhao Luo, Yuan-Fang Li, Gholamreza Haffari, and Shirui Pan. Reasoning on graphs: Faithful and interpretable large language model reasoning. arXiv, 2023. [320] 罗林浩、李元芳、Gholamreza Haffari 和 Shirui Pan。图推理:忠实且可解释的大型语言模型推理。arXiv,2023 年。
[321] Yijun Tian, Huan Song, Zichen Wang, Haozhu Wang, Ziqing Hu, Fang Wang, Nitesh V Chawla, and Panpan Xu. Graph neural prompting with large language models. In AAAI, 2024. [321] 田奕君、宋焕、王子辰、王浩柱、胡子青、王芳、Nitesh V Chawla 和 Panpan Xu。使用大型语言模型绘制图神经提示。在 AAAI,2024 年。
[322] Zijian Li, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian, Jun Zhang, and Rui Wang. Graph neural network enhanced retrieval for question answering of llms. arXiv preprint arXiv:2406.06572, 2024. [322] 李子健、郭庆燕、邵佳伟、雷松、江卞、张军、王瑞。图神经网络增强检索,用于法学硕士的问答。arXiv 预印本 arXiv:2406.06572,2024。
[323] Inyoung Choi, Sukwon Yun, Jiayi Xin, Jie Peng, Tianlong Chen, and Qi Long. Multimodal graph-llm: Leveraging graph-enhanced llms for multimodal healthcare predictions. Openreview, 2025. [323] Inyoung Choi、Sukwon Yun、Jiayi Xin、Jie Peng、Tianlong Chen 和 Qi Long。多模态图 llm:利用图增强的 llm 进行多模态医疗保健预测。开放评论,2025 年。
[324] Shiyu Tian, Yangyang Luo, Tianze Xu, Caixia Yuan, Huixing Jiang, Chen Wei, and Xiaojie Wang. Kg-adapter: Enabling knowledge graph integration in large language models through parameter-efficient fine-tuning. In ACL,2024A C L, 2024. [324] 田世宇、罗洋洋、徐天泽、袁彩霞、江惠兴、陈伟、王晓杰.Kg-adapter:通过参数高效的微调,实现大型语言模型中的知识图谱集成。在 ACL,2024A C L, 2024 .
[325] Aabid A Mir, Megat F Zuhairi, and Shahrulniza Musa Musa. Graph anomaly detection with graph convolutional networks. International Journal of Advanced Computer Science & Applications, 14(11), 2023. [325] 阿比德·米尔、梅加特·祖海里和沙鲁尼扎·穆萨·穆萨。使用图卷积网络进行图异常检测。国际高级计算机科学与应用杂志, 14(11), 2023.
[326] Hwan Kim, Byung Suk Lee, Won-Yong Shin, and Sungsu Lim. Graph anomaly detection with graph neural networks: Current status and challenges. IEEE Access, 10:111820-111829, 2022. [326] Hwan Kim、Byung Suk Lee、Won-Yong Shin 和 Sungsu Lim。使用图神经网络进行图异常检测:现状和挑战。IEEE 访问,10:111820-111829,2022 年。
[327] Yuanchen Bei, Sheng Zhou, Jinke Shi, Yao Ma, Haishuai Wang, and Jiajun Bu. Guarding graph neural networks for unsupervised graph anomaly detection. arXiv, 2024. [327] 贝元晨、周胜、石金科、马尧、王海帅和布佳军。用于无监督图异常检测的保护图神经网络。arXiv,2024 年。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
[328] Duo Zhang, Xinzijian Liu, Xiangyu Zhang, Chengqian Zhang, Chun Cai, Hangrui Bi, Yiming Du, Xuejian Qin, Anyang Peng, Jiameng Huang, et al. Dpa-2: a large atomic model as a multi-task learner. npj Computational Materials, 2024. [328] 张铎,刘新子健,张祥宇,张成倩,蔡春,毕航瑞,杜一鸣,秦雪建,彭安阳,黄佳萌,等。Dpa-2:作为多任务学习器的大型原子模型。npj 计算材料,2024 年。
[329] Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M Elena, Dávid P Kovács, Janosh Riebesell, Xavier R Advincula, Mark Asta, Matthew Avaylon, William J Baldwin, et al. A foundation model for atomistic materials chemistry. arXiv, 2023. [329] 伊利斯·巴塔蒂亚、菲利普·本纳、袁蒋、阿林·埃琳娜、达维德·科瓦奇、贾诺什·里贝塞尔、泽维尔·阿德文库拉、马克·阿斯塔、马修·阿瓦伦、威廉·鲍德温等。原子材料化学的基础模型。arXiv,2023 年。
[330] Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. Galactica: A large language model for science. arXiv, 2022. [330] 罗斯·泰勒、马辛·卡达斯、吉列姆·库库鲁尔、托马斯·夏洛姆、安东尼·哈茨霍恩、埃尔维斯·萨拉维亚、安德鲁·波尔顿、维克多·克尔克兹和罗伯特·斯托伊尼奇。Galactica:科学大型语言模型。arXiv,2022 年。
[331] Maciej Sypetkowski, Frederik Wenkel, Farimah Poursafaei, Nia Dickson, Karush Suri, Philip Fradkin, and Dominique Beaini. On the scalability of gnns for molecular graphs. NeruIPS, 2025. [331] Maciej Sypetkowski、Frederik Wenkel、Farimah Poursafaei、Nia Dickson、Karush Suri、Philip Fradkin 和 Dominique Beaini。关于分子图的 gnns 的可扩展性。NeruIPS,2025 年。
[332] Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, et al. Towards predicting equilibrium distributions for molecular systems with deep learning. Nature Machine Intelligence, 2024. [332] 郑树新,何继燕,刘长,石宇,陆子恒,冯伟涛,鞠福松,王佳曦,朱建伟,闵耀森,等。通过深度学习预测分子系统的平衡分布。自然机器智能,2024 年。
[333] Mikolaj Mizera, Arkadii Lin, Eugene Babin, Yury Kashkur, Tatiana Sitnik, Ien An Chan, Arsen Yedige, Maksim Vendin, Shamkhal Baybekov, and Vladimir Aladinskiy. Graph transformer foundation model for modeling admet properties. chemrXiv, 2024. [333] 米古拉·米泽拉、阿尔卡迪·林、尤金·巴宾、尤里·卡什库尔、塔蒂亚娜·西特尼克、伊恩·安·陈、阿尔森·耶迪格、马克西姆·文丁、沙姆哈尔·拜别科夫和弗拉基米尔·阿拉丁斯基。用于对 admet 属性进行建模的图转换器基础模型。chemrXiv,2024 年。
[334] Yuyan Liu, Sirui Ding, Sheng Zhou, Wenqi Fan, and Qiaoyu Tan. Moleculargpt: Open large language model (llm) for few-shot molecular property prediction. arXiv, 2024. [334] 刘玉燕、丁思瑞、盛周、范文琦、谭巧玉。Moleculargpt:用于少量分子性质预测的开放大型语言模型 (llm)。arXiv,2024 年。
[335] Yanjun Lyu, Zihao Wu, Lu Zhang, Jing Zhang, Yiwei Li, Wei Ruan, Zhengliang Liu, Xiaowei Yu, Chao Cao, Tong Chen, et al. Gp-gpt: Large language model for gene-phenotype mapping. arXiv, 2024. [335] Yanjun Lyu, Zihao Wu, Lu Zhang, Jing Zhang, Yiwei Li, Wei Ruan, Zhengliang Liu, Xiaowei Yu, Chao Cao, Tong Chen, et al. Gp-gpt: Large Language Model for gene-phenotype mapping.arXiv,2024 年。
[336] Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N Ioannidis, Huzefa Rangwala, and Rishita Anubhai. Biobridge: Bridging biomedical foundation models via knowledge graphs. arXiv, 2023. [336] 王子峰、王子辰、巴拉苏布拉马尼亚姆·斯里尼瓦桑、瓦西利斯·约阿尼迪斯、胡泽法·朗瓦拉和里希塔·阿努巴伊。Biobridge:通过知识图谱桥接生物医学基础模型。arXiv,2023 年。
[337] Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022. [337] 林泽明、哈利勒·阿金、罗山·饶、布莱恩·希、朱忠凯、卢文婷、艾伦·多斯桑托斯·科斯塔、玛丽亚姆·法泽尔-扎兰迪、汤姆·塞尔库、萨尔·坎迪多等。进化尺度的蛋白质序列语言模型可以实现准确的结构预测。生物 Rxiv,2022。
[338] Chen Qian, Huayi Tang, Zhirui Yang, Hong Liang, and Yong Liu. Can large language models empower molecular property prediction? arXiv, 2023. [338] 陈倩、唐华仪、杨志瑞、洪亮、刘永。大语言模型能否赋能分子性质预测?arXiv,2023 年。
[339] He Cao, Zijing Liu, Xingyu Lu, Yuan Yao, and Yu Li. Instructmol: Multi-modal integration for building a versatile and reliable molecular assistant in drug discovery. arXiv, 2023. [339] 曹何、刘子静、陆星宇、姚元和李宇。Instructmol:用于构建药物发现中多功能且可靠的分子助手的多模态集成。arXiv,2023 年。
[340] Youjia Li, Vishu Gupta, Muhammed Nur Talha Kilic, Kamal Choudhary, Daniel Wines, Wei-keng Liao, Alok Choudhary, and Ankit Agrawal. Hybrid-llm-gnn: integrating large language models and graph neural networks for enhanced materials property prediction. Digital Discovery, 2025. [340] Youjia Li、Vishu Gupta、Muhammed Nur Talha Kilic、Kamal Choudhary、Daniel Wines、Wei-keng Liao、Alok Choudhary 和 Ankit Agrawal。Hybrid-llm-gnn:集成大型语言模型和图神经网络,增强材料性能预测。数字发现,2025 年。
[341] Carl Edwards, ChengXiang Zhai, and Heng Ji. Text2mol: Cross-modal molecule retrieval with natural language queries. In EMNLP,2021E M N L P, 2021. [341] 卡尔·爱德华兹、翟成祥和恒吉。Text2mol:使用自然语言查询进行跨模态分子检索。在 EMNLP,2021E M N L P, 2021 .
[342] Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, and Animashree Anandkumar. Multi-modal molecule structure-text model for text-based retrieval and editing. Nature Machine Intelligence, 2023. [342] 刘胜超、聂伟丽、王成鹏、卢佳瑞、乔卓然、刘玲、唐健、肖超伟、阿尼玛什里·阿南德库马尔。用于基于文本的检索和编辑的多模态分子结构-文本模型。自然机器智能,2023 年。
[343] Philipp Seidl, Andreu Vall, Sepp Hochreiter, and Günter Klambauer. Enhancing activity prediction models in drug discovery with the ability to understand human language. In ICML, 2023. [343] 菲利普·塞德尔、安德鲁·瓦尔、塞普·霍赫赖特和君特·克兰鲍尔。增强药物发现中的活性预测模型,使其能够理解人类语言。ICML,2023 年。
[344] Pengfei Liu, Yiming Ren, Jun Tao, and Zhixiang Ren. Git-mol: A multi-modal large language model for molecular science with graph, image, and text. Computers in biology and medicine, 2024. [344] 刘鹏飞、任一鸣、君涛、任志祥。Git-mol:一种用于分子科学的多模态大型语言模型,具有图形、图像和文本。生物学和医学中的计算机,2024 年。
[345] Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, and Zaiqing Nie. Molfm: A multimodal molecular foundation model. arXiv, 2023. [345] 罗奕贞、杨凯、马西莫·洪、刘行毅、聂再清。Molfm:一种多模态分子基础模型。arXiv,2023 年。
[346] Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, and Ji-Rong Wen. A molecular multimodal foundation model associating molecule graphs with natural language. arXiv, 2022. [346] 苏兵、杜大钊、赵阳、周玉杰、李江孟、饶安仪、孙浩、陆志武、温吉荣。将分子图与自然语言相关联的分子多模态基础模型。arXiv,2022 年。
[347] Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. In EMNLP, 2023. [347] 刘志远、李思航、罗彦晨、郝飞、曹奕昕、川口健二、王翔、蔡达成。Molca:使用跨模态投影器和单模态适配器进行分子图语言建模。在 EMNLP,2023 年。
[348] Yaorui Shi, An Zhang, Enzhi Zhang, Zhiyuan Liu, and Xiang Wang. Relm: Leveraging language models for enhanced chemical reaction prediction. arXiv, 2023. [348] 石尧瑞、张安、张恩智、刘志远、王相。Relm:利用语言模型增强化学反应预测。arXiv,2023 年。
[349] Felix Musil, Andrea Grisafi, Albert P Bartók, Christoph Ortner, Gábor Csányi, and Michele Ceriotti. Physics-inspired structural representations for molecules and materials. Chemical Reviews, 2021. [349] 菲利克斯·穆西尔、安德里亚·格里萨菲、阿尔伯特·巴托克、克里斯托夫·奥特纳、加博尔·查尼和米歇尔·切里奥蒂。受物理学启发的分子和材料结构表示。化学评论,2021 年。
[350] Hannes Stärk, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan Günnemann, and Pietro Liò. 3d infomax improves gnns for molecular property prediction. In ICML, 2022. [350] 汉内斯·施塔克、多米尼克·贝伊尼、加布里埃尔·科尔索、普鲁登西奥·托苏、克里斯蒂安·达拉戈、斯蒂芬·京内曼和彼得罗·利奥。3D InfoMax 改进了用于分子性质预测的 GNN。ICML,2022 年。
[351] Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 2022. [351] 王育阳、王建仁、曹忠林和阿米尔·巴拉蒂·法里马尼。通过图神经网络对表示进行分子对比学习。自然机器智能,2022 年。
[352] Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Zhiyi Li, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis Müller, et al. Towards foundational models for molecular learning on large-scale multi-task datasets. arXiv, 2023. [352] 多米尼克·贝伊尼、沈阳黄、若昂·亚历克斯·库尼亚、李志毅、加布里埃拉·莫伊塞斯库-帕雷哈、亚历山大·迪莫夫、塞缪尔·马德雷尔-曼德、卡勒姆·麦克莱恩、弗雷德里克·温克尔、路易斯·穆勒等。在大规模多任务数据集上建立分子学习的基础模型。arXiv,2023 年。
[353] Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, et al. Predicting equilibrium distributions for molecular systems with deep learning. Nature Machine Intelligence, 2024. [353] 郑树新,何继燕,刘常,石宇,陆子恒,冯伟涛,鞠福松,王佳曦,朱建伟,闵耀森,等.使用深度学习预测分子系统的平衡分布。自然机器智能,2024 年。
[354] Borja Ibarz, Vitaly Kurin, George Papamakarios, Kyriacos Nikiforou, Mehdi Bennani, Róbert Csordás, Andrew Joseph Dudzik, Matko Bošnjak, Alex Vitvitskyi, Yulia Rubanova, et al. A generalist neural algorithmic learner. In LoG,2022L o G, 2022. [354] 博尔哈·伊巴尔兹、维塔利·库林、乔治·帕帕马卡里奥斯、基里亚科斯·尼基福鲁、迈赫迪·本纳尼、罗伯特·索尔达斯、安德鲁·约瑟夫·杜齐克、马特科·博什尼亚克、亚历克斯·维特维茨基、尤利娅·鲁巴诺娃等。通才神经算法学习器。在 LoG,2022L o G, 2022 .
[355] Rongzheng Wang, Shuang Liang, Qizhi Chen, Jiasheng Zhang, and Ke Qin. Graphtool-instruction: Revolutionizing graph reasoning in llms through decomposed subtask instruction. arXiv, 2024. [355] 王荣正、双梁、陈启志、张嘉生、柯勤.Graphtool 指令:通过分解的子任务指令彻底改变 llms 中的图推理。arXiv,2024 年。
[356] Sambhav Khurana, Xiner Li, Shurui Gui, and Shuiwang Ji. A hierarchical language model for interpretable graph reasoning. arXiv, 2024. [356] 桑巴夫·库拉纳、李新儿、桂淑瑞和吉水王。用于可解释图推理的分层语言模型。arXiv,2024 年。
[357] Palaash Agrawal, Shavak Vasania, and Cheston Tan. Can llms perform structured graph reasoning? arXiv, 2024. [357] 帕拉什·阿格拉瓦尔、沙瓦克·瓦萨尼亚和切斯顿·谭。法学硕士可以进行结构化图推理吗?arXiv,2024 年。
[358] Yifan Feng, Chengwu Yang, Xingliang Hou, Shaoyi Du, Shihui Ying, Zongze Wu, and Yue Gao. Beyond graphs: Can large language models comprehend hypergraphs? arXiv, 2024. [358] 冯一帆、杨成武、侯兴梁、杜绍义、应世辉、吴宗泽、高岳。超越图表:大型语言模型能否理解超图?arXiv,2024 年。
[359] Christos Xypolopoulos, Guokan Shang, Xiao Fei, Giannis Nikolentzos, Hadi Abdine, Iakovos Evdaimon, Michail Chatzianastasis, Giorgos Stamou, and Michalis Vazirgiannis. Graph linearization methods for reasoning on graphs with large language models. arXiv, 2024. [359] 克里斯托斯·西波洛普洛斯、尚国侃、肖飞、扬尼斯·尼科伦佐斯、哈迪·阿卜丁、亚科沃斯·叶夫代蒙、米哈伊尔·查齐亚纳斯塔西斯、乔治·斯塔穆和米哈利斯·瓦齐尔詹尼斯。使用大型语言模型对图进行推理的图线性化方法。arXiv,2024 年。
[360] Junchi Yu, Ran He, and Rex Ying. Thought propagation: An analogical approach to complex reasoning with large language models. arXiv, 2023. [360] 余俊池、何冉和英。思想传播:使用大型语言模型进行复杂推理的类比方法。arXiv,2023 年。
[361] Yuwei Hu, Runlin Lei, Xinyi Huang, Zhewei Wei, and Yongchao Liu. Scalable and accurate graph reasoning with llm-based multi-agents. arXiv, 2024. [361] 胡宇伟、雷润林、黄欣怡、魏哲伟、刘永超.使用基于 llm 的多代理进行可扩展且准确的图形推理。arXiv,2024 年。
[362] Xin Li, Qizhi Chu, Yubin Chen, Yang Liu, Yaoqi Liu, Zekai Yu, Weize Chen, Chen Qian, Chuan Shi, and Cheng Yang. Graphteam: Facilitating large language model-based graph analysis via multi-agent collaboration. arXiv, 2024. [362] 李昕、褚琦之、陈玉斌、刘杨、刘瑶琦、余泽楷、陈伟泽、陈倩、传石、程阳。Graphteam:通过多智能体协作促进基于大型语言模型的图分析。arXiv,2024 年。
[363] Sheng Ouyang, Yulan Hu, Ge Chen, and Yong Liu. Gundam: Aligning large language models with graph understanding. arXiv, 2024. [363] 盛欧阳、胡玉兰、葛陈、刘勇。高达:将大型语言模型与图理解相结合。arXiv,2024 年。
[364] Chang Gong, Wanrui Bian, Zhijie Zhang, and Weiguo Zheng. Pseudocode-injection magic: Enabling llms to tackle graph computational tasks. arXiv, 2025. [364] 常公、卞万瑞、张志杰、郑卫国。伪代码注入魔法:使 llm 能够处理图计算任务。arXiv,2025 年。
[365] Zihan Luo, Xiran Song, Hong Huang, Jianxun Lian, Chenhao Zhang, Jinqi Jiang, and Xing Xie. Graphinstruct: Empowering large language models with graph understanding and reasoning capability. arXiv, 2024. [365] 罗子涵、宋希然、黄洪、连建勋、张晨浩、江金琦、谢星。Graphinstruct:为大型语言模型赋能图理解和推理能力。arXiv,2024 年。
[366] Qifan Zhang, Xiaobin Hong, Jianheng Tang, Nuo Chen, Yuhan Li, Wenzhong Li, Jing Tang, and Jia Li. Gcoder: Improving large language model for generalized graph problem solving. arXiv, 2024. [366] 张奇凡、洪晓斌、唐建恒、陈诺、李宇涵、李文忠、唐静和李佳。Gcoder:改进用于广义图问题解决的大型语言模型。arXiv,2024 年。
[367] Ziwei Chai, Tianjie Zhang, Liang Wu, Kaiqiao Han, Xiaohai Hu, Xuanwen Huang, and Yang Yang. Graphllm: Boosting graph reasoning ability of large language model. arXiv, 2023. [367] 柴紫薇、张天杰、吴梁、韩开桥、胡晓海、黄宣文、杨洋.Graphllm:提升大语言模型的图推理能力。arXiv,2023 年。
[368] Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, and Jonathan Halcrow. Let your graph do the talking: Encoding structured data for llms. arXiv, 2024. [368] 布莱恩·佩罗齐、巴哈雷·法特米、达斯汀·泽勒、安东·齐苏林、梅赫兰·卡泽米、拉米·阿尔福和乔纳森·哈尔克劳。让您的图表说话:为 LLM 编码结构化数据。arXiv,2024 年。
[369] Xinnan Dai, Haohao Qu, Yifen Shen, Bohang Zhang, Qihao Wen, Wenqi Fan, Dongsheng Li, Jiliang Tang, and Caihua Shan. How do large language models understand graph patterns? a benchmark for graph pattern comprehension. arXiv, 2024. [369] 戴新南、瞿浩浩、沈一芬、张伯航、温启浩、范文琦、李东升、唐季良、单彩华。大型语言模型如何理解图模式?图模式理解的基准。arXiv,2024 年。
[370] Alexander K Taylor, Anthony Cuturrufo, Vishal Yathish, Mingyu Derek Ma, and Wei Wang. Are large-language models graph algorithmic reasoners? arXiv, 2024. [370] 亚历山大·泰勒、安东尼·库图鲁福、维沙尔·亚西什、明宇·德里克·马和王伟。大型语言模型是图形算法推理器吗?arXiv,2024 年。
[371] Jianheng Tang, Qifan Zhang, Yuhan Li, Nuo Chen, and Jia Li. Grapharena: Evaluating and exploring large language models on graph computation. In ICLR, 2025. [371] 唐建恒、张奇凡、李宇涵、陈诺和李佳。Grapharena:在图计算上评估和探索大型语言模型。ICLR,2025 年。
[372] Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, and Yulia Tsvetkov. Can language models solve graph problems in natural language? NeurIPS, 2023. [372] 王恒、冯尚斌、何天兴、谭朝轩、韩晓闯、尤利娅·茨维特科夫。语言模型能否解决自然语言中的图问题?神经 IPS,2023 年。
[373] Kerui Zhu, Bo-Wei Huang, Bowen Jin, Yizhu Jiao, Ming Zhong, Kevin Chang, Shou-De Lin, and Jiawei Han. Investigating instruction tuning large language models on graphs. arXiv, 2024. [373] Kerui 朱、黄博伟、金博文、焦一助、钟明、张凯文、林守德、韩佳伟。研究图上的指令调优大型语言模型。arXiv,2024 年。
[374] Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, and Vahab Mirrokni. Understanding transformer reasoning capabilities via graph algorithms. NeurIPS, 2025. [374] 克莱顿·桑福德、巴哈雷·法特米、伊桑·霍尔、安东·齐苏林、梅赫兰·卡泽米、乔纳森·哈尔克劳、布莱恩·佩罗齐和瓦哈布·米罗克尼。通过图算法了解 Transformer 推理能力。神经 IPS,2025 年。
[375] Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xiaochuang Han, Tianxing He, and Yulia Tsvetkov. Can llm graph reasoning generalize beyond pattern memorization? arXiv preprint arXiv:2406.15992, 2024. [375] 张奕卓、王恒、冯尚斌、谭朝轩、韩晓闯、何天兴、尤利娅·茨维特科夫。LLM 图推理可以推广到模式记忆之外吗?arXiv 预印本 arXiv:2406.15992,2024 年。
[376] Jashn Arora, Rahul Madhavan, Karthikeyan Shanmugam, John Palowitch, and Manish Jain. Treetop: Topology-aware fine-tuning for 1lm conversation tree understanding. In NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability, 2024. [376] 贾什恩·阿罗拉、拉胡尔·马达万、卡蒂凯扬·尚穆加姆、约翰·帕洛维奇和马尼什·耆那教。树顶:拓扑感知微调,用于理解 1lm 对话树。在 NeurIPS 2024 年现代机器学习微调研讨会:原理和可扩展性,2024 年。
[377] Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Han Zhao, and Jiawei Han. Learning multiplex representations on text-attributed graphs with one language model encoder. arXiv, 2023. [377] 金博文、张文涛、张宇、于孟、韩昭、韩嘉伟。使用一个语言模型编码器学习文本属性图上的多重表示。arXiv,2023 年。
[378] Shengyin Sun, Yuxiang Ren, Chen Ma, and Xuecang Zhang. Large language models as topological structure enhancers for text-attributed graphs. arXiv, 2023. [378] 孙胜寅、玉任、陈马、张学仓。大型语言模型作为文本归因图的拓扑结构增强器。arXiv,2023 年。
[379] Xuanwen Huang, Kaiqiao Han, Dezheng Bao, Quanjin Tao, Zhisheng Zhang, Yang Yang, and Qi Zhu. Prompt-based node feature extractor for few-shot learning on text-attributed graphs. arXiv, 2023. [379] 黄宣文、韩开桥、包德正、陶全金、张志生、杨杨、朱琦。基于提示的节点特征提取器,用于对文本属性图进行少量学习。arXiv,2023 年。
[380] Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, and Jiawei Han. Patton: Language model pretraining on text-rich networks. arXiv, 2023. [380] 金博文,张文涛,张宇,于孟,张信阳,朱琦,韩嘉伟。Patton:在富文本网络上进行语言模型预训练。arXiv,2023 年。
[381] Bowen Jin, Yu Zhang, Qi Zhu, and Jiawei Han. Heterformer: Transformer-based deep node representation learning on heterogeneous text-rich networks. In KDD, 2023. [381] 金博文、张宇、齐朱、韩嘉伟。Heterformer:在异构富文本网络上基于 Transformer 的深度节点表示学习。在 KDD,2023 年。
[382] Dasol Hwang, Jinyoung Park, Sunyoung Kwon, KyungMin Kim, Jung-Woo Ha, and Hyunwoo J Kim. Self-supervised auxiliary learning with meta-paths for heterogeneous graphs. NeurIPS, 2020. [382] Dasol Hwang、Jinyoung Park、Sunyoung Kwon、KyungMin Kim、Jung-Woo Ha 和 Hyunwoo J Kim。异构图的元路径自监督辅助学习。神经 IPS,2020 年。
[383] Xunqiang Jiang, Tianrui Jia, Yuan Fang, Chuan Shi, Zhe Lin, and Hui Wang. Pre-training on large-scale heterogeneous graph. In KDD, 2021. [383] 江训强、贾天瑞、袁方、传时、哲林、王惠。在大规模异构图上进行预训练。在 KDD,2021 年。
[384] Yihong Ma, Ning Yan, Jiayu Li, Masood Mortazavi, and Nitesh V Chawla. Hetgpt: Harnessing the power of prompt tuning in pre-trained heterogeneous graph neural networks. In WWW,2024W W W, 2024. [384] 马一红、宁晏、李佳宇、马苏德·莫尔塔扎维和尼特什·乔拉。Hetgpt:在预训练的异构图神经网络中利用提示调整的力量。在 WWW,2024W W W, 2024 .
[385] Qiannan Zhang, Xiaodong Wu, Qiang Yang, Chuxu Zhang, and Xiangliang Zhang. Few-shot heterogeneous graph learning via cross-domain knowledge transfer. In KDD, 2022. [385] 张倩南、吴晓东、杨强、张楚旭、张相良。通过跨领域知识转移进行少样本异构图学习。在 KDD,2022 年。
[386] Qiuyu Zhu, Liang Zhang, Qianxiong Xu, and Cheng Long. Hierpromptlm: A pure plm-based framework for representation learning on heterogeneous text-rich networks. arXiv, 2025. [386] 朱秋雨、张良、徐倩雄、程龙。Hierpromptlm:一个纯粹的基于 plm 的框架,用于在异构文本富网络上进行表示学习。arXiv,2025 年。
[387] Tao Zou, Le Yu, Yifei Huang, Leilei Sun, and Bowen Du. Pretraining language models with textattributed heterogeneous graphs. arXiv, 2023. [387] 邹涛、乐宇、黄一飞、孙磊磊和杜博文。使用文本归因异构图进行预训练语言模型。arXiv,2023 年。
Graph Foundation Models: A Comprehensive Survey 图基础模型:综合调查
[388] Han Xie, Da Zheng, Jun Ma, Houyu Zhang, Vassilis N Ioannidis, Xiang Song, Qing Ping, Sheng Wang, Carl Yang, Yi Xu, et al. Graph-aware language model pre-training on a large graph corpus can help multiple graph applications. In KDD,2023K D D, 2023. [388] 韩燮,大郑,马军,张厚宇,瓦西利斯·约阿尼迪斯,宋祥,平庆,王盛,杨卡尔,徐毅等。在大型图语料库上进行图感知语言模型预训练可以帮助多个图应用。在 KDD,2023K D D, 2023 .
[389] Hang Gao, Chenhao Zhang, Fengge Wu, Junsuo Zhao, Changwen Zheng, and Huaping Liu. Bootstrapping heterogeneous graph representation learning via large language models: A generalized approach. arXiv, 2024. [389] 高航、张晨浩、吴凤歌、赵军锁、郑长文、刘华平。通过大型语言模型引导异构图表示学习:一种通用方法。arXiv,2024 年。
[390] Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Long Xia, Dawei Yin, and Chao Huang. Higpt: Heterogeneous graph language model. In KDD,2024K D D, 2024. [390] 唐佳彬、杨宇浩、魏巍、雷石、龙霞、尹大伟、黄朝。Higpt:异构图语言模型。在 KDD,2024K D D, 2024 .
[391] Jiasheng Zhang, Jialin Chen, Ali Maatouk, Ngoc Bui, Qianqian Xie, Leandros Tassiulas, Jie Shao, Hua Xu, and Rex Ying. Litfm: A retrieval augmented structure-aware foundation model for citation graphs. arXiv, 2024. [391] 张佳生、陈佳林、阿里·马图克、吴布伊、谢倩倩、莱安德罗斯·塔西乌拉斯、邵杰、华旭、英力士。Litfm:引文图的检索增强结构感知基础模型。arXiv,2024 年。
[392] Kiarash Shamsi, Tran Gia Bao Ngo, Razieh Shirzadkhani, Shenyang Huang, Farimah Poursafaei, Poupak Azad, Reihaneh Rabbany, Baris Coskunuzer, Guillaume Rabusseau, and Cuneyt Gurcan Akcora. Mint: Multi-network training for transfer learning on temporal graphs. arXiv, 2024. [392] Kiarash Shamsi、Tran Gia Bao Ngo、Razieh Shirzadkhani、沈阳黄、Farimah Poursafaei、Poupak Azad、Reihaneh Rabbany、Baris Coskunuzer、Guillaume Rabusseau 和 Cuneyt Gurcan Akcora。Mint:用于时间图迁移学习的多网络训练。arXiv,2024 年。
[393] Alessandro Antonucci, Gregorio Piqué, and Marco Zaffalon. Zero-shot causal graph extrapolation from text via llms. arXiv, 2023. [393] 亚历山德罗·安东努奇、格雷戈里奥·皮克和马可·扎法隆。通过 LLM 从文本进行零样本因果图外推。arXiv,2023 年。
[394] Yanbin Wei, Qiushi Huang, James T Kwok, and Yu Zhang. Kicgpt: Large language model with knowledge in context for knowledge graph completion. In EMNLP, 2023. [394] 魏彦斌、黄秋实、郭俊杰、张宇。Kicgpt:具有知识上下文中的知识的大型语言模型,用于知识图谱的完成。在 EMNLP,2023 年。
[395] Zeyang Zhang, Xin Wang, Ziwei Zhang, Haoyang Li, Yijian Qin, and Wenwu Zhu. Llm4dyg: can large language models solve spatial-temporal problems on dynamic graphs? In KDD, 2024. [395] 张泽阳、王新、张紫未、李浩阳、秦一健、朱文武。Llm4dyg:大型语言模型能否解决动态图上的时空问题?在 KDD,2024 年。
[396] Zijian Zhang, Zonghan Zhang, and Zhiqian Chen. Flowgpt: How long can llms trace back and predict the trends of graph dynamics? arXiv, 2024. [396] 张子健、张宗翰、陈志谦。Flowgpt:LLM 可以追溯和预测图动态的趋势多久?arXiv,2024 年。
[397] Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, and Chaochao Lu. Clear: Can language models really understand causal graphs? arXiv, 2024. [397] 陈思瑞、徐梦英、王坤、曾星宇、赵瑞、赵胜杰、卢超超。明确:语言模型真的能理解因果图吗?arXiv,2024 年。
[398] Nathan C Frey, Ryan Soklaski, Simon Axelrod, Siddharth Samsi, Rafael Gomez-Bombarelli, Connor W Coley, and Vijay Gadepally. Neural scaling of deep chemical models. NMI, 2023. [398] 内森·弗雷、瑞安·索克拉斯基、西蒙·阿克塞尔罗德、悉达多·萨姆西、拉斐尔·戈麦斯-庞巴雷利、康纳·科利和维杰·加德帕利。深度化学模型的神经尺度。NMI,2023 年。
[399] Dingshuo Chen, Yanqiao Zhu, Jieyu Zhang, Yuanqi Du, Zhixun Li, Qiang Liu, Shu Wu, and Liang Wang. Uncovering neural scaling laws in molecular representation learning. In NeurIPS, volume 36, 2023. [399] 陈定硕、朱彦桥、张洁宇、杜元琦、李志勋、刘强、吴淑、王良。揭示分子表示学习中的神经缩放定律。在 NeurIPS,第 36 卷,2023 年。
[400] Jingzhe Liu, Haitao Mao, Zhikai Chen, Tong Zhao, Neil Shah, and Jiliang Tang. Towards neural scaling laws on graphs. arXiv, 2024. [400] 刘景哲、毛海涛、陈志凯、赵彤、尼尔·沙阿和唐继良。走向图上的神经缩放定律。arXiv,2024 年。
[401] Qian Ma, Haitao Mao, Jingzhe Liu, Zhehua Zhang, Chunlin Feng, Yu Song, Yihan Shao, and Yao Ma. Do neural scaling laws exist on graph self-supervised learning? arXiv, 2024. [401] 钱马、毛海涛、刘景哲、张哲华、冯春林、于松、邵一涵、姚马。图自监督学习是否存在神经缩放定律?arXiv,2024 年。
[402] Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In ICML workshop, 2012. [402] 约书亚·本吉奥。用于无监督学习和迁移学习的表示深度学习。2012 年 ICML 研讨会。
[403] Junguang Jiang, Yang Shu, Jianmin Wang, and Mingsheng Long. Transferability in deep learning: A survey. arXiv, 2022. [403] 江俊光、杨树、王建民、龙明生。深度学习中的可迁移性:一项调查。arXiv,2022 年。
[404] Haitao Mao, Zhikai Chen, Wei Jin, Haoyu Han, Yao Ma, Tong Zhao, Neil Shah, and Jiliang Tang. Demystifying structural disparity in graph neural networks: Can one size fit all? In NeurIPS, 2023. [404] 毛海涛、陈志凯、金伟、韩浩宇、马尧、赵童、沙阿尼尔、唐继良。揭秘图神经网络中的结构差异:一种方法可以适合所有人吗?在 NeurIPS,2023 年。
[405] Muhan Zhang, Pan Li, Yinglong Xia, Kai Wang, and Long Jin. Labeling trick: A theory of using graph neural networks for multi-node representation learning. NeurIPS, 2021. [405] 张慕翰、潘丽、夏应龙、王凯、龙进。标注技巧:利用图神经网络进行多节点表示学习的理论。神经 IPS,2021 年。
[406] Luan Tran, Xi Yin, and Xiaoming Liu. Disentangled representation learning gan for pose-invariant face recognition. In CVPR, 2017. [406] 栾陈、习尹和刘晓明。用于姿态不变人脸识别的解缠绕表示学习 gan。在 CVPR,2017 年。
[407] Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv, 2021. [407] 阿德里安·巴德斯、让·庞塞和扬·勒昆。Vicreg:用于自监督学习的方差-不变性-协方差正则化。arXiv,2021 年。
[408] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In AAAI, 2019. [408] 克里斯托弗·莫里斯、马丁·里策特、马蒂亚斯·费、威廉·汉密尔顿、扬·埃里克·伦森、高拉夫·拉坦和马丁·格罗仪。Weisfeiler 和 leman go neural:高阶图神经网络。在 AAAI,2019 年。
[409] Luana Ruiz, Luiz Chamon, and Alejandro Ribeiro. Graphon neural networks and the transferability of graph neural networks. In NeurIPS, 2020. [409] 卢安娜·鲁伊斯、路易斯·查蒙和亚历杭德罗·里贝罗。Graphon 神经网络和图神经网络的可转移性。在 NeurIPS,2020 年。
[410] Yuxuan Cao, Jiarong Xu, Carl Yang, Jiaan Wang, Yunchao Zhang, Chunping Wang, Lei Chen, and Yang Yang. When to pre-train graph neural networks? from data generation perspective! In KDD, 2023. [410] 曹宇轩、徐嘉荣、杨卡尔、王佳安、张云超、王春平、陈雷、杨洋。何时预训练图神经网络?从数据生成的角度来看!在 KDD,2023 年。
[411] Ron Levie, Wei Huang, Lorenzo Bucci, Michael Bronstein, and Gitta Kutyniok. Transferability of spectral graph convolutional neural networks. JMLR,2021J M L R, 2021. [411] 罗恩·莱维、黄伟、洛伦佐·布奇、迈克尔·布朗斯坦和吉塔·库蒂尼奥克。谱图卷积神经网络的可转移性。 JMLR,2021J M L R, 2021 。
[412] Ron Levie, Elvin Isufi, and Gitta Kutyniok. On the transferability of spectral graph filters. In SampTA, 2019. [412] 罗恩·利维、埃尔文·伊苏菲和吉塔·库蒂尼奥克。光谱图滤波器的可转移性.在 SampTA,2019 年。
[413] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In ICLR, 2019. [413] 徐克律鲁、胡卫华、尤尔·莱斯科维茨和斯蒂芬妮·耶格尔卡。图神经网络有多强大?在 ICLR,2019 年。
[414] Zhikai Chen, Haitao Mao, Jingzhe Liu, Yu Song, Bingheng Li, Wei Jin, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, et al. Text-space graph foundation models: Comprehensive benchmarks and new insights. arXiv, 2024. [414] 陈志凯、毛海涛、刘景哲、宋宇、李炳恒、金伟、巴哈雷·法特米、安东·齐茨林、布莱恩·佩罗齐、刘辉等。文本空间图基础模型:全面的基准和新的见解。arXiv,2024 年。
[415] Jiasheng Zhang, Jialin Chen, Menglin Yang, Aosong Feng, Shuang Liang, Jie Shao, and Rex Ying. Dtgb: A comprehensive benchmark for dynamic text-attributed graphs. arXiv, 2024. [415] 张嘉生、陈佳林、杨梦林、冯奥松、梁双、邵杰、应雷克斯。Dtgb:动态文本属性图的综合基准。arXiv,2024 年。
[416] Jiarui Feng, Hao Liu, Lecheng Kong, Yixin Chen, and Muhan Zhang. Taglas: An atlas of text-attributed graph datasets in the era of large graph and language models. arXiv, 2024. [416] 冯佳瑞、刘浩、孔乐成、陈一新、张慕涵。Taglas:大型图和语言模型时代的文本归因图数据集图集。arXiv,2024 年。
[417] Author et al. Toxcast: [dataset details for toxcast], 2024. Accessed via OpenReview, https://openreview. net/pdf?id=Q2sDuwtutB. [417] 作者等人。Toxcast:[toxcast 的数据集详细信息],2024 年。通过 OpenReview 访问,https://openreview。net/pdf?id=Q2sDuwtutB。
[418] Yuhao Xu, Xinqi Liu, Keyu Duan, Yi Fang, Yu-Neng Chuang, Daochen Zha, and Qiaoyu Tan. Graphfm: A comprehensive benchmark for graph foundation model. arXiv preprint arXiv:2406.08310, 2024. [418] 徐宇浩、刘新奇、段克宇、方毅、庄宇能、赵道晨、谭乔宇.Graphfm:图基础模型的综合基准。arXiv 预印本 arXiv:2406.08310, 2024.
[419] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. In NeurIPS, 2020. [419] 胡卫华、马蒂亚斯·费、马林卡·齐特尼克、董玉晓、任宏宇、刘博文、米歇尔·卡塔斯塔和尤雷·莱斯科维茨。开放图基准测试:用于图形机器学习的数据集。在 NeurIPS,2020 年。
[420] Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. Deepgcns: Can gcns go as deep as cnns? In ICCV, 2019. [420] 李国豪、马蒂亚斯·穆勒、阿里·塔贝特和伯纳德·加内姆。Deepgcns:gcns 能像 cnn 一样深入吗?ICCV,2019 年。
[421] Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In ICLR, 2021. [421] 乌里·阿隆和埃兰·亚哈夫。论图神经网络的瓶颈及其实际意义。在 ICLR,2021 年。
[422] Vijay Prakash Dwivedi, Ladislav Rampášek, Mikhail Galkin, Ali Parviz, Guy Wolf, Anh Tuan Luu, and Dominique Beaini. Long range graph benchmark. In NeurIPS, 2022. [422] 维杰·普拉卡什·德维维迪、拉迪斯拉夫·兰帕谢克、米哈伊尔·加尔金、阿里·帕尔维兹、盖伊·沃尔夫、安端卢和多米尼克·贝伊尼。远程图基准测试。在 NeurIPS,2022 年。
[423] Christopher Morris, Fabrizio Frasca, Nadav Dym, Haggai Maron, Ismail Ilkan Ceylan, Ron Levie, Derek Lim, Michael M. Bronstein, Martin Grohe, and Stefanie Jegelka. Position: Future directions in the theory of graph machine learning. In ICML, 2024. [423] 克里斯托弗·莫里斯、法布里齐奥·弗拉斯卡、纳达夫·迪姆、哈盖·马龙、伊斯梅尔·伊尔坎·锡兰、罗恩·利维、德里克·林、迈克尔·布朗斯坦、马丁·格尔和斯蒂芬妮·杰格尔卡。立场:图机器学习理论的未来方向。ICML,2024 年。
[424] Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, and Fan Yu. Dgcl: An efficient communication library for distributed gnn training. In EuroSys, 2021. [424] 蔡振坤、萧炎、吴一荻、马凯豪、程子弟、范宇。Dgcl:用于分布式 gnn 训练的高效通信库。在 EuroSys,2021 年。
[425] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021. [425] 阿列克谢·多索维茨基、卢卡斯·拜尔、亚历山大·科列斯尼科夫、德克·魏森博恩、翟晓华、托马斯·温特蒂纳、穆斯塔法·德加尼、马蒂亚斯·明德勒、乔治·海戈尔德、西尔万·盖利、雅各布·乌斯科雷特和尼尔·霍尔斯比。一张图像价值 16x16 字:用于大规模图像识别的 Transformers。在 ICLR,2021 年。
[426] Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian, and Junchi Yan. Simplifying and empowering transformers for large-graph representations. In NeurIPS, 2023. [426] 吴启田、赵文涛、杨晨晓、张恒瑞、聂范、江海天、卞亚陶、严俊池。简化和授权大图表示的 Transformer。在 NeurIPS,2023 年。
[427] Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. In CVPR, 2023. [427] 马哈茂德·阿斯兰、昆汀·杜瓦尔、伊尚·米斯拉、彼得·博亚诺夫斯基、帕斯卡·文森特、迈克尔·拉巴特、扬·勒昆和尼古拉斯·巴拉斯。具有联合嵌入预测架构的图像自监督学习。在 CVPR,2023 年。
[428] Crawl4ai team. Crawl4ai. crawl4ai.com, 2024. [428] Crawl4ai 团队。爬行 4ai。crawl4ai.com,2024 年。
[429] Lin Long, Rui Wang, Ruixuan Xiao, Junbo Zhao, Xiao Ding, Gang Chen, and Haobo Wang. On llms-driven synthetic data generation, curation, and evaluation: A survey. arXiv, 2024. [429] 林龙、王瑞、肖瑞轩、赵俊波、丁晓、陈刚、王浩波。关于法学硕士驱动的合成数据生成、管理和评估:一项调查。arXiv,2024 年。
[430] Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, et al. Rho-1: Not all tokens are what you need. arXiv, 2024. [430] 林正昊、苟志斌、龚业云、小刘、沈夜龙、徐若辰、陈林、杨玉久、娇健、段南等。Rho-1:并非所有的代币都是你需要的。arXiv,2024 年。
[431] Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, and Danqi Chen. Less: Selecting influential data for targeted instruction tuning. arXiv, 2024. [431] 夏孟州、萨迪卡·马拉迪、苏钦·古鲁兰甘、桑吉夫·阿罗拉和陈丹奇。少:选择有影响力的数据进行有针对性的指令优化。arXiv,2024 年。
[432] Eyal Winter. The shapley value. Handbook of game theory with economic applications, 2002. [432] 埃亚尔·温特。shapley 值。博弈论与经济应用手册,2002 年。
[433] Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gürel, Bo Li, Ce Zhang, Dawn Song, and Costas J Spanos. Towards efficient data valuation based on the shapley value. In AISTATS, 2019. [433] 贾若曦、道大卫、王博信、弗朗西斯·安·胡比斯、尼克·海因斯、内齐赫·梅尔夫·古雷尔、李博、张策、宋黎明和科斯塔斯·斯帕诺斯。基于 shapley 值的高效数据估值。AISTATS,2019 年。
[434] Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca, Luis Müller, Jan Tönshoff, Antoine Siraudin, Viktor Zaverkin, Michael M Bronstein, Mathias Niepert, Bryan Perozzi, et al. Position: Graph learning will lose relevance due to poor benchmarks. arXiv, 2025. [434] Maya Bechler-Speicher、Ben Finkelshtein、Fabrizio Frasca、Luis Müller、Jan Tönshoff、Antoine Siraudin、Viktor Zaverkin、Michael M Bronstein、Mathias Niepert、Bryan Perozzi 等。立场:由于基准测试不佳,图学习将失去相关性。arXiv,2025 年。
[435] Xiangguo Sun, Jiawen Zhang, Xixi Wu, Hong Cheng, Yun Xiong, and Jia Li. Graph prompt learning: A comprehensive survey and beyond. arXiv, 2023. [435] 孙相国、张佳文、吴希希、程洪、熊云、李佳。图提示学习:综合调查及其他。arXiv,2023 年。
[436] Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nova, et al. A graph placement methodology for fast chip design. Nature, 2021. [436] 阿扎莉亚·米尔霍塞尼、安娜·戈尔迪、穆斯塔法·亚兹甘、乔·文杰·江、易卜拉欣·松霍里、沈王、李英俊、埃里克·约翰逊、奥姆卡尔·帕塔克、阿扎德·诺瓦等。用于快速芯片设计的图形放置方法。自然,2021 年。
[437] Frederik Wenkel, Semih Cantürk, Michael Perlmutter, and Guy Wolf. Towards a general gnn framework for combinatorial optimization. arXiv, 2024. [437] 弗雷德里克·温克尔、塞米赫·坎图尔克、迈克尔·珀尔穆特和盖伊·沃尔夫。迈向用于组合优化的通用 gnn 框架。arXiv,2024 年。
[438] Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, and Jure Leskovec. Position: Relational deep learning-graph representation learning on relational databases. In ICML, 2024. [438] 马蒂亚斯·费伊、胡卫华、黄克欣、扬·埃里克·伦森、里沙布·兰詹、约书亚·罗宾逊、雷克斯·英、游嘉轩和尤尔·莱斯科维茨。位置:关系型深度学习-关系型数据库上的图表示学习。ICML,2024 年。
^(2){ }^{2} Graph Transformers can be seen as performing global message passing between any pairs of nodes. ^(2){ }^{2} Graph Transformer 可以看作是在任何节点对之间执行全局消息传递。