这是用户在 2025-7-22 15:56 为 https://app.immersivetranslate.com/pdf-pro/6807d6f4-aaf4-444f-93b4-b386b2ac74d4/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Multi-Feature Fusion Strategies for Enhancing Knowledge Graph Embedding
多特征融合策略在知识图谱嵌入中的应用

Chenchen Liu a a ^(a){ }^{a}, Fei Pu a , b , a , b , ^(a,b,**){ }^{a, b, *}, Bailin Yang a a ^(a){ }^{a} and Lirong Cheng c c ^(c){ }^{c}
刘陈陈 a a ^(a){ }^{a} ,蒲飞 a , b , a , b , ^(a,b,**){ }^{a, b, *} ,杨百林 a a ^(a){ }^{a} 和程丽蓉 c c ^(c){ }^{c}
a a ^(a){ }^{a} School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, China
a a ^(a){ }^{a} 浙江工商大学计算机科学与技术学院,中国杭州
b b ^(b){ }^{b} Economic Forecasting and Policy Simulation Laboratory, Zhejiang Gongshang University, Hangzhou, China
b b ^(b){ }^{b} 经济预测与政策模拟实验室,浙江工商大学,杭州,中国
c c ^(c){ }^{c} School of Humanities and Communications, Zhejiang Gongshang University, Hangzhou, China
c c ^(c){ }^{c} 浙江省商业学院人文与传播学院,中国浙江省杭州市

ARTICLE INFO  文章信息

Keywords:  关键词:

Knowledge graph  知识图谱
Entity-relation interactions
实体-关系交互

Feature fusion  特征融合
Convolutional neural network
卷积神经网络

Link prediction  链接预测

Abstract  摘要

Knowledge Graph Embedding (KGE) constitutes a pivotal technique in the comprehension of largescale knowledge graphs, serving to effectively represent the entities and the intricate relationships that exist among them. Despite advancements in neural network-based KGE models, current approaches still encounter significant challenges. Key challenges include insufficient modeling of intricate relational patterns, difficulties in effectively integrating heterogeneous features, and limited depth in representing the complex interactions between entities and their corresponding relations. This paper proposes a novel multi-feature fusion approach to address these issues by leveraging four specialized modules: the Relation-Conditioned Neighborhood Aggregator (RCNA), the Dynamic Relation Modulator (DRM), the Entity-Relation Attention Module (ERAM), and the Adaptive Feature Fusion Module(AFFM). These modules combine neighborhood, relation-aware, and interaction features through deep neural networks to generate robust and expressive entity and relation embeddings. Our method dynamically adjusts the influence of different features, improving both the accuracy and the generalizability of the embeddings. Extensive experiments on five benchmark knowledge graph datasets, FB15K-237, YAGO3-10, WN18RR, KINSHIP, and UMLS, demonstrate that our approach significantly outperforms state-of-the-art models, achieving superior performance on link prediction tasks.
知识图谱嵌入(KGE)是理解大规模知识图谱的关键技术,能够有效表示实体及其之间的复杂关系。尽管基于神经网络的 KGE 模型取得了显著进展,但现有方法仍面临诸多挑战。主要挑战包括对复杂关系模式建模不足、异构特征有效融合困难,以及在表示实体与其对应关系之间的复杂交互方面深度有限。本文提出了一种新型多特征融合方法,通过整合四个专用模块:关系条件邻域聚合器(RCNA)、动态关系调制器(DRM)、实体关系注意力模块(ERAM)和自适应特征融合模块(AFFM),以解决上述问题。这些模块通过深度神经网络结合邻域、关系感知和交互特征,生成鲁棒且表达力强的实体和关系嵌入。我们的方法能够动态调整不同特征的影响,从而提升嵌入的准确性和泛化能力。在五个基准知识图谱数据集(FB15K-237、YAGO3-10、WN18RR、KINSHIP 和 UMLS)上的广泛实验表明,我们的方法显著优于现有最先进模型,在链接预测任务中取得了优异性能。

1. Introduction  1. 引言

Knowledge graphs (KGs) have become a widely adopted data model to structurally represent knowledge and its relationships, with applications in areas such as semantic search [34], recommendation systems [32], natural language processing (NLP) [8], and intelligent question answering [36].By establishing a multi-layered semantic network wherein nodes represent entities and edges denote relationships, KGs are capable of effectively capturing and representing the intricate semantic and structural interconnections among entities.
知识图谱(KGs)已成为一种广泛采用的数据模型,用于结构化表示知识及其关系,其应用领域包括语义搜索[34]、推荐系统[32]、自然语言处理(NLP)[8]和智能问答[36]。通过构建多层语义网络,其中节点代表实体,边表示关系,知识图谱能够有效捕捉和表示实体之间复杂的语义和结构关联。
In real-world applications, most knowledge graphs are often incomplete. Chain-based prediction, as an important reasoning method, infers missing information by utilizing the existing knowledge within the graph, thereby filling in the gaps. KGE techniques map entities and relations to a lowdimensional vector space, enabling semantic information within the graph to be computed and reasoned in the form of vectors. This provides effective support for chain-based prediction, further promoting the inference of missing parts of the graph. Early KGE methods primarily focused on geometric transformation mechanisms, such as TransE [5], TransH [33], and TransD [10], which capture the relational structures between entities through simple spatial transformations. RotatE [24] replaces translation with rotation, enabling the model to capture a wider range of relationship patterns. Meanwhile, traditional bilinear scoring functionbased methods, such as DistMult [35] and ComplEx [28],
在实际应用中,大多数知识图谱往往是不完整的。链式预测作为一种重要的推理方法,通过利用图中已有的知识来推断缺失的信息,从而填补空白。知识图谱工程(KGE)技术将实体和关系映射到低维向量空间,使图中的语义信息能够以向量的形式进行计算和推理。这为链式预测提供了有效的支持,进一步促进了图中缺失部分的推断。早期 KGE 方法主要关注几何变换机制,如 TransE[5]、TransH[33]和 TransD[10],通过简单的空间变换捕获实体间的关联结构。RotatE[24]将平移替换为旋转,使模型能够捕获更广泛的关系模式。与此同时,传统的双线性评分函数方法,如 DistMult [35]和 ComplEx [28],

effectively model the interactions between entities and relations. The RESCAL [20] uses a matrix to represent relations and captures the latent interactions between entities. However, these methods exhibit certain limitations when handling asymmetric relations. With advancements in deep learning technologies, the Convolutional Neural Network (CNN) [17] has been increasingly applied to knowledge graph reasoning tasks. Recent research has explored the use of neural network architectures for KGE, where ConvE [9] employs convolutional filters to capture local interaction patterns, and R-GCN [21] utilizes a relational graph convolutional network to aggregate relational information.
有效建模实体与关系之间的交互。RESCAL [20] 采用矩阵表示关系,并捕捉实体之间的潜在交互。然而,这些方法在处理非对称关系时存在一定局限性。随着深度学习技术的进步,卷积神经网络(CNN)[17] 逐渐被应用于知识图谱推理任务。近期研究探索了神经网络架构在知识图谱表示中的应用,其中 ConvE [9] 通过卷积滤波器捕捉局部交互模式,而 R-GCN [21] 则利用关系图卷积网络聚合关系信息。
Although neural network-based knowledge graph embedding (KGE) methods have made significant strides in recent years, existing embedding models still face several critical challenges. First, current KGE models predominantly focus on direct entity-relation pairs, failing to fully exploit the rich structural information embedded within local neighborhoods. This limitation restricts the model’s ability to capture complex topological patterns, which are essential for understanding the semantics of entities within the context of relations.Traditional approaches rely on static, low-dimensional representations, which are inadequate for encoding the multi-dimensional semantic nature of knowledge graphs. In knowledge graphs, the semantic features of entities and relations are dynamic, influenced by their structural context and co-occurrence patterns. However, static representations are inherently limited in capturing the dynamic evolution of semantic information. Furthermore, existing methods lack sophisticated mechanisms for integrating heterogeneous semantic features. Current models are
尽管基于神经网络的知识图谱嵌入(KGE)方法在近年来取得了显著进展,但现有的嵌入模型仍面临多个关键挑战。首先,当前的 KGE 模型主要关注直接的实体-关系对,未能充分利用局部邻域中蕴含的丰富结构信息。这一局限性限制了模型捕捉复杂拓扑模式的能力,而这些模式对于理解实体在关系上下文中的语义至关重要。传统方法依赖于静态、低维的表示,无法有效编码知识图谱的多维语义特性。在知识图谱中,实体和关系的语义特征是动态的,受其结构上下文和共现模式的影响。然而,静态表示在捕获语义信息动态演变方面存在固有局限性。此外,现有方法缺乏整合异构语义特征的复杂机制。当前模型是

often unable to effectively merge these different types of features, resulting in a narrow and incomplete semantic representation. Knowledge graphs typically embody complex and diverse semantics, and developing an embedding model that can effectively integrate these heterogeneous characteristics remains a significant challenge, especially as the scale and heterogeneity of knowledge graphs continue to grow.
通常难以有效地融合这些不同类型的特征,导致语义表示过于狭窄且不完整。知识图谱通常包含复杂多样的语义,而开发一种能够有效整合这些异构特征的嵌入模型仍是一项重大挑战,尤其随着知识图谱的规模和异构性持续增长。
To address these limitations, we propose a novel MultiFeature Fusion Embedding (MFFE) framework that systematically integrates multiple complementary feature representations through specialized neural modules. Our approach recognizes that effective knowledge graph embedding requires the coordinated modeling of diverse structural and semantic aspects, rather than relying on single-feature extraction strategies. We design four core modules: RCNA, DRM, ERAM, and AFFM. Specifically, RCNA constructs a local subgraph tensor and applies a relation-aware convolutional transformation to aggregate neighborhood features. DRM introduces a learnable relation matrix that dynamically modulates the influence of each relation on entities, thereby enhancing the model’s expressive capability. ERAM utilizes multi-head attention to capture higher-order interaction features between entities and relations. AFFM employs dynamic bidirectional gated fusion mechanism to fuse neighborhood features, relation-aware features, and entityrelation interaction features, generating high-quality embedding representations.
为了解决这些局限性,我们提出了一种新型的多特征融合嵌入(MFFE)框架,该框架通过专门的神经网络模块系统性地整合多种互补的特征表示。我们的方法认识到,有效的知识图谱嵌入需要对多样化的结构和语义方面进行协同建模,而非依赖单一特征提取策略。我们设计了四个核心模块:RCNA、DRM、ERAM 和 AFFM。具体而言,RCNA 构建局部子图张量,并应用关系感知卷积变换聚合邻域特征。DRM 引入可学习的关系矩阵,动态调节每个关系对实体的影响,从而提升模型的表达能力。ERAM 通过多头注意力机制捕获实体与关系之间的更高阶交互特征。AFFM 采用动态双向门控融合机制,融合邻域特征、关系感知特征及实体-关系交互特征,生成高质量的嵌入表示。
The contributions of this research are:
本研究的贡献包括:
  1. This study presents an advanced embedding approach that effectively integrates multiple feature types, including neighborhood structures, relation-aware information, and entity-relation interactions, to enhance performance in link prediction tasks.
    本研究提出了一种先进的嵌入方法,该方法能够有效整合多种特征类型,包括邻域结构、关系感知信息以及实体间关系交互,从而提升链接预测任务的性能。
  2. We introduce an innovative feature aggregation technique that leverages a dynamic bidirectional gated fusion mechanism to adaptively combine heterogeneous features.
    我们提出了一种创新的特征聚合技术,该技术通过动态双向门控融合机制,能够自适应地结合异构特征。
  3. Our method achieves substantial improvements over state-of-the-art baselines in link prediction across multiple benchmark datasets, including FB15K-237, YAGO3-10, WN18RR, KINSHIP, and UMLS, validating the efficacy of multifeature dynamic fusion.
    我们的方法在多个基准数据集上的链接预测任务中,相较于现有最先进基线方法取得了显著提升,包括 FB15K-237、YAGO3-10、WN18RR、KINSHIP 和 UMLS,验证了多特征动态融合的有效性。
Neural network-based models leverage deep learning techniques to capture complex interactions between entities and relations in knowledge graphs. One of the pioneering models in this category is ConvE, which uses CNN to apply 2D convolutions to the interaction between entities and relations, extracting richer features from these interactions. InteractE [29] uses a checkerboard reshaping method to enable complete interactions between entities, further enhancing the feature extraction process. HypER [3] use hypernetwork models that share weights across layers and adaptively combine them based on inputs, allowing for more flexible entityrelation interactions. In ConvR [12], relations are utilized
基于神经网络的模型利用深度学习技术捕捉知识图谱中实体与关系之间的复杂交互。该领域开创性模型之一是 ConvE,其通过卷积神经网络(CNN)对实体与关系之间的交互应用二维卷积,从而从这些交互中提取更丰富的特征。InteractE [29]采用棋盘重塑方法实现实体之间的完全交互,进一步提升特征提取效果。HypER [3] 采用超网络模型,该模型在层间共享权重并根据输入自适应地组合权重,从而实现更灵活的实体-关系交互。在 ConvR [12] 中,关系被用于

directly as convolutional kernels, enabling explicit interactions between entities and relations, and thereby enhancing the model’s ability to capture relational patterns. M-DCN [40] extracts interaction features of entities and relations through multi-scale convolutional kernels, enhancing the expressiveness of the embeddings. JointE [41] combines 1D convolution and 2D convolution to promote entity-relation interaction, and dynamically generates convolutional kernels based on internal embeddings. In MSHE [11] entities, relationships, and joint knowledge sources are constructed via a base multi-source extension framework, followed by the use of hierarchical convolutional networks to capture multilevel semantic features. CALP [39] fuses 2D convolution and self-attention mechanisms to simultaneously capture local and global feature interactions between entities and relations, significantly improving link prediction performance while maintaining low-dimensional embeddings. DTAE [7] constructs adaptive convolutional kernels based on relation representations and combines multidimensional attention mechanisms to capture nuanced interactions between entities and relations, while employing dynamic adaptive atrous convolutional networks to expand the receptive field for obtaining global contextual information. Recent studies, such as MGIF [15], have made significant contributions to multi-perspective knowledge graph embedding by integrating global and interaction features. MGIF primarily extracts features through global entity-sharing and relationsharing convolution operations, which struggle to fully capture the relationship-specific neighborhood patterns crucial for complex semantic modeling. Additionally, its feature fusion approach mainly relies on self-attention mechanisms to combine different perspective features. In contrast, the MFFE framework introduces a more comprehensive feature extraction strategy, leveraging neighborhood aggregation, dynamic relation-awareness, and interaction information to capture a more diversified and granular set of semantic features. The adaptive bidirectional gated fusion mechanism employed by MFFE allows for more precise integration of heterogeneous features compared to attention-based fusion methods.
直接用作卷积核,使实体与关系之间能够明确交互,从而提升模型捕捉关系模式的能力。M-DCN [40] 通过多尺度卷积核提取实体与关系的交互特征,提升嵌入的表达能力。JointE [41] 结合 1D 卷积与 2D 卷积促进实体-关系交互,并基于内部嵌入动态生成卷积核。在 MSHE [11]中,通过基础多源扩展框架构建实体、关系及联合知识源,随后采用分层卷积网络捕获多层次语义特征。CALP [39]融合 2D 卷积与自我注意机制,同时捕获实体与关系间的局部与全局特征交互,显著提升链接预测性能并保持低维嵌入。DTAE [7] 基于关系表示构建自适应卷积核,结合多维注意力机制捕捉实体与关系间的细微交互,并采用动态自适应空洞卷积网络扩展受容野以获取全局上下文信息。近期研究,如 MGIF [15],通过整合全局与交互特征,为多视角知识图谱嵌入做出了重要贡献。MGIF 主要通过全局实体共享与关系共享卷积操作提取特征,但难以充分捕捉复杂语义建模所需的关系特异性邻域模式。 此外,其特征融合方法主要依赖于自注意力机制来整合不同视角的特征。相比之下,MFFE 框架引入了一种更全面的特征提取策略,通过邻域聚合、动态关系感知和交互信息,捕获更丰富多样且细粒度的语义特征集。MFFE 采用的自适应双向门控融合机制,相较于基于注意力的融合方法,能够更精准地整合异构特征。
Graph Convolutional Networks (GCNs) leverage structural information from knowledge graphs to learn more expressive embeddings, demonstrating particular effectiveness in modeling complex relational data. The R-GCN enhances multi-relational data processing by assigning distinct weights to different relation types, thereby capturing diverse relational patterns. CompGCN [30] proposes various composition operations for neighbor aggregation to model the structural patterns of multi-relational graphs. KE-GCN [38] proposes a joint learning framework that integrates knowledge embedding with graph convolution, simultaneously updating both entity and relation embeddings. This approach overcomes the limitations of traditional GCNs, which are typically restricted to homogeneous graphs or focus solely on node embeddings. It offers a unified solution for joint modeling of entities and relations in heterogeneous knowledge graphs. SHGNet [18] removes the transformation
图卷积网络(GCNs)利用知识图谱中的结构信息学习更具表达力的嵌入,在建模复杂关系数据方面表现出特别的有效性。R-GCN 通过为不同关系类型分配不同的权重,提升了多关系数据处理能力,从而捕捉多样化的关系模式。CompGCN [30]提出了多种邻居聚合的组合操作,以建模多关系图的结构模式。KE-GCN [38] 提出了一种联合学习框架,将知识嵌入与图卷积相结合,同时更新实体和关系嵌入。该方法克服了传统 GCN 的局限性,后者通常仅适用于同质图或仅关注节点嵌入。它为异构知识图谱中实体与关系的联合建模提供了统一解决方案。SHGNet [18] 消除了转换

Fig.1. Overall pipeline of the MFFE framework for knowledge graph embedding.
图 1. MFFE 框架用于知识图谱嵌入的整体流程图。

matrix and nonlinear activation functions from traditional GNNs, retaining only the essential neighborhood aggregation operation and incorporating relation features into the feature propagation process, while selectively aggregating informative features through node aggregation and relation weighting mechanisms. MGTCA [22] employs a mixed geometry message function to integrate information from hyperbolic space, hypersphere space, and Euclidean space to generate rich neighbor messages, and designs a trainable convolutional attention network to achieve autonomous switching of graph neural network types.
矩阵和非线性激活函数,仅保留传统 GNN 中的核心邻域聚合操作,并将关系特征融入特征传播过程,同时通过节点聚合和关系权重机制有选择性地聚合信息丰富的特征。MGTCA [22] 采用混合几何消息函数整合双曲空间、超球空间和欧几里得空间的信息以生成丰富的邻域消息,并设计可训练的卷积注意力网络实现图神经网络类型的自主切换。
Transformer architectures have recently been adapted for KGE. SDFormer [16] incorporates Transformer architectures to further enhance entity-relation interactions through attention mechanisms. KG-BERT [37] reformulates knowledge graph triplets as textual sequences and leverages pretrained language models for knowledge graph completion. StAR [31] introduces a structure-augmented text representation learning framework that optimizes both structural and semantic features for efficient knowledge graph reasoning. Relphormer [4] introduces the Triple2Seq mechanism to dynamically sample context subgraph sequences and incorporates a structurally enhanced self-attention mechanism to retain the structural information of KGs. This effectively addresses the heterogeneity issue faced by traditional Transformers in handling KGs. LP-BERT [14] is a multi-task pre-training framework for knowledge graph completion that combines three pre-training tasks: Masked Language Model, Masked Entity Model, and Masked Relation Model, to simultaneously learn contextual information and structural knowledge of triples.
变压器架构最近被适应于知识图谱表示(KGE)。SDFormer [16] 通过注意力机制将变压器架构融入其中,进一步增强实体关系交互。KG-BERT [37] 将知识图谱三元组重新表述为文本序列,并利用预训练语言模型进行知识图谱补全。StAR [31] 提出了一种结构增强的文本表示学习框架,通过优化结构和语义特征,实现高效的知识图谱推理。Relphormer [4] 引入了 Triple2Seq 机制,动态采样上下文子图序列,并整合了结构增强的自我注意力机制以保留知识图谱的结构信息。这有效解决了传统 Transformer 在处理知识图谱时面临的异构性问题。LP-BERT [14] 是一个知识图谱补全的多任务预训练框架,结合了三个预训练任务:遮蔽语言模型、遮蔽实体模型和遮蔽关系模型,以同时学习三元组的上下文信息和结构知识。

3. Methodology  3. 方法论

In this section, we first delineate the task formulation for addressing the link prediction problem in KGs, and then propose a methodology based on multi-feature fusion embeddings. Fig. 1 provides an overview of our proposed MultiFeature Fusion Embedding (MFFE) framework, which extracts diverse features from KGs and adaptively fuses them to generate high-quality embeddings for link prediction. Specifically, we introduce a MFFE approach designed to enhance predictive accuracy by effectively integrating diverse feature representations. Our framework comprises four core modules: (a) Relation-Conditioned Neighborhood Aggregator (RCNA), (b) Dynamic Relation Modulator (DRM), © Entity-Relation Attention Module (ERAM), and (d) Adaptive Feature Fusion Module(AFFM).
在本节中,我们首先明确了在知识图谱(KG)中解决链接预测问题的任务定义,随后提出了一种基于多特征融合嵌入的方法。图 1 展示了我们提出的 MultiFeature Fusion Embedding(MFFE)框架的整体架构,该框架通过从 KG 中提取多样化特征并进行自适应融合,生成高质量的嵌入向量用于链接预测。具体而言,我们提出了一种 MFFE 方法,旨在通过有效整合多样化特征表示来提升预测准确性。我们的框架包含四个核心模块:(a) 关系条件邻域聚合器(RCNA),(b) 动态关系调制器(DRM),(c) 实体关系注意力模块(ERAM),以及(d) 自适应特征融合模块(AFFM)。

3.1. Task Formulation  3.1. 任务制定

A knowledge graph ( KG ) can be represented as G = G = G=\mathcal{G}= ( E , R , T ) ( E , R , T ) (E,R,T)(\mathcal{E}, \mathcal{R}, \mathcal{T}), where: E E E\mathcal{E} is the set of entities in the graph, R R R\mathcal{R} is the set of relations, T T T\mathcal{T} is the set of triples, where each triple consists of a head entity, a relation, and a tail entity.
知识图谱(KG)可以表示为 G = G = G=\mathcal{G}= ( E , R , T ) ( E , R , T ) (E,R,T)(\mathcal{E}, \mathcal{R}, \mathcal{T}) ,其中: E E E\mathcal{E} 是图中的实体集合, R R R\mathcal{R} 是关系集合, T T T\mathcal{T} 是三元组集合,每个三元组由一个头实体、一个关系和一个尾实体组成。
Each triple in T T T\mathcal{T} is represented as ( e h , e r , e t e h , e r , e t e_(h),e_(r),e_(t)e_{h}, e_{r}, e_{t} ), where e h , e t E e h , e t E e_(h),e_(t)inEe_{h}, e_{t} \in \mathcal{E} are the head and tail entities, and e r R e r R e_(r)inRe_{r} \in \mathcal{R} is the relation between the entities. In the context of link prediction, the task is to predict the missing entity in a given triple ( e h , e r , ? e h , e r , ? e_(h),e_(r),?e_{h}, e_{r}, ? ) or (?, e r , e t e r , e t e_(r),e_(t)e_{r}, e_{t} ), where either the head entity e h e h e_(h)e_{h} or the tail entity e t e t e_(t)e_{t} is missing.
每个三元组在 T T T\mathcal{T} 中表示为( e h , e r , e t e h , e r , e t e_(h),e_(r),e_(t)e_{h}, e_{r}, e_{t} ),其中 e h , e t E e h , e t E e_(h),e_(t)inEe_{h}, e_{t} \in \mathcal{E} 是头实体和尾实体, e r R e r R e_(r)inRe_{r} \in \mathcal{R} 是实体之间的关系。在链接预测的上下文中,任务是预测给定三元组( e h , e r , ? e h , e r , ? e_(h),e_(r),?e_{h}, e_{r}, ? )或(?, e r , e t e r , e t e_(r),e_(t)e_{r}, e_{t} )中缺失的实体,其中头实体 e h e h e_(h)e_{h} 或尾实体 e t e t e_(t)e_{t} 缺失。
Formally, the link prediction problem is defined as follows: given a triple ( e h , e r e h , e r e_(h),e_(r)e_{h}, e_{r},?), the goal is to predict the tail entity e t e t e_(t)e_{t}, or given ( ? , e r , e t ? , e r , e t ?,e_(r),e_(t)?, e_{r}, e_{t} ), the goal is to predict the head entity e h e h e_(h)e_{h}. This task can be represented as:
从形式上讲,链接预测问题定义如下:给定一个三元组 ( e h , e r e h , e r e_(h),e_(r)e_{h}, e_{r} ,?),目标是预测尾部实体 e t e t e_(t)e_{t} ,或者给定 ( ? , e r , e t ? , e r , e t ?,e_(r),e_(t)?, e_{r}, e_{t} ),目标是预测头部实体 e h e h e_(h)e_{h} 。该任务可表示为:
( e h , e r , ? ) e t or ( ? , e r , e t ) e h . e h , e r , ? e t  or  ? , e r , e t e h . (e_(h),e_(r),?)rarre_(t)quad" or "quad(?,e_(r),e_(t))rarre_(h).\left(e_{h}, e_{r}, ?\right) \rightarrow e_{t} \quad \text { or } \quad\left(?, e_{r}, e_{t}\right) \rightarrow e_{h} .
Fig.2. Overview of the feature extraction components in the proposed multi-feature fusion knowledge graph embedding framework. It includes: (a) Relation-aware Convolutional Network Aggregator (RCNA), (b) Dynamic Representation Module (DRM), and © Entity-Relation Alignment Mechanism (ERAM).
图 2. 提出的基于多特征融合知识图谱嵌入框架的特征提取组件概述。该框架包括:(a) 关系感知卷积网络聚合器(RCNA),(b) 动态表示模块(DRM),以及 © 实体关系对齐机制(ERAM)。

3.2. Relation-Conditioned Neighborhood Aggregator
3.2. 关系条件邻域聚合器

To effectively capture relationship-specific topological patterns in KGs, this section introduces the RCNA. This module aggregates neighbor features in a structure-sensitive way by dynamically parameterizing relationship embeddings as a convolution kernel during chain reasoning. The module consists of two core stages:
为了有效捕获知识图谱(KGs)中关系特异性的拓扑模式,本节引入了关系链分析(RCNA)模块。该模块通过在链推理过程中动态参数化关系嵌入作为卷积核,以结构敏感的方式聚合邻居特征。该模块包含两个核心阶段:

(1)Neighbor Subgraph Tensor Construction
(1)邻域子图张量构造
For a target head entity h E h E h inEh \in \mathcal{E}, a set of K K KK associated triples within its 1 -hop neighborhood, denoted as { ( h , r i , t i ) } i = 1 K h , r i , t i i = 1 K {(h,r_(i),t_(i))}_(i=1)^(K)\left\{\left(h, r_{i}, t_{i}\right)\right\}_{i=1}^{K} are systematically aggregated to construct a local subgraph, where K K KK is configured to be 32 by default. Following the embedding of each element into a d d dd-dimensional vector,
对于目标头实体 h E h E h inEh \in \mathcal{E} ,在其 1 跳邻域内与之关联的一组三元组 K K KK ,记为 { ( h , r i , t i ) } i = 1 K h , r i , t i i = 1 K {(h,r_(i),t_(i))}_(i=1)^(K)\left\{\left(h, r_{i}, t_{i}\right)\right\}_{i=1}^{K} ,系统性地聚合这些三元组以构建局部子图,其中 K K KK 默认配置为 32。随后,将每个元素嵌入到一个 d d dd 维向量中,

a three-dimensional structural tensor is subsequently constructed:
随后构建了一个三维结构张量:

N subgraph = [ Pad ( { h r i t i } i = 1 32 ) ] R 32 × 3 × d N subgraph  = Pad h r i t i i = 1 32 R 32 × 3 × d N_("subgraph ")=[Pad({ho+r_(i)o+t_(i)}_(i=1)^(32))]inR^(32 xx3xx d)\mathbf{N}_{\text {subgraph }}=\left[\operatorname{Pad}\left(\left\{\mathbf{h} \oplus \mathbf{r}_{i} \oplus \mathbf{t}_{i}\right\}_{i=1}^{32}\right)\right] \in \mathbb{R}^{32 \times 3 \times d}
Here, o+\oplus denotes the concatenation operation, and the padding operation Pad is applied to ensure a fixed neighborhood size of 32 . When K < 32 K < 32 K < 32K<32, the remaining positions are padded with zero vectors, thereby ensuring that each entity is associated with a fixed-size tensor of dimensions. In contrast, if an entity has more than 32 neighbors, only the first 32 triples are retained to maintain consistency in tensor dimensions. The tensor preserves the topological ordering of the local subgraph along the second dimension, which corresponds to the neighbor index axis.
在此, o+\oplus 表示连接操作,填充操作 Pad 被应用以确保邻域大小固定为 32。当 K < 32 K < 32 K < 32K<32 时,剩余位置用零向量填充,从而确保每个实体与固定维度的张量相关联。相反,如果一个实体有超过 32 个邻居,则仅保留前 32 个三元组以保持张量维度的一致性。张量保留了局部子图在第二维度上的拓扑顺序,该维度对应于邻居索引轴。

(2)Relation-Conditioned Convolution Transformation The relationship embedding r R d r R d rinR^(d)\mathbf{r} \in \mathbb{R}^{d} from the current chain reasoning stage is projected into a dynamic convolution kernel via a fully connected layer. Specifically, the MLP projects r r r\mathbf{r} into a one-dimensional vector of size 32 × 3 × 3 = 288 32 × 3 × 3 = 288 32 xx3xx3=28832 \times 3 \times 3=288, which is then reshaped into a three-dimensional tensor:
(2)关系条件卷积变换当前链推理阶段的关系嵌入 r R d r R d rinR^(d)\mathbf{r} \in \mathbb{R}^{d} 通过全连接层投影到动态卷积核中。具体而言,MLP 将 r r r\mathbf{r} 投影为大小为 32 × 3 × 3 = 288 32 × 3 × 3 = 288 32 xx3xx3=28832 \times 3 \times 3=288 的一维向量,随后将其重塑为三维张量:

K r = Reshape ( MLP ( r ) ) R 32 × 3 × 3 K r = Reshape ( MLP ( r ) ) R 32 × 3 × 3 K_(r)=Reshape(MLP(r))inR^(32 xx3xx3)\mathcal{K}_{r}=\operatorname{Reshape}(\operatorname{MLP}(\mathbf{r})) \in \mathbb{R}^{32 \times 3 \times 3}
The convolution kernel operates over a 3 × 3 3 × 3 3xx33 \times 3 spatial window to model structural interactions within each triple, and utilizes 32 feature channels to learn discriminative representations for distinct neighboring entities. The convolution operation is formally defined as:
卷积核在 3 × 3 3 × 3 3xx33 \times 3 空间窗口上进行操作,以建模每个三元组内的结构交互,并利用 32 个特征通道学习区分相邻实体的特征表示。卷积操作的正式定义为:

h neigh = σ ( i = 1 32 j = 1 3 k = 1 3 K r ( i , j , k ) N subgraph ( i , j , k ) ) R d h neigh  = σ i = 1 32 j = 1 3 k = 1 3 K r ( i , j , k ) N subgraph  ( i , j , k ) R d h_("neigh ")=sigma(sum_(i=1)^(32)sum_(j=1)^(3)sum_(k=1)^(3)K_(r)^((i,j,k))**N_("subgraph ")^((i,j,k)))inR^(d)\mathbf{h}_{\text {neigh }}=\sigma\left(\sum_{i=1}^{32} \sum_{j=1}^{3} \sum_{k=1}^{3} \mathcal{K}_{r}^{(i, j, k)} * \mathbf{N}_{\text {subgraph }}{ }^{(i, j, k)}\right) \in \mathbb{R}^{d}
where *** denotes the valid convolution operation on tensor slices, and σ σ sigma\sigma is the ReLU activation function. The final output neighbor features h neigh h neigh  h_("neigh ")\mathbf{h}_{\text {neigh }} will serve as relation-specific contextual representations for subsequent decoding.
其中, *** 表示张量切片上的有效卷积运算, σ σ sigma\sigma 为 ReLU 激活函数。最终输出的邻域特征 h neigh h neigh  h_("neigh ")\mathbf{h}_{\text {neigh }} 将作为后续解码的关联特异性上下文表示。

3.3. Dynamic Relation Modulator
3.3. 动态关系调制器

Traditional KGE models, such as RESCAL, represent each relation with a fixed matrix. Nevertheless, this static representation framework is insufficient for fostering the model’s adaptation to the relational impact on entities, particularly in accordance with the specific requirements of the task or the subtleties of the context. To overcome this limitation, we propose the DRM, which allows for the dynamic adjustment of relation embeddings.
传统 KGE 模型(如 RESCAL)通过固定矩阵表示每种关系。然而,这种静态表示框架无法有效支持模型适应关系对实体的影响,尤其难以满足任务特定需求或处理上下文细微差别。为克服这一局限,我们提出动态关系模型(DRM),该模型支持关系嵌入的动态调整。
In this module, each relation is initially represented by an embedding vector r r r\mathbf{r}. Instead of using a fixed matrix for each relation, we generate a dynamic relation matrix W r W r W_(r)\mathbf{W}_{r} using a feed-forward neural network f θ f θ f thetaf \theta, which takes the relation embedding r r r\mathbf{r} as input.
在本模块中,每个关系最初由一个嵌入向量 r r r\mathbf{r} 表示。与为每个关系使用固定矩阵不同,我们通过一个前馈神经网络 f θ f θ f thetaf \theta 生成动态关系矩阵 W r W r W_(r)\mathbf{W}_{r} ,该网络以关系嵌入 r r r\mathbf{r} 作为输入。

W r = f θ ( r ) = ReLU ( r U 1 + b 1 ) U 2 + b 2 W r = f θ ( r ) = ReLU r U 1 + b 1 U 2 + b 2 Wr=f theta(r)=ReLU(rU_(1)+b_(1))U_(2)+b_(2)\mathbf{W} r=f \theta(\mathbf{r})=\operatorname{ReLU}\left(\mathbf{r} \mathbf{U}_{1}+\mathbf{b}_{1}\right) \mathbf{U}_{2}+\mathbf{b}_{2}
where U 1 R d × d U 1 R d × d U_(1)inR^(d xx d)\mathbf{U}_{1} \in \mathbb{R}^{d \times d} and U 2 R d × d 2 U 2 R d × d 2 U_(2)inR^(d xxd^(2))\mathbf{U}_{2} \in \mathbb{R}^{d \times d^{2}} are learnable transformation matrices, b 1 R d h b 1 R d h b_(1)inR^(d_(h))\mathbf{b}_{1} \in \mathbb{R}^{d_{h}} and b 2 R d 2 b 2 R d 2 b_(2)inR^(d^(2))\mathbf{b}_{2} \in \mathbb{R}^{d^{2}} are bias vectors.
其中 U 1 R d × d U 1 R d × d U_(1)inR^(d xx d)\mathbf{U}_{1} \in \mathbb{R}^{d \times d} U 2 R d × d 2 U 2 R d × d 2 U_(2)inR^(d xxd^(2))\mathbf{U}_{2} \in \mathbb{R}^{d \times d^{2}} 是可学习的变换矩阵, b 1 R d h b 1 R d h b_(1)inR^(d_(h))\mathbf{b}_{1} \in \mathbb{R}^{d_{h}} b 2 R d 2 b 2 R d 2 b_(2)inR^(d^(2))\mathbf{b}_{2} \in \mathbb{R}^{d^{2}} 是偏置向量。
The output is then reshaped to form the dynamic relation matrix W r R d × d W r R d × d W_(r)inR^(d xx d)\mathbf{W}_{r} \in \mathbb{R}^{d \times d}.
输出随后被重新 shaping 以形成动态关系矩阵 W r R d × d W r R d × d W_(r)inR^(d xx d)\mathbf{W}_{r} \in \mathbb{R}^{d \times d}
The relation-aware feature extraction is performed by multiplying the entity embedding h h h\mathbf{h} with the dynamic relation matrix W r W r W_(r)\mathbf{W}_{r}, as follows:
关系感知特征提取通过将实体嵌入 h h h\mathbf{h} 与动态关系矩阵 W r W r W_(r)\mathbf{W}_{r} 相乘来实现,具体如下:

h rel = h W r h rel  = h W r h_("rel ")=h*W_(r)\mathbf{h}_{\text {rel }}=\mathbf{h} \cdot \mathbf{W}_{r}
where h R 1 × d h R 1 × d hinR^(1xx d)\mathbf{h} \in \mathbb{R}^{1 \times d} denotes the original entity embedding vector, W r R d × d W r R d × d W_(r)inR^(d xx d)\mathbf{W}_{r} \in \mathbb{R}^{d \times d} is the dynamic transformation matrix generated for relation r r rr, and h rel R 1 × d h rel  R 1 × d h_("rel ")inR^(1xx d)\mathbf{h}_{\text {rel }} \in \mathbb{R}^{1 \times d} represents the resulting relation-aware entity representation.
其中, h R 1 × d h R 1 × d hinR^(1xx d)\mathbf{h} \in \mathbb{R}^{1 \times d} 表示原始实体嵌入向量, W r R d × d W r R d × d W_(r)inR^(d xx d)\mathbf{W}_{r} \in \mathbb{R}^{d \times d} 是为关系 r r rr 生成的动态变换矩阵,而 h rel R 1 × d h rel  R 1 × d h_("rel ")inR^(1xx d)\mathbf{h}_{\text {rel }} \in \mathbb{R}^{1 \times d} 表示最终获得的关系感知实体表示。
This formulation allows each relation to have its own specific effect on the entity, providing a richer and more expressive representation of the entity-relation interactions. By dynamically adjusting the relation matrix, the model is capable of capturing the diverse and complex nature of the relations in the knowledge graph.
该方法使每个关系都能对实体产生其特有的影响,从而对实体与关系之间的交互关系提供更丰富、更具表达力的表示。通过动态调整关系矩阵,该模型能够捕捉知识图谱中关系的多样性和复杂性。

3.4. Entity-Relation Attention Module
3.4. 实体关系注意力模块

We introduce the ERAM based on multi-head attention to enhance the interaction between entities and relations. Specifically, entity embeddings, relation embeddings, and the dot product of entities and relations are designated as query vectors, key vectors, and value vectors, respectively. The attention weights are dynamically computed through the multi-head attention mechanism, which reflects the varying importance of different relations to the entity representation. These weights highlight key relations and diminish redundant information. The weighted aggregation of these attention weights with the entity-relation representations results in the final feature representation, capturing the refined interaction between the entity and relation.
我们提出了一种基于多头注意力的 ERAM 模型,以增强实体与关系之间的交互作用。具体而言,实体嵌入、关系嵌入以及实体与关系的点积分别被指定为查询向量、关键向量和值向量。通过多头注意力机制动态计算注意力权重,反映不同关系对实体表示的不同重要性。这些权重突出了关键关系并削弱了冗余信息。通过将这些注意力权重与实体-关系表示进行加权聚合,最终得到特征表示,从而捕捉到实体与关系之间精细化的交互关系。
Formally, for each triple ( s , r , t s , r , t s,r,ts, r, t ), we assign an embedding vector e s R d e s R d e_(s)inR^(d)\mathbf{e}_{s} \in \mathbb{R}^{d} to the head entity s s ss, and e r R d e r R d e_(r)inR^(d)\mathbf{e}_{r} \in \mathbb{R}^{d} to the relation r r rr. These embeddings are used to form query, key, and value vectors for the multi-head attention mechanism. The query matrix Q s Q s Q_(s)\mathbf{Q}_{s} is derived from the entity embedding e s e s e_(s)\mathbf{e}_{s}, while the key matrix K r K r K_(r)\mathbf{K}_{r} comes from the relation embedding e r e r e_(r)\mathbf{e}_{r}, and the value matrix V s r V s r V_(s)r\mathbf{V}_{s} r is the dot product of the entity and relation embeddings. The attention matrix A s r A s r A_(sr)\mathbf{A}_{s r}, which measures the influence of the relation r r rr on the entity s s ss, is calculated as:
正式地,对于每个三元组 ( s , r , t s , r , t s,r,ts, r, t ),我们将嵌入向量 e s R d e s R d e_(s)inR^(d)\mathbf{e}_{s} \in \mathbb{R}^{d} 映射到头实体 s s ss ,并将 e r R d e r R d e_(r)inR^(d)\mathbf{e}_{r} \in \mathbb{R}^{d} 映射到关系 r r rr 。这些嵌入向量用于构造多头注意力机制中的查询向量、键向量和值向量。查询矩阵 Q s Q s Q_(s)\mathbf{Q}_{s} 由实体嵌入 e s e s e_(s)\mathbf{e}_{s} 衍生而来,而键矩阵 K r K r K_(r)\mathbf{K}_{r} 来自关系嵌入 e r e r e_(r)\mathbf{e}_{r} ,值矩阵 V s r V s r V_(s)r\mathbf{V}_{s} r 则是实体嵌入与关系嵌入的点积。注意力矩阵 A s r A s r A_(sr)\mathbf{A}_{s r} ,用于衡量关系 r r rr 对实体 s s ss 的影响,其计算方式为:

A s r = softmax ( e s e r T d ) A s r = softmax e s e r T d A_(sr)=softmax((e_(s)e_(r)^(T))/(sqrtd))\mathbf{A}_{s r}=\operatorname{softmax}\left(\frac{\mathbf{e}_{s} \mathbf{e}_{r}^{T}}{\sqrt{d}}\right)
where e s , e r R 1 × d e s , e r R 1 × d e_(s),e_(r)inR^(1xx d)\mathbf{e}_{s}, \mathbf{e}_{r} \in \mathbb{R}^{1 \times d} denote the embedding vectors of the head entity and relation, respectively, and d d dd is the embedding dimension.
其中, e s , e r R 1 × d e s , e r R 1 × d e_(s),e_(r)inR^(1xx d)\mathbf{e}_{s}, \mathbf{e}_{r} \in \mathbb{R}^{1 \times d} 分别表示头实体和关系的嵌入向量,而 d d dd 表示嵌入维度。
Subsequently, we compute the weighted feature representation h int h int  h_("int ")\mathbf{h}_{\text {int }} by aggregating the value vectors V r V r V_(r)\mathbf{V}_{r} using the attention matrix A s r A s r A_(sr)\mathbf{A}_{s r} :
随后,我们通过使用注意力矩阵 A s r A s r A_(sr)\mathbf{A}_{s r} 聚合值向量 V r V r V_(r)\mathbf{V}_{r} ,计算加权特征表示 h int h int  h_("int ")\mathbf{h}_{\text {int }}

h i n t = A s r ( e s e r ) h i n t = A s r e s e r h_(int)=A_(sr)(e_(s)o.e_(r))\mathbf{h}_{i n t}=\mathbf{A}_{s r}\left(\mathbf{e}_{s} \odot \mathbf{e}_{r}\right)
The output h int h int  h_("int ")\mathbf{h}_{\text {int }} represents the refined feature of the entity after incorporating the relation’s influence.
输出 h int h int  h_("int ")\mathbf{h}_{\text {int }} 表示在考虑关系影响后,实体的精炼特征。
To further enhance the expressiveness of the model, we apply multi-head attention, where multiple sets of query, key, and value projections are used to independently compute attention for each head. The outputs of all heads are concatenated and linearly transformed. Mathematically, the multi-head attention mechanism can be defined as:
为了进一步提升模型的表达能力,我们引入了多头注意力机制,其中使用多组查询、关键和值投影来独立计算每个头部的注意力。所有头部的输出被拼接后再进行线性变换。数学上,多头注意力机制可定义为:
Mathematically, the multi-head attention mechanism can be defined as:
从数学上讲,多头注意力机制可以定义为:

MultiHead ( e s , e r ) = [ h int 1 , h int 2 , , h int i ] W O MultiHead e s , e r = h int  1 , h int  2 , , h int  i W O MultiHead(e_(s),e_(r))=[h_("int "_(1)),h_("int "_(2)),dots,h_("int "_(i))]W_(O)\operatorname{MultiHead}\left(\mathbf{e}_{s}, \mathbf{e}_{r}\right)=\left[\mathbf{h}_{\text {int }_{1}}, \mathbf{h}_{\text {int }_{2}}, \ldots, \mathbf{h}_{\text {int }_{i}}\right] \mathbf{W}_{O}
where h int i h int  i h_("int "_(i))\mathbf{h}_{\text {int }_{i}} denotes the output of the i i ii-th attention head, and W O R h d × d W O R h d × d W_(O)inR^(h*d xx d)\mathbf{W}_{O} \in \mathbb{R}^{h \cdot d \times d} is the output projection matrix.
其中, h int i h int  i h_("int "_(i))\mathbf{h}_{\text {int }_{i}} 表示第 i i ii 个注意力头部的输出,而 W O R h d × d W O R h d × d W_(O)inR^(h*d xx d)\mathbf{W}_{O} \in \mathbb{R}^{h \cdot d \times d} 是输出投影矩阵。
The ERAM effectively captures the intricate interactions between entities and relations by dynamically adjusting the contribution of different relations to the entity’s representation. The multi-head attention mechanism allows the model to learn from multiple subspaces simultaneously, enhancing its ability to represent complex relational information in KGs.
ERAM 通过动态调整不同关系对实体表示的贡献,有效捕获实体与关系之间的复杂交互。多头注意力机制使模型能够同时从多个子空间中学习,从而提升其在知识图谱中表示复杂关系信息的能力。

3.5. Adaptive Feature Fusion Module
3.5. 自适应特征融合模块

Fig.3. Illustration of the Adaptive Feature Fusion Module (AFFM)
图 3. 自适应特征融合模块(AFFM)的示意图
The AFFM is designed to combine the features extracted from the previous three modules: neighborhood features, relation-aware features, and entity-relation interaction features. This module aims to aggregate these features effectively and produce a comprehensive entity representation that captures both local and global graph information.
AFFM 旨在整合前三个模块中提取的特征:邻域特征、关系感知特征以及实体关系交互特征。该模块的目标是有效聚合这些特征,生成一个全面的实体表示,能够同时捕捉局部和全局图信息。
Let h neigh R d , h rel R d h neigh  R d , h rel  R d h_("neigh ")inR^(d),h_("rel ")inR^(d)\mathbf{h}_{\text {neigh }} \in \mathbb{R}^{d}, \mathbf{h}_{\text {rel }} \in \mathbb{R}^{d}, and h int R d h int  R d h_("int ")inR^(d)\mathbf{h}_{\text {int }} \in \mathbb{R}^{d} denote three distinct feature vectors corresponding to neighborhood features, relation-aware features, and entity-relation interaction features, respectively. The module operates through a cascaded architecture comprising bidirectional gated fusion and dynamic weight aggregation, enabling adaptive combination of complementary information across features.
h neigh R d , h rel R d h neigh  R d , h rel  R d h_("neigh ")inR^(d),h_("rel ")inR^(d)\mathbf{h}_{\text {neigh }} \in \mathbb{R}^{d}, \mathbf{h}_{\text {rel }} \in \mathbb{R}^{d} h int R d h int  R d h_("int ")inR^(d)\mathbf{h}_{\text {int }} \in \mathbb{R}^{d} 分别表示对应于邻域特征、关系感知特征和实体关系交互特征的三个不同特征向量。该模块通过级联架构实现,包含双向门控融合与动态权重聚合,从而实现跨特征的互补信息自适应融合。
For each feature pair ( h i , h j h i , h j h_(i),h_(j)\mathbf{h}_{i}, \mathbf{h}_{j} ), where i , j { i , j { i,j in{i, j \in\{ neigh, rel, int } } }\}, the fusion process begins by modeling cross-feature dependencies through a parameterized gating mechanism. A convolutional operator with kernel W i j R k × 2 d W i j R k × 2 d W_(ij)inR^(k xx2d)\mathbf{W}_{i j} \in \mathbb{R}^{k \times 2 d} processes the concatenated input [ h i ; h j ] h i ; h j [h_(i);h_(j)]\left[\mathbf{h}_{i} ; \mathbf{h}_{j}\right], generating a spatial-channel attention map:
对于每个特征对 ( h i , h j h i , h j h_(i),h_(j)\mathbf{h}_{i}, \mathbf{h}_{j} ),其中 i , j { i , j { i,j in{i, j \in\{ } } }\} 邻接且具有相同关系和整数属性,融合过程首先通过参数化门控机制建模跨特征依赖关系。卷积操作符(卷积核为 W i j R k × 2 d W i j R k × 2 d W_(ij)inR^(k xx2d)\mathbf{W}_{i j} \in \mathbb{R}^{k \times 2 d} )处理拼接后的输入 [ h i ; h j ] h i ; h j [h_(i);h_(j)]\left[\mathbf{h}_{i} ; \mathbf{h}_{j}\right] ,生成空间-通道注意力图:

S i j = σ ( Conv ( [ h i ; h j ] , W i j ) + b i j ) S i j = σ Conv h i ; h j , W i j + b i j S_(ij)=sigma(Conv([h_(i);h_(j)],W_(ij))+b_(ij))\mathbf{S}_{i j}=\sigma\left(\operatorname{Conv}\left(\left[h_{i} ; h_{j}\right], W_{i j}\right)+b_{i j}\right)
where σ σ sigma\sigma denotes the sigmoid function, and S i j [ 0 , 1 ] d S i j [ 0 , 1 ] d S_(ij)in[0,1]^(d)\mathbf{S}_{i j} \in[0,1]^{d} dynamically quantifies the relevance of each feature dimension. The gated fusion output is then computed as:
其中 σ σ sigma\sigma 表示 sigmoid 函数,而 S i j [ 0 , 1 ] d S i j [ 0 , 1 ] d S_(ij)in[0,1]^(d)\mathbf{S}_{i j} \in[0,1]^{d} 动态量化每个特征维度的相关性。门控融合输出随后计算为:

F i j = S i j h i + ( 1 S i j ) h j F i j = S i j h i + 1 S i j h j F_(ij)=S_(ij)o.h_(i)+(1-S_(ij))o.h_(j)\mathbf{F}_{i j}=\mathbf{S}_{i j} \odot \mathbf{h}_{i}+\left(\mathbf{1}-\mathbf{S}_{i j}\right) \odot \mathbf{h}_{j},
where o.\odot represents element-wise multiplication. This formulation allows selective emphasis on discriminative patterns while suppressing redundant components between h i h i h_(i)\mathbf{h}_{i} and h j h j h_(j)\mathbf{h}_{j}.
其中 o.\odot 表示元素级乘法。这种表述方式允许有选择地强调区分性模式,同时抑制 h i h i h_(i)\mathbf{h}_{i} h j h j h_(j)\mathbf{h}_{j} 之间的冗余成分。
Each fused pair F i j F i j F_(ij)\mathbf{F}_{i j} undergoes dimension reduction and non-linear transformation via a learnable projection:
每个融合对 F i j F i j F_(ij)\mathbf{F}_{i j} 通过可学习的投影进行维度缩减和非线性变换:

F ^ i j = ReLU ( W proj F i j + b proj ) F ^ i j = ReLU W proj  F i j + b proj  hat(F)_(ij)=ReLU(W_("proj ")^(TT)F_(ij)+b_("proj "))\hat{\mathbf{F}}_{i j}=\operatorname{ReLU}\left(\mathbf{W}_{\text {proj }}^{\top} \mathbf{F}_{i j}+\mathbf{b}_{\text {proj }}\right),
where W proj R d × d W proj  R d × d W_("proj ")inR^(d xxd^('))\mathbf{W}_{\text {proj }} \in \mathbb{R}^{d \times d^{\prime}} compresses features into a lowerdimensional manifold, enhancing representational efficiency and separability.
其中, W proj R d × d W proj  R d × d W_("proj ")inR^(d xxd^('))\mathbf{W}_{\text {proj }} \in \mathbb{R}^{d \times d^{\prime}} 将特征压缩到一个低维流形中,从而提升表示效率和可分离性。
The final fused representation F R d F R d FinR^(d^('))\mathbf{F} \in \mathbb{R}^{d^{\prime}} is synthesized by adaptively combining all pairwise outputs F ^ neigh-rel F ^ neigh-rel  hat(F)_("neigh-rel ")\hat{\mathbf{F}}_{\text {neigh-rel }}, F ^ neigh-int , F ^ rel-int F ^ neigh-int  , F ^ rel-int  hat(F)_("neigh-int "), hat(F)_("rel-int ")\hat{\mathbf{F}}_{\text {neigh-int }}, \hat{\mathbf{F}}_{\text {rel-int }}. Dynamic weights α neigh-rel , α neigh-int , α rel-int α neigh-rel  , α neigh-int  , α rel-int  alpha_("neigh-rel "),alpha_("neigh-int "),alpha_("rel-int ")\alpha_{\text {neigh-rel }}, \alpha_{\text {neigh-int }}, \alpha_{\text {rel-int }} are learned through a self-attention mechanism:
最终融合表示 F R d F R d FinR^(d^('))\mathbf{F} \in \mathbb{R}^{d^{\prime}} 通过自适应地组合所有成对输出 F ^ neigh-rel F ^ neigh-rel  hat(F)_("neigh-rel ")\hat{\mathbf{F}}_{\text {neigh-rel }} F ^ neigh-int , F ^ rel-int F ^ neigh-int  , F ^ rel-int  hat(F)_("neigh-int "), hat(F)_("rel-int ")\hat{\mathbf{F}}_{\text {neigh-int }}, \hat{\mathbf{F}}_{\text {rel-int }} 合成得到。动态权重 α neigh-rel , α neigh-int , α rel-int α neigh-rel  , α neigh-int  , α rel-int  alpha_("neigh-rel "),alpha_("neigh-int "),alpha_("rel-int ")\alpha_{\text {neigh-rel }}, \alpha_{\text {neigh-int }}, \alpha_{\text {rel-int }} 通过自注意力机制学习获得:

α i j = exp ( q F ^ i j ) k l exp ( q F ^ k l ) α i j = exp q F ^ i j k l exp q F ^ k l alpha_(ij)=(exp(q^(TT) hat(F)_(ij)))/(sum_(kl)exp(q^(TT) hat(F)_(kl)))\alpha_{i j}=\frac{\exp \left(\mathbf{q}^{\top} \hat{\mathbf{F}}_{i j}\right)}{\sum_{k l} \exp \left(\mathbf{q}^{\top} \hat{\mathbf{F}}_{k l}\right)},
where q R d q R d qinR^(d^('))\mathbf{q} \in \mathbb{R}^{d^{\prime}} is a trainable query vector. The aggregated output is computed as:
其中 q R d q R d qinR^(d^('))\mathbf{q} \in \mathbb{R}^{d^{\prime}} 是一个可训练的查询向量。聚合输出计算如下:

V = i , j α i j F ^ i j V = i , j α i j F ^ i j V=sum_(i,j)alpha_(ij)* hat(F)_(ij)\mathbf{V}=\sum_{i, j} \alpha_{i j} \cdot \hat{\mathbf{F}}_{i j}.
The AFFM aggregates these features into a unified entity representation V V V\mathbf{V}, which is then used for link prediction tasks. By integrating multiple feature types and processing them at different scales, the AFFM allows the model to capture complex dependencies and interactions within the knowledge graph, improving the overall performance of link prediction.
AFFM 将这些特征聚合为一个统一的实体表示 V V V\mathbf{V} ,该表示随后用于链接预测任务。通过整合多种特征类型并在不同尺度上进行处理,AFFM 使模型能够捕捉知识图谱中的复杂依赖关系和交互作用,从而提升链接预测的整体性能。

3.6. Training  3.6. 培训

The training objective of our model is to learn highquality embeddings by minimizing the discrepancy between predicted and ground-truth triples. For each candidate triple ( h , r , t ) ( h , r , t ) (h,r,t)(h, r, t), the model computes a score ψ ( h , r , t ) ψ ( h , r , t ) psi(h,r,t)\psi(h, r, t) through the multi-feature fusion module, which is then normalized to a probabilistic prediction via the sigmoid activation function:
我们的模型训练目标是通过最小化预测三元组与真实三元组之间的差异,学习高质量的嵌入向量。对于每个候选三元组 ( h , r , t ) ( h , r , t ) (h,r,t)(h, r, t) ,模型通过多特征融合模块计算一个分数 ψ ( h , r , t ) ψ ( h , r , t ) psi(h,r,t)\psi(h, r, t) ,随后通过 sigmoid 激活函数将其规范化为概率预测值:

y = sigmoid ( ψ ( h , r , t ) ) y = sigmoid ( ψ ( h , r , t ) ) y=sigmoid(psi(h,r,t))y=\operatorname{sigmoid}(\psi(h, r, t))
Here, y [ 0 , 1 ] y [ 0 , 1 ] y in[0,1]y \in[0,1] represents the confidence of the triple being valid. We employ binary cross-entropy loss as the training objective to measure the divergence between predictions and labels:
在此, y [ 0 , 1 ] y [ 0 , 1 ] y in[0,1]y \in[0,1] 表示三元组为有效的置信度。我们采用二分类交叉熵损失作为训练目标,以衡量预测结果与标签之间的差异:

L = 1 N i = 1 N [ y ^ i log y i + ( 1 y ^ i ) log ( 1 y i ) ] L = 1 N i = 1 N y ^ i log y i + 1 y ^ i log 1 y i L=-(1)/(N)sum_(i=1)^(N)[ hat(y)_(i)log y_(i)+(1- hat(y)_(i))log(1-y_(i))]\mathcal{L}=-\frac{1}{N} \sum_{i=1}^{N}\left[\hat{y}_{i} \log y_{i}+\left(1-\hat{y}_{i}\right) \log \left(1-y_{i}\right)\right]
where y ^ i { 0 , 1 } y ^ i { 0 , 1 } hat(y)_(i)in{0,1}\hat{y}_{i} \in\{0,1\} denotes the ground-truth label, and N N NN is the total number of training triples.
其中 y ^ i { 0 , 1 } y ^ i { 0 , 1 } hat(y)_(i)in{0,1}\hat{y}_{i} \in\{0,1\} 表示真实标签, N N NN 表示训练三元组的总数。

4. Experiments  4. 实验

In this section, we describe the experimental setup and evaluate the performance of our proposed model on link prediction tasks within KGs.
在本节中,我们描述了实验设置,并评估了我们在知识图谱(KGs)中的链接预测任务上提出的模型的性能。

4.1. Datasets  4.1. 数据集

We evaluate the efficacy of our model utilizing three extensively recognized benchmark datasets for KGs: FB15k237 [27], YAGO3-10 [23], and WN18RR [9]. Table 1 summarizes key statistics for each dataset. FB15k-237 is a subset of FB15K [5], with inverse relations removed for a more accurate evaluation. WN18RR is a dataset that is a subset of WordNet, describing the associative features between English words. It retains the symmetric, asymmetric, and combinatorial relationships in the WordNet dataset and eliminates inverse relationships. YAGO3-10 is a large-scale KG derived from YAGO3. The KINSHIP[19] captures kinship relationships among Alyawarra tribe members in central Australia, representing traditional Aboriginal family structures. UMLS[13] constitutes a comprehensive biomedical knowledge graph integrating medical vocabularies, with relationships encoding clinical associations between diseases, treatments, and diagnostic procedures.
我们通过三个广受认可的知识图谱(KG)基准数据集对模型效果进行评估:FB15k237 [27]、YAGO3-10 [23] 和 WN18RR [9]。表 1 总结了每个数据集的关键统计信息。FB15k-237 是 FB15K [5]的子集,移除了逆关系以实现更准确的评估。WN18RR 是一个基于 WordNet 的子集数据集,描述了英语单词之间的关联特征。它保留了 WordNet 数据集中的对称、非对称和组合关系,并去除了逆关系。YAGO3-10 是从 YAGO3 衍生的大规模知识图谱。KINSHIP[19] 捕获了澳大利亚中部 Alyawarra 部落成员之间的亲属关系,代表了传统原住民家庭结构。UMLS[13] 是一个综合性生物医学知识图谱,整合了医学术语,其中关系编码了疾病、治疗和诊断程序之间的临床关联。

4.2. Evaluation Metrics  4.2. 评估指标

The evaluation of knowledge graph completion follows the standard entity ranking protocol. Given a partial triple with either the head or tail entity missing (e.g., ( ? , r , e j ) ? , r , e j (?,r,e_(j))\left(?, r, e_{j}\right) or ( e i , r , ? e i , r , ? e_(i),r,?e_{i}, r, ? )), the model is required to generate a ranked list of candidate entities from the entire entity space E E E\mathcal{E}. We use three primary metrics to assess model performance:
知识图谱补全的评估遵循标准实体排序协议。给定一个不完整的三重元组(即缺少头部或尾部实体,例如 ( ? , r , e j ) ? , r , e j (?,r,e_(j))\left(?, r, e_{j}\right) 或 ( e i , r , ? e i , r , ? e_(i),r,?e_{i}, r, ? )),模型需从整个实体空间中生成一个排序的候选实体列表 E E E\mathcal{E} 。我们采用三项主要指标评估模型性能:
Mean Reciprocal Rank (MRR): This metric computes the harmonic mean of reciprocal ranks across all test triples:
平均互逆秩(MRR):该指标计算所有测试三元组的互逆秩的调和平均值:

MRR = 1 | T | i = 1 | T | 1 rank i MRR = 1 | T | i = 1 | T | 1 rank i MRR=(1)/(|T|)sum_(i=1)^(|T|)(1)/(rank_(i))\operatorname{MRR}=\frac{1}{|\mathcal{T}|} \sum_{i=1}^{|\mathcal{T}|} \frac{1}{\operatorname{rank}_{i}}
where rank i rank i rank_(i)\operatorname{rank}_{i} denotes the position of the correct entity in the ranked list for the i i ii-th test triple. A higher MRR indicates better performance.
其中, rank i rank i rank_(i)\operatorname{rank}_{i} 表示在第 i i ii 个测试三元组的排序列表中正确实体的位置。MRR 值越高,性能越好。
Mean Rank (MR): This metric calculates the average rank of the correct entities across all test triples:
平均排名(MR):该指标计算所有测试三元组中正确实体的平均排名:

MR = 1 | T | i = 1 | T | rank i MR = 1 | T | i = 1 | T | rank i MR=(1)/(|T|)sum_(i=1)^(|T|)rank_(i)\mathrm{MR}=\frac{1}{|\mathcal{T}|} \sum_{i=1}^{|\mathcal{T}|} \operatorname{rank}_{i}
Table 1  表 1
Dataset statistics.  数据集统计信息。
Datasets  数据集 Entities  实体 Relations  关系 Training Triples  训练三倍 Validation Triples  验证三元组 Test Triples  测试三元组
FB15k-237 14,541 237 272,115 17,535 20,466
WN18RR 40,943 11 86,835 3,034 3,134
YAGO3-10 123,182 37 1,079,040 5,000 5,000
KINSHIP 104 25 8,544 1,068 1,074
UMLS 132 46 5,216 652 661
Datasets Entities Relations Training Triples Validation Triples Test Triples FB15k-237 14,541 237 272,115 17,535 20,466 WN18RR 40,943 11 86,835 3,034 3,134 YAGO3-10 123,182 37 1,079,040 5,000 5,000 KINSHIP 104 25 8,544 1,068 1,074 UMLS 132 46 5,216 652 661| Datasets | Entities | Relations | Training Triples | Validation Triples | Test Triples | | :--- | :--- | :--- | :--- | :--- | :--- | | FB15k-237 | 14,541 | 237 | 272,115 | 17,535 | 20,466 | | WN18RR | 40,943 | 11 | 86,835 | 3,034 | 3,134 | | YAGO3-10 | 123,182 | 37 | 1,079,040 | 5,000 | 5,000 | | KINSHIP | 104 | 25 | 8,544 | 1,068 | 1,074 | | UMLS | 132 | 46 | 5,216 | 652 | 661 |
Table 2  表 2
Results of the Link Prediction for FB15K-237 and WN18RR datasets.
FB15K-237 和 WN18RR 数据集的链接预测结果。
Model  型号 FB15K-237 WN18RR
MR MRR Hits@1  热门@1 Hits@3  热门@3 Hits@10  热门@10 MR MRR Hits@1  热门@1 Hits@3  热门@3 Hits@10  热门@10
TransE [5] 357 0.294 - - 0.465 3384 0.226 - - 0.501
DistMult [35] 254 0.241 0.155 0.263 0.419 5510 0.430 0.390 0.440 0.490
ComplEx [28] 339 0.247 0.158 0.275 0.428 5261 0.440 0.410 0.460 0.510
ConvE [9] 224 0.325 0.237 0.356 0.501 4187 0.430 0.400 0.440 0.520
RotatE [24] - 0.333 0.240 0.368 0.522 - 0.478 0.439 0.494 0.553
InteractE [29]  互动 E [29] 181 0.355 0.263 0.390 0.539 5105 0.465 0.433 0.481 0.526
M-DCN [40] - 0.345 0.255 0.380 0.528 - 0.475 0.440 0.485 0.540
JointE [41]  联合 E [41] - 0.356 0.262 0.393 0.543 - 0.471 0.438 0.483 0.537
SAttLE [1] - 0.358 0.266 0.394 0.541 - 0.476 0.442 0.490 0.540
SDFormer [16] 185 0.356 0.264 0.390 0.541 3633 0.458 0.425 0.471 0.528
MSHE [11] - 0.356 0.264 0.392 0.544 - 0.461 0.429 0.473 0.530
MGIF [15] 171 0.362 0.271 0.398 0.544 4198 0.475 0.433 0.494 0.557
MFFE 180 0.366 0.272 0.393 0.550 4217 0.470 0.431 0.487 0.545
Model FB15K-237 WN18RR MR MRR Hits@1 Hits@3 Hits@10 MR MRR Hits@1 Hits@3 Hits@10 TransE [5] 357 0.294 - - 0.465 3384 0.226 - - 0.501 DistMult [35] 254 0.241 0.155 0.263 0.419 5510 0.430 0.390 0.440 0.490 ComplEx [28] 339 0.247 0.158 0.275 0.428 5261 0.440 0.410 0.460 0.510 ConvE [9] 224 0.325 0.237 0.356 0.501 4187 0.430 0.400 0.440 0.520 RotatE [24] - 0.333 0.240 0.368 0.522 - 0.478 0.439 0.494 0.553 InteractE [29] 181 0.355 0.263 0.390 0.539 5105 0.465 0.433 0.481 0.526 M-DCN [40] - 0.345 0.255 0.380 0.528 - 0.475 0.440 0.485 0.540 JointE [41] - 0.356 0.262 0.393 0.543 - 0.471 0.438 0.483 0.537 SAttLE [1] - 0.358 0.266 0.394 0.541 - 0.476 0.442 0.490 0.540 SDFormer [16] 185 0.356 0.264 0.390 0.541 3633 0.458 0.425 0.471 0.528 MSHE [11] - 0.356 0.264 0.392 0.544 - 0.461 0.429 0.473 0.530 MGIF [15] 171 0.362 0.271 0.398 0.544 4198 0.475 0.433 0.494 0.557 MFFE 180 0.366 0.272 0.393 0.550 4217 0.470 0.431 0.487 0.545| Model | FB15K-237 | | | | | WN18RR | | | | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | | MR | MRR | Hits@1 | Hits@3 | Hits@10 | MR | MRR | Hits@1 | Hits@3 | Hits@10 | | TransE [5] | 357 | 0.294 | - | - | 0.465 | 3384 | 0.226 | - | - | 0.501 | | DistMult [35] | 254 | 0.241 | 0.155 | 0.263 | 0.419 | 5510 | 0.430 | 0.390 | 0.440 | 0.490 | | ComplEx [28] | 339 | 0.247 | 0.158 | 0.275 | 0.428 | 5261 | 0.440 | 0.410 | 0.460 | 0.510 | | ConvE [9] | 224 | 0.325 | 0.237 | 0.356 | 0.501 | 4187 | 0.430 | 0.400 | 0.440 | 0.520 | | RotatE [24] | - | 0.333 | 0.240 | 0.368 | 0.522 | - | 0.478 | 0.439 | 0.494 | 0.553 | | InteractE [29] | 181 | 0.355 | 0.263 | 0.390 | 0.539 | 5105 | 0.465 | 0.433 | 0.481 | 0.526 | | M-DCN [40] | - | 0.345 | 0.255 | 0.380 | 0.528 | - | 0.475 | 0.440 | 0.485 | 0.540 | | JointE [41] | - | 0.356 | 0.262 | 0.393 | 0.543 | - | 0.471 | 0.438 | 0.483 | 0.537 | | SAttLE [1] | - | 0.358 | 0.266 | 0.394 | 0.541 | - | 0.476 | 0.442 | 0.490 | 0.540 | | SDFormer [16] | 185 | 0.356 | 0.264 | 0.390 | 0.541 | 3633 | 0.458 | 0.425 | 0.471 | 0.528 | | MSHE [11] | - | 0.356 | 0.264 | 0.392 | 0.544 | - | 0.461 | 0.429 | 0.473 | 0.530 | | MGIF [15] | 171 | 0.362 | 0.271 | 0.398 | 0.544 | 4198 | 0.475 | 0.433 | 0.494 | 0.557 | | MFFE | 180 | 0.366 | 0.272 | 0.393 | 0.550 | 4217 | 0.470 | 0.431 | 0.487 | 0.545 |
Table 3  表 3
Results of the Link Prediction for YAGO3-10 datasets.
YAGO3-10 数据集的链接预测结果。
Model  型号 YAGO3-10
MRR Hits@1  热门@1 Hits@3  热门@3 Hits@10  热门@10
TransE [5] 0.238 0.212 0.361 0.447
DistMult [35] 0.340 0.240 0.380 0.540
ComplEx [28] 0.360 0.260 0.400 0.550
ConvE [9] 0.440 0.350 0.490 0.620
RotatE [24] 0.487 0.398 0.545 0.667
InteractE [29]  互动 E [29] 0.541 0.462 0.593 0.687
M-DCN [40] 0.505 0.423 0.587 0.682
JointE [41]  联合 E [41] 0.556 0.481 0.605 0.695
MSHE [11] 0.537 0.460 0.582 0.682
MGIF [15] 0.557 0.483 0.601 0.691
MFFE 0.562 0.495 0.610 0.689
Model YAGO3-10 MRR Hits@1 Hits@3 Hits@10 TransE [5] 0.238 0.212 0.361 0.447 DistMult [35] 0.340 0.240 0.380 0.540 ComplEx [28] 0.360 0.260 0.400 0.550 ConvE [9] 0.440 0.350 0.490 0.620 RotatE [24] 0.487 0.398 0.545 0.667 InteractE [29] 0.541 0.462 0.593 0.687 M-DCN [40] 0.505 0.423 0.587 0.682 JointE [41] 0.556 0.481 0.605 0.695 MSHE [11] 0.537 0.460 0.582 0.682 MGIF [15] 0.557 0.483 0.601 0.691 MFFE 0.562 0.495 0.610 0.689| Model | YAGO3-10 | | | | | :--- | :--- | :--- | :--- | :--- | | | MRR | Hits@1 | Hits@3 | Hits@10 | | TransE [5] | 0.238 | 0.212 | 0.361 | 0.447 | | DistMult [35] | 0.340 | 0.240 | 0.380 | 0.540 | | ComplEx [28] | 0.360 | 0.260 | 0.400 | 0.550 | | ConvE [9] | 0.440 | 0.350 | 0.490 | 0.620 | | RotatE [24] | 0.487 | 0.398 | 0.545 | 0.667 | | InteractE [29] | 0.541 | 0.462 | 0.593 | 0.687 | | M-DCN [40] | 0.505 | 0.423 | 0.587 | 0.682 | | JointE [41] | 0.556 | 0.481 | 0.605 | 0.695 | | MSHE [11] | 0.537 | 0.460 | 0.582 | 0.682 | | MGIF [15] | 0.557 | 0.483 | 0.601 | 0.691 | | MFFE | 0.562 | 0.495 | 0.610 | 0.689 |
Lower MR values indicate better performance, as they reflect that correct predictions are ranked higher on average.
较低的 MR 值表明性能更好,因为它们反映出正确预测的平均排名更高。
Hits@k: This metric measures the proportion of correct entities appearing in the top-k ranked positions:
Hits@k:该指标衡量正确实体在前 k 个排名位置中出现的比例:
Hits@k = 1 | T | i = 1 | T | ] ] ( rank i k ) = 1 | T | i = 1 | T | ] ] rank i k =(1)/(|T|)sum_(i=1)^(|T|)]](rank_(i) <= k)=\frac{1}{|\mathcal{T}|} \sum_{i=1}^{|\mathcal{T}|} \rrbracket\left(\operatorname{rank}_{i} \leq k\right)  点击量@k = 1 | T | i = 1 | T | ] ] ( rank i k ) = 1 | T | i = 1 | T | ] ] rank i k =(1)/(|T|)sum_(i=1)^(|T|)]](rank_(i) <= k)=\frac{1}{|\mathcal{T}|} \sum_{i=1}^{|\mathcal{T}|} \rrbracket\left(\operatorname{rank}_{i} \leq k\right)
where I ( ) I ( ) I(*)\mathbb{I}(\cdot) is the indicator function and k { 1 , 3 , 10 } k { 1 , 3 , 10 } k in{1,3,10}k \in\{1,3,10\}. A higher Hits@k value indicates better model performance.
其中 I ( ) I ( ) I(*)\mathbb{I}(\cdot) 是指示函数, k { 1 , 3 , 10 } k { 1 , 3 , 10 } k in{1,3,10}k \in\{1,3,10\} 表示模型在 k 值为 1 时的击中数。Hits@k 值越高,表明模型性能越好。
The evaluation is performed for both head prediction ( ? , r , e j ) ? , r , e j (?,r,e_(j))\left(?, r, e_{j}\right) and tail prediction ( e i , r , ? e i , r , ? e_(i),r,?e_{i}, r, ? ), with final results reported as the macro-average across both directions.
评估同时针对头部预测( ( ? , r , e j ) ? , r , e j (?,r,e_(j))\left(?, r, e_{j}\right) )和尾部预测( e i , r , ? e i , r , ? e_(i),r,?e_{i}, r, ? )进行,最终结果以两个方向的宏观平均值形式报告。

4.3. Implementation Details
4.3. 实现细节

The model is implemented in PyTorch and trained on an NVIDIA GeForce RTX 3090 GPU. Entity and relation embeddings are initialized randomly and optimized jointly. Hyperparameters such as embedding dimensions, batch sizes, learning rates, and dropout rates are selected via grid search on validation sets, with configurations varying across datasets. Specifically, the embedding dimensions are configured as 150 for FB15K-237 and 200 for YAGO3-10 and WN18RR, reflecting the optimal parameter settings for each respective dataset. Batch sizes are 512 for FB15k-237 and WN18RR, and 256 for YAGO3-10. Learning rates range from 5 × 10 4 5 × 10 4 5xx10^(-4)5 \times 10^{-4} to 1 × 10 3 1 × 10 3 1xx10^(-3)1 \times 10^{-3}, and dropout rates are between 0.2 and 0.5 .
该模型采用 PyTorch 实现,并在 NVIDIA GeForce RTX 3090 GPU 上进行训练。实体和关系嵌入初始化为随机值,并进行联合优化。超参数(如嵌入维度、批量大小、学习率和 dropout 率)通过在验证集上进行网格搜索选取,不同数据集的配置有所差异。具体而言,嵌入维度针对 FB15K-237 设置为 150,针对 YAGO3-10 和 WN18RR 设置为 200,以反映各自数据集的优化参数设置。批量大小为 FB15k-237 和 WN18RR 为 512,YAGO3-10 为 256。学习率范围为 5 × 10 4 5 × 10 4 5xx10^(-4)5 \times 10^{-4} 1 × 10 3 1 × 10 3 1xx10^(-3)1 \times 10^{-3} ,dropout 率在 0.2 到 0.5 之间。
To prevent overfitting, we incorporate regularization strategies: batch normalization after each convolutional layer, and L2 regularization ( λ = 10 5 λ = 10 5 lambda=10^(-5)\lambda=10^{-5} ) on relation matrices and attention parameters. The Adam optimizer and dynamic adjustments through gradient momentum is used. Label smoothing ( ϵ = 0.1 ϵ = 0.1 epsilon=0.1\epsilon=0.1 ) is applied to reduce overconfidence on ambiguous triples. Training runs for 1,000 epochs with early stopping based on validation MRR.
为了防止过拟合,我们采用了正则化策略:在每个卷积层后进行批量归一化,并对关系矩阵和注意力参数应用 L2 正则化( λ = 10 5 λ = 10 5 lambda=10^(-5)\lambda=10^{-5} )。采用 Adam 优化器并通过梯度动量进行动态调整。对模糊三元组应用标签平滑( ϵ = 0.1 ϵ = 0.1 epsilon=0.1\epsilon=0.1 )以减少对模糊三元组的过度自信。训练运行 1,000 个 epoch,基于验证集 MRR 进行早停。

4.4. Main results  4.4. 主要结果

We conduct extensive evaluations of our Multi-Feature Fusion Embedding (MFFE) model across three standard knowledge graph benchmarks. Table 2 and Table 3 present comprehensive comparisons with 12 baseline models using
我们对多特征融合嵌入(MFFE)模型在三个标准知识图谱基准数据集上进行了全面评估。表 2 和表 3 展示了与 12 个基线模型在
Table 4  表 4
Link prediction results on KINSHIP and UMLS datasets.
亲缘关系和 UMLS 数据集上的链接预测结果。
Models  模型 KINSHIP UMLS
MRR Hits@1  热门@1 Hits@3  热门@3 Hits@10  热门@10 MRR Hits@1  热门@1 Hits@3  热门@3 Hits@10  热门@10
DistMult [35] 0.685 0.553 0.766 0.943 0.924 0.879 0.962 0.995
ComplEx [28] 0.864 0.780 0.935 0.977 0.944 0.914 0.972 0.994
ConvE [9] 0.830 0.740 0.920 0.980 0.940 0.920 0.960 0.990
TuckER [2]  特克 [2] 0.603 0.462 0.698 0.863 0.732 0.625 0.812 0.909
RotatE [24] 0.651 0.504 0.755 0.932 0.744 0.636 0.822 0.939
HypER [3] 0.840 0.755 0.912 0.982 0.894 0.822 0.957 0.984
RNNLogic [6]  RNN 逻辑 [6] 0.722 0.598 0.814 0.949 0.842 0.772 0.891 0.965
RulE [26]  规则 [26] 0.736 0.615 0.824 0.957 0.867 0.797 0.925 0.972
KRACL [25] 0.895 0.817 0.970 0.991 0.904 0.831 - 0.995
MFFE 0.872 0.802 0.934 0.985 0.946 0.915 0.988 0.996
Models KINSHIP UMLS MRR Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10 DistMult [35] 0.685 0.553 0.766 0.943 0.924 0.879 0.962 0.995 ComplEx [28] 0.864 0.780 0.935 0.977 0.944 0.914 0.972 0.994 ConvE [9] 0.830 0.740 0.920 0.980 0.940 0.920 0.960 0.990 TuckER [2] 0.603 0.462 0.698 0.863 0.732 0.625 0.812 0.909 RotatE [24] 0.651 0.504 0.755 0.932 0.744 0.636 0.822 0.939 HypER [3] 0.840 0.755 0.912 0.982 0.894 0.822 0.957 0.984 RNNLogic [6] 0.722 0.598 0.814 0.949 0.842 0.772 0.891 0.965 RulE [26] 0.736 0.615 0.824 0.957 0.867 0.797 0.925 0.972 KRACL [25] 0.895 0.817 0.970 0.991 0.904 0.831 - 0.995 MFFE 0.872 0.802 0.934 0.985 0.946 0.915 0.988 0.996| Models | KINSHIP | | | | UMLS | | | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | | MRR | Hits@1 | Hits@3 | Hits@10 | MRR | Hits@1 | Hits@3 | Hits@10 | | DistMult [35] | 0.685 | 0.553 | 0.766 | 0.943 | 0.924 | 0.879 | 0.962 | 0.995 | | ComplEx [28] | 0.864 | 0.780 | 0.935 | 0.977 | 0.944 | 0.914 | 0.972 | 0.994 | | ConvE [9] | 0.830 | 0.740 | 0.920 | 0.980 | 0.940 | 0.920 | 0.960 | 0.990 | | TuckER [2] | 0.603 | 0.462 | 0.698 | 0.863 | 0.732 | 0.625 | 0.812 | 0.909 | | RotatE [24] | 0.651 | 0.504 | 0.755 | 0.932 | 0.744 | 0.636 | 0.822 | 0.939 | | HypER [3] | 0.840 | 0.755 | 0.912 | 0.982 | 0.894 | 0.822 | 0.957 | 0.984 | | RNNLogic [6] | 0.722 | 0.598 | 0.814 | 0.949 | 0.842 | 0.772 | 0.891 | 0.965 | | RulE [26] | 0.736 | 0.615 | 0.824 | 0.957 | 0.867 | 0.797 | 0.925 | 0.972 | | KRACL [25] | 0.895 | 0.817 | 0.970 | 0.991 | 0.904 | 0.831 | - | 0.995 | | MFFE | 0.872 | 0.802 | 0.934 | 0.985 | 0.946 | 0.915 | 0.988 | 0.996 |
mean rank (MR), mean reciprocal rank (MRR), and Hits @ k metrics.
平均排名(MR)、平均倒排排名(MRR)和 k 次击中数(Hits @ k)指标。
On the FB15K-237 dataset, which is characterized by complex relational patterns, our MFFE model achieves state-of-the-art performance with MRR of 0.366 and Hits @ 10 of 0.550 . These results represent relative improvements of 1.1 % 1.1 % 1.1%1.1 \% over the previous best model MGIF, which achieved an MRR of 0.362 and Hits@10 of 0.544. For YAGO3-10 containing richer factual knowledge, MFFE obtains the highest MRR (0.562) and Hits@1 (0.495), surpassing MGIF by 0.9 % 0.9 % 0.9%0.9 \% and 2.5 % 2.5 % 2.5%2.5 \% respectively. While slightly trailing JointE in Hits@ 10 ( 0.689 vs 0.695 ), our model demonstrates superior precision in top-1 predictions. On the WN18RR dataset, our MFFE model achieves competitive performance with an MRR of 0.470 and Hits@10 of 0.545. Notably, MFFE outperforms InteractE by 3.6 % 3.6 % 3.6%3.6 \% in Hits@10 ( 0.545 versus 0.526 ). However, it remains 1.7 % 1.7 % 1.7%1.7 \% below the performance of RotatE, which achieves a Hits@10 score of 0.553 .
在具有复杂关系模式的 FB15K-237 数据集上,我们的 MFFE 模型实现了最先进性能,MRR 为 0.366,Hits @ 10 为 0.550。这些结果相较于前最佳模型 MGIF(MRR 为 0.362,Hits@10 为 0.544)实现了 1.1 % 1.1 % 1.1%1.1 \% 的相对提升。在包含更丰富事实知识的 YAGO3-10 数据集上,MFFE 获得了最高的 MRR(0.562)和 Hits@1(0.495),分别比 MGIF 高出 0.9 % 0.9 % 0.9%0.9 \% 2.5 % 2.5 % 2.5%2.5 \% 。尽管在 Hits@10 指标上略微落后于 JointE(0.689 vs 0.695),但我们的模型在 Top-1 预测的精度上表现更优。在 WN18RR 数据集上,我们的 MFFE 模型实现了具有竞争力的性能,MRR 为 0.470,Hits@10 为 0.545。值得注意的是,MFFE 在 Hits@10 指标上比 InteractE 高出 3.6 % 3.6 % 3.6%3.6 \% (0.545 vs 0.526)。然而,其性能仍低于 RotatE,后者在 Hits@10 上取得了 0.553 的分数。
The proposed fusion mechanism markedly enhances performance on both FB15K-237 and YAGO3-10, outperforming the respective leading baseline models in these datasets. Specifically, this improvement underscores the model’s robustness across diverse relational patterns. The balanced performance of our model across various relation types provides compelling evidence for the efficacy of multi-feature integration.
提出的融合机制显著提升了在 FB15K-237 和 YAGO3-10 数据集上的性能,超越了这些数据集中各自的领先基线模型。具体而言,这一改进凸显了模型在多样化关系模式下的鲁棒性。模型在不同关系类型上表现出的均衡性能,为多特征融合的有效性提供了有力证据。
Based on the experimental results presented in Table 4, this study further evaluates MFFE’s performance on smallscale datasets KINSHIP and UMLS. Small-scale datasets are characterized by their ease of fitting, allowing even simple models to achieve satisfactory performance. However, this characteristic also increases the risk of overfitting for complex models, potentially compromising their generalization capability on test sets. On the KINSHIP dataset, MFFE achieves an MRR of 0.872 and Hits@1 of 0.802, outperforming other baseline models. Similarly, on the UMLS dataset, MFFE delivers competitive performance with an MRR of 0.946 . Notably, the results reveal that traditional non-neural models such as ComplEx maintain their competitiveness on small-scale datasets, indicating the continued relevance of classical approaches in data-scarce scenarios.
基于表 4 中呈现的实验结果,本研究进一步评估了 MFFE 在小型数据集 KINSHIP 和 UMLS 上的性能。小型数据集的特点是易于拟合,即使是简单的模型也能取得令人满意的性能。然而,这一特点也增加了复杂模型过拟合的风险,可能削弱其在测试集上的泛化能力。在 KINSHIP 数据集上,MFFE 实现了 0.872 的 MRR 和 0.802 的 Hits@1,优于其他基线模型。在 UMLS 数据集上,MFFE 同样展现出竞争性性能,MRR 为 0.946。值得注意的是,结果表明传统非神经网络模型如 ComplEx 在小规模数据集上仍保持竞争力,这表明经典方法在数据稀缺场景下的持续适用性。

4.5. Ablation Study  4.5. 消融研究

To validate the contribution of each module, we conducted ablation experiments on FB15k-237.
为了验证每个模块的贡献,我们在 FB15k-237 数据集上进行了消融实验。
Based on the ablation results in Table 5, we observe that removing the interaction feature module leads to a 6.8 % 6.8 % 6.8%6.8 \% drop in MRR , highlighting its importance in modeling structural patterns and semantic dependencies. Removing the multifeature fusion module causes a 4.9 % 4.9 % 4.9%4.9 \% drop in MRR, supporting the need for collaborative enhancement of heterogeneous features through dynamic gating. Even though the interaction module is the most important, removing neighborhood (MRR -2.7%) or relation-aware features (MRR -4.6%) also reduces performance, indicating that these modules capture complementary features critical for model robustness. Overall, the multi-feature fusion framework significantly improves KGE performance through modular collaboration.
根据表 5 的消融结果,我们发现移除交互特征模块导致 MRR 下降 6.8 % 6.8 % 6.8%6.8 \% ,这凸显了该模块在建模结构模式和语义依赖关系中的重要性。移除多特征融合模块导致 MRR 下降 4.9 % 4.9 % 4.9%4.9 \% ,这支持了通过动态门控对异构特征进行协同增强的必要性。尽管交互模块最为关键,但移除邻域特征(MRR -2.7%)或关系感知特征(MRR -4.6%)也会降低性能,表明这些模块捕获了对模型鲁棒性至关重要的互补特征。总体而言,多特征融合框架通过模块化协作显著提升了 KGE 性能。

4.6. Evaluation on Different Relation Types
4.6. 不同关系类型的评估

In this section, we conduct a comprehensive analysis of MFFE’s performance across various relation categories on the FB15k-237 dataset. We select FB15k-237 for this analysis due to its extensive and diverse collection of relation types. Following established methodology, we categorize relations based on the average number of tails per head and heads per tail into four distinct types: one-to-one (1-1), one-to-many (1-N), many-to-one (N-1), and many-to-many (N N ) N ) N)\mathrm{N}).
在本节中,我们对 MFFE 在 FB15k-237 数据集上的多种关系类别性能进行了全面分析。我们选择 FB15k-237 作为分析对象,是因为该数据集包含了丰富多样的关系类型。遵循既定方法论,我们根据每个头部关联的平均尾部数量以及每个尾部关联的平均头部数量,将关系分为四种类型:一对一(1-1)、一对多(1-N)、多对一(N-1)和多对多(N N ) N ) N)\mathrm{N}) )。
Table 6 presents the detailed performance analysis of different knowledge graph embedding models on relationspecific link prediction tasks. We evaluate both head entity prediction (Head Pred) and tail entity prediction (Tail Pred) across the four relation cardinality types, with results reported using Mean Reciprocal Rank (MRR) and Hits@10 metrics.
表 6 展示了不同知识图谱嵌入模型在关系特定链接预测任务中的详细性能分析。我们对四种关系基数类型下的头实体预测(Head Pred)和尾实体预测(Tail Pred)进行了评估,并使用平均逆排名(MRR)和 Hits@10 指标报告了结果。
The experimental results reveal several key insights regarding MFFE’s capability in modeling different relation complexities. For one-to-one relations, MFFE demonstrates superior performance in both head and tail prediction tasks, achieving the highest Hits@10 scores of 0.598 for both prediction directions, indicating MFFE’s effectiveness in
实验结果揭示了 MFFE 在建模不同关系复杂性方面的多个关键洞察。对于一对一关系,MFFE 在头部和尾部预测任务中均表现出优异性能,在两个预测方向上均实现了 0.598 的最高 Hits@10 分数,这表明 MFFE 在
Table 5  表 5
Ablation study on FB15K-237 datasets.
FB15K-237 数据集的消融研究。
Model Variant  型号变体 MRR Hits@1  热门@1 Hits@3  热门@3 Hits @10  热门 @10
Full Model  完整模型 0.366 0.272 0.393 0.550
w/o Neighborhood Features
不包含邻域特征
0.356 (-2.7%) 0.267 0.389 0.537
w/o Relation-aware Features
不考虑关系特征
0.349 (-4.6%) 0.261 0.381 0.526
w/o Interaction Features
不带交互功能
0.341(-6.8%) 0.254 0.372 0.517
w/o Multi-Feature Fusion
不带多特征融合
0.348 (-4.9%) 0.259 0.381 0.522
Model Variant MRR Hits@1 Hits@3 Hits @10 Full Model 0.366 0.272 0.393 0.550 w/o Neighborhood Features 0.356 (-2.7%) 0.267 0.389 0.537 w/o Relation-aware Features 0.349 (-4.6%) 0.261 0.381 0.526 w/o Interaction Features 0.341(-6.8%) 0.254 0.372 0.517 w/o Multi-Feature Fusion 0.348 (-4.9%) 0.259 0.381 0.522| Model Variant | MRR | Hits@1 | Hits@3 | Hits @10 | | :--- | :--- | :--- | :--- | :--- | | Full Model | 0.366 | 0.272 | 0.393 | 0.550 | | w/o Neighborhood Features | 0.356 (-2.7%) | 0.267 | 0.389 | 0.537 | | w/o Relation-aware Features | 0.349 (-4.6%) | 0.261 | 0.381 | 0.526 | | w/o Interaction Features | 0.341(-6.8%) | 0.254 | 0.372 | 0.517 | | w/o Multi-Feature Fusion | 0.348 (-4.9%) | 0.259 | 0.381 | 0.522 |
Table 6  表 6
Link prediction performance by relation cardinality on FB15k-237 dataset
基于关系基数在 FB15k-237 数据集上的链接预测性能
RotatE ConvE InteractE  互动 E MFFE
MRR H@10 MRR H@10 MRR H@10 MRR H@10
Head Pred  头部预测 1-1 0.498 0.593 0.374 0.505 0.386 0.547 0.495 0.598
1-N 0.092 0.174 0.091 0.170 0.106 0.192 0.115 0.216
N-1 0.471 0.674 0.444 0.644 0.466 0.647 0.472 0.641
N N N N N-N\mathrm{N}-\mathrm{N} 0.261 0.476 0.261 0.459 0.276 0.476 0.276 0.477
Tail Pred  尾部预测 1-1 0.484 0.578 0.366 0.510 0.368 0.547 0.489 0.598
1-N 0.749 0.674 0.762 0.878 0.777 0.881 0.775 0.877
N-1 0.074 0.138 0.069 0.150 0.074 0.141 0.082 0.154
N-N 0.364 0.608 0.375 0.603 0.395 0.617 0.412 0.627
RotatE ConvE InteractE MFFE MRR H@10 MRR H@10 MRR H@10 MRR H@10 Head Pred 1-1 0.498 0.593 0.374 0.505 0.386 0.547 0.495 0.598 1-N 0.092 0.174 0.091 0.170 0.106 0.192 0.115 0.216 N-1 0.471 0.674 0.444 0.644 0.466 0.647 0.472 0.641 N-N 0.261 0.476 0.261 0.459 0.276 0.476 0.276 0.477 Tail Pred 1-1 0.484 0.578 0.366 0.510 0.368 0.547 0.489 0.598 1-N 0.749 0.674 0.762 0.878 0.777 0.881 0.775 0.877 N-1 0.074 0.138 0.069 0.150 0.074 0.141 0.082 0.154 N-N 0.364 0.608 0.375 0.603 0.395 0.617 0.412 0.627| | | RotatE | | ConvE | | InteractE | | MFFE | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | | | MRR | H@10 | MRR | H@10 | MRR | H@10 | MRR | H@10 | | Head Pred | 1-1 | 0.498 | 0.593 | 0.374 | 0.505 | 0.386 | 0.547 | 0.495 | 0.598 | | | 1-N | 0.092 | 0.174 | 0.091 | 0.170 | 0.106 | 0.192 | 0.115 | 0.216 | | | N-1 | 0.471 | 0.674 | 0.444 | 0.644 | 0.466 | 0.647 | 0.472 | 0.641 | | | $\mathrm{N}-\mathrm{N}$ | 0.261 | 0.476 | 0.261 | 0.459 | 0.276 | 0.476 | 0.276 | 0.477 | | Tail Pred | 1-1 | 0.484 | 0.578 | 0.366 | 0.510 | 0.368 | 0.547 | 0.489 | 0.598 | | | 1-N | 0.749 | 0.674 | 0.762 | 0.878 | 0.777 | 0.881 | 0.775 | 0.877 | | | N-1 | 0.074 | 0.138 | 0.069 | 0.150 | 0.074 | 0.141 | 0.082 | 0.154 | | | N-N | 0.364 | 0.608 | 0.375 | 0.603 | 0.395 | 0.617 | 0.412 | 0.627 |
capturing simple, bijective relationships. In one-to-many relations, MFFE excels particularly in head prediction scenarios, obtaining the best MRR of 0.115 and Hits@10 of 0.216, while maintaining competitive performance in tail prediction with an MRR of 0.775 .
捕获简单、一一对应的关系。在一对多关系中,MFFE 在头部预测场景中表现尤为出色,实现了最佳 MRR 为 0.115 和 Hits@10 为 0.216,同时在尾部预测中保持了竞争性的性能,MRR 为 0.775。
For many-to-one relations, MFFE shows strong performance in head prediction tasks with an MRR of 0.472, while achieving the highest scores in tail prediction (MRR: 0.082, Hits@10: 0.154). Most notably, MFFE consistently outperforms baseline methods on many-to-many relations, the most complex relation type, achieving MRR scores of 0.276 and 0.412 for head and tail prediction respectively. These results demonstrate that MFFE’s multi-scale feature fusion mechanism effectively captures intricate relational patterns, enabling superior modeling of complex many-tomany relationships compared to traditional approaches.
对于多对一关系,MFFE 在头部预测任务中表现优异,MRR 值为 0.472,并在尾部预测任务中取得最高分数(MRR:0.082,Hits@10:0.154)。值得注意的是,MFFE 在多对多关系(最复杂的关系类型)上始终优于基线方法,头部预测和尾部预测的 MRR 分数分别为 0.276 和 0.412。这些结果表明,MFFE 的多尺度特征融合机制能够有效捕捉复杂的关系模式,相较于传统方法,在建模复杂多对多关系方面具有显著优势。

4.7. Parameter Efficiency
4.7. 参数效率

Fig. 4 presents the total number of parameters for each model under their optimal configurations on FB15k-237 and WN18RR. On FB15k-237, MFFE employs 13.85 million parameters-higher than ConvE’s 4.96 million, yet significantly lower than ComplEx ( 29.56 million) and RotatE ( 29.32 million). On WN18RR, MFFE expands to 19.01 million parameters, compared to ConvE’s 10.18 million, but remains considerably more compact than ComplEx (40.95 million) and RotatE ( 40.85 million).
图 4 展示了各模型在 FB15k-237 和 WN18RR 数据集上最佳配置下的参数总数。在 FB15k-237 数据集上,MFFE 使用了 1385 万个参数,高于 ConvE 的 496 万个,但显著低于 ComplEx(2956 万个)和 RotatE(2932 万个)。在 WN18RR 数据集上,MFFE 的参数数量扩展至 1901 万个,相比 ConvE 的 1018 万个有所增加,但仍显著少于 ComplEx(4095 万个)和 RotatE(4085 万个)。
While MFFE does incur a greater parameter cost than ConvE, this increase is a necessary trade-off to enable its more expressive feature fusion capabilities. Importantly, the additional parameter overhead is modest and yields tangible improvements in embedding quality without introducing excessive model complexity.
虽然 MFFE 的参数开销确实比 ConvE 更大,但这种增加是实现其更强大的特征融合能力所必需的权衡。重要的是,额外的参数开销较为有限,且能在不引入过多模型复杂性的情况下,显著提升嵌入质量。

Fig.4. Comparison of model parameter counts on FB15k-237 and WN18RR
图 4. FB15k-237 和 WN18RR 数据集上模型参数数量的比较

4.8. Convergence Analysis
4.8. 收敛性分析

As illustrated in Fig. 5, the convergence behaviors of the CNN-based models-ConvE, InteractE, and our proposed MFFE-exhibit distinct patterns. Initially, InteractE converges more slowly than ConvE during the first 20 epochs but eventually outperforms ConvE in terms of performance. More notably, our MFFE model demonstrates consistently superior convergence behavior across all evaluation metrics, including MRR, Hits@1, Hits@3, and Hits@10.
如图 5 所示,基于卷积神经网络(CNN)的模型——ConvE、InteractE 和我们提出的 MFFE——在收敛行为上呈现出明显的差异。在前 20 个 epoch 中,InteractE 的收敛速度慢于 ConvE,但最终在性能上超越了 ConvE。更值得注意的是,我们的 MFFE 模型在所有评估指标(包括 MRR、Hits@1、Hits@3 和 Hits@10)上均展现出持续优异的收敛行为。
In particular, MFFE achieves rapid convergence in the early stages of training, outperforming both ConvE and InteractE by a significant margin. This advantage is maintained throughout the training process. The rapid and stable convergence of MFFE underscores the effectiveness of its multi-feature fusion mechanism, which enables the model to integrate diverse feature representations efficiently. This capability facilitates more efficient learning and contributes
特别是,MFFE 在训练的早期阶段实现了快速收敛,显著优于 ConvE 和 InteractE。这一优势在整个训练过程中得以保持。MFFE 的快速且稳定的收敛性充分证明了其多特征融合机制的有效性,该机制使模型能够高效地整合多样化的特征表示。这一能力促进了更高效的学习,并为模型性能的提升做出了贡献。

Fig.5. Training convergence curves of MFFE, ConvE, and InteractE on the FB15K-237 dataset (MRR and Hits@1,3,10)
图 5. MFFE、ConvE 和 InteractE 在 FB15K-237 数据集上的训练收敛曲线(MRR 和 Hits@1,3,10)

to improved predictive accuracy. These empirical results strongly support the practical effectiveness and superiority of the proposed MFFE framework in knowledge embedding tasks.
以提升预测准确性。这些实证结果有力地证明了所提出的 MFFE 框架在知识嵌入任务中的实际有效性和优越性。

5. Conclusions  5. 结论

In this study, we propose a novel multi-feature fusion framework for KGE to address the limitations of existing methods in capturing high-dimensional semantics and complex structural patterns. Our approach integrates neighborhood topology, relation-aware dynamics, and entity-relation interactions through four specialized modules, enabling comprehensive representation learning for entities and relations. Experimental results demonstrate that our method outperforms state-of-the-art models on benchmark datasets. Key innovations include dynamic relation-conditioned convolution for neighborhood aggregation, multi-head attention for entity-relation interaction modeling, and dynamic bidirectional gated fusion mechanism that adaptively combine heterogeneous features. The success of our framework highlights the significance of explicit feature differentiation and fusion in KGE tasks, demonstrating the importance of combining complementary feature types-neighborhood topology, relation-aware dynamics, and entity-relation in-teractions-rather than relying on single-feature extraction
在本研究中,我们提出了一种新型的多特征融合框架,用于解决现有方法在捕获高维语义和复杂结构模式方面的局限性。我们的方法通过四个专用模块,整合了邻域拓扑、关系感知动态以及实体-关系交互,实现了对实体和关系的全面表示学习。实验结果表明,我们的方法在基准数据集上优于现有最先进模型。关键创新包括基于邻域聚合的动态关系条件卷积、用于实体-关系交互建模的多头注意力机制,以及动态双向门控融合机制,该机制可自适应地结合异构特征。本框架的成功表明,在 KGE 任务中,显式特征差异化与融合至关重要,强调了结合互补特征类型(邻域拓扑、关系感知动态及实体-关系交互)而非单纯依赖单一特征提取的必要性。

strategies. Our dynamic fusion mechanism effectively captures complex relational patterns, particularly excelling in many-to-many relationship modeling.
策略。我们的动态融合机制能够有效捕捉复杂的关系模式,尤其在多对多关系建模方面表现出色。
Despite these significant improvements, several limitations persist within the proposed model. Firstly, the computational complexity increases linearly with neighborhood depth, which may impede its application to web-scale knowledge graphs. Secondly, the current implementation does not fully leverage textual entity descriptions that could enhance semantic representations. Future work will explore lightweight attention mechanisms for scalable subgraph processing and investigate multimodal fusion using textual and image embeddings.
尽管取得了这些显著改进,提出的模型仍存在若干局限性。首先,计算复杂度随邻域深度呈线性增长,这可能阻碍其在大规模知识图谱中的应用。其次,当前实现未充分利用文本实体描述以提升语义表示能力。未来研究将探索轻量级注意力机制以实现可扩展的子图处理,并研究基于文本和图像嵌入的多模态融合方法。

Acknowledgement  致谢

This research is supported by Zhejiang Gongshang University “Digital+” Disciplinary Construction Management Project (No.SZJ2022A009) and the project of Economic Forecasting and Policy Simulation Laboratory, Zhejiang Gongshang University (No.2024SYS006).
本研究得到浙江工商大学“数字+”学科建设管理项目(编号:SZJ2022A009)和浙江工商大学经济预测与政策模拟实验室项目(编号:2024SYS006)的支持。

References  参考文献

[1] Baghershahi, P., Hosseini, R., Moradi, H.: Self-attention presents low-dimensional knowledge graph embeddings for link prediction. Knowledge-based systems 260, 110124 (2023)
[1] 巴格什哈伊, P., 侯赛尼, R., 莫拉迪, H.: 自注意机制在链接预测中生成低维知识图谱嵌入. 基于知识的系统 260, 110124 (2023)

[2] Balazevic, I., Allen, C., Hospedales, T.: Tucker: Tensor factorization for knowledge graph completion. In: 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing. pp. 5184-5193. Association for Computational Linguistics (2019)
[2] 巴拉泽维奇, I., 艾伦, C., 霍斯佩达莱斯, T.: 蒂克尔: 张量分解在知识图谱补全中的应用. 收录于: 2019 年自然语言处理经验方法会议及第 9 届自然语言处理国际联合会议. 第 5184-5193 页. 计算语言学协会 (2019)

[3] Balažević, I., Allen, C., Hospedales, T.M.: Hypernetwork knowledge graph embeddings. In: Artificial Neural Networks and Machine Learning-ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17-19, 2019, Proceedings 28. pp. 553-565. Springer (2019)
[3] 巴拉泽维奇, I., 艾伦, C., 赫斯佩达莱斯, T.M.: 超网络知识图谱嵌入.收录于:人工神经网络与机器学习-ICANN 2019:研讨会与专题会议:第 28 届国际人工神经网络会议,德国慕尼黑,2019 年 9 月 17 日至 19 日,会议论文集 28,第 553-565 页。斯普林格出版社(2019)

[4] Bi, Z., Cheng, S., Chen, J., Liang, X., Xiong, F., Zhang, N.: Relphormer: Relational graph transformer for knowledge graph representations. Neurocomputing 566, 127044 (2024)
[4] 毕, Z., 程, S., 陈, J., 梁, X., 熊, F., 张, N.: Relphormer: 关系图变换器用于知识图谱表示. 神经计算 566, 127044 (2024)

[5] Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013)
[5] 伯德斯,A.,乌苏尼耶,N.,加西亚-杜兰,A.,韦斯顿,J.,雅赫内科,O.:多关系数据建模中的嵌入转换。神经信息处理系统进展 26 (2013)

[6] Cheng, K., Liu, J., Wang, W., Sun, Y.: Rlogic: Recursive logical rule learning from knowledge graphs. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. pp. 179-189 (2022)
[6] 程, K., 刘, J., 王, W., 孙, Y.: Rlogic: 基于知识图谱的递归逻辑规则学习. 收录于: 第 28 届 ACM SIGKDD 知识发现与数据挖掘会议论文集. 第 179-189 页 (2022)

[7] Deng, W., Zhang, Y., Yu, H., Li, H.: Knowledge graph embedding based on dynamic adaptive atrous convolution and attention mechanism for link prediction. Information Processing & Management 61(3), 103642 (2024)
[7] 邓, W., 张, Y., 俞, H., 李, H.: 基于动态自适应阿特鲁斯卷积与注意力机制的知识图谱嵌入在链接预测中的应用. 信息处理与管理 61(3), 103642 (2024)

[8] Dessì, D., Osborne, F., Recupero, D.R., Buscaldi, D., Motta, E.: Generating knowledge graphs by employing natural language processing and machine learning techniques within the scholarly domain. Future Generation Computer Systems 116, 253-264 (2021)
[8] 迪西,D.,奥斯本,F.,雷库佩罗,D.R.,布斯卡尔迪,D.,莫塔,E.:利用自然语言处理和机器学习技术在学术领域生成知识图谱。未来一代计算机系统 116, 253-264 (2021)

[9] Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32 (2018)
[9] Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: 卷积 2D 知识图谱嵌入。收录于:人工智能协会年会论文集。第 32 卷(2018 年)

[10] Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers). pp. 687-696 (2015)
[10] 季, G., 何, S., 许, L., 刘, K., 赵, J.: 基于动态映射矩阵的知识图谱嵌入. 收录于: 第 53 届计算语言学协会年会暨第 7 届自然语言处理国际联合会议论文集(第 1 卷: 长文论文). 第 687-696 页 (2015)

[11] Jiang, D., Wang, R., Xue, L., Yang, J.: Multisource hierarchical neural network for knowledge graph embedding. Expert Systems with Applications 237, 121446 (2024)
[11] 江, D., 王, R., 薛, L., 杨, J.: 多源分层神经网络在知识图谱嵌入中的应用. 专家系统与应用 237, 121446 (2024)

[12] Jiang, X., Wang, Q., Wang, B.: Adaptive convolution for multirelational learning. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 978-987 (2019)
[12] 江, X., 王, Q., 王, B.: 基于自适应卷积的多关系学习. 收录于: 2019 年北美计算语言学协会会议论文集: 人类语言技术, 第 1 卷 (长文与短文). 第 978-987 页 (2019)

[13] Kok, S., Domingos, P.: Statistical predicate invention. In: Proceedings of the 24th international conference on Machine learning. pp. 433440 (2007)
[13] 柯克,S.,多明戈斯,P.:统计谓词发明。载于:第 24 届国际机器学习会议论文集。第 433–440 页(2007)

[14] Li, D., Zhu, B., Yang, S., Xu, K., Yi, M., He, Y., Wang, H.: Multi-task pre-training language model for semantic network completion. ACM Transactions on Asian and Low-Resource Language Information Processing 22(11), 1-20 (2023)
[14] 李, D., 朱, B., 杨, S., 许, K., 闫, M., 何, Y., 王, H.: 基于多任务预训练的语言模型在语义网络补全中的应用. ACM 亚洲与低资源语言信息处理交易 22(11), 1-20 (2023)

[15] Li, D., Shi, F., Wang, X., Zheng, C., Cai, Y., Li, B.: Multi-perspective knowledge graph completion with global and interaction features. Information Sciences 666, 120438 (2024)
[15] 李, D., 史, F., 王, X., 郑, C., 蔡, Y., 李, B.: 基于全局和交互特征的多视角知识图谱补全. 信息科学 666, 120438 (2024)

[16] Li, D., Xia, T., Wang, J., Shi, F., Zhang, Q., Li, B., Xiong, Y.: Sdformer: A shallow-to-deep feature interaction for knowledge graph embedding. Knowledge-Based Systems 284, 111253 (2024)
[16] 李, D., 夏, T., 王, J., 史, F., 张, Q., 李, B., 熊, Y.: Sdformer: 浅层到深层特征交互在知识图谱嵌入中的应用. 知识基于系统 284, 111253 (2024)

[17] Li, Z., Liu, F., Yang, W., Peng, S., Zhou, J.: A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems 33(12), 6999-7019 (2021)
[17] 李, Z., 刘, F., 杨, W., 彭, S., 周, J.: 卷积神经网络综述:分析、应用与展望. 电气与电子工程师学会神经网络与学习系统汇刊 33(12), 6999-7019 (2021)

[18] Li, Z., Zhang, Q., Zhu, F., Li, D., Zheng, C., Zhang, Y.: Knowledge graph representation learning with simplifying hierarchical feature propagation. Information Processing & Management 60(4), 103348 (2023)
[18] 李, Z., 张, Q., 朱, F., 李, D., 郑, C., 张, Y.: 基于简化层次特征传播的知识图谱表示学习. 信息处理与管理 60(4), 103348 (2023)

[19] Lin, X.V., Socher, R., Xiong, C.: Multi-hop knowledge graph reasoning with reward shaping. In: Proceedings of the 2018 Conference on
[19] 林, X.V., 索切尔, R., 熊, C.: 多跳知识图谱推理与奖励塑造. 收录于: 2018 年会议论文集.
Empirical Methods in Natural Language Processing. pp. 3243-3253 (2018)
自然语言处理中的经验方法。第 3243-3253 页(2018 年)

[20] Nickel, M., Tresp, V., Kriegel, H.P., et al.: A three-way model for collective learning on multi-relational data. In: Icml. vol. 11, pp. 3104482-3104584 (2011)
[20] 尼克尔,M.,特雷普,V.,克里格尔,H.P.,等:多关系数据上集体学习的三元模型。收录于:ICML,第 11 卷,第 3104482-3104584 页(2011)

[21] Schlichtkrull, M., Kipf, T.N., Bloem, P., Van Den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, proceedings 15. pp. 593-607. Springer (2018)
[21] 施利希特克鲁尔,M.,基普夫,T.N.,布洛姆,P.,范登伯格,R.,蒂托夫,I.,韦林,M.:使用图卷积网络建模关系数据。收录于:《语义网:第 15 届国际会议,ESWC 2018》,希腊克里特岛赫拉克利翁,2018 年 6 月 3 日至 7 日,会议论文集 15,第 593-607 页。斯普林格出版社(2018)

[22] Shang, B., Zhao, Y., Liu, J., Wang, D.: Mixed geometry message and trainable convolutional attention network for knowledge graph completion. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 8966-8974 (2024)
[22] 尚, B., 赵, Y., 刘, J., 王, D.: 混合几何消息与可训练卷积注意力网络在知识图谱补全中的应用. 收录于: 人工智能协会年会论文集. 第 38 卷, 第 8966-8974 页 (2024)

[23] Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web. pp. 697-706 (2007)
[23] Suchanek, F.M., Kasneci, G., Weikum, G.: Yago:语义知识的核心。载于:第 16 届国际万维网会议论文集,第 697-706 页(2007)

[24] Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: Knowledge graph embedding by relational rotation in complex space. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 (2019)
[24] 孙, Z., 邓, Z., 聂, J., 唐, J.: 旋转: 通过复杂空间中的关系旋转实现知识图谱嵌入. 收录于: 第 7 届国际学习表示会议 (ICLR 2019), 新奥尔良, 路易斯安那州, 美国, 2019 年 5 月 6 日至 9 日 (2019)

[25] Tan, Z., Chen, Z., Feng, S., Zhang, Q., Zheng, Q., Li, J., Luo, M.: Kracl: Contrastive learning with graph context modeling for sparse knowledge graph completion. In: Proceedings of the ACM web conference 2023. pp. 2548-2559 (2023)
[25] 谭, Z., 陈, Z., 冯, S., 张, Q., 郑, Q., 李, J., 罗, M.: Kracl: 基于图上下文建模的对比学习方法在稀疏知识图谱补全中的应用. 收录于: ACM 2023 年网络会议论文集. 第 2548-2559 页 (2023)

[26] Tang, X., Zhu, S.c., Liang, Y., Zhang, M.: Rule: Knowledge graph reasoning with rule embedding. In: Findings of the Association for Computational Linguistics ACL 2024. pp. 4316-4335 (2024)
[26] 唐, X., 朱, S.c., 梁, Y., 张, M.: 规则: 基于规则嵌入的知识图谱推理. 收录于: 计算语言学协会年会论文集 ACL 2024. 第 4316-4335 页 (2024)

[27] Toutanova, K., Chen, D., Pantel, P., Poon, H., Choudhury, P., Gamon, M.: Representing text for joint embedding of text and knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing. pp. 1499-1509 (2015)
[27] 托塔诺娃, K., 陈, D., 潘泰尔, P., 潘, H., 乔杜里, P., 加蒙, M.: 用于文本与知识库联合嵌入的文本表示. 收录于: 2015 年自然语言处理经验方法会议论文集. 第 1499-1509 页 (2015)

[28] Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: International conference on machine learning. pp. 2071-2080. PMLR (2016)
[28] 特鲁永, T., 韦尔布, J., 里德尔, S., 戈西耶, É., 布尚, G.: 复杂嵌入在简单链接预测中的应用. 收录于: 机器学习国际会议. 第 2071-2080 页. PMLR (2016)

[29] Vashishth, S., Sanyal, S., Nitin, V., Agrawal, N., Talukdar, P.: Interacte: Improving convolution-based knowledge graph embeddings by increasing feature interactions. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 3009-3016 (2020)
[29] 瓦希什特,S.,桑亚尔,S.,尼廷,V.,阿格拉瓦尔,N.,塔卢克达尔,P.:Interacte:通过增加特征交互提升卷积基知识图嵌入。收录于:人工智能协会年会论文集。第 34 卷,第 3009-3016 页(2020)

[30] Vashishth, S., Sanyal, S., Nitin, V., Talukdar, P.: Compositionbased multi-relational graph convolutional networks. arXiv preprint arXiv:1911.03082 (2019)
[30] 瓦希什特,S.,桑亚尔,S.,尼廷,V.,塔卢克达尔,P.:基于组成的多关系图卷积网络。arXiv 预印本 arXiv:1911.03082 (2019)

[31] Wang, B., Shen, T., Long, G., Zhou, T., Wang, Y., Chang, Y.: Structure-augmented text representation learning for efficient knowledge graph completion. In: Proceedings of the Web Conference 2021. pp. 1737-1748 (2021)
[31] 王, B., 申, T., 龍, G., 周, T., 王, Y., 張, Y.: 結構增強的文本表示學習用於高效知識圖譜補全. 收錄於: 2021 年網路會議論文集. 第 1737-1748 頁 (2021)

[32] Wang, X., He, X., Cao, Y., Liu, M., Chua, T.S.: Kgat: Knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp. 950-958 (2019)
[32] 王, X., 何, X., 曹, Y., 刘, M., 蔡, T.S.: Kgat: 基于知识图谱的注意力网络推荐系统. 收录于: 第 25 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集. 第 950-958 页 (2019)

[33] Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence. vol. 28 (2014)
[33] 王, Z., 张, J., 冯, J., 陈, Z.: 通过在超平面上翻译实现知识图谱嵌入. 收录于: 人工智能协会年会论文集. 第 28 卷 (2014)

[34] Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: Proceedings of the 26th international conference on world wide web. pp. 1271-1279 (2017)
[34] 熊, C., 鲍尔, R., 卡兰, J.: 基于知识图谱嵌入的学术搜索显式语义排序. 收录于: 第 26 届国际万维网会议论文集. 第 1271-1279 页 (2017)

[35] Yang, B., Yih, S.W.t., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: Proceedings of the International Conference on Learning Representations (ICLR) 2015 (2015)
[35] 杨, B., 尹, S.W.t., 何, X., 高, J., 邓, L.: 知识库中实体与关系嵌入的学习与推理. 收录于: 2015 年国际学习表示会议(ICLR)论文集 (2015)

[36] Yani, M., Krisnadhi, A.A.: Challenges, techniques, and trends of simple knowledge graph question answering: a survey. Information 12(7), 271 (2021)
[36] 严尼, M., 克里沙迪, A.A.: 简单知识图谱问答的挑战、技术与趋势: 一项综述. 信息 12(7), 271 (2021)

[37] Yao, L., Mao, C., Luo, Y.: Kg-bert: Bert for knowledge graph completion. arXiv preprint arXiv:1909.03193 (2019)
[37] 姚, L., 毛, C., 罗, Y.: Kg-bert: 用于知识图谱补全的 BERT 模型. arXiv 预印本 arXiv:1909.03193 (2019)

[38] Yu, D., Yang, Y., Zhang, R., Wu, Y.: Knowledge embedding based graph convolutional network. In: Proceedings of the web conference
[38] 于, D., 杨, Y., 张, R., 吴, Y.: 基于知识嵌入的图卷积网络. 在: 网页会议论文集

2021. pp. 1619-1628 (2021)
2021. 第 1619-1628 页(2021 年)

[39] Zan, S., Ji, W., Zhou, G.: Knowledge graph embeddings based on 2d convolution and self-attention mechanisms for link prediction. Applied Intelligence 55(2), 1-18 (2025)
[39] 赞, S., 季, W., 周, G.: 基于 2D 卷积和自我注意机制的知识图谱嵌入在链接预测中的应用. 应用智能 55(2), 1-18 (2025)

[40] Zhang, Z., Li, Z., Liu, H., Xiong, N.N.: Multi-scale dynamic convolutional network for knowledge graph embedding. IEEE Transactions on Knowledge and Data Engineering 34(5), 2335-2347 (2020)
[40] 张, Z., 李, Z., 刘, H., 熊, N.N.: 多尺度动态卷积网络在知识图谱嵌入中的应用. 知识与数据工程学报 34(5), 2335-2347 (2020)

[41] Zhou, Z., Wang, C., Feng, Y., Chen, D.: Jointe: Jointly utilizing 1d and 2d convolution for knowledge graph embedding. Knowledge-Based Systems 240, 108100 (2022)
[41] 周, Z., 王, C., 冯, Y., 陈, D.: Jointe: 联合利用 1D 和 2D 卷积进行知识图谱嵌入. 知识基于系统 240, 108100 (2022)