这是用户在 2025-7-8 17:56 为 https://app.immersivetranslate.com/pdf-pro/17b8b0f0-9932-42b8-9493-0e4511710f90/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Entity alignment method for aeronautical metrology domain based on multi-perspective entity embedding
基于多视角实体嵌入的航空计量领域实体对齐方法

Shengjie Kong a ^("a "){ }^{\text {a }}, Xiang Huang a, a,  ^("a, "){ }^{\text {a, }}, Shuanggao Li a ^("a "){ }^{\text {a }}, Gen Li b ^("b "){ }^{\text {b }}, Dong Zhang a ^("a "){ }^{\text {a }}
孔胜杰 a ^("a "){ }^{\text {a }} ,黄翔 a, a,  ^("a, "){ }^{\text {a, }} ,李双高 a ^("a "){ }^{\text {a }} ,李根 b ^("b "){ }^{\text {b }} ,张东 a ^("a "){ }^{\text {a }}
a a ^(a){ }^{\mathrm{a}} College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
a a ^(a){ }^{\mathrm{a}} 南京航空航天大学机电工程学院,中国南京
b b ^(b){ }^{\mathrm{b}} Suzhou Research Institute, Nanjing University of Aeronautics and Astronautics, Suzhou, China
b b ^(b){ }^{\mathrm{b}} 南京航空航天大学苏州研究院,中国苏州

A R T I C L E I N F O
文章信息

Keywords:关键词:

Aeronautical metrology航空计量学
Entity alignment实体对齐
Entity embedding实体嵌入
Knowledge graph知识图谱

Abstract摘要

The accuracy and consistency of metrology data are the cornerstones of the safety and reliability of aircraft throughout aeronautical products’ lifecycles. Due to the heterogeneous nature of metrology data derived from various sources, knowledge silos commonly emerge, complicating the integration and reuse of knowledge. This study introduces an entity alignment model leveraging multi-perspective embedding. It employs a multi-scale graph convolutional network enhanced by a gating mechanism that aggregates multi-hop neighborhood features to capture the structural embeddings of nodes. Additionally, the model utilizes TransD for representing complex relationships and BERT for capturing entity attributes, facilitating more comprehensive entity representations. Entity alignment is then accomplished by integrating structural, relational, and attribute embeddings using a weighted strategy. In this study, we conducted experimental validation on aeronautical metrology data and also assessed our proposed model on five benchmark datasets. The results indicate that our model significantly outperforms comparative models, demonstrating its potential to enhance the management and application of aeronautical metrology data.
计量数据的准确性和一致性是航空产品全生命周期内飞机安全性和可靠性的基石。由于计量数据来源多样且异构,常常产生知识孤岛,增加了知识整合和重用的难度。本研究提出了一种利用多视角嵌入的实体对齐模型。该模型采用多尺度图卷积网络,结合门控机制聚合多跳邻居特征,以捕捉节点的结构嵌入。此外,模型利用 TransD 表示复杂关系,使用 BERT 捕捉实体属性,从而实现更全面的实体表示。实体对齐通过加权策略整合结构、关系和属性嵌入来完成。本研究在航空计量数据上进行了实验验证,并在五个基准数据集上评估了所提模型的性能。 结果表明,我们的模型显著优于对比模型,展示了其在提升航空计量数据管理和应用方面的潜力。

1. Introduction1. 引言

Aviation metrology is crucial in the aviation industry, significantly impacting aircraft design, manufacturing, testing, and maintenance. Metrology is foundational for ensuring product quality by maintaining consistent, accurate, and effective performance parameters throughout the product’s lifecycle, enabling it to perform its intended tasks at any given time reliably. Given the stringent reliability and safety demands of aviation products, meticulous measurement, calibration, and management of each performance parameter are essential to maintain optimal aircraft conditions during operation [1,2].
航空计量在航空工业中至关重要,对飞机设计、制造、测试和维护有着显著影响。计量是确保产品质量的基础,通过在产品生命周期内保持一致、准确和有效的性能参数,使其能够在任何时刻可靠地执行预定任务。鉴于航空产品对可靠性和安全性的严格要求,必须对每个性能参数进行细致的测量、校准和管理,以确保飞机在运行中的最佳状态[1,2]。
With advancements in aviation technology, modern aircraft have significantly improved in function and performance, accompanied by increased complexity. The metrological traceability system, crucial for monitoring aircraft performance, has expanded to encompass the transfer of metrology parameter values across aviation products, parameters, test equipment, calibration equipment, and metrology standards [3,4]. However, data supporting the safe and reliable operation of aircraft suffer from issues such as heterogeneity, ambiguity, and redundancy arising from multiple sources. These data are typically dispersed among metrological traceability manuals and product design
随着航空技术的进步,现代飞机在功能和性能上有了显著提升,同时复杂性也随之增加。计量溯源系统作为监测飞机性能的关键,已扩展到涵盖航空产品、参数、测试设备、校准设备及计量标准之间计量参数值的传递[3,4]。然而,支持飞机安全可靠运行的数据存在异构性、模糊性和冗余性等问题,这些问题源于多种数据来源。这些数据通常分散在计量溯源手册和产品设计中。

specifications, leading to data islands due to varying formats across departments and a lack of standardized data naming conventions. Current practices in aviation product metrology heavily rely on historical manual queries and professional empirical knowledge. However, the sharing and reuse of metrological knowledge face challenges due to the prevalence of isolated and ambiguous data. This study aims to integrate heterogeneous metrological traceability data from diverse sources to eliminate ambiguity and redundant information. The goal is to establish a unified and standardized aeronautical metrological knowledge system that offers high-quality support for the digitized metrology processes of aeronautical products.
规范不统一,导致各部门因格式差异产生数据孤岛,且缺乏标准化的数据命名规范。当前航空产品计量实践高度依赖历史手工查询和专业经验知识。然而,由于数据孤立且含糊,计量知识的共享和复用面临挑战。本研究旨在整合来自多源的异构计量溯源数据,以消除歧义和冗余信息。目标是建立统一标准的航空计量知识体系,为航空产品数字化计量过程提供高质量支持。
In recent years, the burgeoning fields of big data and artificial intelligence have facilitated the widespread adoption of knowledge graph (KG) across various domains, including healthcare and geography [5,6]. This technology has also introduced innovative approaches to the aviation industry. The aeronautical metrology knowledge graph (AMKG) organically organizes the concepts and their relationships in the metrology traceability system in the form of a graph structure, which lays the foundation for the construction of an intelligent system of aviation metrology [7]. Multi-source heterogeneous aeronautical metrology data will form multiple knowledge graphs for different
近年来,蓬勃发展的大数据和人工智能领域促进了知识图谱(KG)在包括医疗保健和地理学等多个领域的广泛应用[5,6]。该技术还为航空业引入了创新的方法。航空计量知识图谱(AMKG)以图结构的形式有机地组织了计量溯源系统中的概念及其关系,为构建智能航空计量系统奠定了基础[7]。多源异构的航空计量数据将形成多个针对不同领域的知识图谱。
scenarios, and there is a large amount of overlapping and complementary knowledge between these knowledge graphs. Knowledge fusion integrates disparate knowledge across multiple knowledge graphs [8,9]. A crucial component of this process, entity alignment (EA), involves determining the equivalence of entities across different knowledge graphs [ 10 , 11 ] [ 10 , 11 ] [10,11][10,11]. The precision of entity alignment significantly impacts the construction quality of AMKG and, in turn, affects the metrology reliability of aviation products.
场景之间存在大量重叠和互补的知识,这些知识分布在多个知识图谱中。知识融合整合了跨多个知识图谱的不同知识[8,9]。这一过程的关键组成部分是实体对齐(EA),即确定不同知识图谱中实体的等价性 [ 10 , 11 ] [ 10 , 11 ] [10,11][10,11] 。实体对齐的准确性显著影响 AMKG 的构建质量,进而影响航空产品的计量可靠性。
Entity alignment methods are typically categorized into three types: similarity-based, knowledge graph embedding-based, and graph neural network-based. These methods have demonstrated effectiveness in aligning entities across cross-lingual knowledge graphs [12-14]. However, the entity alignment of AMKG presents distinct challenges that differ from those encountered in cross-lingual KGs. 1) AMKG encompasses numerous homonymous entities that necessitate differentiation through structural embedding. 2) AMKG features a variety of 1 N , N 1 1 N , N 1 1-N,N-11-\mathrm{N}, \mathrm{N}-1, and N-N metrological traceability relationships, requiring differentiation via relational embedding. 3) The entities in AMKG also possess extensive and significant attribute information, which demands distinction through attribute embedding. To address these challenges, this paper presents an entity alignment method tailored for the aeronautical metrology domain, based on a multi-perspective entity embedding approach. The main contributions of this study are summarized as follows:
实体对齐方法通常分为三类:基于相似度的、基于知识图谱嵌入的和基于图神经网络的。这些方法已在跨语言知识图谱的实体对齐中展现出有效性[12-14]。然而,航空计量知识图谱(AMKG)的实体对齐面临着与跨语言知识图谱不同的独特挑战。1)AMKG 包含大量同名实体,需要通过结构嵌入进行区分。2)AMKG 具有多种 1 N , N 1 1 N , N 1 1-N,N-11-\mathrm{N}, \mathrm{N}-1 和 N-N 计量溯源关系,需通过关系嵌入加以区分。3)AMKG 中的实体还拥有丰富且重要的属性信息,需通过属性嵌入进行区分。为应对这些挑战,本文提出了一种基于多视角实体嵌入的航空计量领域实体对齐方法。本研究的主要贡献总结如下:
  1. A multi-perspective embedded entity alignment model called MEEA is proposed. The model learns entity representations from three perspectives: structural, relational, and attribute embedding, to realize entity alignment in the field of aeronautical metrology.
    提出了一种多视角嵌入实体对齐模型 MEEA。该模型从结构、关系和属性嵌入三个视角学习实体表示,以实现航空计量领域的实体对齐。
  2. A multi-scale graph convolutional neural network (MGCN) is proposed to aggregate multi-hop neighbor information and use a gating mechanism to reduce the effect of noisy features.
    提出了一种多尺度图卷积神经网络(MGCN),用于聚合多跳邻居信息,并利用门控机制减少噪声特征的影响。
  3. The proposed MEEA is evaluated on five benchmark datasets and an aeronautical metrology dataset, and the experimental results demonstrate the superiority of MEEA and the effectiveness of each module.
    在五个基准数据集和一个航空计量数据集上对所提 MEEA 进行了评估,实验结果证明了 MEEA 的优越性及各模块的有效性。
Entity alignment is a crucial task in knowledge fusion, primarily involving three methodologies: similarity-based, knowledge graph embedding-based, and graph neural network-based methods.
实体对齐是知识融合中的一项关键任务,主要涉及三种方法:基于相似度的方法、基于知识图谱嵌入的方法和基于图神经网络的方法。

2.1. Similarity-based entity alignment method
2.1. 基于相似度的实体对齐方法

One of the earliest approaches to entity alignment involves similarity-based methods. These methods determine whether two entities align by calculating their similarity. Techniques include string similarity, which compares features like character composition and word spelling [15], and graph similarity, which evaluates the similarity between neighboring nodes within the entities’ respective subgraphs [16].
实体对齐的最早方法之一是基于相似度的方法。这些方法通过计算两个实体的相似度来判断它们是否对齐。技术包括字符串相似度,比较字符组成和单词拼写等特征[15],以及图相似度,评估实体各自子图中邻近节点之间的相似性[16]。
Scharffe et al. [17] developed the RDF-AI entity alignment framework, comprising five modules: preprocessing, matching, fusion, linking, and post-processing. Notably, the matching module utilizes a fuzzy string matching algorithm [18] that significantly enhances the efficiency of string similarity comparisons. Pershina et al. [16] introduced the HolisticEM model for entity alignment. Initially, a graph of potentially matching entity pairs was constructed by assessing word correlations using Inverse Document Frequency (IDF) [19]. The model then facilitated entity alignment in large-scale knowledge graphs through the calculation of graph similarity, employing a personalized webpage ranking algorithm. However, similarity-based entity alignment methods exhibit several weaknesses. Primarily, these approaches depend on predefined rules and similarity metrics, which hinder their ability to effectively handle the intricate diversity of entity representations within the domain of aeronautical metrology. Additionally, they are
Scharffe 等人[17]开发了 RDF-AI 实体对齐框架,该框架包含五个模块:预处理、匹配、融合、链接和后处理。值得注意的是,匹配模块采用了一种模糊字符串匹配算法[18],显著提升了字符串相似度比较的效率。Pershina 等人[16]提出了 HolisticEM 实体对齐模型。最初,通过使用逆文档频率(IDF)[19]评估词语相关性,构建了潜在匹配实体对的图。该模型随后通过计算图相似度,利用个性化网页排名算法,实现了大规模知识图谱中的实体对齐。然而,基于相似度的实体对齐方法存在若干不足。首先,这些方法依赖预定义规则和相似度度量,限制了其有效处理航空计量领域内实体表示复杂多样性的能力。此外,它们还存在...

particularly susceptible to noise and lack robustness, which becomes evident when aligning metrological equipment entities with variant nomenclatures. Consequently, these limitations render similarity-based methods less effective for knowledge fusion tasks in aeronautical metrology.
特别容易受到噪声影响且缺乏鲁棒性,这在对具有不同命名的计量设备实体进行对齐时尤为明显。因此,这些限制使得基于相似度的方法在航空计量的知识融合任务中效果较差。

2.2. Knowledge graph embedding-based entity alignment method
2.2. 基于知识图谱嵌入的实体对齐方法

The knowledge graph embedding-based method involves embedding entities and relations into a continuous vector space, where entities with similar characteristics are positioned closely. This method computes the vector representation of entities and relations to determine their similarity, facilitating entity alignment. The TransE model [20] represents a seminal approach to representation learning for translations and marks a significant milestone in knowledge graph embedding research. Its core concept revolves around treating relationships as vector transformations within the same vector space, mapping head entities to tail entities.
基于知识图谱嵌入的方法涉及将实体和关系嵌入到连续的向量空间中,其中具有相似特征的实体被定位得较近。该方法通过计算实体和关系的向量表示来确定它们的相似度,从而实现实体对齐。TransE 模型[20]是翻译表示学习的开创性方法,也是知识图谱嵌入研究的重要里程碑。其核心思想是将关系视为同一向量空间内的向量变换,将头实体映射到尾实体。
Translation-based representation learning offers advantages, including a small number of parameters and concise representations. Consequently, numerous entity alignment models employing knowledge graph embeddings have been developed. For instance, the RotatE model [21] conceptualizes each relationship as a rotation from the source entity to the target entity within a complex vector space, thereby enhancing the model’s capacity to capture inter-entity relationships. The HAKE model [22] situates entities within a polar coordinate system to effectively depict their semantic hierarchies. The ConvE model [23] employs multilayer convolution to elucidate the semantic relationships between entity pairs, facilitating efficient entity alignment. In addition to the previously discussed translation-based methods, semantic matching approaches also play a crucial role in the realm of knowledge graph representation learning. The MtransE model [24] employs the TransE algorithm to learn embeddings for entities in multiple languages, facilitating cross-lingual entity alignment. The BootEA model [25] enhances alignment model performance through the integration of a bootstrapping strategy and self-supervised learning. Additionally, the TransEdge model [26] addresses the challenge of entity matching across graph edges by incorporating unique edge embedding vectors. Zhu et al. [27] summarized the graph embedding-based entity alignment approach, introduced a new EA framework with modules for information aggregation, entity alignment, and post-alignment, and validated its effectiveness through experiments. Tian et al. [28] proposed the ExEA framework to generate explanations for interpreting and correcting embedding-based EA results. This method constructs matching subgraphs and alignment dependency graphs to explain and resolve conflicts in entity alignment. Guo et al. [29] proposed a method for entity alignment in the civil aviation domain using translation embeddings. This method separately embeds the semantic descriptions, relations, and attributes of entities, offering advantages for entity alignment tasks within this domain. Wang et al. [30] introduced an entity alignment method based on a Siamese network and multi-attribute importance features. By incorporating a font pre-training model and a character feature layer with a multi-attribute attention mechanism, this method enhances the completeness and reliability of the civil aviation knowledge graph. However, the above models struggle to effectively represent many-to-many metrological traceability relationships in their generation of entity relationship vectors. Furthermore, these models face challenges in fully capturing the semantic information of multi-attribute aeronautical metrology entities, thereby constraining their performance.
基于翻译的表示学习具有参数少、表示简洁等优势。因此,许多采用知识图谱嵌入的实体对齐模型被提出。例如,RotatE 模型[21]将每个关系概念化为从源实体到目标实体在复数向量空间中的旋转,从而增强了模型捕捉实体间关系的能力。HAKE 模型[22]将实体置于极坐标系中,有效地描绘了它们的语义层次结构。ConvE 模型[23]采用多层卷积来阐释实体对之间的语义关系,促进了高效的实体对齐。除了上述基于翻译的方法,语义匹配方法在知识图谱表示学习领域也起着关键作用。MtransE 模型[24]利用 TransE 算法学习多语言实体的嵌入,促进跨语言实体对齐。 BootEA 模型[25]通过引入自举策略和自监督学习提升了对齐模型的性能。此外,TransEdge 模型[26]通过引入独特的边嵌入向量,解决了跨图边的实体匹配问题。Zhu 等人[27]总结了基于图嵌入的实体对齐方法,提出了包含信息聚合、实体对齐和后对齐模块的新型 EA 框架,并通过实验验证了其有效性。Tian 等人[28]提出了 ExEA 框架,用于生成解释以解读和纠正基于嵌入的实体对齐结果。该方法构建匹配子图和对齐依赖图,以解释和解决实体对齐中的冲突。Guo 等人[29]提出了一种基于翻译嵌入的民航领域实体对齐方法,该方法分别嵌入实体的语义描述、关系和属性,针对该领域的实体对齐任务具有优势。Wang 等人[30]引入了一种基于孪生网络和多属性重要性特征的实体对齐方法。 通过引入字体预训练模型和带有多属性注意力机制的字符特征层,该方法增强了民航知识图谱的完整性和可靠性。然而,上述模型在生成实体关系向量时难以有效表示多对多的计量溯源关系。此外,这些模型在充分捕捉多属性航空计量实体的语义信息方面存在挑战,从而限制了其性能。

2.3. Graph neural network-based entity alignment method
2.3. 基于图神经网络的实体对齐方法

Graph neural network (GNN) approaches leverage graph models to represent entities and their relationships within a knowledge graph. By aggregating neighboring entities, these methods enhance entity embeddings, allowing them to effectively capture both local and global
图神经网络(GNN)方法利用图模型来表示知识图谱中的实体及其关系。通过聚合邻近实体,这些方法增强了实体嵌入,使其能够有效捕捉局部和全局

structural node information. Consequently, GNNs can more accurately identify similar entities across different knowledge graphs.
结构节点信息。因此,GNN 能够更准确地识别不同知识图谱中的相似实体。
The GCN-Align model [31], an early entity alignment model, leverages graph convolutional networks to embed entities from diverse languages into a unified vector space. It integrates structural and attribute embeddings to facilitate cross-language entity alignment. The NAEA model [32] uses a neighborhood attention mechanism to assign varying attention weights to entity neighbors, thereby learning entity representations through the aggregation of neighborhood information. Chen et al. [33] proposed a high-order graph neural network HOLI-GNN for knowledge graph entity alignment. This model addresses the issue of excessive smoothing from neighborhood aggregation by introducing a local inflation mechanism. Song et al. [34] developed a weakly supervised group mask network (WSGMN), which enhances target object recognition by integrating contextual information through community instance generation. Recently, unsupervised and self-supervised methods have been introduced to reduce reliance on labeled data. Zeng et al. [35] proposed MultiEM, an unsupervised approach for multitable entity matching. This model improves matching efficiency and effectiveness through enhanced entity representation, hierarchical merging, and density-based pruning. Ge et al. [36] introduced a selfsupervised entity matching framework, CollaborEM, which performs entity matching without manual labeling by leveraging multi-feature collaboration. This framework effectively identifies tuple features in a fault-tolerant manner, generating reliable matching results. Additionally, methods to optimize entity embeddings and neighborhood aggregation are proposed to enhance entity alignment performance further. The GRGCN model [37] integrates entity structure, attributes, and attention mechanisms via multilevel learning, assigning varied weights to each node’s neighbors to enhance spatial information capture. The AliNet model [38] employs a gating mechanism to assimilate multi-hop neighborhood information of entities, simultaneously diminishing noise from distant neighbors using an attention mechanism. The DvGNet model [39] addresses both entity and relational interactions, reducing the impact of structural heterogeneity on entity alignment. Given the notable benefits of such methods in enriching entity neighbor information, this study employs GNN to facilitate the structural embedding of nodes. Distinctively, this paper introduces a multi-scale node aggregation strategy designed to improve the aggregation of neighbor information for long-tailed entities.
GCN-Align 模型[31]是一种早期的实体对齐模型,利用图卷积网络将来自不同语言的实体嵌入到统一的向量空间中。它整合了结构和属性嵌入,以促进跨语言实体对齐。NAEA 模型[32]采用邻域注意力机制,为实体邻居分配不同的注意力权重,从而通过聚合邻域信息学习实体表示。Chen 等人[33]提出了一种高阶图神经网络 HOLI-GNN 用于知识图谱实体对齐。该模型通过引入局部膨胀机制,解决了邻域聚合导致的过度平滑问题。Song 等人[34]开发了一种弱监督群组掩码网络(WSGMN),通过社区实例生成整合上下文信息,增强目标对象识别能力。近年来,出现了无监督和自监督方法,以减少对标注数据的依赖。Zeng 等人[35]提出了 MultiEM,一种用于多表实体匹配的无监督方法。 该模型通过增强实体表示、分层合并和基于密度的剪枝,提高了匹配的效率和效果。Ge 等人[36]提出了一个自监督实体匹配框架 CollaborEM,该框架通过多特征协作实现无人工标注的实体匹配。该框架以容错方式有效识别元组特征,生成可靠的匹配结果。此外,还提出了优化实体嵌入和邻域聚合的方法,以进一步提升实体对齐性能。GRGCN 模型[37]通过多层次学习整合实体结构、属性和注意力机制,为每个节点的邻居分配不同权重,以增强空间信息的捕获能力。AliNet 模型[38]采用门控机制融合实体的多跳邻域信息,同时利用注意力机制减少远邻的噪声影响。DvGNet 模型[39]同时处理实体和关系交互,降低结构异质性对实体对齐的影响。 鉴于此类方法在丰富实体邻居信息方面的显著优势,本研究采用 GNN 来促进节点的结构嵌入。本文独特地引入了一种多尺度节点聚合策略,旨在提升长尾实体邻居信息的聚合效果。

3. Methodology3. 方法论

This section first introduces foundational concepts related to knowledge graphs and entity alignment. Subsequently, it details the general architecture and individual components of the proposed model.
本节首先介绍与知识图谱和实体对齐相关的基础概念。随后,详细说明所提模型的一般架构及各个组成部分。

3.1. Preliminaries3.1. 基础知识

The AMKG expresses the concepts and their relationships in the metrological traceability system in terms of several triples, which can be defined as A M K G = ( E , R , A , T R , V ) A M K G = E , R , A , T R , V AMKG=(E,R,A,T^(R),V)A M K G=\left(E, R, A, T^{R}, V\right). where E , R , A E , R , A E,R,AE, R, A denotes the set of entities, relationships, and attributes, respectively, T R T R T^(R)T^{R} denotes the relationship ternary, and V V VV denotes the attribute value of an entity. Each ternary t k = ( e i , r i j , e j ) t k = e i , r i j , e j t_(k)=(e_(i),r_(ij),e_(j))t_{k}=\left(e_{i}, r_{i j}, e_{j}\right) denotes the relationship between the head entity e i e i e_(i)e_{i} for the tail entity e j e j e_(j)e_{j} as r i j r i j r_(ij)r_{i j}, where t T , e E , r R t T , e E , r R t in T,e in E,r in Rt \in T, e \in E, r \in R.
AMKG 以多个三元组的形式表达计量溯源系统中的概念及其关系,这些三元组可以定义为 A M K G = ( E , R , A , T R , V ) A M K G = E , R , A , T R , V AMKG=(E,R,A,T^(R),V)A M K G=\left(E, R, A, T^{R}, V\right) ,其中 E , R , A E , R , A E,R,AE, R, A 分别表示实体、关系和属性的集合, T R T R T^(R)T^{R} 表示关系三元组, V V VV 表示实体的属性值。每个三元组 t k = ( e i , r i j , e j ) t k = e i , r i j , e j t_(k)=(e_(i),r_(ij),e_(j))t_{k}=\left(e_{i}, r_{i j}, e_{j}\right) 表示头实体 e i e i e_(i)e_{i} 与尾实体 e j e j e_(j)e_{j} 之间的关系为 r i j r i j r_(ij)r_{i j} ,其中 t T , e E , r R t T , e E , r R t in T,e in E,r in Rt \in T, e \in E, r \in R
The goal of entity alignment is to find the set of remaining aligned entity pairs based on A M K G 1 = ( E 1 , R 1 , A 1 , T 1 R , V 1 ) A M K G 1 = E 1 , R 1 , A 1 , T 1 R , V 1 AMKG_(1)=(E_(1),R_(1),A_(1),T_(1)^(R),V_(1))A M K G_{1}=\left(E_{1}, R_{1}, A_{1}, T_{1}^{R}, V_{1}\right) and A M K G 2 = ( E 2 A M K G 2 = E 2 AMKG_(2)=(E_(2):}A M K G_{2}=\left(E_{2}\right., R 2 , A 2 , T 2 R , V 2 R 2 , A 2 , T 2 R , V 2 R_(2),A_(2),T_(2)^(R),V_(2)R_{2}, A_{2}, T_{2}^{R}, V_{2} ), based on a given seed set of pre-aligned entity pair t = t = t=t= ( e i 1 , e i 2 ) e i 1 , e i 2 (e_(i)^(1),e_(i)^(2))\left(e_{i}^{1}, e_{i}^{2}\right) where e i 1 A M K G 1 , e i 2 A M K G 2 e i 1 A M K G 1 , e i 2 A M K G 2 e_(i)^(1)in AMKG_(1),e_(i)^(2)in AMKG_(2)e_{i}^{1} \in A M K G_{1}, e_{i}^{2} \in A M K G_{2}.
实体对齐的目标是在给定预先对齐实体对的种子集 t = t = t=t= ( e i 1 , e i 2 ) e i 1 , e i 2 (e_(i)^(1),e_(i)^(2))\left(e_{i}^{1}, e_{i}^{2}\right) ,基于 A M K G 1 = ( E 1 , R 1 , A 1 , T 1 R , V 1 ) A M K G 1 = E 1 , R 1 , A 1 , T 1 R , V 1 AMKG_(1)=(E_(1),R_(1),A_(1),T_(1)^(R),V_(1))A M K G_{1}=\left(E_{1}, R_{1}, A_{1}, T_{1}^{R}, V_{1}\right) A M K G 2 = ( E 2 A M K G 2 = E 2 AMKG_(2)=(E_(2):}A M K G_{2}=\left(E_{2}\right. R 2 , A 2 , T 2 R , V 2 R 2 , A 2 , T 2 R , V 2 R_(2),A_(2),T_(2)^(R),V_(2)R_{2}, A_{2}, T_{2}^{R}, V_{2} ,找到剩余对齐实体对的集合,其中 e i 1 A M K G 1 , e i 2 A M K G 2 e i 1 A M K G 1 , e i 2 A M K G 2 e_(i)^(1)in AMKG_(1),e_(i)^(2)in AMKG_(2)e_{i}^{1} \in A M K G_{1}, e_{i}^{2} \in A M K G_{2}

3.2. Framework3.2. 框架

This paper introduces the Multi-perspective Embedding-based Entity Alignment (MEEA) model, designed to enhance the integrity of AMKG.
本文介绍了多视角嵌入实体对齐(MEEA)模型,旨在提升 AMKG 的完整性。
By removing ambiguous and redundant entities and integrating diverse aeronautical metrological knowledge from various sources, MEEA ensures the quality of the knowledge graph. Fig. 1 illustrates the model’s architecture, which comprises three main components: the input AMKGs, multi-perspective embedding, and entity alignment. Firstly, the inputs consist of two distinct AMKGs, denoted as A M K G 1 A M K G 1 AMKG_(1)A M K G_{1} and A M K G 2 A M K G 2 AMKG_(2)A M K G_{2}. In A M K G 1 A M K G 1 AMKG_(1)A M K G_{1}, semantic information regarding an entity e i 1 e i 1 e_(i)^(1)e_{i}^{1} is acquired from three perspectives: structure, relation, and attribute. The structure embedding involves employing a multi-scale GNN to amalgamate multihop neighbor information about e i 1 e i 1 e_(i)^(1)e_{i}^{1}. Notably, a gating mechanism is employed to amalgamate the hidden and output layers of the network, thereby yielding a more precise representation of the entity. The relation embedding is based on the TransD model [40]. The attribute embedding employs the BERT model to allocate attention weights to each attribute name and value of entity e i 1 e i 1 e_(i)^(1)e_{i}^{1}. The same method applies to entity e i 2 e i 2 e_(i)^(2)e_{i}^{2} in A M K G 2 A M K G 2 AMKG_(2)A M K G_{2}. For entity alignment, the alignment between entities e i 1 e i 1 e_(i)^(1)e_{i}^{1} and e i 2 e i 2 e_(i)^(2)e_{i}^{2} is determined by calculating the cosine distance between their final embedding vectors.
通过去除模糊和冗余实体,并整合来自多种来源的多样化航空气象知识,MEEA 确保了知识图谱的质量。图 1 展示了模型的架构,包含三个主要部分:输入的 AMKGs、多视角嵌入和实体对齐。首先,输入由两个不同的 AMKGs 组成,分别表示为 A M K G 1 A M K G 1 AMKG_(1)A M K G_{1} A M K G 2 A M K G 2 AMKG_(2)A M K G_{2} 。在 A M K G 1 A M K G 1 AMKG_(1)A M K G_{1} 中,关于实体 e i 1 e i 1 e_(i)^(1)e_{i}^{1} 的语义信息从结构、关系和属性三个视角获取。结构嵌入采用多尺度 GNN 融合关于 e i 1 e i 1 e_(i)^(1)e_{i}^{1} 的多跳邻居信息。值得注意的是,采用门控机制融合网络的隐藏层和输出层,从而生成更精确的实体表示。关系嵌入基于 TransD 模型[40]。属性嵌入采用 BERT 模型为实体 e i 1 e i 1 e_(i)^(1)e_{i}^{1} 的每个属性名称和值分配注意力权重。相同方法适用于 A M K G 2 A M K G 2 AMKG_(2)A M K G_{2} 中的实体 e i 2 e i 2 e_(i)^(2)e_{i}^{2} 。对于实体对齐,通过计算实体 e i 1 e i 1 e_(i)^(1)e_{i}^{1} e i 2 e i 2 e_(i)^(2)e_{i}^{2} 最终嵌入向量之间的余弦距离来确定它们的对齐关系。

3.3. Structure embedding
3.3. 结构嵌入

3.3.1. Multi-scale node aggregation
3.3.1. 多尺度节点聚合

The core idea of GNN is to aggregate information from neighboring nodes to update the representation of the central node. In the messagepassing phase, those entities with more neighboring nodes are called key nodes because they can aggregate a large amount of information. On the contrary, those entities that appear less frequently and have fewer neighboring nodes are called long-tail entities. Long-tail entities are prevalent in AMKGs. These entities typically include rare and specialized metrology equipment and specific metrology protocols, such as ultra-low-temperature gas flow standard devices and turbine flowmeter protocols. Despite their infrequent appearance in AMKGs, they are crucial for ensuring the performance quality of specific aerospace products.
GNN 的核心思想是从邻居节点聚合信息以更新中心节点的表示。在消息传递阶段,邻居节点较多的实体被称为关键节点,因为它们可以聚合大量信息。相反,出现频率较低且邻居节点较少的实体被称为长尾实体。长尾实体在 AMKG 中普遍存在。这些实体通常包括稀有且专业的计量设备和特定的计量协议,例如超低温气体流量标准装置和涡轮流量计协议。尽管它们在 AMKG 中出现频率较低,但对于确保特定航天产品的性能质量至关重要。
The limitations of traditional GNNs, such as graph convolutional networks (GCN) [41] and graph attention networks (GAT) [42], in handling long-tailed entities are significant. In a single-layer network, as shown in Fig. 2, entity e is connected to only one neighbor, d, resulting in limited information aggregation. While increasing the number of network layers can introduce additional neighbor information, it also leads to over-smoothing. After multiple layers of aggregation, the representations of long-tailed entities become similar to other entities, reducing their distinctiveness. To tackle this limitation, this paper introduces the MGCN model. The model adeptly integrates an attention mechanism, which assigns varied weights to each neighboring node according to their relative importance. This approach is particularly effective when addressing entities with a long-tail entity e. By employing an expanded convolution kernel, the model efficiently aggregates the pertinent features from two-hop neighbors, nodes c and f. During the iterative propagation process of MGCN, the rich information accumulated by long-tail entities can be disseminated to other entities, thereby enhancing the depth and breadth of AMKG information circulation. The target nodes are able to effectively learn the deep knowledge representation embedded in complex structures, which facilitates entity alignment in AMKG.
传统图神经网络(GNN),如图卷积网络(GCN)[41]和图注意力网络(GAT)[42],在处理长尾实体时存在显著局限性。如图 2 所示,在单层网络中,实体 e 仅与一个邻居 d 相连,导致信息聚合有限。虽然增加网络层数可以引入更多邻居信息,但也会导致过度平滑。经过多层聚合后,长尾实体的表示变得与其他实体相似,降低了其区分度。为了解决这一限制,本文引入了 MGCN 模型。该模型巧妙地整合了注意力机制,根据邻居节点的相对重要性赋予不同权重。这种方法在处理长尾实体 e 时尤为有效。通过采用扩展卷积核,模型能够高效地聚合来自二跳邻居节点 c 和 f 的相关特征。 在 MGCN 的迭代传播过程中,长尾实体积累的丰富信息可以传播到其他实体,从而增强 AMKG 信息流通的深度和广度。目标节点能够有效学习嵌入复杂结构中的深层知识表示,这有助于 AMKG 中的实体对齐。
For an entity e i e i e_(i)e_{i} in KG, its embedding in layer l l ll can be computed as:
对于 KG 中的实体 e i e i e_(i)e_{i} ,其在第 l l ll 层的嵌入可以计算为:

h e i l = σ ( e j N e i r k R i j α i j k l 1 W r k h e j l 1 ) h e i l = σ e j N e i r k R i j α i j k l 1 W r k h e j l 1 h_(e_(i))^(l)=sigma(sum_(e_(j)inN_(e_(i)))sum_(r_(k)inR_(ij))alpha_(ijk)^(l-1)W_(r_(k))h_(e_(j))^(l-1))h_{e_{i}}^{l}=\sigma\left(\sum_{e_{j} \in N_{e_{i}}} \sum_{r_{k} \in R_{i j}} \alpha_{i j k}^{l-1} W_{r_{k}} h_{e_{j}}^{l-1}\right)
where σ ( ) σ ( ) sigma(*)\sigma(\cdot) is the activation function of LeakyReLU ( ) , N e i LeakyReLU ( ) , N e i LeakyReLU(*),N_(e_(i))\operatorname{LeakyReLU}(\cdot), N_{e_{i}} is the neighboring nodes of e i , R i j e i , R i j e_(i),R_(ij)e_{i}, R_{i j} is all the relations between e i e i e_(i)e_{i} and e j , r k e j , r k e_(j),r_(k)e_{j}, r_{k} is the k k kk th edge of e i e i e_(i)e_{i} and e j , W r k e j , W r k e_(j),W_(r_(k))e_{j}, W_{r_{k}} is an orthogonal matrix, which guarantees the norms and the relative distances of entities remain unchanged after transformation, and α i j k l 1 α i j k l 1 alpha_(ijk)^(l-1)\alpha_{i j k}^{l-1} is the attention coefficient when aggregating
其中 σ ( ) σ ( ) sigma(*)\sigma(\cdot) 是激活函数, LeakyReLU ( ) , N e i LeakyReLU ( ) , N e i LeakyReLU(*),N_(e_(i))\operatorname{LeakyReLU}(\cdot), N_{e_{i}} e i , R i j e i , R i j e_(i),R_(ij)e_{i}, R_{i j} 的邻居节点, e i e i e_(i)e_{i} e j , r k e j , r k e_(j),r_(k)e_{j}, r_{k} 之间的所有关系是 k k kk e i e i e_(i)e_{i} 的第 k k kk 条边是 e j , W r k e j , W r k e_(j),W_(r_(k))e_{j}, W_{r_{k}} e j , W r k e j , W r k e_(j),W_(r_(k))e_{j}, W_{r_{k}} 是一个正交矩阵,保证变换后实体的范数和相对距离保持不变, α i j k l 1 α i j k l 1 alpha_(ijk)^(l-1)\alpha_{i j k}^{l-1} 是在聚合时的注意力系数。

Fig. 1. Overview of MEEA model.
图 1. MEEA 模型概述。

Fig. 2. Multi-scale node aggregation process.
图 2. 多尺度节点聚合过程。

different neighboring nodes, calculated as:
不同邻居节点,计算公式为:

α i j k l 1 = Softmax k ( a T ( W r k h e i l 1 W r k h e j l 1 h r k ) ) α i j k l 1 = Softmax k a T W r k h e i l 1 W r k h e j l 1 h r k alpha_(ijk)^(l-1)=Softmax_(k)(a^(T)(W_(r_(k))h_(e_(i))^(l-1)o+W_(r_(k))h_(e_(j))^(l-1)o+h_(r_(k))))\alpha_{i j k}^{l-1}=\operatorname{Softmax}_{k}\left(a^{T}\left(W_{r_{k}} h_{e_{i}}^{l-1} \oplus W_{r_{k}} h_{e_{j}}^{l-1} \oplus h_{r_{k}}\right)\right)
where Softmax k k _(k)_{k} is the normalization of the k k kk th edge attention coefficients of e i e i e_(i)e_{i} and e j , e j , e_(j),o+e_{j}, \oplus denotes the vector concatenation, and a T a T a^(T)a^{T} is the parameter of the learnable attention weights.
其中 Softmax k k _(k)_{k} 是对 e i e i e_(i)e_{i} 的第 k k kk 条边注意力系数的归一化, e j , e j , e_(j),o+e_{j}, \oplus 表示向量拼接, a T a T a^(T)a^{T} 是可学习注意力权重的参数。
If the long-tailed entity e i e i e_(i)e_{i} has only one neighbor e j e j e_(j)e_{j}, its embedding in layer l l ll can be computed as:
如果长尾实体 e i e i e_(i)e_{i} 只有一个邻居 e j e j e_(j)e_{j} ,则其在第 l l ll 层的嵌入可以计算为:

h e i l = σ ( Average ( h e i S g 1 + h e i S g 2 + h e i S g 3 ) ) h e i l = σ Average h e i S g 1 + h e i S g 2 + h e i S g 3 h_(e_(i))^(l)=sigma(Average(h_(e_(i))^(Sg1)+h_(e_(i))^(Sg2)+h_(e_(i))^(Sg3)))h_{e_{i}}^{l}=\sigma\left(\operatorname{Average}\left(h_{e_{i}}^{S g 1}+h_{e_{i}}^{S g 2}+h_{e_{i}}^{S g 3}\right)\right)
where h e i S g 1 , h e i S g 2 , h e i S g 3 h e i S g 1 , h e i S g 2 , h e i S g 3 h_(e_(i))^(Sg1),h_(e_(i))^(Sg2),h_(e_(i))^(Sg3)h_{e_{i}}^{S g 1}, h_{e_{i}}^{S g 2}, h_{e_{i}}^{S g 3} is the embedding of Subgraph1, Subgraph2, and Subgraph3 in Fig. 2, respectively. The solid line indicates the relationship direction of entities and the dashed line indicates the direction of information aggregation between entities.
其中 h e i S g 1 , h e i S g 2 , h e i S g 3 h e i S g 1 , h e i S g 2 , h e i S g 3 h_(e_(i))^(Sg1),h_(e_(i))^(Sg2),h_(e_(i))^(Sg3)h_{e_{i}}^{S g 1}, h_{e_{i}}^{S g 2}, h_{e_{i}}^{S g 3} 分别是图 2 中子图 1、子图 2 和子图 3 的嵌入。实线表示实体之间的关系方向,虚线表示实体之间信息聚合的方向。
The Subgraph1 phase computes information about the direct neighbors e j e j e_(j)e_{j} of e i e i e_(i)e_{i}.
子图 1 阶段计算 e i e i e_(i)e_{i} 的直接邻居 e j e j e_(j)e_{j} 的信息。

h e i S g 1 = α i j k l 1 W r k h e j l 1 h e i S g 1 = α i j k l 1 W r k h e j l 1 h_(e_(i))^(Sg1)=alpha_(ijk)^(l-1)W_(r_(k))h_(e_(j))^(l-1)h_{e_{i}}^{S g 1}=\alpha_{i j k}^{l-1} W_{r_{k}} h_{e_{j}}^{l-1}
The Subgraph2 phase computes information about the indirect neighbors e m e m e_(m)e_{m} of e j e j e_(j)e_{j}, i.e., neighbors of neighbors. Here, the information of all other neighbors e m e m e_(m)e_{m} of e j e j e_(j)e_{j} is considered.
子图 2 阶段计算 e j e j e_(j)e_{j} 的间接邻居 e m e m e_(m)e_{m} 的信息,即邻居的邻居。这里考虑了 e j e j e_(j)e_{j} 的所有其他邻居 e m e m e_(m)e_{m} 的信息。

h e i S g 2 = e m N e j r o R j m α j m o l 1 W r o h e m l 1 h e i S g 2 = e m N e j r o R j m α j m o l 1 W r o h e m l 1 h_(e_(i))^(Sg2)=sum_(e_(m)inN_(e_(j)))sum_(r_(o)inR_(jm))alpha_(jmo)^(l-1)W_(r_(o))h_(e_(m))^(l-1)h_{e_{i}}^{S g 2}=\sum_{e_{m} \in N_{e_{j}}} \sum_{r_{o} \in R_{j m}} \alpha_{j m o}^{l-1} W_{r_{o}} h_{e_{m}}^{l-1}
where R j m R j m R_(jm)R_{j m} is all relations between e j e j e_(j)e_{j} and e m , r o e m , r o e_(m),r_(o)e_{m}, r_{o} is the o o oo th edge of e j e j e_(j)e_{j} and e m , W r o e m , W r o e_(m),W_(r_(o))e_{m}, W_{r_{o}} is the transformation matrix of r o r o r_(o)r_{o}.
其中 R j m R j m R_(jm)R_{j m} e j e j e_(j)e_{j} e m , r o e m , r o e_(m),r_(o)e_{m}, r_{o} 之间的所有关系, o o oo e j e j e_(j)e_{j} 的第 o o oo 条边, e m , W r o e m , W r o e_(m),W_(r_(o))e_{m}, W_{r_{o}} r o r o r_(o)r_{o} 的变换矩阵。
The Subgraph3 phase computes the two-hop neighbor e n e n e_(n)e_{n} information of e i e i e_(i)e_{i}. Unlike Subgraph2, Subgraph3 focuses on computing information about all other neighbors of e j e j e_(j)e_{j} (except e i e i e_(i)e_{i} ).
Subgraph3 阶段计算 e i e i e_(i)e_{i} 的两跳邻居 e n e n e_(n)e_{n} 信息。与 Subgraph2 不同,Subgraph3 侧重于计算 e j e j e_(j)e_{j} 的所有其他邻居(除 e i e i e_(i)e_{i} 外)的信息。

h e i S g 3 = e n ( N e j e i ) r p ( R j m R i j ) α j m p l 1 W r p h e n l 1 h e i S g 3 = e n N e j e i r p R j m R i j α j m p l 1 W r p h e n l 1 h_(e_(i))^(Sg3)=sum_(e_(n)in(N_(e_(j))-e_(i)))sum_(r_(p)in(R_(jm)-R_(ij)))alpha_(jmp)^(l-1)W_(r_(p))h_(e_(n))^(l-1)h_{e_{i}}^{S g 3}=\sum_{e_{n} \in\left(N_{e_{j}}-e_{i}\right)} \sum_{r_{p} \in\left(R_{j m}-R_{i j}\right)} \alpha_{j m p}^{l-1} W_{r_{p}} h_{e_{n}}^{l-1}
Therefore, MGCN not only accounts for direct neighbor relationships but also enhances entity representation through multi-hop information transfer, crucial for addressing long-tailed entities in knowledge graphs.
因此,MGCN 不仅考虑直接邻居关系,还通过多跳信息传递增强实体表示,这对于解决知识图谱中的长尾实体问题至关重要。

3.3.2. Gating mechanism3.3.2. 门控机制

GNNs can capture multi-hop neighbor information through multilayer aggregation. However, this may also introduce more noise since not all neighbor information is beneficial to the model’s performance. For this reason, this paper introduces a gating mechanism to regulate the information flow and weight assignment. This mechanism can effectively reduce the effect of noise while capturing multi-hop neighbor information, thus improving the accuracy of entity representation. The details of the gating mechanism are shown in Fig. 3, where o.\odot represents Hadamard Product, and + represents matrix addition.
GNN 可以通过多层聚合捕获多跳邻居信息。然而,这也可能引入更多噪声,因为并非所有邻居信息都有利于模型性能。基于此,本文引入门控机制来调节信息流和权重分配。该机制能够在捕获多跳邻居信息的同时有效减少噪声影响,从而提升实体表示的准确性。门控机制的细节如图 3 所示,其中 o.\odot 表示 Hadamard 积,+表示矩阵加法。
Assuming the number of network layers l = 2 l = 2 l=2l=2, the embedding of entity e i e i e_(i)e_{i} under the gating mechanism can be computed by the following equation:
假设网络层数为 l = 2 l = 2 l=2l=2 ,在门控机制下实体 e i e i e_(i)e_{i} 的嵌入可以通过以下公式计算:

h e i = g h e i 1 + ( 1 g ) h e i 2 h e i = g h e i 1 + ( 1 g ) h e i 2 h_(e_(i))=g*h_(e_(i))^(1)+(1-g)*h_(e_(i))^(2)h_{e_{i}}=g \cdot h_{e_{i}}^{1}+(1-g) \cdot h_{e_{i}}^{2}
where g g gg is learnable weight parameter used as a gate to control the combination of the hidden layer and the output layer, and h e i , t 1 h e i , t 1 h_(e_(i),t)^(1)h_{e_{i}, t}^{1} and h e i , t 2 h e i , t 2 h_(e_(i),t)^(2)h_{e_{i}, t}^{2} are the hidden and output layer representations of the network, respectively.
其中 g g gg 是可学习的权重参数,用作门控以控制隐藏层和输出层的组合, h e i , t 1 h e i , t 1 h_(e_(i),t)^(1)h_{e_{i}, t}^{1} h e i , t 2 h e i , t 2 h_(e_(i),t)^(2)h_{e_{i}, t}^{2} 分别是网络的隐藏层和输出层表示。
Here, when it is necessary to aggregate k k kk-hop ( k > 2 k > 2 k > 2k>2 ) neighbor information, it can be aggregated through a k k kk-layer network. Let ρ 1 ( h e i 1 , h e i 2 ) ρ 1 h e i 1 , h e i 2 rho_(1)(h_(e_(i))^(1),h_(e_(i))^(2))\rho_{1}\left(h_{e_{i}}^{1}, h_{e_{i}}^{2}\right) be a gated combination of one-hop and two-hop neighbor aggregation, which can be found recursively by the following equation:
这里,当需要聚合 k k kk 跳( k > 2 k > 2 k > 2k>2 )邻居信息时,可以通过一个 k k kk 层网络进行聚合。令 ρ 1 ( h e i 1 , h e i 2 ) ρ 1 h e i 1 , h e i 2 rho_(1)(h_(e_(i))^(1),h_(e_(i))^(2))\rho_{1}\left(h_{e_{i}}^{1}, h_{e_{i}}^{2}\right) 为一跳和两跳邻居聚合的门控组合,其递归计算公式如下:

h e i = ρ k 1 ( ρ 2 ( ρ 1 ( h e i 1 , h e i 2 ) , h e i 3 ) ) h e i = ρ k 1 ρ 2 ρ 1 h e i 1 , h e i 2 , h e i 3 h_(e_(i))=rho_(k-1)(dotsrho_(2)(rho_(1)(h_(e_(i))^(1),h_(e_(i))^(2)),h_(e_(i))^(3))dots)h_{e_{i}}=\rho_{k-1}\left(\ldots \rho_{2}\left(\rho_{1}\left(h_{e_{i}}^{1}, h_{e_{i}}^{2}\right), h_{e_{i}}^{3}\right) \ldots\right)
Therefore, the structural embedding loss of the entity can be calculated by the following equation:
因此,实体的结构嵌入损失可以通过以下公式计算:

L s t r = ( e i , e j ) A + ( e i , e j ) A max ( λ s t r + h e i h e j h e i h e j , 0 ) L s t r = e i , e j A + e i , e j A max λ s t r + h e i h e j h e i h e j , 0 L_(str)=sum_((e_(i),e_(j))inA^(+))sum_((e_(i)^('),e_(j)^('))in A-)max(lambda_(str)+||h_(e_(i))-h_(e_(j))||-||h_(e_(i)^('))-h_(e_(j)^('))||,0)\mathscr{L}_{s t r}=\sum_{\left(e_{i}, e_{j}\right) \in A^{+}} \sum_{\left(e_{i}^{\prime}, e_{j}^{\prime}\right) \in A-} \max \left(\lambda_{s t r}+\left\|h_{e_{i}}-h_{e_{j}}\right\|-\left\|h_{e_{i}^{\prime}}-h_{e_{j}^{\prime}}\right\|, 0\right)
where A + A + A^(+)A^{+}and A A A^(-)A^{-}are the set of positive and negative samples for entity alignment, λ s t r λ s t r lambda_(str)\lambda_{s t r} is the margin hyper-parameter separating entity alignment structure loss, and ||*||\|\cdot\| denotes the L 2 L 2 L_(2)L_{2} vector paradigm.
其中 A + A + A^(+)A^{+} A A A^(-)A^{-} 分别是实体对齐的正样本和负样本集合, λ s t r λ s t r lambda_(str)\lambda_{s t r} 是区分实体对齐结构损失的边距超参数, ||*||\|\cdot\| 表示 L 2 L 2 L_(2)L_{2} 向量范式。

3.4. Relation embedding3.4. 关系嵌入

The AMKG comprehensively covers semantic relationships among products, parameters, and metrology equipment, represented as relational triples. By assuming that entities with similar characteristics exhibit similar relationships, leveraging this relational information greatly improves entity alignment. Fig. 4 shows several many-to-many relationships within the AMKG, where circles represent entities and solid lines indicate relationships. The traditional TransE model faces challenges in effectively modeling these complex relationships.
AMKG 全面涵盖了产品、参数和计量设备之间的语义关系,这些关系以关系三元组的形式表示。假设具有相似特征的实体表现出相似的关系,利用这些关系信息大大提升了实体对齐的效果。图 4 展示了 AMKG 中的多个多对多关系,其中圆圈代表实体,实线表示关系。传统的 TransE 模型在有效建模这些复杂关系方面面临挑战。
Inspired by the TransD model [40], this paper embeds entities and relations into distinct semantic spaces. Each head entity and its corresponding tail entity are then projected into the relation space using
受 TransD 模型[40]的启发,本文将实体和关系嵌入到不同的语义空间中。然后,将每个头实体及其对应的尾实体投影到关系空间中,使用

unique transformation matrices. Within this space, we establish a translation relationship between the head entity and the tail entity, as illustrated in Fig. 5. The entity relation loss is subsequently computed using the equation below:
唯一的变换矩阵。在该空间内,我们建立了头实体与尾实体之间的平移关系,如图 5 所示。实体关系损失随后通过以下公式计算:
L r e l = ( e i , r i j , e j ) T R ( e i , r i j , e j ) T R max ( λ r e l + e i M r i j e i + r i j e j M r i j e j e i M r i j e i + r i j e j M r i j e j , 0 ) L r e l = e i , r i j , e j T R e i , r i j , e j T R max λ r e l + e i M r i j e i + r i j e j M r i j e j e i M r i j e i + r i j e j M r i j e j , 0 {:[L_(rel)=sum_((e_(i),r_(ij),e_(j))inT^(R))sum_((e_(i)^('),r_(ij)^('),e_(j)^('))inT^(R^(')))max(lambda_(rel)+||e_(i)M_(r_(ij)e_(i))+r_(ij)-e_(j)M_(r_(ij)e_(j))||-||e_(i)^(')M_(r_(ij)^(')e_(i)^('))+r_(ij)^('):}],[{:-e_(j)^(')M_(r_(ij)^(')e_(j)^('))^(')||,0)]:}\begin{aligned} \mathscr{L}_{r e l}= & \sum_{\left(e_{i}, r_{i j}, e_{j}\right) \in T^{R}} \sum_{\left(e_{i}^{\prime}, r_{i j}^{\prime}, e_{j}^{\prime}\right) \in T^{R^{\prime}}} \max \left(\lambda_{r e l}+\left\|e_{i} M_{r_{i j} e_{i}}+r_{i j}-e_{j} M_{r_{i j} e_{j}}\right\|-\| e_{i}^{\prime} M_{r_{i j}^{\prime} e_{i}^{\prime}}+r_{i j}^{\prime}\right. \\ & \left.-e_{j}^{\prime} M_{r_{i j}^{\prime} e_{j}^{\prime}}^{\prime} \|, 0\right) \end{aligned}
where e i , r i j , e j e i , r i j , e j e_(i),r_(ij),e_(j)e_{i}, r_{i j}, e_{j} is the head entity, relation, and tail entity, respectively, M r i j e i M r i j e i M_(r_(ij)e_(i))M_{r_{i j} e_{i}} and M r i j e j M r i j e j M_(r_(ij)e_(j))M_{r_{i j} e_{j}} are the transformation matrices for the projection of e i e i e_(i)e_{i} and e j e j e_(j)e_{j} into the relation semantic space, T R T R T^(R)T^{R} and T R T R T^(R^('))T^{R^{\prime}} are the set of positive and negative samples of the triad, and λ rel λ rel  lambda_("rel ")\lambda_{\text {rel }} is the margin hyperparameter separating the loss of entity-aligned relations.
其中 e i , r i j , e j e i , r i j , e j e_(i),r_(ij),e_(j)e_{i}, r_{i j}, e_{j} 分别是头实体、关系和尾实体, M r i j e i M r i j e i M_(r_(ij)e_(i))M_{r_{i j} e_{i}} M r i j e j M r i j e j M_(r_(ij)e_(j))M_{r_{i j} e_{j}} 是将 e i e i e_(i)e_{i} e j e j e_(j)e_{j} 投影到关系语义空间的变换矩阵, T R T R T^(R)T^{R} T R T R T^(R^('))T^{R^{\prime}} 是三元组的正样本和负样本集合, λ rel λ rel  lambda_("rel ")\lambda_{\text {rel }} 是区分实体对齐关系损失的边界超参数。

3.5. Attribute embedding
3.5. 属性嵌入

In the field of aeronautical metrology, variations often occur in the naming of entities and the structuring of attributes across diverse data sources. As shown in Fig. 6, entities are represented by circles, attribute names by dotted lines, and attribute values by rectangles. Despite differences in entity names, they may correspond to the same real-world entity when their attributes exhibit significant similarity. Employing attribute embedding techniques enhances the representation of entity features, thereby improving the assessment of semantic similarity between entities. This approach increases both the precision and robustness of entity alignment.
在航空计量领域,不同数据源中实体命名和属性结构常常存在差异。如图 6 所示,实体用圆圈表示,属性名称用虚线表示,属性值用矩形表示。尽管实体名称不同,但当其属性表现出显著相似性时,它们可能对应同一现实世界实体。采用属性嵌入技术可以增强实体特征的表示,从而提升实体语义相似度的评估效果。这种方法提高了实体对齐的准确性和鲁棒性。
The BERT model [43] is based on large-scale unsupervised pretraining and has excellent linguistic representation and feature extraction capabilities. This paper employs the BERT model to initialize the embeddings for attribute names and values, ensuring their distributional consistency.
BERT 模型[43]基于大规模无监督预训练,具备出色的语言表示和特征提取能力。本文采用 BERT 模型初始化属性名称和值的嵌入,确保其分布一致性。

A = ( a 0 , a 1 , , a n ) = BERT ( [ a 0 , a 1 , , a n ] ) A = a 0 , a 1 , , a n = BERT a 0 , a 1 , , a n vec(A)=( vec(a)_(0), vec(a)_(1),dots, vec(a)_(n))=BERT([a_(0),a_(1),dots,a_(n)])\vec{A}=\left(\vec{a}_{0}, \vec{a}_{1}, \ldots, \vec{a}_{n}\right)=\operatorname{BERT}\left(\left[a_{0}, a_{1}, \ldots, a_{n}\right]\right)
V = ( v 1 , v 2 , , v n ) = BERT ( [ v 1 , v 2 , , v n ] ) V = v 1 , v 2 , , v n = BERT v 1 , v 2 , , v n vec(V)=( vec(v)_(1), vec(v)_(2),dots, vec(v)_(n))=BERT([v_(1),v_(2),dots,v_(n)])\vec{V}=\left(\vec{v}_{1}, \vec{v}_{2}, \ldots, \vec{v}_{n}\right)=\operatorname{BERT}\left(\left[v_{1}, v_{2}, \ldots, v_{n}\right]\right)
where A A vec(A)\vec{A} and V V vec(V)\vec{V} denote the set of attribute name and attribute value embeddings, a i a i vec(a)_(i)\vec{a}_{i} denotes the embedding vector of the i i ii th attribute name a i a i a_(i)a_{i}, and v i v i vec(v)_(i)\vec{v}_{i} denotes the embedding vector of the i i ii th attribute value v i v i v_(i)v_{i}.
其中 A A vec(A)\vec{A} V V vec(V)\vec{V} 分别表示属性名称和属性值的嵌入集合, a i a i vec(a)_(i)\vec{a}_{i} 表示第 i i ii 个属性名称 a i a i a_(i)a_{i} 的嵌入向量, v i v i vec(v)_(i)\vec{v}_{i} 表示第 i i ii 个属性值 v i v i v_(i)v_{i} 的嵌入向量。
Different attributes have different importance to the entity and need to be assigned different weights. Based on the attention mechanism, the attention weight ω i ω i omega_(i)\omega_{i} of the i i ii th attribute name vector a i a i vec(a)_(i)\vec{a}_{i} can be calculated by the following equation:
不同属性对实体的重要性不同,需要赋予不同的权重。基于注意力机制,第 i i ii 个属性名称向量 a i a i vec(a)_(i)\vec{a}_{i} 的注意力权重 ω i ω i omega_(i)\omega_{i} 可通过以下公式计算:

ω i = Softmax ( a T W a a i ) ω i = Softmax a T W a a i omega_(i)=Softmax(a^(T)W_(a) vec(a)_(i))\omega_{i}=\operatorname{Softmax}\left(a^{T} W_{a} \vec{a}_{i}\right)
where W a W a W_(a)W_{a} is the weight matrix. The attribute information of an entity
其中 W a W a W_(a)W_{a} 是权重矩阵。实体的属性信息

Fig. 3. Details of the gating mechanism.
图 3. 门控机制的细节。


(a) 1-N relation.(a) 1-N 关系。
A product has N parameters
一个产品有 N 个参数


(b) N-1 relation.(b) N-1 关系。
N master standards form a metrology standard instrument
N 个主标准形成一个计量标准器


© N-N relation.© N-N 关系。
N parameters are detected by N metrology devices
N 个参数由 N 个计量设备检测
Fig. 4. Entity relations in AMKG.
图 4. AMKG 中的实体关系。

Fig. 5. The TransD model structure.
图 5. TransD 模型结构。

Fig. 6. Entity Attributes in AMKG.
图 6. AMKG 中的实体属性。

consists of attribute names and corresponding attribute values. Therefore, the attribute embedding of an entity can be formed by concatenating the embedding vectors of attribute names and attribute values.
由属性名称和对应的属性值组成。因此,实体的属性嵌入可以通过连接属性名称和属性值的嵌入向量来形成。

a = i = 0 n ω i a i i = 0 n ω i v i a = i = 0 n ω i a i i = 0 n ω i v i a=sum_(i=0)^(n)omega_(i) vec(a)_(i)o+sum_(i=0)^(n)omega_(i) vec(v)_(i)a=\sum_{i=0}^{n} \omega_{i} \vec{a}_{i} \oplus \sum_{i=0}^{n} \omega_{i} \vec{v}_{i}
Then the entity attribute loss can be calculated by the following equation:
然后实体属性损失可以通过以下公式计算:

L attr = ( e i , e j ) A + ( e i , e j ) A max ( λ attr + a e i a e j a e i a e j , 0 ) L attr  = e i , e j A + e i , e j A max λ attr  + a e i a e j a e i a e j , 0 L_("attr ")=sum_((e_(i),e_(j))inA^(+))sum_((e_(i)^('),e_(j)^('))inA^(-))max(lambda_("attr ")+||a_(e_(i))-a_(e_(j))||-||a_(e_(i)^('))-a_(e_(j)^('))||,0)\mathscr{L}_{\text {attr }}=\sum_{\left(e_{i}, e_{j}\right) \in A^{+}} \sum_{\left(e_{i}^{\prime}, e_{j}^{\prime}\right) \in A^{-}} \max \left(\lambda_{\text {attr }}+\left\|a_{e_{i}}-a_{e_{j}}\right\|-\left\|a_{e_{i}^{\prime}}-a_{e_{j}^{\prime}}\right\|, 0\right)
where λ attr λ attr  lambda_("attr ")\lambda_{\text {attr }} is the margin hyper-parameter separating the loss of the entity alignment attribute, and the other characters have the same meaning as in equation (9).
其中 λ attr λ attr  lambda_("attr ")\lambda_{\text {attr }} 是区分实体对齐属性损失的边界超参数,其他符号含义与公式(9)相同。

3.6. Entity alignment3.6. 实体对齐

The core of entity alignment lies in learning similar representations for equivalent entities in different knowledge graphs. This method strives to minimize alignment loss, thereby ensuring that representations of aligned entities exhibit minimal discrepancies, whereas those of unaligned entities display marked differences.
实体对齐的核心在于为不同知识图谱中等价的实体学习相似的表示。该方法力求最小化对齐损失,从而确保对齐实体的表示差异最小,而未对齐实体的表示差异显著。
Using the described method, we derive separate losses for the structural, relational, and attribute embeddings of aeronautical
利用上述方法,我们分别计算航空计量实体的结构、关系和属性嵌入的损失。

metrology entities. Subsequently, we aggregate these embeddings from the three perspectives into a unified entity embedding. The final entity alignment loss is then calculated by applying specific weights, as detailed in the equation below.
随后,我们将这三种视角的嵌入聚合成统一的实体嵌入。最终的实体对齐损失通过应用特定权重计算,具体如下面的公式所示。

L = α L str + β L rel + γ L attr L = α L str  + β L rel  + γ L attr  L=alphaL_("str ")+betaL_("rel ")+gammaL_("attr ")\mathscr{L}=\alpha \mathscr{L}_{\text {str }}+\beta \mathscr{L}_{\text {rel }}+\gamma \mathscr{L}_{\text {attr }}
where α , β , γ α , β , γ alpha,beta,gamma\alpha, \beta, \gamma are the entity embedding weights based on structural, relational, and attribute information, respectively, and they take values in the range of [ 0 , 1 ] [ 0 , 1 ] [0,1][0,1], and α + β + γ = 1 α + β + γ = 1 alpha+beta+gamma=1\alpha+\beta+\gamma=1.
其中 α , β , γ α , β , γ alpha,beta,gamma\alpha, \beta, \gamma 分别是基于结构、关系和属性信息的实体嵌入权重,它们的取值范围为 [ 0 , 1 ] [ 0 , 1 ] [0,1][0,1] α + β + γ = 1 α + β + γ = 1 alpha+beta+gamma=1\alpha+\beta+\gamma=1
After the embedding processes, entity vectors in both the source knowledge graph A M K G 1 A M K G 1 AMKG_(1)A M K G_{1} and the target knowledge graph A M K G 2 A M K G 2 AMKG_(2)A M K G_{2} are unified in the same vector space. For an entity e i 1 e i 1 e_(i)^(1)e_{i}^{1} in A M K G 1 A M K G 1 AMKG_(1)A M K G_{1}, its vector representation is derived using the embedding method that integrates structural, relational, and attribute information. Similarly, the vector representations of entities e i e i e_(i)e_{i} in A M K G 2 A M K G 2 AMKG_(2)A M K G_{2} are obtained through the same embedding process. To identify the target entity that best matches the source entity e i 1 e i 1 e_(i)^(1)e_{i}^{1}, we calculate the cosine similarity between e i 1 e i 1 e_(i)^(1)e_{i}^{1} and each entity e i e i e_(i)e_{i} in the target knowledge graph. The entity e i 2 e i 2 e_(i)^(2)e_{i}^{2} with the highest similarity score is selected as the aligned entity for e i 1 e i 1 e_(i)^(1)e_{i}^{1}.
经过嵌入处理后,源知识图谱 A M K G 1 A M K G 1 AMKG_(1)A M K G_{1} 和目标知识图谱 A M K G 2 A M K G 2 AMKG_(2)A M K G_{2} 中的实体向量被统一到同一向量空间中。对于 A M K G 1 A M K G 1 AMKG_(1)A M K G_{1} 中的实体 e i 1 e i 1 e_(i)^(1)e_{i}^{1} ,其向量表示是通过整合结构、关系和属性信息的嵌入方法获得的。同样, A M K G 2 A M K G 2 AMKG_(2)A M K G_{2} 中的实体 e i e i e_(i)e_{i} 的向量表示也通过相同的嵌入过程得到。为了识别与源实体 e i 1 e i 1 e_(i)^(1)e_{i}^{1} 最匹配的目标实体,我们计算 e i 1 e i 1 e_(i)^(1)e_{i}^{1} 与目标知识图谱中每个实体 e i e i e_(i)e_{i} 之间的余弦相似度。相似度得分最高的实体 e i 2 e i 2 e_(i)^(2)e_{i}^{2} 被选为 e i 1 e i 1 e_(i)^(1)e_{i}^{1} 的对齐实体。

e i 2 = argmax ( cos ( e i 1 , e i ) ) e i 2 = argmax cos e i 1 , e i e_(i)^(2)=argmax(cos(e_(i)^(1),e_(i)))e_{i}^{2}=\operatorname{argmax}\left(\cos \left(e_{i}^{1}, e_{i}\right)\right)
where cos ( e i 1 , e i ) cos e i 1 , e i cos(e_(i)^(1),e_(i))\cos \left(e_{i}^{1}, e_{i}\right) denotes the cosine similarity between entities e i 1 e i 1 e_(i)^(1)e_{i}^{1} and e i e i e_(i)e_{i}. Therefore, the correspondence of entities between two knowledge graphs can be effectively established to realize entity alignment.
其中 cos ( e i 1 , e i ) cos e i 1 , e i cos(e_(i)^(1),e_(i))\cos \left(e_{i}^{1}, e_{i}\right) 表示实体 e i 1 e i 1 e_(i)^(1)e_{i}^{1} e i e i e_(i)e_{i} 之间的余弦相似度。因此,可以有效建立两个知识图谱之间的实体对应关系,实现实体对齐。

4. Experiments4. 实验

This study evaluates the proposed MEEA model in three aspects. First, baseline model comparisons and ablation experiments are conducted on the DBP15K benchmark dataset, analyzing the impact of various parameters on model performance. Second, the baseline model’s performance is compared using the latest KG-aligned benchmark dataset DWY-NB. Finally, the model’s applicability and effectiveness in realworld AMKG entity alignment tasks are validated through case studies.
本研究从三个方面评估所提出的 MEEA 模型。首先,在 DBP15K 基准数据集上进行基线模型比较和消融实验,分析各参数对模型性能的影响。其次,使用最新的 KG 对齐基准数据集 DWY-NB 对基线模型的性能进行比较。最后,通过案例研究验证模型在实际 AMKG 实体对齐任务中的适用性和有效性。

4.1. Experiments settings
4.1. 实验设置

4.1.1. Dataset4.1.1. 数据集

In order to evaluate the performance of the proposed model, the DBP15K benchmark dataset [44] is first selected as the experimental object in this study. This dataset is derived from DBpedia, which is a commonly used data resource in the field of entity alignment. The DBP15K consists of three sub-datasets: DBP15K ZH _ EN ZH _ EN  _(ZH_"EN "){ }_{\mathrm{ZH} \_ \text {EN }} (Chinese to English), DBP15K JA_EN JA_EN  _("JA_EN "){ }_{\text {JA_EN }} (Japanese to English), and DBP15K FR_EN FR_EN  _("FR_EN "){ }_{\text {FR_EN }} (French to English). Each sub-dataset comprises two similar knowledge graphs and includes 15,000 pre-aligned entity pairs, with each entity featuring multiple attributes. Table 1 shows the detailed statistics of DBP15K. In the model training phase, 30 % 30 % 30%30 \% of the entity pairs serve as the training set, while the remaining 70 % 70 % 70%70 \% constitute the test set. The experimental results of the proposed model are the average of five runs using different data splits to ensure the fairness and reliability of the assessment.
为了评估所提模型的性能,本研究首先选择 DBP15K 基准数据集[44]作为实验对象。该数据集来源于 DBpedia,是实体对齐领域常用的数据资源。DBP15K 包含三个子数据集:DBP15K ZH _ EN ZH _ EN  _(ZH_"EN "){ }_{\mathrm{ZH} \_ \text {EN }} (中文到英文)、DBP15K JA_EN JA_EN  _("JA_EN "){ }_{\text {JA_EN }} (日文到英文)和 DBP15K FR_EN FR_EN  _("FR_EN "){ }_{\text {FR_EN }} (法文到英文)。每个子数据集由两个相似的知识图谱组成,包含 15,000 个预对齐的实体对,每个实体具有多个属性。表 1 展示了 DBP15K 的详细统计信息。在模型训练阶段, 30 % 30 % 30%30 \% 的实体对作为训练集,其余 70 % 70 % 70%70 \% 作为测试集。所提模型的实验结果是基于五次不同数据划分的平均值,以确保评估的公平性和可靠性。
Table 1表 1
The statistics of the DBP15K dataset.
DBP15K 数据集的统计信息。
Dataset数据集 Entities实体 Relations关系 Relation triples关系三元组 Attribute triples属性三元组
DBP15K ZH_EN ZH_EN  _("ZH_EN "){ }_{\text {ZH_EN }} Chinese中文 66469 2830 153929 379684
English英文 98125 2317 237674 567755
DBP15K JA_EN JA_EN  _("JA_EN "){ }_{\text {JA_EN }} Japanese日文 65744 2043 164373 354619
English英文 95680 2096 233319 497230
DBP15K FR_EN FR_EN  _("FR_EN "){ }_{\text {FR_EN }} French法语 66858 1379 192191 528665
English英文 105 2209 278590 576543
889
Dataset Entities Relations Relation triples Attribute triples DBP15K _("ZH_EN ") Chinese 66469 2830 153929 379684 English 98125 2317 237674 567755 DBP15K _("JA_EN ") Japanese 65744 2043 164373 354619 English 95680 2096 233319 497230 DBP15K _("FR_EN ") French 66858 1379 192191 528665 English 105 2209 278590 576543 889 | Dataset | | Entities | Relations | Relation triples | Attribute triples | | :--- | :--- | :--- | :--- | :--- | :--- | | DBP15K ${ }_{\text {ZH_EN }}$ | Chinese | 66469 | 2830 | 153929 | 379684 | | | English | 98125 | 2317 | 237674 | 567755 | | DBP15K ${ }_{\text {JA_EN }}$ | Japanese | 65744 | 2043 | 164373 | 354619 | | | English | 95680 | 2096 | 233319 | 497230 | | DBP15K ${ }_{\text {FR_EN }}$ | French | 66858 | 1379 | 192191 | 528665 | | | English | 105 | 2209 | 278590 | 576543 | | | | 889 | | | |

4.1.2. Parameter settings
4.1.2. 参数设置

In this study, we standardized the configuration of several crucial hyperparameters. Specifically, we set the output vector dimension of the MEEA model to 100 across all layers. To control the model’s weight parameters, we employed the AdaGrad optimization algorithm with a learning rate of 0.015 . Furthermore, we fixed the loss rate during training at 0.2 and the number of training epochs at 1500 . When addressing the structure, relation, and attribute embedding aspects, we set the margin parameter to values λ s t r = 2.0 , λ rel = 1.5 λ s t r = 2.0 , λ rel  = 1.5 lambda_(str)=2.0,lambda_("rel ")=1.5\lambda_{s t r}=2.0, \lambda_{\text {rel }}=1.5, and λ attr = 2.0 λ attr  = 2.0 lambda_("attr ")=2.0\lambda_{\text {attr }}=2.0. Considering the varying scales among sub-datasets, we adjust the number of layers l l ll in the MGCN network accordingly. For the DBP15K ZH ZH _(ZH-){ }_{\mathrm{ZH}-} EN and DBP15K JA_EN JA_EN  _("JA_EN "){ }_{\text {JA_EN }} sub-datasets, which exhibit similar sizes, we opt for a 2-layer network structure. However, for the larger DBP15K FR-EN FR-EN  _("FR-EN "){ }_{\text {FR-EN }} subdataset, we increase the number of network layers to 3 to delve deeper into extracting structural information.
在本研究中,我们对几个关键超参数的配置进行了标准化。具体来说,我们将 MEEA 模型的输出向量维度在所有层中均设置为 100。为了控制模型的权重参数,我们采用了学习率为 0.015 的 AdaGrad 优化算法。此外,我们将训练过程中的丢弃率固定为 0.2,训练轮数固定为 1500。在处理结构、关系和属性嵌入方面,我们将边缘参数设置为 λ s t r = 2.0 , λ rel = 1.5 λ s t r = 2.0 , λ rel  = 1.5 lambda_(str)=2.0,lambda_("rel ")=1.5\lambda_{s t r}=2.0, \lambda_{\text {rel }}=1.5 λ attr = 2.0 λ attr  = 2.0 lambda_("attr ")=2.0\lambda_{\text {attr }}=2.0 。考虑到子数据集之间的规模差异,我们相应调整了 MGCN 网络中的层数 l l ll 。对于规模相近的 DBP15K ZH ZH _(ZH-){ }_{\mathrm{ZH}-} EN 和 DBP15K JA_EN JA_EN  _("JA_EN "){ }_{\text {JA_EN }} 子数据集,我们选择了 2 层网络结构。然而,对于规模较大的 DBP15K FR-EN FR-EN  _("FR-EN "){ }_{\text {FR-EN }} 子数据集,我们将网络层数增加到 3 层,以更深入地提取结构信息。

4.1.3. Baseline and evaluation metrics
4.1.3. 基线和评估指标

In order to fully evaluate the performance of MEEA, several classical and advanced entity alignment models are selected for comparison in this study. As introduced in related work, these models can be categorized into three groups. The first category is models based on translation representation learning, including RotatE [21], HAKE [22], and ConvE [23]. The second category is semantic matching-based models, including MtransE [24], BootEA [25], TransEdge [26], and CAEA [29]. The third category is graph neural network-based models, including GCN-Align [31], NAEA [32], GRGCN [37], AliNet [38], DvGNet [39]. We obtained metrics reports for these baselines from the source code and parameter settings presented in the original paper.
为了全面评估 MEEA 的性能,本研究选择了若干经典和先进的实体对齐模型进行比较。如相关工作中所述,这些模型可分为三类。第一类是基于翻译表示学习的模型,包括 RotatE [21]、HAKE [22]和 ConvE [23]。第二类是基于语义匹配的模型,包括 MtransE [24]、BootEA [25]、TransEdge [26]和 CAEA [29]。第三类是基于图神经网络的模型,包括 GCN-Align [31]、NAEA [32]、GRGCN [37]、AliNet [38]、DvGNet [39]。我们从原论文中提供的源码和参数设置中获得了这些基线模型的指标报告。
Two metrics, Hits@k and Mean Reciprocal Rank (MRR), are widely employed to evaluate the performance of entity alignment tasks. Hits@k assesses the proportion of correctly matched entities among the top k k kk rankings, and M R R M R R MRRM R R computes the average rank of correctly aligned entities across all recommendation lists. The higher values of these metrics signify improved performance in entity alignment.
实体对齐任务中广泛采用两种指标:Hits@k 和平均倒数排名(MRR)来评估性能。Hits@k 衡量在前{k}名中正确匹配实体的比例,MRR 计算所有推荐列表中正确对齐实体的平均排名。这些指标值越高,表示实体对齐性能越好。

Hits @ k = 1 | T | i = 1 | T | ] ] ( rank i k ) @ k = 1 | T | i = 1 | T | ] ] rank i k @k=(1)/(|T|)sum_(i=1)^(|T|)]](rank_(i) <= k)@ k=\frac{1}{|T|} \sum_{i=1}^{|T|} \rrbracket\left(\operatorname{rank}_{i} \leqslant k\right)
M R R = 1 | T | i = 1 | T | 1 rank i M R R = 1 | T | i = 1 | T | 1 rank i MRR=(1)/(|T|)sum_(i=1)^(|T|)(1)/(rank_(i))M R R=\frac{1}{|T|} \sum_{i=1}^{|T|} \frac{1}{\operatorname{rank}_{i}}
where | T | | T | |T||T| is the number of ternary sets and r a n k i r a n k i rank_(i)r a n k_{i} is the ranking of the i i ii th entity’s true aligned entity among the candidate entities.
其中 | T | | T | |T||T| 是三元组集合的数量, r a n k i r a n k i rank_(i)r a n k_{i} 是第 i i ii 个实体的真实对齐实体在候选实体中的排名。

4.2. Experimental results and analysis
4.2. 实验结果与分析

4.2.1. Overall performance
4.2.1. 整体性能

Table 2 displays the aggregated performance results of different methods on the entity alignment task. The data indicate that MEEA consistently approaches the highest scores across all evaluation metrics. The translation-based representation learning models interpret entity semantics by mapping inter-entity translations. However, these models often falter when tasked with managing complex many-to-many relationships. This is because this method is difficult to effectively capture and model the diversity and complexity of relationships. Comparatively, MEEA employs the TransD mechanism to learn relational features between entities. The model effectively enhances its ability to model complex relationships by introducing a relational feature space and a strategy of entity projection. The experimental results demonstrate the significant performance improvement of MEEA over the traditional translation-based representation learning model on the entity alignment task.
表 2 展示了不同方法在实体对齐任务上的综合性能结果。数据表明,MEEA 在所有评估指标上始终接近最高分。基于翻译的表示学习模型通过映射实体间的转换来解释实体语义。然而,这些模型在处理复杂的多对多关系时常常表现不佳。这是因为该方法难以有效捕捉和建模关系的多样性和复杂性。相比之下,MEEA 采用 TransD 机制学习实体间的关系特征。该模型通过引入关系特征空间和实体投影策略,有效增强了对复杂关系的建模能力。实验结果证明,MEEA 在实体对齐任务上相较传统的基于翻译的表示学习模型具有显著的性能提升。
In addition, MEEA fully considers the attribute information in entity alignment and assigns different attention weights to different attributes. Compared to the CAEA model that also incorporates attribute embedding, MEEA achieves an improvement of 0.176 , 0.059 0.176 , 0.059 0.176,0.0590.176,0.059, and 0.132 in
此外,MEEA 充分考虑了实体对齐中的属性信息,并为不同属性分配了不同的注意力权重。与同样融合属性嵌入的 CAEA 模型相比,MEEA 实现了 0.176 , 0.059 0.176 , 0.059 0.176,0.0590.176,0.059 的提升,以及 0.132 的提升。
Table 2表 2
Overall performance comparison of entity alignment.
实体对齐的整体性能比较。
Models模型 DBP15K ZH_EN ZH_EN  _("ZH_EN "){ }_{\text {ZH_EN }}
DBP15K JA_EN JA_EN  _("JA_EN "){ }_{\text {JA_EN }}
DBP15K FR_EN FR_EN  _("FR_EN "){ }_{\text {FR_EN }}
DBP15K _("JA_EN ") DBP15K _("FR_EN ")| DBP15K ${ }_{\text {JA_EN }}$ | | :--- | | DBP15K ${ }_{\text {FR_EN }}$ |
Hits@1命中率@1 Hits@10命中率@10 MRR Hits@1命中率@1 Hits@10命中率@10 MRR Hits@1命中率@1 Hits@10命中率@10 MRR
RotatE 0.453 0.781 0.577 0.446 0.771 0.558 0.433 0.751 0.541
HAKE 0.288 0.588 0.391 0.319 0.607 0.421 0.319 0.638 0.428
ConvE 0.284 0.452 0.289 0.315 0.532 0.402 0.332 0.581 0.416
MtransE 0.367 0.702 0.403 0.315 0.656 0.449 0.325 0.648 0.425
BootEA 0.629 0.848 0.703 0.622 0.854 0.701 0.653 0.874 0.731
TransEdge 0.653 0.653 _ 0.653 _\underline{0.653} 0.907 0.745 0.638 0.867 0.712 0.639 0.851 0.738
CAEA 0.603 0.788 0.755 0.755 _ 0.755 _\underline{0.755} 0.613 0.886 0.691 0.561 0.867 0.711
GCN-Align 0.417 0.753 0.538 0.422 0.779 0.558 0.403 0.761 0.548
NAEA 0.651 0.867 0.720 0.641 0.873 0.718 0.673 0.874 0.752
GRGCN 0.457 0.759 0.684 0.468 0.771 0.658 0.472 0.761 0.675
AliNet 0.539 0.826 0.628 0.549 0.831 0.645 0.552 0.852 0.657
DvGNet 0.534 0.844 0.638 0.538 0.863 0.651 0.557 0.881 0.668
MEEA 0.768 0.896 0.835 0.753 0.909 0.854 0.783 0.912 0.863
Models DBP15K _("ZH_EN ") "DBP15K _("JA_EN ") DBP15K _("FR_EN ")" Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR RotatE 0.453 0.781 0.577 0.446 0.771 0.558 0.433 0.751 0.541 HAKE 0.288 0.588 0.391 0.319 0.607 0.421 0.319 0.638 0.428 ConvE 0.284 0.452 0.289 0.315 0.532 0.402 0.332 0.581 0.416 MtransE 0.367 0.702 0.403 0.315 0.656 0.449 0.325 0.648 0.425 BootEA 0.629 0.848 0.703 0.622 0.854 0.701 0.653 0.874 0.731 TransEdge 0.653 _ 0.907 0.745 0.638 0.867 0.712 0.639 0.851 0.738 CAEA 0.603 0.788 0.755 _ 0.613 0.886 0.691 0.561 0.867 0.711 GCN-Align 0.417 0.753 0.538 0.422 0.779 0.558 0.403 0.761 0.548 NAEA 0.651 0.867 0.720 0.641 0.873 0.718 0.673 0.874 0.752 GRGCN 0.457 0.759 0.684 0.468 0.771 0.658 0.472 0.761 0.675 AliNet 0.539 0.826 0.628 0.549 0.831 0.645 0.552 0.852 0.657 DvGNet 0.534 0.844 0.638 0.538 0.863 0.651 0.557 0.881 0.668 MEEA 0.768 0.896 0.835 0.753 0.909 0.854 0.783 0.912 0.863| Models | DBP15K ${ }_{\text {ZH_EN }}$ | | | DBP15K ${ }_{\text {JA_EN }}$ <br> DBP15K ${ }_{\text {FR_EN }}$ | | | | | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | | RotatE | 0.453 | 0.781 | 0.577 | 0.446 | 0.771 | 0.558 | 0.433 | 0.751 | 0.541 | | HAKE | 0.288 | 0.588 | 0.391 | 0.319 | 0.607 | 0.421 | 0.319 | 0.638 | 0.428 | | ConvE | 0.284 | 0.452 | 0.289 | 0.315 | 0.532 | 0.402 | 0.332 | 0.581 | 0.416 | | MtransE | 0.367 | 0.702 | 0.403 | 0.315 | 0.656 | 0.449 | 0.325 | 0.648 | 0.425 | | BootEA | 0.629 | 0.848 | 0.703 | 0.622 | 0.854 | 0.701 | 0.653 | 0.874 | 0.731 | | TransEdge | $\underline{0.653}$ | 0.907 | 0.745 | 0.638 | 0.867 | 0.712 | 0.639 | 0.851 | 0.738 | | CAEA | 0.603 | 0.788 | $\underline{0.755}$ | 0.613 | 0.886 | 0.691 | 0.561 | 0.867 | 0.711 | | GCN-Align | 0.417 | 0.753 | 0.538 | 0.422 | 0.779 | 0.558 | 0.403 | 0.761 | 0.548 | | NAEA | 0.651 | 0.867 | 0.720 | 0.641 | 0.873 | 0.718 | 0.673 | 0.874 | 0.752 | | GRGCN | 0.457 | 0.759 | 0.684 | 0.468 | 0.771 | 0.658 | 0.472 | 0.761 | 0.675 | | AliNet | 0.539 | 0.826 | 0.628 | 0.549 | 0.831 | 0.645 | 0.552 | 0.852 | 0.657 | | DvGNet | 0.534 | 0.844 | 0.638 | 0.538 | 0.863 | 0.651 | 0.557 | 0.881 | 0.668 | | MEEA | 0.768 | 0.896 | 0.835 | 0.753 | 0.909 | 0.854 | 0.783 | 0.912 | 0.863 |
Note: Bold are the best results, underlined are the second best results.
注:加粗为最佳结果,下划线为次佳结果。
Hits@1, Hits@10, and MRR metrics, respectively. As shown in Table 2, NAEA is a GNN-based model with performance metrics superior to those based on translation representation learning. This advantage stems from the ability of GNN to enhance entity representation by aggregating multi-hop neighborhood information. Compared with NAEA, our MEEA model improves by 0.113 , 0.034 0.113 , 0.034 0.113,0.0340.113,0.034, and 0.121 on Hits@1, Hits@10, and M R R M R R MRRM R R, respectively. This is due to the proposed multi-scale node aggregation strategy as well as the gating mechanism. In the node aggregation phase, MEEA improves the representation of long-tailed entities by enriching them with extensive neighbor information. Moreover, the incorporation of a gating mechanism allows for an effective integration of the hidden and output layers, preserving essential multi-hop neighbor information and filtering out noise in the output. These strategies markedly enhance the model’s accuracy and robustness in entity alignment tasks.
分别为 Hits@1、Hits@10 和 MRR 指标。如表 2 所示,NAEA 是一个基于 GNN 的模型,其性能指标优于基于翻译表示学习的模型。这一优势源于 GNN 通过聚合多跳邻居信息来增强实体表示的能力。与 NAEA 相比,我们的 MEEA 模型在 Hits@1、Hits@10 和 M R R M R R MRRM R R 上分别提升了 0.113 , 0.034 0.113 , 0.034 0.113,0.0340.113,0.034 和 0.121。这得益于所提出的多尺度节点聚合策略以及门控机制。在节点聚合阶段,MEEA 通过丰富大量邻居信息提升了长尾实体的表示能力。此外,门控机制的引入实现了隐藏层与输出层的有效整合,既保留了关键的多跳邻居信息,又过滤了输出中的噪声。这些策略显著提升了模型在实体对齐任务中的准确性和鲁棒性。

4.2.2. Ablation study4.2.2. 消融实验

To validate the efficacy of each component in MEEA, a series of ablation experiments were conducted in this study. These experiments aimed to dissect the individual contributions of the multi-scale graph convolution, gating mechanism, relation embedding, and attribute embedding by systematically removing them one at a time. As shown in Fig. 7, the results demonstrate that the removal of any module leads to a degradation in MEEA’s performance, thereby affirming the significant role of these modules in enhancing the overall model performance.
为了验证 MEEA 中各组件的有效性,本研究进行了系列消融实验。通过逐一系统地去除多尺度图卷积、门控机制、关系嵌入和属性嵌入,旨在剖析它们各自的贡献。如图 7 所示,结果表明,去除任何模块都会导致 MEEA 性能下降,从而确认这些模块在提升整体模型性能中的重要作用。
In this study, the multiscale graph convolution is first removed and replaced with a traditional GCN [41] called MAEEA (w/o MGCN). This variant resulted in an average drop of 0.051 and 0.047 in Hits@1 and
在本研究中,首先去除了多尺度图卷积,改用传统的 GCN[41],该变体称为 MAEEA(无多尺度图卷积,w/o MGCN)。该变体在 Hits@1 和指标上平均下降了 0.051 和 0.047。
Hits@10, respectively, across three datasets. These results indicate that multi-scale graph convolution effectively captures multi-hop neighborhood information, enhancing entity structure embedding performance. Next, we replaced the gating mechanism with a network output layer, named MEEA (w/o gate). This change led to an average decrease of 0.026 in Hits@1 and 0.017 in Hits@10, suggesting that the gating mechanism effectively mitigates noise introduced by multilayer convolution, proving its effectiveness in entity alignment. We then removed the relational embedding, resulting in MEEA (w/o rel), which showed an average decline of 0.035 in Hits@1 and 0.051 in Hits@10 across the datasets. This outcome demonstrates that the TransD model efficiently captures semantic relationships among entities and better differentiates entities with complex relationships. Finally, we eliminated the attribute embedding, resulting in MEEA (w/o att), causing an average drop of 0.072 in Hits@1 and 0.086 in Hits@10. This underscores the significance of attribute information in entity alignment. The proposed attribute embedding module, based on the BERT model and attention mechanism, substantially improves entity alignment performance.
在三个数据集上的 Hits@1 和 Hits@10 分别表现。这些结果表明,多尺度图卷积有效地捕捉了多跳邻域信息,提升了实体结构嵌入的性能。接下来,我们将门控机制替换为网络输出层,命名为 MEEA(无门控)。这一改变导致 Hits@1 平均下降 0.026,Hits@10 平均下降 0.017,表明门控机制有效减轻了多层卷积引入的噪声,验证了其在实体对齐中的有效性。随后,我们移除了关系嵌入,得到 MEEA(无关系),在各数据集上 Hits@1 平均下降 0.035,Hits@10 平均下降 0.051。该结果表明 TransD 模型能够高效捕捉实体间的语义关系,更好地区分复杂关系的实体。最后,我们去除了属性嵌入,得到 MEEA(无属性),导致 Hits@1 平均下降 0.072,Hits@10 平均下降 0.086。这凸显了属性信息在实体对齐中的重要性。 所提出的基于 BERT 模型和注意力机制的属性嵌入模块显著提升了实体对齐的性能。

4.2.3. Analysis of parameter sensitivity
4.2.3. 参数敏感性分析

To investigate how different embedding weight ratios affect entity alignment, we analyzed five weight combinations, labeled W1 to W5: W1 (0.5, 0.3, 0.2), W2 (0.5, 0.2, 0.3), W3 (0.6, 0.2, 0.2), W4 (0.7, 0.1, 0.2 ), and W5 ( 0.6 , 0.15 , 0.25 0.6 , 0.15 , 0.25 0.6,0.15,0.250.6,0.15,0.25 ). These weights correspond to structure loss L str L str  L_("str ")\mathscr{L}_{\text {str }}, relation loss L rel L rel  L_("rel ")\mathscr{L}_{\text {rel }}, and attribute loss L att L att  L_("att ")\mathscr{L}_{\text {att }}. The results of these experiments are shown in Table 3.
为了研究不同嵌入权重比例对实体对齐的影响,我们分析了五种权重组合,标记为 W1 到 W5:W1(0.5,0.3,0.2),W2(0.5,0.2,0.3),W3(0.6,0.2,0.2),W4(0.7,0.1,0.2),以及 W5( 0.6 , 0.15 , 0.25 0.6 , 0.15 , 0.25 0.6,0.15,0.250.6,0.15,0.25 )。这些权重分别对应结构损失 L str L str  L_("str ")\mathscr{L}_{\text {str }} 、关系损失 L rel L rel  L_("rel ")\mathscr{L}_{\text {rel }} 和属性损失 L att L att  L_("att ")\mathscr{L}_{\text {att }} 。实验结果如表 3 所示。
Table 3 shows the impact of varying weight combinations on MENA’s performance. Optimal performance is achieved with the weight
表 3 展示了不同权重组合对 MENA 性能的影响。最佳性能在权重组合下实现。

Fig. 7. Results of ablation experiments.
图 7. 消融实验结果。
Table 3表 3
Experimental results with different weight settings.
不同权重设置的实验结果。
Weight settings权重设置 DBP15K ZH_EN ZH_EN  _("ZH_EN "){ }_{\text {ZH_EN }} DBP15K JA_EN JA_EN  _("JA_EN "){ }_{\text {JA_EN }} DBP15K FR_EN FR_EN  _("FR_EN "){ }_{\text {FR_EN }}
Hits@1命中率@1 Hits@10命中率@10 Hits@1命中率@1 Hits@10命中率@10 Hits@1命中率@1 Hits@10命中率@10
W1: (0.5, 0.3, 0.2) 0.715 0.858 0.706 0.873 0.742 0.878
W2: (0.5, 0.2, 0.3) 0.723 0.868 0.715 0.882 0.754 0.891
W3: (0.6, 0.2, 0.2)
W3:(0.6,0.2,0.2)
0.754 0.887 0.737 0.895 0.785 0.903
W4: (0.7, 0.1, 0.2)
W4:(0.7,0.1,0.2)
0.739 0.879 0.742 0.742 _ 0.742 _\underline{0.742} 0.912 0.762 0.895
W5: (0.6, 0.15, 0.25)
W5:(0.6,0.15,0.25)
0.768 0.896 0.753 0.909 0.783 0.912
Weight settings DBP15K _("ZH_EN ") DBP15K _("JA_EN ") DBP15K _("FR_EN ") Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10 W1: (0.5, 0.3, 0.2) 0.715 0.858 0.706 0.873 0.742 0.878 W2: (0.5, 0.2, 0.3) 0.723 0.868 0.715 0.882 0.754 0.891 W3: (0.6, 0.2, 0.2) 0.754 0.887 0.737 0.895 0.785 0.903 W4: (0.7, 0.1, 0.2) 0.739 0.879 0.742 _ 0.912 0.762 0.895 W5: (0.6, 0.15, 0.25) 0.768 0.896 0.753 0.909 0.783 0.912| Weight settings | DBP15K ${ }_{\text {ZH_EN }}$ | | DBP15K ${ }_{\text {JA_EN }}$ | | DBP15K ${ }_{\text {FR_EN }}$ | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | | Hits@1 | Hits@10 | Hits@1 | Hits@10 | Hits@1 | Hits@10 | | W1: (0.5, 0.3, 0.2) | 0.715 | 0.858 | 0.706 | 0.873 | 0.742 | 0.878 | | W2: (0.5, 0.2, 0.3) | 0.723 | 0.868 | 0.715 | 0.882 | 0.754 | 0.891 | | W3: (0.6, 0.2, 0.2) | 0.754 | 0.887 | 0.737 | 0.895 | 0.785 | 0.903 | | W4: (0.7, 0.1, 0.2) | 0.739 | 0.879 | $\underline{0.742}$ | 0.912 | 0.762 | 0.895 | | W5: (0.6, 0.15, 0.25) | 0.768 | 0.896 | 0.753 | 0.909 | 0.783 | 0.912 |
Note: Bold are the best results, underlined are the second best results.
注:加粗为最佳结果,带下划线为次佳结果。

configuration W5: ( 0.6 , 0.15 , 0.25 ) ( 0.6 , 0.15 , 0.25 ) (0.6,0.15,0.25)(0.6,0.15,0.25). This superiority primarily stems from the wealth of structural and semantic information encapsulated within the structural embeddings. Typically, the knowledge graphs slated for alignment demonstrate a degree of similarity in both structure and semantics. Consequently, a higher allocation of weight to structural embeddings enhances the accuracy of entity alignment. A comparison of the experimental results of W4 and W5 shows that the higher weight of structural embedding is not always better, and its optimal setting is 0.6 . The entity alignment task should not be completely dependent on the structural information, and subgraphs that characterize the same facts may have some differences in their structure. The inclusion of relationship and attribute information concerning entities is pivotal for entity alignment. Furthermore, the comparison results between the weights W1 and W2, and W3 and W5, show that entity alignment demonstrates greater efficacy when the weight assigned to attribute embedding surpasses that designated for relationship embedding while maintaining a fixed weight for structure embedding. This shows that the rational use of relation and attribute information in a knowledge graph
配置 W5: ( 0.6 , 0.15 , 0.25 ) ( 0.6 , 0.15 , 0.25 ) (0.6,0.15,0.25)(0.6,0.15,0.25) 。这种优势主要源于结构嵌入中包含的大量结构和语义信息。通常,待对齐的知识图在结构和语义上表现出一定程度的相似性。因此,给予结构嵌入更高的权重能够提升实体对齐的准确性。W4 和 W5 的实验结果对比表明,结构嵌入权重越高并不总是越好,其最佳设置为 0.6。实体对齐任务不应完全依赖结构信息,描述相同事实的子图在结构上可能存在一些差异。实体相关的关系和属性信息的引入对于实体对齐至关重要。此外,权重 W1 与 W2 以及 W3 与 W5 的对比结果显示,在保持结构嵌入权重固定的情况下,当属性嵌入的权重高于关系嵌入的权重时,实体对齐的效果更佳。 这表明在知识图谱中合理利用关系和属性信息

can effectively improve the entity alignment effect. Consequently, in all subsequent experiments within this study, MEEA adopts the weight combination of W5: (0.6, 0.15, 0.25).
可以有效提升实体对齐效果。因此,在本研究后续的所有实验中,MEEA 采用权重组合 W5:(0.6, 0.15, 0.25)。
In examining the factors that influence the performance of entityaligned models, the ratio of pre-aligned seeds emerges as particularly crucial. Prior research has indicated that a model’s performance generally enhances with an increase in this ratio. However, the manual production of a substantial quantity of precise entity alignment seeds is both costly and labor-intensive. In this study, we varied the proportions of pre-aligned seeds from 10 % 10 % 10%10 \% to 50 % 50 % 50%50 \%, with incremental increases of 10 % 10 % 10%10 \%. For comparative analysis, we employed the knowledge graph embedding-based model CAEA [29], and the graph neural networkbased models GRGCN [37] and DvGNet [39].
在考察影响实体对齐模型性能的因素时,预对齐种子的比例显得尤为关键。先前研究表明,随着该比例的增加,模型性能通常会提升。然而,人工制作大量精确的实体对齐种子既昂贵又费时。在本研究中,我们将预对齐种子的比例从 10 % 10 % 10%10 \% 变动到 50 % 50 % 50%50 \% ,以 10 % 10 % 10%10 \% 为增量进行调整。为进行对比分析,我们采用了基于知识图谱嵌入的模型 CAEA [29],以及基于图神经网络的模型 GRGCN [37]和 DvGNet [39]。
Fig. 8 shows the variation of Hits@1 and Hits@10 for the four models on the three datasets. The results show that as the number of pre-aligned entity seeds increases, the models are better able to learn the contextual semantic information of the entities, thus improving the generalization ability of the models. MEEA evaluates entity similarity from three dimensions: structural, relational, and attributive, enabling it to capture the semantic connections between entities more comprehensively. Its performance substantially surpasses that of the three comparative models examined. With a 10 % 10 % 10%10 \% pre-aligned seed, MEEA achieved an average of 0.513 in Hits@1 and 0.708 in Hits@10 across three datasets. These results suggest that the multi-scale GNN approach more effectively captures essential features in the graph structure, thereby enhancing node feature aggregation. Additionally, the MEEA model was found to significantly outperform other models in terms of the Hits@1 metric, though its improvement on the Hits@10 metric was less pronounced. These findings suggest that the MEEA model is more effective at identifying the best match for each entity, likely due to the incorporation of a gating mechanism that enhances the accuracy of entity representations.
图 8 展示了四个模型在三个数据集上 Hits@1 和 Hits@10 的变化情况。结果表明,随着预对齐实体种子数量的增加,模型能够更好地学习实体的上下文语义信息,从而提升模型的泛化能力。MEEA 从结构、关系和属性三个维度评估实体相似性,使其能够更全面地捕捉实体之间的语义联系。其性能显著优于所比较的三种模型。在 10 % 10 % 10%10 \% 个预对齐种子下,MEEA 在三个数据集上的 Hits@1 平均达到 0.513,Hits@10 平均达到 0.708。这些结果表明,多尺度 GNN 方法更有效地捕捉图结构中的关键特征,从而增强了节点特征的聚合能力。此外,MEEA 模型在 Hits@1 指标上显著优于其他模型,尽管其在 Hits@10 指标上的提升不那么明显。 这些发现表明,MEEA 模型在识别每个实体的最佳匹配方面更有效,这可能归因于引入了门控机制,从而提高了实体表示的准确性。

Fig. 8. Experimental results with different pre-aligned seeds.
图 8. 使用不同预对齐种子的实验结果。

4.2.4. Comparative analysis of models on the DWY-NB dataset
4.2.4. DWY-NB 数据集上模型的比较分析

To comprehensively evaluate the performance of the proposed MEEA model, this study also conducts comparative experiments using the latest KG alignment benchmark dataset, DWY-NB [45]. DWY-NB comprises two sub-datasets, DW-NB and DY-NB, which are derived from DBpedia and Wikidata, and DBpedia and Yago, respectively. DW-NB includes 50,000 aligned entities, while DY-NB contains 15,000 . This dataset is characterized by its non-bijective nature, name variation, and large scale, with specific statistics provided in Table 4. During model training, 50 % 50 % 50%50 \% of the aligned entities are allocated to the training set, with the remainder used for testing. The MEEA model’s output vector dimension is set to 100, and a learning rate of 0.01 is applied using the AdaGrad optimizer for 1,000 training iterations. The MGCN network structure consists of two layers, and the MEEA hyperparameter margin is set to λ str = 2.0 , λ rel = 1.5 λ str  = 2.0 , λ rel  = 1.5 lambda_("str ")=2.0,lambda_("rel ")=1.5\lambda_{\text {str }}=2.0, \lambda_{\text {rel }}=1.5 and λ attr = 2.0 λ attr  = 2.0 lambda_("attr ")=2.0\lambda_{\text {attr }}=2.0.
为了全面评估所提出的 MEEA 模型的性能,本研究还使用最新的 KG 对齐基准数据集 DWY-NB [45]进行了对比实验。DWY-NB 包含两个子数据集,DW-NB 和 DY-NB,分别来源于 DBpedia 与 Wikidata,以及 DBpedia 与 Yago。DW-NB 包含 50,000 个对齐实体,而 DY-NB 包含 15,000 个。该数据集的特点是非双射性、名称变异和大规模,具体统计数据见表 4。在模型训练过程中, 50 % 50 % 50%50 \% 的对齐实体被分配到训练集,其余用于测试。MEEA 模型的输出向量维度设置为 100,采用 AdaGrad 优化器,学习率为 0.01,训练迭代次数为 1,000 次。MGCN 网络结构由两层组成,MEEA 超参数 margin 设置为 λ str = 2.0 , λ rel = 1.5 λ str  = 2.0 , λ rel  = 1.5 lambda_("str ")=2.0,lambda_("rel ")=1.5\lambda_{\text {str }}=2.0, \lambda_{\text {rel }}=1.5 λ attr = 2.0 λ attr  = 2.0 lambda_("attr ")=2.0\lambda_{\text {attr }}=2.0
In this study, several state-of-the-art (SOTA) methods are selected for comparison, each of which learns entity embeddings from structural, relational, or attribute perspectives. The methods include AutoAlign [46], MultiKE [47], AttrE [48], and MRAEA [49]. AutoAlign employs attribute character embeddings and predicate neighborhood graph embeddings, supported by a large language model, to compute a unified vector space for entity and predicate embeddings from two knowledge graphs. Notably, AutoAlign does not require pre-aligned seeds. MultiKE integrates entity embeddings by combining name, relation, and attribute perspectives, and uses cross-graph inference to improve alignment between knowledge graphs. AttrE merges structural and attribute character embeddings while applying transfer rules to enrich entity attributes, facilitating alignment across knowledge graphs. MRAEA models cross-linguistic entity embeddings by focusing on node adjacencies and the meta-semantics of their connectivity, introducing new alignment seeds during training via a bi-directional iterative strategy.
在本研究中,选取了多种最先进(SOTA)方法进行比较,这些方法分别从结构、关系或属性视角学习实体嵌入。所选方法包括 AutoAlign [46]、MultiKE [47]、AttrE [48]和 MRAEA [49]。AutoAlign 采用属性字符嵌入和谓词邻域图嵌入,借助大型语言模型,计算来自两个知识图谱的实体和谓词嵌入的统一向量空间。值得注意的是,AutoAlign 不需要预先对齐的种子。MultiKE 通过结合名称、关系和属性视角整合实体嵌入,并利用跨图推理提升知识图谱间的对齐效果。AttrE 融合结构和属性字符嵌入,同时应用迁移规则丰富实体属性,促进知识图谱间的对齐。MRAEA 通过关注节点邻接关系及其连接的元语义,建模跨语言实体嵌入,并通过双向迭代策略在训练过程中引入新的对齐种子。
Table 5 presents a performance comparison between MEEA and several state-of-the-art methods on the DWY-NB dataset. The results demonstrate that MEEA surpasses other models across other datasets. This indicates that MEEA effectively captures the structural, relational, and attribute multidimensional features of entities through multi-view embedding learning, which significantly improves the alignment accuracy. While AutoAlign is the best-performing baseline model, its performance, although slightly lower than MEEA’s, still shows strong alignment capabilities. This highlights the effectiveness of its attribute character embedding, enhanced by large-scale language models, in the alignment task. MultiKE and AttrE, despite integrating information from different perspectives, perform worse than MEEA, likely due to their inability to fully capture the complex interactions among entities during fusion. MRAEA, which relies heavily on adjacency and meta-semantic modeling, performs poorly in cross-language embedding tasks, especially in terms of Hits@1. In summary, MEEA’s strengths lie in its effective integration of structural, relational, and attribute semantic information, enhanced structural embedding through multi-scale GNN and gating mechanisms, and improved relation and attribute embeddings via the combination of TransD and BERT models. These enhancements result in superior accuracy and robustness in complex entity alignment tasks.
表 5 展示了 MEEA 与多种最先进方法在 DWY-NB 数据集上的性能比较。结果表明,MEEA 在其他数据集上也优于其他模型。这表明 MEEA 通过多视角嵌入学习,有效捕捉了实体的结构、关系和属性多维特征,显著提升了对齐准确率。虽然 AutoAlign 是表现最好的基线模型,其性能虽略低于 MEEA,但仍显示出强大的对齐能力。这凸显了其属性字符嵌入在大规模语言模型增强下,在对齐任务中的有效性。MultiKE 和 AttrE 尽管整合了不同视角的信息,但表现不及 MEEA,可能是因为它们无法充分捕捉实体融合过程中的复杂交互。MRAEA 高度依赖邻接和元语义建模,在跨语言嵌入任务中表现较差,尤其是在 Hits@1 指标上。 总之,MEEA 的优势在于其有效整合了结构、关系和属性语义信息,通过多尺度 GNN 和门控机制增强了结构嵌入,并通过结合 TransD 和 BERT 模型改进了关系和属性嵌入。这些改进使其在复杂实体对齐任务中表现出更高的准确性和鲁棒性。
Table 4表 4
The statistics of the DWY-NB dataset.
DWY-NB 数据集的统计信息。
Subset子集 Unique entities唯一实体 Predicates谓词 Relationship triples关系三元组 Attribute triples属性三元组
DW-NB
DBpedia 84,911 545 203,502 221,591
Wikidata 86,116 703 198,797 223,232
DY-NB
DBpedia 58,858 211 87,676 173,520
Yago 60,228 91 66,546 186,328
Subset Unique entities Predicates Relationship triples Attribute triples DW-NB DBpedia 84,911 545 203,502 221,591 Wikidata 86,116 703 198,797 223,232 DY-NB DBpedia 58,858 211 87,676 173,520 Yago 60,228 91 66,546 186,328| Subset | Unique entities | Predicates | Relationship triples | Attribute triples | | :--- | :--- | :--- | :--- | :--- | | DW-NB | | | | | | DBpedia | 84,911 | 545 | 203,502 | 221,591 | | Wikidata | 86,116 | 703 | 198,797 | 223,232 | | DY-NB | | | | | | DBpedia | 58,858 | 211 | 87,676 | 173,520 | | Yago | 60,228 | 91 | 66,546 | 186,328 |
Table 5表 5
Model performance comparison on DWY-NB dataset.
DWY-NB 数据集上的模型性能比较。
Models模型 DW-NB DY-NB
Hits@1命中率@1 Hits@10命中率@10 Hits@1命中率@1 Hits@10命中率@10
AutoAlign 0.887 0.887 _ 0.887 _\underline{0.887} 0.969 0.913 0.913 _ 0.913 _\underline{0.913} 0.956 0.956 _ 0.956 _\underline{0.956}
MultiKe 0.852 0.951 0.893 0.936
AttrE 0.880 0.958 0.904 0.942
MRAEA 0.841 0.876 0.762 0.802
MEEA 0.893 0.976 0.921 0.964
Models DW-NB DY-NB Hits@1 Hits@10 Hits@1 Hits@10 AutoAlign 0.887 _ 0.969 0.913 _ 0.956 _ MultiKe 0.852 0.951 0.893 0.936 AttrE 0.880 0.958 0.904 0.942 MRAEA 0.841 0.876 0.762 0.802 MEEA 0.893 0.976 0.921 0.964| Models | DW-NB | DY-NB | | | | :--- | :--- | :--- | :--- | :--- | | | Hits@1 | Hits@10 | Hits@1 | Hits@10 | | AutoAlign | $\underline{0.887}$ | 0.969 | $\underline{0.913}$ | $\underline{0.956}$ | | MultiKe | 0.852 | 0.951 | 0.893 | 0.936 | | AttrE | 0.880 | 0.958 | 0.904 | 0.942 | | MRAEA | 0.841 | 0.876 | 0.762 | 0.802 | | MEEA | 0.893 | 0.976 | 0.921 | 0.964 |
Note: Bold are the best results, underlined are the second best results.
注:加粗为最佳结果,带下划线为次佳结果。

4.3. Case study4.3. 案例研究

Knowledge fusion seeks to align new data effectively with preexisting datasets within the framework of an established AMKG. The primary challenge in this endeavor is entity alignment, which involves the precise integration of identical nodes to merge heterogeneous aeronautical metrology knowledge graphs from diverse sources. This study verifies the effectiveness of MEEA on a real AMKG through a case study and explores the potential application value of entity alignment technology in the field of aviation metrology. We obtained some original materials from an aeronautical metrology institute, including metrological traceability manuals, product design specifications, and relevant documents of metrology standard instruments. Based on our previously proposed method [7], an AMKG containing 5835 entities, 13,651 relationship triples, and 26,552 attribute triples was constructed. To simulate the incomplete and inconsistent nature of real-world aeronautical metrology data, we generated the AMKG align AMKG align  AMKG_("align ")\mathrm{AMKG}_{\text {align }} dataset for alignment in this study. Derived from the original AMKG, the AMKGalign underwent two primary modifications. First, 20 % 20 % 20%20 \% of its relational edges were randomly removed. Second, string noise was introduced to 40 % 40 % 40%40 \% of the attribute values. This noise, implemented through synonym vocabulary substitution, aims to enhance the authenticity and intricacy of the dataset, thus more effectively capturing the complexities inherent in real-world aeronautical metrology data for practical applications.
知识融合旨在在既有航空计量知识图谱(AMKG)框架内,有效地将新数据与已有数据集对齐。该过程中主要的挑战是实体对齐,即精确整合相同节点,以融合来自不同来源的异构航空计量知识图谱。本研究通过案例分析验证了 MEEA 在真实 AMKG 上的有效性,并探讨了实体对齐技术在航空计量领域的潜在应用价值。我们从一家航空计量研究所获得了一些原始资料,包括计量溯源手册、产品设计规范以及计量标准仪器的相关文件。基于我们之前提出的方法[7],构建了一个包含 5835 个实体、13651 个关系三元组和 26552 个属性三元组的 AMKG。为了模拟现实航空计量数据的不完整性和不一致性,本研究生成了用于对齐的 AMKG align AMKG align  AMKG_("align ")\mathrm{AMKG}_{\text {align }} 数据集。该数据集源自原始 AMKG,AMKGalign 经历了两项主要修改。首先,其关系边的 20 % 20 % 20%20 \% 被随机移除。 其次,对属性值的 40 % 40 % 40%40 \% 引入了字符串噪声。通过同义词词汇替换实现的这种噪声,旨在增强数据集的真实性和复杂性,从而更有效地捕捉实际应用中航空计量数据固有的复杂性。
To comprehensively assess the effectiveness and superiority of the proposed MEEA on the AMKG dataset, this paper conducts comparative experiments with baseline models. Due to the unavailability of source code for certain models, which leads to poor performance or nonreproducibility, we selected nine SOTA models that have demonstrated exceptional performance on the benchmark dataset for evaluation. These models are divided into four categories, each outperforming other baseline models in the AMKG entity alignment task. Specifically, TransEdge [26] and RAGA [50] focus on relational embedding, NAEA [32] and AliNet [38] emphasize structural embedding, CAEA [29], SDEA [51], and BERT-INT [52] emphasize attribute embedding, and finally, AutoAlign [46] and UPLR [53] do not rely on pre-aligned entity seeds. TransEdge is designed for relational context embedding, NAEA is a neighborhood-aware attention representation model, AliNet is a gated multi-hop neighborhood aggregation model, and CAEA specializes in entity alignment within the civil aviation domain. Detailed descriptions of these models can be found in the related work. RAGA, based on the Relational Awareness Graph Attention Network, employs a global alignment algorithm to enhance entity alignment. SDEA leverages semantic information in attribute values and implicit associations of neighboring entities for alignment. BERT-INT focuses on neighborhood information, particularly the names and attributes of the current entity and its neighbors, by calculating their interactions to achieve alignment. AutoAlign leverages large-scale language models to align entities and predicates between knowledge graphs without relying on manually generated seed alignments. The UPLR framework learns high-quality entity embeddings from pseudo-labeled datasets with noisy data, eliminating the need for manual labeling. Unlike AutoAlign and UPLR, other methods use 30 % 30 % 30%30 \% pre-aligned entity pairs. The experiments are evaluated using Hits@1, Hits@5, and Hits@10 metrics. Table 6 presents the entity alignment results for each model on AMKG, showing that the
为了全面评估所提出的 MEEA 在 AMKG 数据集上的有效性和优越性,本文与基线模型进行了对比实验。由于部分模型的源码不可用,导致性能较差或无法复现,我们选择了九个在基准数据集上表现出色的 SOTA 模型进行评估。这些模型分为四类,每类在 AMKG 实体对齐任务中均优于其他基线模型。具体而言,TransEdge [26]和 RAGA [50]侧重于关系嵌入,NAEA [32]和 AliNet [38]强调结构嵌入,CAEA [29]、SDEA [51]和 BERT-INT [52]强调属性嵌入,最后,AutoAlign [46]和 UPLR [53]不依赖预对齐的实体种子。TransEdge 设计用于关系上下文嵌入,NAEA 是一种邻域感知注意力表示模型,AliNet 是一个门控多跳邻域聚合模型,CAEA 专注于民航领域的实体对齐。关于这些模型的详细描述可参见相关工作。 RAGA 基于关系感知图注意力网络,采用全局对齐算法来增强实体对齐。SDEA 利用属性值中的语义信息和邻近实体的隐式关联进行对齐。BERT-INT 关注邻域信息,特别是当前实体及其邻居的名称和属性,通过计算它们的交互实现对齐。AutoAlign 利用大规模语言模型在知识图谱之间对齐实体和谓词,无需依赖手工生成的种子对齐。UPLR 框架从带有噪声数据的伪标注数据集中学习高质量的实体嵌入,消除了手工标注的需求。与 AutoAlign 和 UPLR 不同,其他方法使用 30 % 30 % 30%30 \% 预先对齐的实体对。实验通过 Hits@1、Hits@5 和 Hits@10 指标进行评估。表 6 展示了各模型在 AMKG 上的实体对齐结果,显示出
Table 6表 6
Entity alignment results on AMKGs.
AMKG 上的实体对齐结果。
Models模型 Hits@1命中率@1 Hits@5 Hits@10命中率@10
TransEdge 0.589 0.749 0.835
RAGA 0.634 0.769 0.856
NAEA 0.652 0.763 0.834
AliNet 0.623 0.794 0.866
CAEA 0.713 0.845 0.901
SDEA 0.715 0.834 0.914
BERT-INT 0.737 0.878 0.931
AutoAlign 0.752 0.752 _ 0.752 _\underline{0.752} 0.880 0.880 _ 0.880 _\underline{0.880} 0.917
UPLR 0.708 0.793 0.881
MEEA 0.764 0.881 0.923
Models Hits@1 Hits@5 Hits@10 TransEdge 0.589 0.749 0.835 RAGA 0.634 0.769 0.856 NAEA 0.652 0.763 0.834 AliNet 0.623 0.794 0.866 CAEA 0.713 0.845 0.901 SDEA 0.715 0.834 0.914 BERT-INT 0.737 0.878 0.931 AutoAlign 0.752 _ 0.880 _ 0.917 UPLR 0.708 0.793 0.881 MEEA 0.764 0.881 0.923| Models | Hits@1 | Hits@5 | Hits@10 | | :--- | :--- | :--- | :--- | | TransEdge | 0.589 | 0.749 | 0.835 | | RAGA | 0.634 | 0.769 | 0.856 | | NAEA | 0.652 | 0.763 | 0.834 | | AliNet | 0.623 | 0.794 | 0.866 | | CAEA | 0.713 | 0.845 | 0.901 | | SDEA | 0.715 | 0.834 | 0.914 | | BERT-INT | 0.737 | 0.878 | 0.931 | | AutoAlign | $\underline{0.752}$ | $\underline{0.880}$ | 0.917 | | UPLR | 0.708 | 0.793 | 0.881 | | MEEA | 0.764 | 0.881 | 0.923 |
Note: Bold are the best results, underlined are the second best results.
注:加粗为最佳结果,带下划线为次佳结果。

proposed MEEA outperforms the comparison models, demonstrating its effectiveness in integrating entity information from various perspectives. TransEdge functions as a translation mechanism between entity embeddings via contextual relationship representations, while RAGA incorporates relationship information into entities through a selfattentive mechanism. However, models that consider only relational embeddings, like TransEdge and RAGA, do not perform well in aeronautical metrology entity alignment tasks. Although NAEA and AliNet aggregate neighboring node features and learn graph structure representations using attention mechanisms, their performance improvements are still limited. This indicates that solely relying on structural information is insufficient for aeronautical metrology. Our findings show that integrating attribute embeddings of entities in an attributerich AMKG significantly enhances entity alignment performance, as evidenced by the results from CAEA, SDEA, and BERT-INT. These models utilize pre-trained models to capture semantic information from attributes, integrating interactions of relations, attributes, and more to achieve alignment. This underscores the necessity of incorporating attribute embeddings in the proposed MEEA, which comprehensively considers structural, relational, and attribute embeddings, demonstrating superior performance in all SOTA model comparison experiments. Additionally, AutoAlign and UPLR achieve Hits@10 scores of 0.917 and 0.881 , respectively, which are 0.006 and 0.042 lower than the proposed MEEA, without requiring pre-aligned seeds. This disparity can be attributed to the abundance of homonymous entities in AMKG. Prealigned seeds give the model a precise set of initially aligned entities, enabling it to better differentiate entity features during training. Moreover, the 1 N , N 1 1 N , N 1 1-N,N-11-\mathrm{N}, \mathrm{N}-1, and N N N N N-N\mathrm{N}-\mathrm{N} relationships in AMKG further complicate alignment. These pre-aligned seeds, serving as positive examples, not only offer the model correct alignment references but also enhance its ability to learn intricate relationships. In contrast, noisy data from aeronautical metrology can hinder the seedless approach, while prealigned seeds provide cleaner data, making it more suitable for AMKG’s entity alignment tasks, especially when dealing with homonymous entities and complex relationships.
所提出的 MEEA 优于对比模型,展示了其从多角度整合实体信息的有效性。TransEdge 通过上下文关系表示作为实体嵌入之间的转换机制,而 RAGA 通过自注意力机制将关系信息融入实体中。然而,仅考虑关系嵌入的模型,如 TransEdge 和 RAGA,在航空计量实体对齐任务中表现不佳。尽管 NAEA 和 AliNet 通过注意力机制聚合邻居节点特征并学习图结构表示,但其性能提升仍然有限。这表明仅依赖结构信息对于航空计量来说是不足够的。我们的研究发现,在属性丰富的 AMKG 中整合实体的属性嵌入显著提升了实体对齐性能,CAEA、SDEA 和 BERT-INT 的结果证明了这一点。这些模型利用预训练模型捕捉属性的语义信息,整合关系、属性等的交互以实现对齐。 这凸显了在所提出的 MEEA 中引入属性嵌入的必要性,该方法全面考虑了结构、关系和属性嵌入,在所有 SOTA 模型对比实验中表现出优越的性能。此外,AutoAlign 和 UPLR 分别实现了 0.917 和 0.881 的 Hits@10 分数,分别比所提出的 MEEA 低 0.006 和 0.042,且不需要预先对齐的种子。这一差异可归因于 AMKG 中大量的同名实体。预先对齐的种子为模型提供了一组精确的初始对齐实体,使其在训练过程中能够更好地区分实体特征。此外,AMKG 中的 1 N , N 1 1 N , N 1 1-N,N-11-\mathrm{N}, \mathrm{N}-1 N N N N N-N\mathrm{N}-\mathrm{N} 关系进一步增加了对齐的复杂性。这些预先对齐的种子作为正样本,不仅为模型提供了正确的对齐参考,还增强了其学习复杂关系的能力。相比之下,航空计量中的噪声数据可能会阻碍无种子方法的效果,而预先对齐的种子提供了更干净的数据,使其更适合 AMKG 的实体对齐任务,尤其是在处理同名实体和复杂关系时。
In aeronautical metrology, AMKG serves as a robust knowledge base that enables engineers to efficiently conduct metrology across the entire lifecycle of aeronautical products. The proposed method for entity alignment enhances the comprehensiveness of AMKG, thereby providing precise support for knowledge-based applications. Enriching AMKG also facilitates the development of prototype systems for aeronautical metrology, supporting diverse engineering applications. Entity alignment technology standardizes and unifies the expression of heterogeneous metrological data from various sources, ensuring data consistency across departments and establishing a reliable foundation for accurately measuring aeronautical products. Engineers can quickly access essential information by querying AMKG, significantly reducing design iteration cycles for new aeronautical products. Moreover, in the aviation manufacturing industry, adherence to domestic and international metrological traceability standards is stringent. Knowledge fusion technology effectively integrates these traceability systems, employing
在航空计量学中,AMKG 作为一个强大的知识库,使工程师能够高效地在航空产品的整个生命周期内进行计量。所提出的实体对齐方法增强了 AMKG 的全面性,从而为基于知识的应用提供了精准支持。丰富 AMKG 还促进了航空计量原型系统的开发,支持多样化的工程应用。实体对齐技术规范并统一了来自不同来源的异构计量数据的表达,确保了跨部门的数据一致性,为准确测量航空产品奠定了可靠基础。工程师通过查询 AMKG 可以快速获取关键信息,显著缩短新航空产品的设计迭代周期。此外,在航空制造业中,严格遵守国内外计量溯源标准。知识融合技术有效整合了这些溯源系统,采用

intelligent matching and reasoning capabilities to help engineers establish a comprehensive metrological traceability chain for aeronautical products, thereby enhancing both product quality and measurability.
智能匹配和推理能力,帮助工程师建立航空产品的全面计量溯源链,从而提升产品质量和可测量性。

5. Conclusion5. 结论

Multi-source heterogeneous data in aeronautical metrology suffer from semantic ambiguity and redundancy, often leading to isolated knowledge clusters within the knowledge graph that hinder the effective sharing and reuse of knowledge. To address this challenge, our study presents an innovative entity alignment model based on multiperspective embedding. This model derives entity embeddings from three distinct perspectives: structural, relational, and attributive. By computing weighted averages of entity embeddings from each perspective, we identify the target entity most similar to the source entity. This approach facilitates entity alignment, thereby enhancing the quality of the aeronautical metrology knowledge graph. In performance evaluation, our model demonstrates superior performance compared to existing entity alignment methods utilizing knowledge graph embedding and graph neural networks across five benchmark datasets and a specialized dataset in aeronautical metrology. Furthermore, ablation experiments validate the effectiveness of each design module. Future research will explore diverse applications of aeronautical metrology knowledge graphs to better support aeronautical product design and metrology processes.
航空计量中的多源异构数据存在语义歧义和冗余问题,常导致知识图谱中出现孤立的知识簇,阻碍知识的有效共享与复用。为解决该挑战,本研究提出了一种基于多视角嵌入的创新实体对齐模型。该模型从结构、关系和属性三个不同视角提取实体嵌入,通过计算各视角实体嵌入的加权平均值,识别与源实体最相似的目标实体。此方法促进了实体对齐,提升了航空计量知识图谱的质量。在性能评估中,本模型在五个基准数据集及航空计量专用数据集上,相较于现有基于知识图谱嵌入和图神经网络的实体对齐方法表现出更优的性能。此外,消融实验验证了各设计模块的有效性。 未来的研究将探索航空计量知识图谱的多样化应用,以更好地支持航空产品设计和计量过程。

CRediT authorship contribution statement
CRediT 作者贡献声明

Shengjie Kong: Writing - review & editing, Writing - original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Xiang Huang: Writing review & editing, Supervision, Resources, Project administration, Funding acquisition. Shuanggao Li: Supervision, Project administration. Gen Li: Validation, Software, Investigation. Dong Zhang: Visualization, Software.
孔胜杰:撰写 - 审阅与编辑,撰写 - 原始稿件,视觉化,验证,软件,方法学,调查,正式分析,数据管理,概念化。黄翔:撰写审阅与编辑,监督,资源,项目管理,资金获取。李双高:监督,项目管理。李根:验证,软件,调查。张东:视觉化,软件。

Declaration of competing interest
利益冲突声明

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
作者声明他们没有已知的竞争性财务利益或个人关系,这些可能被认为影响了本文所报告的工作。

Acknowledgments致谢

The authors gratefully acknowledge the insightful comments by the editors and the anonymous referees.
作者衷心感谢编辑和匿名审稿人的宝贵意见。

Data availability数据可用性

The data that has been used is confidential.
所使用的数据为机密。

References参考文献

[1] Y. Wang, Y. Liu, H. Chen, et al., Combined measurement based wing-fuselage assembly coordination via multiconstraint optimization[J], IEEE Transactions on Instrumentation and Measurement 71 (2022) 1-16.
[1] Y. Wang, Y. Liu, H. Chen, 等,基于多约束优化的机翼-机身装配协调联合测量[J],IEEE 仪器与测量学报,71 (2022) 1-16。

[2] Y. Gai, J. Zhang, J. Guo, et al., Construction and uncertainty evaluation of largescale measurement system of laser trackers in aircraft assembly[J], Measurement 165 (2020) 108144.
[2] Y. Gai, J. Zhang, J. Guo, 等,飞机装配中激光跟踪仪大型测量系统的构建及不确定度评估[J],Measurement,165 (2020) 108144。

[3] K. Miličević, L. Omrčen, M. Kohler, et al., Trust model concept for IoT blockchain applications as part of the digital transformation of metrology[J], Sensors 22 (13) (2022) 4708.
[3] K. Miličević, L. Omrčen, M. Kohler 等,作为计量数字化转型一部分的物联网区块链应用信任模型概念[J],《Sensors》22(13) (2022) 4708。

[4] N. Takegawa, N. Furuichi, Traceability Management System Using Blockchain Technology and Cost Estimation in the Metrology Field[J], Sensors 23 (3) (2023) 1673.
[4] 竹川直树,古市直,基于区块链技术的可追溯性管理系统及计量领域的成本估算[J],《传感器》23(3)(2023)1673。

[5] J. Li, Y. Horiguchi, T. Sawaragi, Counterfactual inference to predict causal knowledge graph for relational transfer learning by assimilating expert
[5] J. Li, Y. Horiguchi, T. Sawaragi,通过同化专家进行关系迁移学习的因果知识图预测的反事实推断

knowledge-Relational feature transfer learning algorithm[J], Advanced Engineering Informatics 51 (2022) 101516.
知识-关系特征迁移学习算法[J],高级工程信息学 51 (2022) 101516。

[6] P. Westphal, T. Grubenmann, D. Collarana, et al., Spatial concept learning and inference on geospatial polygon data[J], Knowledge-Based Systems 241 (2022) 108233.
[6] P. Westphal, T. Grubenmann, D. Collarana 等,基于地理空间多边形数据的空间概念学习与推理[J],《知识基础系统》241 (2022) 108233。

[7] S. Kong, X. Huang, X. Zhong, et al., Entity recognition method for airborne products metrological traceability knowledge graph construction[J], Measurement 225 (2024) 114032.
[7] S. Kong, X. Huang, X. Zhong 等,面向航空产品计量溯源知识图谱构建的实体识别方法[J],《测量》225 (2024) 114032。

[8] A. Smirnov, T. Levashova, Knowledge fusion patterns: A survey[J], Information Fusion 52 (2019) 31-40.
[8] A. Smirnov, T. Levashova,知识融合模式综述[J],《信息融合》52 (2019) 31-40。

[9] X. Shen, X. Li, B. Zhou, et al., Dynamic knowledge modeling and fusion method for custom apparel production process based on knowledge graph[J], Advanced Engineering Informatics 55 (2023) 101880.
[9] X. Shen, X. Li, B. Zhou 等,基于知识图谱的定制服装生产过程动态知识建模与融合方法[J],《先进工程信息学》55 (2023) 101880。

[10] J. Li, D. Song, H. Wang, et al., Entity alignment for temporal knowledge graphs via adaptive graph networks[J], Knowledge-Based Systems 274 (2023) 110631.
[10] J. Li, D. Song, H. Wang 等,基于自适应图网络的时序知识图谱实体对齐[J],《知识基系统》274 (2023) 110631。

[11] B. Zhou, B. Hua, X. Gu, et al., An end-to-end tabular information-oriented causality event evolutionary knowledge graph for manufacturing documents[J], Advanced Engineering Informatics 50 (2021) 101441.
[11] B. Zhou, B. Hua, X. Gu 等,面向制造文档的端到端表格信息因果事件演化知识图谱[J],《先进工程信息学》50 (2021) 101441。

[12] L. Bai, N. Li, G. Li, et al., Embedding-Based Entity Alignment of Cross-Lingual Temporal Knowledge Graphs[J], Neural Networks 172 (2024) 106143.
[12] L. Bai, N. Li, G. Li 等,基于嵌入的跨语言时序知识图谱实体对齐[J],《神经网络》172 (2024) 106143。

[13] H. Huang, C. Li, X. Peng, et al., Cross-knowledge-graph entity alignment via relation prediction[J], Knowledge-Based Systems 240 (2022) 107813.
[13] H. Huang, C. Li, X. Peng 等,通过关系预测实现跨知识图谱实体对齐[J],《知识基系统》240 (2022) 107813。

[14] B. Zhu, T. Bao, L. Liu, et al., Cross-lingual knowledge graph entity alignment based on relation awareness and attribute involvement[J], Applied Intelligence 53 (6) (2023) 6159-6177.
[14] B. Zhu, T. Bao, L. Liu 等,基于关系感知和属性参与的跨语言知识图谱实体对齐[J],应用智能,53(6) (2023) 6159-6177。

[15] E.S. Ristad, P.N. Yianilos, Learning string-edit distance[J], IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (5) (1998) 522-532.
[15] E.S. Ristad, P.N. Yianilos,学习字符串编辑距离[J],IEEE 模式分析与机器智能汇刊,20(5) (1998) 522-532。

[16] Pershina M, Yakout M, Chakrabarti K. Holistic entity matching across knowledge graphs[C]//2015 IEEE International Conference on Big Data (Big Data). IEEE, 2015: 1585-1590.
[16] Pershina M, Yakout M, Chakrabarti K. 跨知识图谱的整体实体匹配[C]//2015 IEEE 国际大数据会议(Big Data)。IEEE, 2015: 1585-1590。

[17] Scharffe F, Liu Y, Zhou C. Rdf-ai: an architecture for rdf datasets matching, fusion and interlink[C]//Proc. IJCAI 2009 workshop on Identity, reference, and knowledge representation (IR-KR), Pasadena (CA US). 2009: 23.
[17] Scharffe F, Liu Y, Zhou C. RDF-AI:用于 RDF 数据集匹配、融合和互联的架构[C]//IJCAI 2009 身份、引用与知识表示(IR-KR)研讨会论文集,美国加州帕萨迪纳,2009: 23。

[18] E. Rivas, S.R. Eddy, A dynamic programming algorithm for RNA structure prediction including pseudoknots[J], Journal of Molecular Biology 285 (5) (1999) 2053-2068.
[18] E. Rivas, S.R. Eddy, 一种包含伪结的 RNA 结构预测动态规划算法[J], 分子生物学杂志 285 (5) (1999) 2053-2068.

[19] T.K. Moon, The expectation-maximization algorithm[J], IEEE Signal Processing Magazine 13 (6) (1996) 47-60.
[19] T.K. Moon, 期望最大化算法[J], IEEE 信号处理杂志 13 (6) (1996) 47-60.

[20] A. Bordes, N. Usunier, A. Garcia-Duran, et al., Translating embeddings for modeling multi-relational data[J], Advances in Neural Information Processing Systems 26 (2013).
[20] A. Bordes, N. Usunier, A. Garcia-Duran 等, 用于多关系数据建模的翻译嵌入[J], 神经信息处理系统进展 26 (2013).

[21] Sun Z, Deng Z H, Nie J Y, et al. Rotate: Knowledge graph embedding by relational rotation in complex space[J]. arXiv preprint arXiv:1902.10197, 2019.
[21] Sun Z, Deng Z H, Nie J Y 等, Rotate: 通过复数空间中的关系旋转进行知识图谱嵌入[J]. arXiv 预印本 arXiv:1902.10197, 2019.

[22] Z. Zhang, J. Cai, Y. Zhang, et al., Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction[c]//proceedings of the AAAI Conference on Artificial Intelligence. 34 (03) (2020) 3065-3072.
[22] Z. Zhang, J. Cai, Y. Zhang 等,面向链接预测的层次感知知识图谱嵌入学习[c]//第 34 届 AAAI 人工智能会议论文集(03) (2020) 3065-3072。

[23] T. Dettmers, P. Minervini, P. Stenetorp, et al., Convolutional 2d Knowledge Graph Embeddings[c]//proceedings of the AAAI Conference on Artificial Intelligence. 32 (1) (2018).
[23] T. Dettmers, P. Minervini, P. Stenetorp 等,卷积二维知识图谱嵌入[c]//第 32 届 AAAI 人工智能会议论文集(1) (2018)。

[24] Chen M, Tian Y, Yang M, et al. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment[J]. arXiv preprint arXiv:1611.03954, 2016.
[24] Chen M, Tian Y, Yang M 等,多语言知识图谱嵌入用于跨语言知识对齐[J]. arXiv 预印本 arXiv:1611.03954, 2016。

[25] Z. Sun, W. Hu, Q. Zhang, et al., Bootstrapping Entity Alignment with Knowledge Graph embedding[C]//IJCAI. 18 (2018) (2018).
[25] Z. Sun, W. Hu, Q. Zhang 等,基于知识图谱嵌入的实体对齐自举方法[C]//IJCAI. 18 (2018) (2018)。

[26] Sun Z, Huang J, Hu W, et al. Transedge: Translating relation-contextualized embeddings for knowledge graphs[C]//The Semantic Web-ISWC 2019: 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part I 18. Springer International Publishing, 2019: 612-629.
[26] 孙卓,黄健,胡伟,等. Transedge:用于知识图谱的关系上下文化嵌入翻译[C]//语义网-ISWC 2019:第 18 届国际语义网会议,2019 年 10 月 26-30 日,新西兰奥克兰,论文集,第一部分 18. 施普林格国际出版,2019:612-629.

[27] B. Zhu, R. Wang, J. Wang, et al., A survey: knowledge graph entity alignment research based on graph embedding[J], Artificial Intelligence Review 57 (9) (2024) 1-58.
[27] 朱斌,王锐,王军,等. 调查:基于图嵌入的知识图谱实体对齐研究[J]. 人工智能综述, 57(9) (2024) 1-58.

[28] X. Tian, Z. Sun, W. Hu, Generating Explanations to Understand and Repair Embedding-based Entity Alignment[C]//2024, in: IEEE 40th International Conference on Data Engineering (ICDE), 2024, pp. 2205-2217.
[28] 田晓,孙卓,胡伟. 生成解释以理解和修复基于嵌入的实体对齐[C]//2024 年 IEEE 第 40 届国际数据工程会议(ICDE),2024,页 2205-2217.

[29] D. Guo, J. Wang, A. Fu, in: Research on Entity Alignment Method in Civil Aviation Equipment Domain Based on Translation Embedding[C]//2023, 2023, pp. 204-214.
[29] 郭东,王军,傅安. 基于翻译嵌入的民航设备领域实体对齐方法研究[C]//2023,2023,页 204-214.

[30] J. Wang, J. Qu, Z. Zhao, et al., SMAAMA: A named entity alignment method based on Siamese network character feature and multi-attribute importance feature for Chinese civil aviation[J], Journal of King Saud University-Computer and Information Sciences 35 (10) (2023) 101856.
[30] J. Wang, J. Qu, Z. Zhao 等,SMAAMA:一种基于孪生网络字符特征和多属性重要性特征的中国民航命名实体对齐方法[J],《King Saud University-计算机与信息科学杂志》35(10) (2023) 101856。

[31] Z. Wang, Q. Lv, X. Lan, et al., Cross-Lingual Knowledge Graph Alignment via Graph Convolutional Networks[c]//proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. (2018:) 349-357.
[31] Z. Wang, Q. Lv, X. Lan 等,基于图卷积网络的跨语言知识图谱对齐[c]//2018 年自然语言处理实证方法会议论文集。(2018:) 349-357。

[32] Q. Zhu, X. Zhou, J. Wu, et al., Neighborhood-Aware Attentional Representation for, Multilingual Knowledge Graphs[C]//IJCAI. (2019) 1943-1949.
[32] Q. Zhu, X. Zhou, J. Wu 等,面向多语言知识图谱的邻域感知注意力表示[C]//IJCAI。(2019) 1943-1949。

[33] J. Chen, L. Yang, Z. Wang, et al., Higher-order GNN with Local Inflation for entity alignment[J], Knowledge-Based Systems 293 (2024) 111634.
[33] J. Chen, L. Yang, Z. Wang 等,带局部膨胀的高阶 GNN 用于实体对齐[J],《基于知识的系统》293 (2024) 111634。

[34] L. Song, J. Liu, M. Sun, et al., Weakly supervised group mask network for object detection[J], International Journal of Computer Vision 129 (3) (2021) 681-702.
[34] L. Song, J. Liu, M. Sun 等,弱监督组掩码网络用于目标检测[J],国际计算机视觉杂志 129 (3) (2021) 681-702。

[35] X. Zeng, P. Wang, Y. Mao, et al., in: MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching[C]//2024, 2024, pp. 3421-3434.
[35] X. Zeng, P. Wang, Y. Mao 等,发表于:MultiEM:高效且有效的无监督多表实体匹配[C]//2024, 2024, 页码 3421-3434。

[36] C. Ge, P. Wang, L. Chen, et al., CollaborEM: A self-supervised entity matching framework using multi-features collaboration[J], IEEE Transactions on Knowledge and Data Engineering 35 (12) (2021) 12139-12152.
[36] C. Ge, P. Wang, L. Chen 等,CollaborEM:一种利用多特征协作的自监督实体匹配框架[J],IEEE 知识与数据工程汇刊 35 (12) (2021) 12139-12152。

[37] Z. Zhao, S. Lin, A cross-linguistic entity alignment method based on graph convolutional neural network and graph attention network[J], Computing (2023) 1-18.
[37] Z. Zhao, S. Lin,基于图卷积神经网络和图注意力网络的跨语言实体对齐方法[J],Computing (2023) 1-18。

[38] Z. Sun, C. Wang, W. Hu, et al., Knowledge Graph Alignment Network with Gated Multi-Hop Neighborhood Aggregation[c]//proceedings of the AAAI Conference on Artificial Intelligence. 34 (01) (2020) 222-229.
[38] Z. Sun, C. Wang, W. Hu 等,带门控多跳邻域聚合的知识图谱对齐网络[c]//第 34 届 AAAI 人工智能会议论文集(01),2020,222-229。

[39] L. Li, J. Dong, X. Qin, Dual-view graph neural network with gating mechanism for entity alignment[J], Applied Intelligence (2023) 1-16.
[39] L. Li, J. Dong, X. Qin,带门控机制的双视图图神经网络用于实体对齐[J],《应用智能》,2023,1-16。

[40] Ji G, He S, Xu L, et al. Knowledge graph embedding via dynamic mapping matrix [C]//Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers). 2015: 687-696.
[40] Ji G, He S, Xu L 等,通过动态映射矩阵进行知识图谱嵌入[C]//第 53 届计算语言学协会年会暨第 7 届国际自然语言处理联合会议论文集(第一卷:长文),2015,687-696。

[41] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016.
[41] Kipf T N, Welling M,基于图卷积网络的半监督分类[J]。arXiv 预印本 arXiv:1609.02907,2016。

[42] Veličković P, Cucurull G, Casanova A, et al. Graph attention networks[J]. arXiv preprint arXiv:1710.10903, 2017.
[42] Veličković P, Cucurull G, Casanova A 等. 图注意力网络[J]. arXiv 预印本 arXiv:1710.10903, 2017.

[43] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[43] Devlin J, Chang M W, Lee K 等. BERT:用于语言理解的深度双向变换器预训练[J]. arXiv 预印本 arXiv:1810.04805, 2018.

[44] Sun Z, Hu W, Li C. Cross-lingual entity alignment via joint attribute-preserving embedding[C]//The Semantic Web-ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I 16. Springer International Publishing, 2017: 628-644.
[44] Sun Z, Hu W, Li C. 通过联合属性保持嵌入的跨语言实体对齐[C]//语义网-ISWC 2017:第 16 届国际语义网会议,奥地利维也纳,2017 年 10 月 21-25 日,会议论文集,第一部分 16. 施普林格国际出版, 2017: 628-644.

[45] R. Zhang, B.D. Trisedya, M. Li, et al., A benchmark and comprehensive survey on knowledge graph entity alignment via representation learning[J], The VLDB Journal 31 (5) (2022) 1143-1168.
[45] R. Zhang, B.D. Trisedya, M. Li 等. 基于表示学习的知识图谱实体对齐基准与综合综述[J]. VLDB 期刊 31 (5) (2022) 1143-1168.

[46] R. Zhang, Y. Su, B.D. Trisedya, et al., Autoalign: fully automatic and effective knowledge graph alignment enabled by large language models[J], IEEE Transactions on Knowledge and Data Engineering (2023).
[46] R. Zhang, Y. Su, B.D. Trisedya 等,Autoalign:由大型语言模型驱动的全自动高效知识图谱对齐[J],IEEE 知识与数据工程汇刊(2023 年)。

[47] Zhang Q, Sun Z, Hu W, et al. Multi-view knowledge graph embedding for entity alignment[J]. arXiv preprint arXiv:1906.02390, 2019.
[47] Zhang Q, Sun Z, Hu W 等,多视角知识图谱嵌入用于实体对齐[J],arXiv 预印本 arXiv:1906.02390,2019 年。

[48] B.D. Trisedya, J. Qi, R. Zhang, Entity Alignment between Knowledge Graphs Using Attribute Embeddings[c]//proceedings of the AAAI Conference on Artificial Intelligence. 33 (01) (2019) 297-304.
[48] B.D. Trisedya, J. Qi, R. Zhang,基于属性嵌入的知识图谱实体对齐[c]//第 33 届 AAAI 人工智能会议论文集(01),2019 年,297-304 页。

[49] Mao X, Wang W, Xu H, et al. MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph[C]//Proceedings of the 13th International Conference on Web Search and Data Mining. 2020: 420-428.
[49] Mao X, Wang W, Xu H 等,MRAEA:一种高效且鲁棒的跨语言知识图谱实体对齐方法[C]//第 13 届国际网络搜索与数据挖掘会议论文集,2020 年,420-428 页。

[50] R. Zhu, M. Ma, P. Wang, RAGA: relation-aware graph attention networks for global entity alignment[C]//Pacific-Asia conference on knowledge discovery and data mining, Springer International Publishing, Cham, 2021, pp. 501-513.
[50] R. Zhu, M. Ma, P. Wang, RAGA:面向全局实体对齐的关系感知图注意力网络[C]//太平洋亚洲知识发现与数据挖掘会议,Springer 国际出版,Cham,2021,页 501-513。

[51] Z. Zhong, M. Zhang, J. Fan, et al., Semantics driven embedding learning for effective entity alignment[C]//2022, in: IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 2127-2140.
[51] Z. Zhong, M. Zhang, J. Fan 等,基于语义驱动的嵌入学习用于高效实体对齐[C]//2022 年 IEEE 第 38 届国际数据工程会议(ICDE),2022,页 2127-2140。

[52] Tang X, Zhang J, Chen B, et al. BERT-INT: A BERT-based interaction model for knowledge graph alignment[J]. interactions, 2020, 100: e1.
[52] Tang X, Zhang J, Chen B 等,BERT-INT:基于 BERT 的知识图谱对齐交互模型[J]。interactions,2020,100:e1。

[53] J. Li, D. Song, Uncertainty-aware pseudo label refinery for entity alignment[C]// Proceedings of the ACM, Web Conference (2022) 829-837.
[53] J. Li, D. Song,不确定性感知的伪标签精炼用于实体对齐[C]//ACM 网络会议论文集(2022),页 829-837。

    • Corresponding author.通讯作者。
    E-mail address: xhuang@nuaa.edu.cn (X. Huang).
    电子邮箱地址:xhuang@nuaa.edu.cn(X. Huang)。