收藏切换
Research on Entity Alignment in Chinese Materia Medica Knowledge Graphs for Knowledge Fusion
收藏切换
PDF
He Li1, *, Wenshi Shao1, Jiayu Liu1, Jinyuan Zhang1, Wang Shen1, Guimin Wang2
Journal of Modern Information | 2026, 46(3) : 30 - 43
Less
收藏切换
Journal of Modern Information | 2026, 46(3): 30-43
DATA INTELLIGENCE and KNOWLEDGE SERVICE
Research on Entity Alignment in Chinese Materia Medica Knowledge Graphs for Knowledge Fusion
Full
He Li1, *, Wenshi Shao1, Jiayu Liu1, Jinyuan Zhang1, Wang Shen1, Guimin Wang2
Affiliations
  • 1School of Business and Management,Jilin University,Changchun130012,China
  • 2Jilin Provincial Library,Changchun130022,China
Published: 2026-03-01 doi: 10.3969/j.issn.1008-0821.2026.03.003
Outline
收藏切换
Purpose/Significance

The digital transformation of Chinese Materia Medica(CMM) classics is critical for bridging ancient pharmacological wisdom with modern drug discovery. However, existing Knowledge Graphs(KGs) for CMM are often constructed in isolation, resulting in fragmented information silos that hinder global data interoperability. While Entity Alignment(EA) has become a focal point in the international Semantic Web community, specific research targeting the alignment of ancient CMM literature remains a significant gap. Moreover, current state-of-the-art models—primarily designed for modern, high-resource languages—struggle to address the unique challenges of ancient Chinese texts. These challenges include severe structural heterogeneity caused by disparate historical writing styles, high terminological ambiguity where distinct medical concepts share similar characters, and a critical scarcity of high-quality annotated data⁃sets. This study aims to fill this gap by proposing a domain-specific deep learning framework designed to automate the fusion of multi-source historical medical knowledge.

Method/Process

To overcome these barriers, this paper proposed the Generative Adversarial Fuzzy-boundary Learning(GAFL-Align) model. The study utilized two representative classics spanning different historical eras: Shennong Bencao Jing and Tangye Bencao. After data cleaning, the datasets comprised 3 771 and 3 910 normalized entities, respectively, focusing on core categories such as herbs, symptoms, and diseases. The technical architecture integrated BERT for deep semantic encoding with Graph Attention Networks(GAT) to capture topological structures. To handle distribution shifts across heterogeneous texts, the model employed a Generative Adversarial Network(GAN) for domain adaptation, mapping entities into a unified feature space. Furthermore, a novel fuzzy boundary negative sampling strategy was developed to distinguish “hard negatives”—terms with high lexical similarity but distinct medical meanings. To address data scarcity, an iterative self-training mechanism with confidence-aware filtering was implemented to augment the training set from a limited number of expert-annotated seed pairs.

Result/Conclusion

Experimental results indicated that GAFL-Align achieved a Hits@1 score of 83.59%, significantly outperforming nine baselines, including translation-based models, GNN variants, and Large Language Models(LLMs)-augmented approaches like ChatEA. The model successfully constructed a fused KG containing 6 826 entities, effectively merging heterogeneous data while preserving unique source-specific attributes. These findings demonstrate that combining adversarial domain adaptation with fine-grained semantic differentiation offers a superior solution for low-resource historical knowledge fusion compared to generic LLMs. Ultimately, this research provides a robust technical foundation for the intelligent organization of CMM heritage, offering significant implications for digital humanities and the global standardization of traditional medicine data.

knowledge fusion  /  entity alignment  /  Chinese Materia Medica classics  /  knowledge graph  /  deep learning
He Li, Wenshi Shao, Jiayu Liu, Jinyuan Zhang, Wang Shen, Guimin Wang. Research on Entity Alignment in Chinese Materia Medica Knowledge Graphs for Knowledge Fusion[J]. Journal of Modern Information, 2026 , 46 (3) : 30 -43 . DOI: 10.3969/j.issn.1008-0821.2026.03.003
Year 2026 volume 46 Issue 3
PDF
29
13
Cite this Article
BibTeX
Article Info
doi: 10.3969/j.issn.1008-0821.2026.03.003
  • Receive Date:2025-12-11
  • Online Date:2026-06-05
  • Published:2026-03-01
Article Data
Affiliations
History
  • Received:2025-12-11
Affiliations
    1School of Business and Management,Jilin University,Changchun130012,China
    2Jilin Provincial Library,Changchun130022,China
References
Share
https://castjournals.cast.org.cn/joweb/xdqb/EN/10.3969/j.issn.1008-0821.2026.03.003
Share to
QR

Scan QR to access full text

Cite this article
BibTeX
Citations
表12种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏
  • BibTeX
  • EndNote
  • RefWorks
  • TxT