Article(id=1149776907639681095, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1149776900194791454, articleNumber=null, orderNo=null, doi=10.12404/j.issn.1671-1815.2403519, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=research-article, receivedDate=1715529600000, receivedDateStr=2024-05-13, revisedDate=1722441600000, revisedDateStr=2024-08-01, acceptedDate=null, acceptedDateStr=null, onlineDate=1752057776603, onlineDateStr=2025-07-09, pubDate=1744905600000, pubDateStr=2025-04-18, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1752057776603, onlineIssueDateStr=2025-07-09, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1752057776603, creator=13701087609, updateTime=1752057776603, updator=13701087609, issue=Issue{id=1149776900194791454, tenantId=1146029695717560320, journalId=1146123166801305609, year='2025', volume='25', issue='11', pageStart='4397', pageEnd='4826', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=-1, specialIssue=0, createTime=1752057774827, creator=13701087609, updateTime=1768456666677, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1218558837930512931, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1149776900194791454, language=EN, specialIssueTitle=, coverIllustrator=, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1218558837930512932, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1149776900194791454, language=CN, specialIssueTitle=, coverIllustrator=, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=4656, endPage=4665, ext={EN=ArticleExt(id=1149776907941670985, articleId=1149776907639681095, tenantId=1146029695717560320, journalId=1146123166801305609, language=EN, title=Named Entity Recognition for Chinese Electronic Medical Records Using MacBERT and Global Pointer Network, columnId=1156262729162810294, journalTitle=Science Technology and Engineering, columnName=Papers·Automation and Computational Technology, runingTitle=null, highlight=null, articleAbstract=

Addressing the limitation of existing sequence labeling approaches in effectively recognizing nested entities within Chinese electronic health records (EHRs), a novel named entity recognition model that integrates MacBERT and a global pointer network was proposed. Initially, the MacBERT-large pre-trained model transformed the text into context-sensitive dynamic vectors. Subsequently, the fast gradient method (FGM) was employed to generate adversarial samples, which were incorporated into the original vectors and fed into a BiLSTM (bi-directional long short-term memory) network to capture contextual features. To enhance the capture of long-distance semantic features, an attention mechanism was introduced. Finally, a global pointer network model was leveraged to decode simultaneously considering both head and tail feature information, thereby achieving superior prediction performance for medical nested entities. Experimental results demonstrate that compared to the state-of-the-art global pointer model, the proposed model achieves an improvement of 1.8%, 1.37%, and 1.72% in F1-score on the CCKS2019 dataset and two versions of the CMeEE Chinese EHR dataset, respectively, validating the effectiveness of the proposed approach.

, correspAuthors=Dong-dong GUO, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Tian-yu WU, Dong-dong GUO, Wen-qiao LI, Zi-kang LI, Lin MIAO), CN=ArticleExt(id=1149776919991906574, articleId=1149776907639681095, tenantId=1146029695717560320, journalId=1146123166801305609, language=CN, title=基于MacBERT与全局指针网络的中文电子病历命名实体识别, columnId=1156262729783567290, journalTitle=科学技术与工程, columnName=论文·自动化技术、计算机技术, runingTitle=null, highlight=null, articleAbstract=

针对现有序列标注方法不能有效解决中文电子病历嵌套实体识别问题,提出一种基于MacBERT与全局指针网络的中文电子病历命名实体识别模型。首先通过MacBERT-large预训练模型将文本转换为结合语境信息的动态向量,然后使用FGM (fast gradient method)方法生成对抗样本添加至原有向量并一同输入BiLSTM (bi-directional long short-term memory)网络获取上下文特征,并通过引入注意力机制增强长距离语义特征获取,最后利用全局指针网络模型同时考虑头部和尾部的特征信息进行解码以获得更好的医学嵌套实体预测效果。实验结果表明,本文模型相较于识别效果较好的主流模型全局指针网络模型在CCKS2019以及两个版本的CMeEE中文电子病历数据集上F1分别提高了1.8%、1.37%、1.72%,证明了模型的有效性。

, correspAuthors=郭冬冬, authorNote=null, correspAuthorsNote=
* 郭冬冬(1990—),男,汉族,山西大同人,博士。研究方向:自然语言处理。E-mail:
, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=Wfwb8tgS+LVTuaAyGH/B9A==, magXml=UCGnBxsLHveqGdc80wbnxQ==, pdfUrl=null, pdf=2Wo1F+lIMgGV5POaye5Srg==, pdfFileSize=4178899, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=dxGM8IRkYXDUlarDZElrgg==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=q2mMWe8RfV+bBHzEYsl5ug==, mapNumber=null, authorCompany=null, fund=null, authors=

吴天宇(2000—),男,汉族,北京丰台人,硕士研究生。研究方向:知识图谱。E-mail:

, authorsList=吴天宇, 郭冬冬, 李文桥, 李子康, 苗琳)}, authors=[Author(id=1218843903353082313, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=767101839@qq.com, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1218843903457939921, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843903353082313, language=EN, stringName=Tian-yu WU, firstName=Tian-yu, middleName=null, lastName=WU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Computer School, Beijing Information Science and Technology University, Beijing 100101, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1218843903655072215, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843903353082313, language=CN, stringName=吴天宇, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京信息科技大学计算机学院, 北京 100101, bio={"content":"

吴天宇(2000—),男,汉族,北京丰台人,硕士研究生。研究方向:知识图谱。E-mail:

"}, bioImg=null, bioContent=

吴天宇(2000—),男,汉族,北京丰台人,硕士研究生。研究方向:知识图谱。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1218843903235641793, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, xref=null, ext=[AuthorCompanyExt(id=1218843903244030401, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Computer School, Beijing Information Science and Technology University, Beijing 100101, China), AuthorCompanyExt(id=1218843903256613315, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京信息科技大学计算机学院, 北京 100101)])]), Author(id=1218843903814455780, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=dongdongguo@bistu.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1218843903961256432, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843903814455780, language=EN, stringName=Dong-dong GUO, firstName=Dong-dong, middleName=null, lastName=GUO, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=*, address=Computer School, Beijing Information Science and Technology University, Beijing 100101, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1218843904066114043, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843903814455780, language=CN, stringName=郭冬冬, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=*, address=北京信息科技大学计算机学院, 北京 100101, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1218843903235641793, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, xref=null, ext=[AuthorCompanyExt(id=1218843903244030401, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Computer School, Beijing Information Science and Technology University, Beijing 100101, China), AuthorCompanyExt(id=1218843903256613315, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京信息科技大学计算机学院, 北京 100101)])]), Author(id=1218843904175165958, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1218843904359715348, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843904175165958, language=EN, stringName=Wen-qiao LI, firstName=Wen-qiao, middleName=null, lastName=LI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Computer School, Beijing Information Science and Technology University, Beijing 100101, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1218843904485544479, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843904175165958, language=CN, stringName=李文桥, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京信息科技大学计算机学院, 北京 100101, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1218843903235641793, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, xref=null, ext=[AuthorCompanyExt(id=1218843903244030401, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Computer School, Beijing Information Science and Technology University, Beijing 100101, China), AuthorCompanyExt(id=1218843903256613315, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京信息科技大学计算机学院, 北京 100101)])]), Author(id=1218843904569430569, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1218843904682676790, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843904569430569, language=EN, stringName=Zi-kang LI, firstName=Zi-kang, middleName=null, lastName=LI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Computer School, Beijing Information Science and Technology University, Beijing 100101, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1218843904825283139, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843904569430569, language=CN, stringName=李子康, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京信息科技大学计算机学院, 北京 100101, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1218843903235641793, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, xref=null, ext=[AuthorCompanyExt(id=1218843903244030401, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Computer School, Beijing Information Science and Technology University, Beijing 100101, China), AuthorCompanyExt(id=1218843903256613315, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京信息科技大学计算机学院, 北京 100101)])]), Author(id=1218843904946917970, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, orderNo=4, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1218843905047581275, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843904946917970, language=EN, stringName=Lin MIAO, firstName=Lin, middleName=null, lastName=MIAO, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Computer School, Beijing Information Science and Technology University, Beijing 100101, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1218843905173410412, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, authorId=1218843904946917970, language=CN, stringName=苗琳, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京信息科技大学计算机学院, 北京 100101, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1218843903235641793, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, xref=null, ext=[AuthorCompanyExt(id=1218843903244030401, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Computer School, Beijing Information Science and Technology University, Beijing 100101, China), AuthorCompanyExt(id=1218843903256613315, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京信息科技大学计算机学院, 北京 100101)])])], keywords=[Keyword(id=1218843905383125631, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, orderNo=1, keyword=named entity recognition), Keyword(id=1218843905513149066, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, orderNo=2, keyword=Chinese electronic medical record), Keyword(id=1218843905638978196, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, orderNo=3, keyword=global pointer network), Keyword(id=1218843905760613025, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, orderNo=4, keyword=attention mechanism), Keyword(id=1218843905852887725, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, orderNo=1, keyword=命名实体识别), Keyword(id=1218843905966133944, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, orderNo=2, keyword=中文电子病历), Keyword(id=1218843906091963076, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, orderNo=3, keyword=全局指针网络), Keyword(id=1218843906234569427, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, orderNo=4, keyword=注意力机制)], refs=[Reference(id=1218843910525341778, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2022, volume=50, issue=12, pageStart=3030, pageEnd=3053, url=null, language=null, rfNumber=[1], rfOrder=0, authorNames=杜晋华, 尹浩, 冯嵩, journalName=电子学报, refType=null, unstructuredReference=杜晋华, 尹浩, 冯嵩. 中文电子病历命名实体识别的研究与进展[J]. 电子学报, 2022, 50(12): 3030-3053., articleTitle=中文电子病历命名实体识别的研究与进展, refAbstract=null), Reference(id=1218843910756028509, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2022, volume=50, issue=12, pageStart=3030, pageEnd=3053, url=null, language=null, rfNumber=[1], rfOrder=1, authorNames=Du Jinhua, Yin Hao, Feng Song, journalName=Journal of Electronics, refType=null, unstructuredReference=Du Jinhua, Yin Hao, Feng Song. Research and progress on named entity recognition of Chinese electronic medical records[J]. Journal of Electronics, 2022, 50(12): 3030-3053., articleTitle=Research and progress on named entity recognition of Chinese electronic medical records, refAbstract=null), Reference(id=1218843910877663332, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2024, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[2], rfOrder=2, authorNames=周冬冬, journalName=中文电子病历命名实体识别研究, refType=null, unstructuredReference=周冬冬. 中文电子病历命名实体识别研究[D]. 大庆: 东北石油大学, 2024., articleTitle=null, refAbstract=null), Reference(id=1218843911024463985, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2024, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[2], rfOrder=3, authorNames=Zhou Dongdong, journalName=Research on named entity recognition of Chinese electronic medical records, refType=null, unstructuredReference=Zhou Dongdong. Research on named entity recognition of Chinese electronic medical records[D]. Daqing: Northeast University of Petroleum, 2024., articleTitle=null, refAbstract=null), Reference(id=1218843911167070335, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2024, volume=60, issue=1, pageStart=15, pageEnd=27, url=null, language=null, rfNumber=[3], rfOrder=4, authorNames=赵继贵, 钱育蓉, 王魁, journalName=计算机工程与应用, refType=null, unstructuredReference=赵继贵, 钱育蓉, 王魁, 等. 中文命名实体识别研究综述[J]. 计算机工程与应用, 2024, 60(1): 15-27., articleTitle=中文命名实体识别研究综述, refAbstract=null), Reference(id=1218843911250956421, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2024, volume=60, issue=1, pageStart=15, pageEnd=27, url=null, language=null, rfNumber=[3], rfOrder=5, authorNames=Zhao Jigui, Qian Yurong, Wang Kui, journalName=Computer Engineering and Applications, refType=null, unstructuredReference=Zhao Jigui, Qian Yurong, Wang Kui, et al. A review of research on Chinese named entity recognition[J]. Computer Engineering and Applications, 2024, 60(1): 15-27., articleTitle=A review of research on Chinese named entity recognition, refAbstract=null), Reference(id=1218843911401951377, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2019, volume=19, issue=2, pageStart=67, pageEnd=78, url=null, language=null, rfNumber=[4], rfOrder=6, authorNames=Gao Y, Gu L, Wang Y, journalName=BMC Medical Informatics and Decision Making, refType=null, unstructuredReference=Gao Y, Gu L, Wang Y, et al. Constructing a Chinese electronic medical record corpus for named entity recognition on resident admit notes[J]. BMC Medical Informatics and Decision Making, 2019, 19(2): 67-78., articleTitle=Constructing a Chinese electronic medical record corpus for named entity recognition on resident admit notes, refAbstract=null), Reference(id=1218843911531974811, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2024, volume=46, issue=2, pageStart=325, pageEnd=337, url=null, language=null, rfNumber=[5], rfOrder=7, authorNames=吉旭瑞, 魏德健, 张俊忠, journalName=计算机工程与科学, refType=null, unstructuredReference=吉旭瑞, 魏德健, 张俊忠, 等. 中文电子病历信息提取方法研究综述[J]. 计算机工程与科学, 2024, 46(2): 325-337., articleTitle=中文电子病历信息提取方法研究综述, refAbstract=null), Reference(id=1218843911661998246, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2024, volume=46, issue=2, pageStart=325, pageEnd=337, url=null, language=null, rfNumber=[5], rfOrder=8, authorNames=Ji Xurui, Wei Dejian, Zhang Junzhong, journalName=Computer Engineering and Science, refType=null, unstructuredReference=Ji Xurui, Wei Dejian, Zhang Junzhong, et al. A review of research on information extraction methods for Chinese electronic medical records[J]. Computer Engineering and Science, 2024, 46(2): 325-337., articleTitle=A review of research on information extraction methods for Chinese electronic medical records, refAbstract=null), Reference(id=1218843911771050160, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=13, pageStart=5655, pageEnd=5661, url=null, language=null, rfNumber=[6], rfOrder=9, authorNames=刘浩, 张建业, 吕张成, journalName=科学技术与工程, refType=null, unstructuredReference=刘浩, 张建业, 吕张成, 等. 面向数控机床设计知识图谱构建的实体识别[J]. 科学技术与工程, 2023, 23(13): 5655-5661., articleTitle=面向数控机床设计知识图谱构建的实体识别, refAbstract=null), Reference(id=1218843911934628033, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=13, pageStart=5655, pageEnd=5661, url=null, language=null, rfNumber=[6], rfOrder=10, authorNames=Liu Hao, Zhang Jianye, Lü Zhangcheng, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Liu Hao, Zhang Jianye, Zhangcheng, et al. Entity recognition based on knowledge graph construction for CNC machine tool design[J]. Science Technology and Engineering, 2023, 23 (13): 5655-5661., articleTitle=Entity recognition based on knowledge graph construction for CNC machine tool design, refAbstract=null), Reference(id=1218843912060457165, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2021, volume=42, issue=5, pageStart=444, pageEnd=450, url=null, language=null, rfNumber=[7], rfOrder=11, authorNames=赵辉, 庞海婷, 冯珊珊, journalName=长春工业大学学报, refType=null, unstructuredReference=赵辉, 庞海婷, 冯珊珊, 等. 中文命名实体识别技术综述[J]. 长春工业大学学报, 2021, 42(5): 444-450., articleTitle=中文命名实体识别技术综述, refAbstract=null), Reference(id=1218843912182091992, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2021, volume=42, issue=5, pageStart=444, pageEnd=450, url=null, language=null, rfNumber=[7], rfOrder=12, authorNames=Zhao Hui, Pang Haiting, Feng Shanshan, journalName=Journal of Changchun University of Technology, refType=null, unstructuredReference=Zhao Hui, Pang Haiting, Feng Shanshan, et al. Overview of Chinese named entity recognition technology[J]. Journal of Changchun University of Technology, 2021, 42(5): 444-450., articleTitle=Overview of Chinese named entity recognition technology, refAbstract=null), Reference(id=1218843912354058470, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2019, volume=98, issue=null, pageStart=103289, pageEnd=null, url=null, language=null, rfNumber=[8], rfOrder=13, authorNames=Yin M W, Mou C J, Xiong K N, journalName=Journal of Biomedical Informatics, refType=null, unstructuredReference=Yin M W, Mou C J, Xiong K N, et al. Chinese clinical named entity recognition with radical-level feature and self-attention mechanism[J]. Journal of Biomedical Informatics, 2019, 98: 103289., articleTitle=Chinese clinical named entity recognition with radical-level feature and self-attention mechanism, refAbstract=null), Reference(id=1218843912450527470, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2019, volume=54, issue=2, pageStart=104, pageEnd=124, url=null, language=null, rfNumber=[9], rfOrder=14, authorNames=Devlin J, Chang M W, Lee K, journalName=Association for Computational Linguistics, refType=null, unstructuredReference=Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. Association for Computational Linguistics, 2019, 54(2): 104-124., articleTitle=BERT: pre-training of deep bidirectional transformers for language understanding, refAbstract=null), Reference(id=1218843912551190776, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2021, volume=41, issue=3, pageStart=173, pageEnd=176, url=null, language=null, rfNumber=[10], rfOrder=15, authorNames=陈琛, 吴芬琳, journalName=自动化与仪器仪表, refType=null, unstructuredReference=陈琛, 吴芬琳. 基于BERT的电子病历命名实体识别[J]. 自动化与仪器仪表, 2021, 41(3): 173-176., articleTitle=基于BERT的电子病历命名实体识别, refAbstract=null), Reference(id=1218843912660242695, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2021, volume=41, issue=3, pageStart=173, pageEnd=176, url=null, language=null, rfNumber=[10], rfOrder=16, authorNames=Chen Chen, Wu Fenlin, journalName=Automation and Instrumentation, refType=null, unstructuredReference=Chen Chen, Wu Fenlin. Named entity recognition of electronic medical records based on BERT[J]. Automation and Instrumentation, 2021, 41(3): 173-176., articleTitle=Named entity recognition of electronic medical records based on BERT, refAbstract=null), Reference(id=1218843912777683218, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2022, volume=36, issue=3, pageStart=36, pageEnd=43, url=null, language=null, rfNumber=[11], rfOrder=17, authorNames=孔令巍, 朱艳辉, 张旭, journalName=湖南工业大学学报, refType=null, unstructuredReference=孔令巍, 朱艳辉, 张旭, 等. 基于对抗训练的中文电子病历命名实体识别[J]. 湖南工业大学学报, 2022, 36(3): 36-43., articleTitle=基于对抗训练的中文电子病历命名实体识别, refAbstract=null), Reference(id=1218843912878346527, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2022, volume=36, issue=3, pageStart=36, pageEnd=43, url=null, language=null, rfNumber=[11], rfOrder=18, authorNames=Kong Lingwei, Zhu Yanhui, Zhang Xu, journalName=Journal of Hunan University of Technology, refType=null, unstructuredReference=Kong Lingwei, Zhu Yanhui, Zhang Xu, et al. Chinese electronic medical record named entity recognition based on adversarial training[J]. Journal of Hunan University of Technology, 2022, 36(3): 36-43., articleTitle=Chinese electronic medical record named entity recognition based on adversarial training, refAbstract=null), Reference(id=1218843912970621228, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2023, volume=44, issue=8, pageStart=1680, pageEnd=1685, url=null, language=null, rfNumber=[12], rfOrder=19, authorNames=陈娜, 孙艳秋, 燕燕, journalName=小型微型计算机系统, refType=null, unstructuredReference=陈娜, 孙艳秋, 燕燕. 结合注意力机制的BERT-BiGRU-CRF中文电子病历命名实体识别[J]. 小型微型计算机系统, 2023, 44(8): 1680-1685., articleTitle=结合注意力机制的BERT-BiGRU-CRF中文电子病历命名实体识别, refAbstract=null), Reference(id=1218843913121616182, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2023, volume=44, issue=8, pageStart=1680, pageEnd=1685, url=null, language=null, rfNumber=[12], rfOrder=20, authorNames=Chen Na, Sun Yanqiu, Yan Yan, journalName=Small Micro Computer Systems, refType=null, unstructuredReference=Chen Na, Sun Yanqiu, Yan Yan. BERT BiGRU-CRF Chinese electronic medical record named entity recognition combined with attention mechanism[J]. Small Micro Computer Systems, 2023, 44(8): 1680-1685., articleTitle=BERT BiGRU-CRF Chinese electronic medical record named entity recognition combined with attention mechanism, refAbstract=null), Reference(id=1218843913230668094, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2022, volume=22, issue=30, pageStart=13370, pageEnd=13377, url=null, language=null, rfNumber=[13], rfOrder=21, authorNames=李洋, 蔡红珍, 邢林林, journalName=科学技术与工程, refType=null, unstructuredReference=李洋, 蔡红珍, 邢林林, 等. 基于对抗迁移的复合材料检测领域命名实体识别[J]. 科学技术与工程, 2022, 22(30): 13370-13377., articleTitle=基于对抗迁移的复合材料检测领域命名实体识别, refAbstract=null), Reference(id=1218843913369080140, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2022, volume=22, issue=30, pageStart=13370, pageEnd=13377, url=null, language=null, rfNumber=[13], rfOrder=22, authorNames=Li Yang, Cai Hongzhen, Xing Linlin, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Li Yang, Cai Hongzhen, Xing Linlin, et al. Named entity recognition in the field of composite material detection based on adversarial transfer[J]. Science Technology and Engineering, 2022, 22(30): 13370-13377., articleTitle=Named entity recognition in the field of composite material detection based on adversarial transfer, refAbstract=null), Reference(id=1218843913482326358, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=17, pageStart=7436, pageEnd=7443, url=null, language=null, rfNumber=[14], rfOrder=23, authorNames=蒋丽媛, 吴亚东, 王书航, journalName=科学技术与工程, refType=null, unstructuredReference=蒋丽媛, 吴亚东, 王书航, 等. 融合笔画特征的命名实体识别方法[J]. 科学技术与工程, 2023, 23(17): 7436-7443., articleTitle=融合笔画特征的命名实体识别方法, refAbstract=null), Reference(id=1218843913608155487, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=17, pageStart=7436, pageEnd=7443, url=null, language=null, rfNumber=[14], rfOrder=24, authorNames=Jiang Liyuan, Wu Yadong, Wang Shuhang, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Jiang Liyuan, Wu Yadong, Wang Shuhang, et al. A named entity recognition method based on fusion of stroke features[J]. Science Technology and Engineering, 2023, 23(17): 7436-7443., articleTitle=A named entity recognition method based on fusion of stroke features, refAbstract=null), Reference(id=1218843913725596007, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2024, volume=60, issue=11, pageStart=147, pageEnd=155, url=null, language=null, rfNumber=[15], rfOrder=25, authorNames=赵珍珍, 董彦如, 刘静, journalName=计算机工程与应用, refType=null, unstructuredReference=赵珍珍, 董彦如, 刘静, 等. 融合词信息和图注意力的医学命名实体识别[J]. 计算机工程与应用, 2024, 60(11): 147-155., articleTitle=融合词信息和图注意力的医学命名实体识别, refAbstract=null), Reference(id=1218843913834647922, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2024, volume=60, issue=11, pageStart=147, pageEnd=155, url=null, language=null, rfNumber=[15], rfOrder=26, authorNames=Zhao Zhenzhen, Dong Yanru, Liu Jing, journalName=Computer Engineering and Applications, refType=null, unstructuredReference=Zhao Zhenzhen, Dong Yanru, Liu Jing, et al. Medical named entity recognition based on fusion of word information and graph attention[J]. Computer Engineering and Applications, 2024, 60(11): 147-155., articleTitle=Medical named entity recognition based on fusion of word information and graph attention, refAbstract=null), Reference(id=1218843913960477051, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=1, pageEnd=12, url=http://kns.cnki.net/kcms/detail/21.1106.TP.20240228.1116.013.html, language=null, rfNumber=[16], rfOrder=27, authorNames=陆鑫涛, 孙丽萍, 凌晨, journalName=小型微型计算机系统, refType=null, unstructuredReference=陆鑫涛, 孙丽萍, 凌晨, 等. 融入拼音与词性特征的中文电子病历命名实体识别[J/OL]. 小型微型计算机系统: 1-12[2024-03-06]. http://kns.cnki.net/kcms/detail/21.1106.TP.20240228.1116.013.html., articleTitle=融入拼音与词性特征的中文电子病历命名实体识别, refAbstract=null), Reference(id=1218843914069528965, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=1, pageEnd=12, url=http://kns.cnki.net/kcms/detail/21.1106.TP.20240228.1116.013.html, language=null, rfNumber=[16], rfOrder=28, authorNames=Lu Xintao, Sun Liping, Ling Chen, journalName=Mini microcomputer systems, refType=null, unstructuredReference=Lu Xintao, Sun Liping, Ling Chen, et al. Chinese electronic medical record named entity recognition incorporating pinyin and part ofspeech features[J/OL]. Mini microcomputer systems: 1-12 [2024-03-06] http://kns.cnki.net/kcms/detail/21.1106.TP.20240228.1116.013.html., articleTitle=Chinese electronic medical record named entity recognition incorporating pinyin and part ofspeech features, refAbstract=null), Reference(id=1218843914195358093, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=null, pageStart=657, pageEnd=668, url=null, language=null, rfNumber=[17], rfOrder=29, authorNames=Cui Y M, Che W X, Liu T, journalName=Findings of the Association for Computational Linguistics. Online: EMNLP, refType=null, unstructuredReference=Cui Y M, Che W X, Liu T, et al. Revisiting pre-trained models for Chinese natural language processing[J]. Findings of the Association for Computational Linguistics. Online: EMNLP, 2020: 657-668., articleTitle=Revisiting pre-trained models for Chinese natural language processing, refAbstract=null), Reference(id=1218843914296021397, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2013, volume=null, issue=null, pageStart=null, pageEnd=null, url=https://arxiv.org/abs/1312.6199, language=null, rfNumber=[18], rfOrder=30, authorNames=Szegedy C, Zaremba W, SutskeveR I, journalName=Computer Science, refType=null, unstructuredReference=Szegedy C, Zaremba W, SutskeveR I, et al. Intriguing properties of neural networks[J/OL]. Computer Science, 2013. https://arxiv.org/abs/1312.6199., articleTitle=Intriguing properties of neural networks, refAbstract=null), Reference(id=1218843914388296090, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[19], rfOrder=31, authorNames=Su J L, Murtadha A, Pan S F, journalName=arXiv:2208.03054, refType=null, unstructuredReference=Su J L, Murtadha A, Pan S F, et al. Global pointer: novel efficient span-based approach for named entity recognition[J]. arXiv:2208.03054, 2022., articleTitle=Global pointer: novel efficient span-based approach for named entity recognition, refAbstract=null), Reference(id=1218843914501542311, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=http://arxiv.org/abs/1412.6572, language=null, rfNumber=[20], rfOrder=32, authorNames=Goodfellow I J, Shlens J, Szegedy C, journalName=null, refType=null, unstructuredReference=Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[EB/OL]. http://arxiv.org/abs/1412.6572., articleTitle=Explaining and harnessing adversarial examples, refAbstract=null), Reference(id=1218843914648342957, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=null, pageStart=652, pageEnd=664, url=null, language=null, rfNumber=[21], rfOrder=33, authorNames=Hong Y Z, Wen X L, Kun L Z, journalName=Workshop on Chinese Lexical Semantics, refType=null, unstructuredReference=Hong Y Z, Wen X L, Kun L Z, et al. Building a pediatric medical corpus: word segmentation and named entity annotation[C]// Workshop on Chinese Lexical Semantics. Cham: Springer, 2020: 652-664., articleTitle=Building a pediatric medical corpus: word segmentation and named entity annotation, refAbstract=null), Reference(id=1218843914761589176, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[22], rfOrder=34, authorNames=Zhang N, Chen M, Bi Z, journalName=arXiv preprint arXiv: 2106.08087, refType=null, unstructuredReference=Zhang N, Chen M, Bi Z, et al. CBLUE: a Chinese biomedical language understanding evaluation benchmark[J]. arXiv preprint arXiv: 2106.08087, 2021., articleTitle=CBLUE: a Chinese biomedical language understanding evaluation benchmark, refAbstract=null), Reference(id=1218843914883224002, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2019, volume=35, issue=10, pageStart=1745, pageEnd=1752, url=null, language=null, rfNumber=[23], rfOrder=35, authorNames=Wang X, Zhang Y, Ren X, journalName=Bioinformatics, refType=null, unstructuredReference=Wang X, Zhang Y, Ren X, et al. Cross-type biomedical named entity recognition with deep multi-task learning[J]. Bioinformatics, 2019, 35(10): 1745-1752., articleTitle=Cross-type biomedical named entity recognition with deep multi-task learning, refAbstract=null), Reference(id=1218843915004858827, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, doi=null, pmid=null, pmcid=null, year=2022, volume=36, issue=10, pageStart=10965, pageEnd=10973, url=null, language=null, rfNumber=[24], rfOrder=36, authorNames=Li J, Fei H, Liu J, journalName=Artificial Intelligence, refType=null, unstructuredReference=Li J, Fei H, Liu J, et al. Unified named entity recognition as word-word relation classification[J]. Artificial Intelligence, 2022, 36(10): 10965-10973., articleTitle=Unified named entity recognition as word-word relation classification, refAbstract=null)], funds=[Fund(id=1218843909929750559, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, awardId=2021YFB2600600, language=CN, fundingSource=国家重点研发计划(2021YFB2600600), fundOrder=null, country=null), Fund(id=1218843910030413864, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, awardId=2023XJJ15, language=CN, fundingSource=北京信息科技大学校级科研项目(2023XJJ15), fundOrder=null, country=null), Fund(id=1218843910214963253, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, awardId=2023XJJ17, language=CN, fundingSource=北京信息科技大学校级科研项目(2023XJJ17), fundOrder=null, country=null)], companyList=[AuthorCompany(id=1218843903235641793, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, xref=null, ext=[AuthorCompanyExt(id=1218843903244030401, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Computer School, Beijing Information Science and Technology University, Beijing 100101, China), AuthorCompanyExt(id=1218843903256613315, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, companyId=1218843903235641793, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京信息科技大学计算机学院, 北京 100101)])], figs=[ArticleFig(id=1218843906536559349, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Fig.1, caption=Structure diagram of Chinese electronic medical record named entity recognition model based on MacBERT and global pointer network, figureFileSmall=3TY3o2wfUPuln2yddhibtw==, figureFileBig=dYjo2NfzI9n37ZMpWT/X+g==, tableContent=null), ArticleFig(id=1218843906796606213, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=图1, caption=基于MacBERT与全局指针网络的中文电子病历命名实体识别模型结构图, figureFileSmall=3TY3o2wfUPuln2yddhibtw==, figureFileBig=dYjo2NfzI9n37ZMpWT/X+g==, tableContent=null), ArticleFig(id=1218843906943406863, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Fig.2, caption=Comparison chart of different mask strategies, figureFileSmall=sGKicRHA6bxsGXgWr4gktA==, figureFileBig=951mhJtf2FrDAaSKQh/9nw==, tableContent=null), ArticleFig(id=1218843907052458776, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=图2, caption=不同掩码策略对比图, figureFileSmall=sGKicRHA6bxsGXgWr4gktA==, figureFileBig=951mhJtf2FrDAaSKQh/9nw==, tableContent=null), ArticleFig(id=1218843907232813861, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Fig.3, caption=MacBERT input example diagram, figureFileSmall=wQZbDzvss9nfW56lciRkng==, figureFileBig=FtIbMqi8F5nA/9t9zM2sKw==, tableContent=null), ArticleFig(id=1218843907383808814, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=图3, caption=MacBERT输入示例图, figureFileSmall=wQZbDzvss9nfW56lciRkng==, figureFileBig=FtIbMqi8F5nA/9t9zM2sKw==, tableContent=null), ArticleFig(id=1218843907463500599, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Fig.4, caption=LSTM unit structure diagram, figureFileSmall=YrHznsu4XURL8yBD+H2DNA==, figureFileBig=PunPq8rIgWrSzEf/aXhnxg==, tableContent=null), ArticleFig(id=1218843907572552517, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=图4, caption=LSTM单元结构图, figureFileSmall=YrHznsu4XURL8yBD+H2DNA==, figureFileBig=PunPq8rIgWrSzEf/aXhnxg==, tableContent=null), ArticleFig(id=1218843907719353172, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Table 1, caption=

Introduction to dataset distribution

, figureFileSmall=null, figureFileBig=null, tableContent=
数据集名称 实体类型 训练集 验证集 测试集 评价指标
CCKS2019 解剖部位、手术、疾病和诊断、药物、实验室检验、影像检查 1 000 200 179 F1
CMeEE 疾病名称、临床表现、医疗程序、医疗设备、药物名称、医学检验项目、身体部位、科室微生物类 15 000 5 000 3 000 F1
CMeEE-V2 疾病名称、临床表现、医疗程序、医疗设备、药物名称、医学检验项目、身体部位、科室微生物类 15 000 5 000 3 000 F1
), ArticleFig(id=1218843907849376604, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=表1, caption=

数据集分布情况介绍

, figureFileSmall=null, figureFileBig=null, tableContent=
数据集名称 实体类型 训练集 验证集 测试集 评价指标
CCKS2019 解剖部位、手术、疾病和诊断、药物、实验室检验、影像检查 1 000 200 179 F1
CMeEE 疾病名称、临床表现、医疗程序、医疗设备、药物名称、医学检验项目、身体部位、科室微生物类 15 000 5 000 3 000 F1
CMeEE-V2 疾病名称、临床表现、医疗程序、医疗设备、药物名称、医学检验项目、身体部位、科室微生物类 15 000 5 000 3 000 F1
), ArticleFig(id=1218843907950039909, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Table 2, caption=

Experimental environment setting table

, figureFileSmall=null, figureFileBig=null, tableContent=
实验环境名称 配置
操作系统 Windows10
编程语言 Python3.8
深度学习框架 Pytorch1.9.0
GPU RTX 3090
), ArticleFig(id=1218843908075869041, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=表2, caption=

实验环境设置表

, figureFileSmall=null, figureFileBig=null, tableContent=
实验环境名称 配置
操作系统 Windows10
编程语言 Python3.8
深度学习框架 Pytorch1.9.0
GPU RTX 3090
), ArticleFig(id=1218843908201698175, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Table 3, caption=

Experimental parameter setting table

, figureFileSmall=null, figureFileBig=null, tableContent=
参数名称 参数值
epoch 10
学习率 2×10-5
LSTM隐层维度 512
dropout 0.1
优化器 Adam
batch_size 16
最大序列长度 256
), ArticleFig(id=1218843908340110222, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=表3, caption=

实验参数设置表

, figureFileSmall=null, figureFileBig=null, tableContent=
参数名称 参数值
epoch 10
学习率 2×10-5
LSTM隐层维度 512
dropout 0.1
优化器 Adam
batch_size 16
最大序列长度 256
), ArticleFig(id=1218843908478522270, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Table 4, caption=

Comparison experiment of decoding methods

, figureFileSmall=null, figureFileBig=null, tableContent=
序号 模型 F1/%
CCKS2019 CMeEE
1 CRF 80.14 65.09
2 Global Pointer 80.47 66.21
), ArticleFig(id=1218843908608545703, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=表4, caption=

解码方法对比实验

, figureFileSmall=null, figureFileBig=null, tableContent=
序号 模型 F1/%
CCKS2019 CMeEE
1 CRF 80.14 65.09
2 Global Pointer 80.47 66.21
), ArticleFig(id=1218843908767929271, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Table 5, caption=

Examples of recognition results using different decoding methods

, figureFileSmall=null, figureFileBig=null, tableContent=
文本 CRF Global Pointer 真实结果
患者3月余前于我院诊断为直肠癌 直肠癌(疾病和诊断) 直肠癌(疾病和诊断) 直肠癌(疾病和诊断)
显微镜下见神经细胞空泡形成 显微镜(医疗设备)
神经细胞空泡(临床表现)
显微镜(医疗设备)
神经细胞(身体物质)
神经细胞空泡(临床表现)
显微镜(医疗设备)
神经细胞(身体物质)
神经细胞空泡(临床表现)
), ArticleFig(id=1218843908902147010, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=表5, caption=

不同解码方法识别结果示例

, figureFileSmall=null, figureFileBig=null, tableContent=
文本 CRF Global Pointer 真实结果
患者3月余前于我院诊断为直肠癌 直肠癌(疾病和诊断) 直肠癌(疾病和诊断) 直肠癌(疾病和诊断)
显微镜下见神经细胞空泡形成 显微镜(医疗设备)
神经细胞空泡(临床表现)
显微镜(医疗设备)
神经细胞(身体物质)
神经细胞空泡(临床表现)
显微镜(医疗设备)
神经细胞(身体物质)
神经细胞空泡(临床表现)
), ArticleFig(id=1218843909044753355, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Table 6, caption=

Comparison of pre trained models

, figureFileSmall=null, figureFileBig=null, tableContent=
序号 模型 F1/%
1 BERT 73.44
2 RoBERTa 73.66
3 RoBERTa-wwm-ext 73.75
4 MacBERT 73.86
5 RoBERTa-wwm-ext-large 74.69
6 MacBERT-large(MABAGP) 74.84
), ArticleFig(id=1218843909187359700, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=表6, caption=

预训练模型对比

, figureFileSmall=null, figureFileBig=null, tableContent=
序号 模型 F1/%
1 BERT 73.44
2 RoBERTa 73.66
3 RoBERTa-wwm-ext 73.75
4 MacBERT 73.86
5 RoBERTa-wwm-ext-large 74.69
6 MacBERT-large(MABAGP) 74.84
), ArticleFig(id=1218843909308994529, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Table 7, caption=

Comparison experiment with mainstream models

, figureFileSmall=null, figureFileBig=null, tableContent=
序号 模型 F1/%
CCKS2019 CMeEE CMeEE-V2
1 BiLSTM+CRF 75.14 59.32 66.46
2 BERT-BiLSTM-CRF 78.40 63.80 69.92
3 Deep Biaffine[23] 78.51 63.89 71.57
4 W2NER[24] 77.83 63.91 70.49
5 Global Pointer[19] 78.67 64.84 73.12
6 MABAGP(本文方法) 80.47 66.21 74.84
), ArticleFig(id=1218843909451600880, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=表7, caption=

与主流模型对比实验

, figureFileSmall=null, figureFileBig=null, tableContent=
序号 模型 F1/%
CCKS2019 CMeEE CMeEE-V2
1 BiLSTM+CRF 75.14 59.32 66.46
2 BERT-BiLSTM-CRF 78.40 63.80 69.92
3 Deep Biaffine[23] 78.51 63.89 71.57
4 W2NER[24] 77.83 63.91 70.49
5 Global Pointer[19] 78.67 64.84 73.12
6 MABAGP(本文方法) 80.47 66.21 74.84
), ArticleFig(id=1218843909569041401, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=EN, label=Table 8, caption=

Ablation experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
序号 模型 F1/%
1 Baseline 79.88
2 Baseline+AT 80.28
3 Baseline+Attention 80.32
4 Baseline+AT+Attention 80.47
), ArticleFig(id=1218843909703258115, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1149776907639681095, language=CN, label=表8, caption=

消融实验

, figureFileSmall=null, figureFileBig=null, tableContent=
序号 模型 F1/%
1 Baseline 79.88
2 Baseline+AT 80.28
3 Baseline+Attention 80.32
4 Baseline+AT+Attention 80.47
)], attaches=null, journal=Journal(id=1146119176004939786, delFlag=0, nameCn=科学技术与工程, nameEn=Science Technology and Engineering, nameHistory1=null, nameHistory2=null, issn=1671-1815, eissn=, cn=11-4688/T, coden=null, periodic=4, language=CN, oaType=是, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=UKU/O7GSka5polgCTkbIIw==, journalPrice=null, startedYear=null, abbrevIsoEn=Sci Technol Eng, journalRemark=null, publicationField=null, createdTime=null, updatedTime=1754445529766, createdBy=null, updatedBy=13701087609, firstLetterCn=S, firstLetterEn=S, subjectCode=Natural Sciences, subjectName=自然科学, subjectCodeEn=Natural Sciences, subjectNameEn=null, picCn=UKU/O7GSka5polgCTkbIIw==, picEn=5hwlULoNwcbj3xUmVi9MAQ==, jcr=null, cjcr=null, exts=[JournalExt(id=1159791870395564357, language=CN, name=科学技术与工程, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.stae.com.cn/jsygc/home, createdTime=1754445529793, updatedTime=1754445529793, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=http://www.stae.com.cn/jsygc/site/menus/20090429150146001, submissionAuthorUrl=http://www.stae.com.cn/jsygc/author/login, submissionEditorUrl=http://www.stae.com.cn/jsygc/editor/login, submissionReviewUrl=http://www.stae.com.cn/jsygc/reviewer/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""}), JournalExt(id=1159791870441701702, language=EN, name=Science Technology and Engineering, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.stae.com.cn/jsygc/home, createdTime=1754445529804, updatedTime=1754445529804, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=http://www.stae.com.cn/jsygc/author/login, submissionEditorUrl=http://www.stae.com.cn/jsygc/editor/login, submissionReviewUrl=http://www.stae.com.cn/jsygc/reviewer/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1146123166801305609, websiteList=[Website(id=1148243202391400884, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123166801305609, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kxjsygc/CN, language=CN, createTime=1751692112777, createBy=18614031015, updateTime=1753520965431, updateBy=18614031015, name=科学技术与工程-中文站点, tplId=1146099689490845704, title=科学技术与工程, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1148622798802673703, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=articleTextType, value=kx, createTime=1751782615614, updateTime=1751782615614, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798781702180, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=banner, value=null, createTime=1751782615609, updateTime=1751782615609, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798769119267, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=j86gbwi+p0Idkyl5SzIlmQ==, createTime=1751782615606, updateTime=1751782615606, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798794285094, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1751782615612, updateTime=1751782615612, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798790090789, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1751782615611, updateTime=1751782615611, creator=18614031015, updator=18614031015)]), Website(id=1155914124811976731, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123166801305609, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kxjsygc/EN, language=EN, createTime=1753521003206, createBy=18614031015, updateTime=1753521003206, updateBy=18614031015, name=科学技术与工程-英文站点, tplId=1146101810881728533, title=Science Technology and Engineering, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1155914371227308235, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=articleTextType, value=kx, createTime=1753521061952, updateTime=1753521061952, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371210531016, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=banner, value=null, createTime=1753521061947, updateTime=1753521061947, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371202142407, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=j86gbwi+p0Idkyl5SzIlmQ==, createTime=1753521061945, updateTime=1753521061945, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371223113930, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1753521061950, updateTime=1753521061950, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371218919625, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1753521061949, updateTime=1753521061949, creator=18614031015, updator=18614031015)])], journalTitle=科学技术与工程, weixinUrl=null, journalUrl=null, iacademicId=null, status=0, seqNo=null, journalTitleEn=Science Technology and Engineering, journalPhotoCn=UKU/O7GSka5polgCTkbIIw==, journalPhotoEn=5hwlULoNwcbj3xUmVi9MAQ==, journalFirstLetter=S, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=null, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/kxjsygc/CN/10.12404/j.issn.1671-1815.2403519, detailUrlEn=https://castjournals.cast.org.cn/joweb/kxjsygc/EN/10.12404/j.issn.1671-1815.2403519, pdfUrlCn=https://castjournals.cast.org.cn/joweb/kxjsygc/CN/PDF/10.12404/j.issn.1671-1815.2403519, pdfUrlEn=https://castjournals.cast.org.cn/joweb/kxjsygc/EN/PDF/10.12404/j.issn.1671-1815.2403519, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
基于MacBERT与全局指针网络的中文电子病历命名实体识别
收藏切换
PDF下载
吴天宇 , 郭冬冬 * , 李文桥 , 李子康 , 苗琳
科学技术与工程 | 论文·自动化技术、计算机技术 2025,25(11): 4656-4665
收起
收藏切换
科学技术与工程 | 论文·自动化技术、计算机技术 2025, 25(11): 4656-4665
基于MacBERT与全局指针网络的中文电子病历命名实体识别
全屏
吴天宇 , 郭冬冬* , 李文桥, 李子康, 苗琳
作者信息
  • 北京信息科技大学计算机学院, 北京 100101
  • 吴天宇(2000—),男,汉族,北京丰台人,硕士研究生。研究方向:知识图谱。E-mail:

通讯作者:

* 郭冬冬(1990—),男,汉族,山西大同人,博士。研究方向:自然语言处理。E-mail:
Named Entity Recognition for Chinese Electronic Medical Records Using MacBERT and Global Pointer Network
Tian-yu WU , Dong-dong GUO* , Wen-qiao LI, Zi-kang LI, Lin MIAO
Affiliations
  • Computer School, Beijing Information Science and Technology University, Beijing 100101, China
出版时间: 2025-04-18 doi: 10.12404/j.issn.1671-1815.2403519
文章导航
收藏切换

针对现有序列标注方法不能有效解决中文电子病历嵌套实体识别问题,提出一种基于MacBERT与全局指针网络的中文电子病历命名实体识别模型。首先通过MacBERT-large预训练模型将文本转换为结合语境信息的动态向量,然后使用FGM (fast gradient method)方法生成对抗样本添加至原有向量并一同输入BiLSTM (bi-directional long short-term memory)网络获取上下文特征,并通过引入注意力机制增强长距离语义特征获取,最后利用全局指针网络模型同时考虑头部和尾部的特征信息进行解码以获得更好的医学嵌套实体预测效果。实验结果表明,本文模型相较于识别效果较好的主流模型全局指针网络模型在CCKS2019以及两个版本的CMeEE中文电子病历数据集上F1分别提高了1.8%、1.37%、1.72%,证明了模型的有效性。

命名实体识别  /  中文电子病历  /  全局指针网络  /  注意力机制

Addressing the limitation of existing sequence labeling approaches in effectively recognizing nested entities within Chinese electronic health records (EHRs), a novel named entity recognition model that integrates MacBERT and a global pointer network was proposed. Initially, the MacBERT-large pre-trained model transformed the text into context-sensitive dynamic vectors. Subsequently, the fast gradient method (FGM) was employed to generate adversarial samples, which were incorporated into the original vectors and fed into a BiLSTM (bi-directional long short-term memory) network to capture contextual features. To enhance the capture of long-distance semantic features, an attention mechanism was introduced. Finally, a global pointer network model was leveraged to decode simultaneously considering both head and tail feature information, thereby achieving superior prediction performance for medical nested entities. Experimental results demonstrate that compared to the state-of-the-art global pointer model, the proposed model achieves an improvement of 1.8%, 1.37%, and 1.72% in F1-score on the CCKS2019 dataset and two versions of the CMeEE Chinese EHR dataset, respectively, validating the effectiveness of the proposed approach.

named entity recognition  /  Chinese electronic medical record  /  global pointer network  /  attention mechanism
吴天宇, 郭冬冬, 李文桥, 李子康, 苗琳. 基于MacBERT与全局指针网络的中文电子病历命名实体识别. 科学技术与工程, 2025 , 25 (11) : 4656 -4665 . DOI: 10.12404/j.issn.1671-1815.2403519
Tian-yu WU, Dong-dong GUO, Wen-qiao LI, Zi-kang LI, Lin MIAO. Named Entity Recognition for Chinese Electronic Medical Records Using MacBERT and Global Pointer Network[J]. Science Technology and Engineering, 2025 , 25 (11) : 4656 -4665 . DOI: 10.12404/j.issn.1671-1815.2403519
随着医疗技术的不断进步和医疗服务的日益完善,大量的医疗记录不断积累,给医疗管理和研究带来了极大的挑战。为了更加高效、便捷地管理这些记录,电子病历应运而生并逐渐得到广泛应用。电子病历的出现,不仅极大地提高了医疗工作的效率,也为医疗研究提供了更加便捷的数据来源。电子病历中通常蕴含着丰富的医学知识,当前,如何从非结构化公开的中文电子病历文本中抽取出结构化的医学知识,已成为医学信息抽取领域的研究热点之一[1-2]
命名实体识别(named entity recognition,NER)可以从文本中识别出具有特定意义的实体,并将它们分类为预定义的类别,如人名、地名、组织机构名、日期、时间等[3]。中文电子病历命名实体识别旨在利用规则模板、机器学习以及深度学习等方法从非结构化的电子病历中抽取出不涉及患者隐私的、结构化的医学知识信息,如药物名、手术名、症状名等,并利用这些数据为医学信息抽取、医学知识图谱构建等工作提供坚实的支撑[4],从而进一步推动医疗信息化。
然而中文电子病历命名实体识别不同于通用领域的实体识别,其所包含的医学术语专业性强、结构复杂,同时含有大量嵌套医学实体[5],例如,针对中文电子病历文本“对有心脏损害者应进行床边动态心电监护”,其中存在疾病名称实体“心脏损害”、治疗方案实体“床边动态心电监护”,而“心脏损害”实体中又嵌套身体部位实体“心脏”,现有序列标注方法通常不能同时识别两个实体。同时电子病历通常涉及患者隐私,需要专业人士进行数据标注,现有公开电子病历数据集少且存在较多的标注噪声。多种原因导致了电子病历信息抽取难度增大,中文电子病历的命名实体识别面临着巨大的挑战。
中文电子病历作为医疗信息化的重要组成部分,其命名实体识别的准确性直接影响到医疗数据的挖掘和利用。因此,为了提升中文电子病历命名实体识别模型性能以及解决中文电子病历中嵌套实体处理难的问题,对中文电子病历领域命名实体识别方法进行研究,使用深度学习方法提出一种基于MacBERT与全局指针网络的中文电子病历命名实体识别模型,在解决扁平医学实体识别任务的同时,能够更有效地处理嵌套医学实体的识别问题。此外通过引入注意力机制和对抗训练方法提升模型的鲁棒性与特征提取能力,进一步提高医疗命名实体的识别性能。本文模型为解决中文电子病历命名实体识别任务提供新的思路,同时有助于推动医疗信息化的发展。
早期针对于命名实体识别研究主要采用两种方法,一是基于事先定义的规则模板,二是基于机器学习技术。基于规则模板的方法通常依赖医学领域专家根据数据分布特征制定固定的规则,然后进行实体匹配,制定规则的过程耗时,容易产生错误,同时可移植性差[6]。基于机器学习的方法通常利用机器自动提取文本特征,将命名实体识别视为序列标注任务,对输入序列的每个元素进行标记或打标签,然而这种方法需要大量的特征工程且训练出的模型泛化能力较弱。
随着各种算法和模型不断更迭,目前深度学习神经网络方法在医学信息处理中占据主导地位,尤其在实体识别任务上表现出显著优势,其主要研究方法包括卷积神经网络(convolutional neural networks,CNN)、循环神经网络(recurrent neural networks,RNN)等,以及改进RNN模型的长短期记忆网络(long-short term memory,LSTM)等模型[7]。Yin等[8]提出基于字符偏旁的BiLSTM-CRF(bi-directional long short-term memory-conditional random field)命名实体识别模型,并在基础上引入注意力机制,最终在中文电子病历数据集CCKS2017上取得了93.00%的F1值。
基于深度学习的方法通常使用词向量嵌入模型进行编码,近期学者们发现将BERT[9]预训练模型及其各种变形如RoBERTa、RoBERTa-wwm-ext、ALBERT以及MacBERT等运用到电子病历实体识别模型的嵌入中,有助于增强语义表示。陈琛等[10]利BERT预训练模型代替word2vec生成词向量,并与BiLSTM-CRF模型结合,实验表明该模型对中文电子病历命名实体识别效果好于BiLSTM-CRF。孔令巍等[11]在基线模型中引入对抗训练样本,最终在电子病历数据集CCKS2021上的精准率、召回率以及F1值相比于基线模型均有所提升。陈娜等[12]在BERT-BiGRU-CRF模型的基础上引入注意力机制,强化了长距离文本语义特征获取,在多为扁平实体的电子病历数据集CCKS2019上相较于BiLSTM-CRF等基准模型取得了较好的识别效果。李洋等[13]结合对抗训练方法和BERT嵌入技术提出了一种命名实体识别模型,解决了复合材料检测领域的数据规模较小且专业名词多、边界混淆等问题。蒋丽媛等[14]使用笔画组成编码器获取汉字字形特征并与BERT-BiLSTM-CRF模型结合,在Resume数据集上取得了较好的识别效果。赵珍珍等[15]提出一种融合词信息与图注意力的命名实体识别模型,该模型使用学习了医疗知识的MedBERT作为嵌入层,在引入词向量嵌入的同时通过图注意力机制增强模型学习医学文本上下文关系的能力,最终在两个医学数据集上均取得了较好的识别水平。陆鑫涛等[16]提出一种融入拼音与词性特征的电子病历命名实体识别方法,在使用BERT进行嵌入的基础上引入中文拼音特征,并提取词性特征对拼音特征的不确定性加以约束,最后通过缩放点积注意力模块将三种类型特征进行融合,在电子病历数据集CCKS2018、CCKS2019与通用领域数据集Weibo上,F1分别达到了98.66、87.25、73.41。
综上所述,目前针对命名实体识别的现有研究取得了显著进展,研究者们通过设计更复杂的神经网络模型、利用更多的特征工程、优化算法、调整模型结构等方式,不断提升实体识别的准确率和效率。但其中大多数针对中文电子病历领域实体识别的研究主要采用序列标注方法集中解决扁平医学实体的识别,而对于电子病历文本中嵌套医学实体识别的研究较少,且效果不理想。因此,针对上述问题,同时为了提升中文电子病历领域命名实体识别准确率,提出一种结合MacBERT与全局指针网络的命名实体识别方法MABAGP(MacBERT-adversarial-BiLSTM-attention-global pointer),解中文电子病历文本中扁平医学实体识别的同时更好地解决嵌套医学实体识别问题。
提出一种基于MacBERT-large[17]、对抗训练(adversarial training,AT)[18]、双向长短期记忆网络(bi-directional long short-term memory,BiLSTM)、注意力机制以及全局指针网络(global pointer)[19]的中文电子病历命名实体识别模型MABAGP,模型框架如图 1所示。
MABAGP模型的具体识别流程如下:首先将待识别的电子病历文本输入模型的嵌入层,通过使用n-gram与全词掩码随机替换策略的中文预训练语言模型MacBERT-large转换为高维度的动态向量表示,然后通过FGM(fast gradient method)对抗训练算法针对原始向量添加扰动以生成对抗样本以提升模型的鲁棒性和泛化能力,并将原始向量与对抗向量一同输入双向长短期记忆网络中进行序列编码,再利用多头注意力机制增强长距离语义特征获取,最后在解码层通过全局指针网络模型同时考虑实体起始和终止位置的特征信息输出解码实体信息矩阵,在预测扁平实体的同时更好的预测医学嵌套实体,完成对中文电子病历文本的命名实体识别。
BERT使用MLM(masked language model)随机掩码策略,将输入序列中的一部分字随机地MASK掉,并要求模型根据上下文来预测这些被MASK的字的标识符。这使得模型能够双向地理解文本,并生成更具上下文意义的向量表示。但在中文信息处理中,词汇相比单个字可能包含了更多的语义信息,单纯对字进行MASK可能导致一些具有重要意义的信息缺失。RoBERTa-wwm-ext在BERT的基础上进行改进,采用更大的训练集、参数、训练轮次,并使用全词掩码策略(whole word masking,WWM)代替MLM,在保留了整个词语的完整性的同时缓解了信息丢失的问题,但全词掩码方法虽然对预训练阶段是有帮助的,能加强预训练的阶段模型效果,但是对下游任务提升不明显。MacBERT是由哈工大讯飞联合实验室提出的新型预训练语言模型,使用纠错型掩码策略(masked language model as correction,MAC),其在全词掩码策略的基础上,利用N-gram方式选择待掩码的标记,其中1-gram至4-gram的概率分别为40%、30%、20%、10%,再利用相似词进行代替MASK掩码标记,有效解决了预训练阶段有MASK标记,下游任务无MASK标记,这种上下游任务不一致的问题,从而提高了模型的泛化能力。MLM、WWM、MAC三种不同掩码策略分别如图2所示。MacBERT在中文信息处理任务上表现优秀,超越了BERT、RoBERTa、ERNIE等其他预训练模型。同时大量现有研究表明,大模型的识别效果通常要比小模型好,因此本文中使用更大规模语料库训练、模型结构更复杂的MacBERT-large作为模型的编码嵌入层。
MacBERT-large首先对输入的中文电子病历文本进行分词操作,之后输入嵌入层。如图3所示,嵌入层包括三种嵌入表示,其中词嵌入用于表示词本身的信息特征,段嵌入用于判断句子的先后顺序以获取句子特征,位置嵌入则是用于编码和学习词在句中的位置信息特征,[CLS]和[SEP]分别代表句子开始与结束的标志,最后将3种嵌入方法获得的向量相加,得到输入中文电子病历文本的特征向量,作为后续模块的输入。
对抗训练是一种有效的深度学习正则化技术,其核心原理是使模型同时接受正样本和对抗样本的训练,其中对抗样本是通过对原始样本进行微小但有针对性的扰动而生成的。中文电子病历文本数据中通常存在各种形式的噪声和干扰,例如语义模糊、存在歧义和标注错误等。通过添加对抗训练,迫使模型在学习中更好地理解数据的分布,提高对于噪声和干扰的抵抗能力,以应对输入数据的微小变化。对抗训练可以简化为
$\max p \gtrless(\boldsymbol{y} \mid \boldsymbol{x}+\Delta \boldsymbol{x}, \boldsymbol{\theta})$
式(1)中:x代表输入;Δx代表对抗扰动;θ为模型参数;y为真实标签;为增加扰动后预测真实标签的概率,其中Δx在一定扰动空间内进行扰动。
GoodFellow等[20]提出FGM方法,以输入向量x=[v1,v2,…,vt]为例,首先复制预训练阶段的词汇向量,计算x的梯度并进行标准化处理,得到扰动值Δx,具体计算公式为
$\Delta x=\varepsilon \frac{g}{\|g\|_{2}}$
式(2)中:ε为一个缩放因子;g为梯度,计算公式为
$\boldsymbol{g}=\nabla_{x} L\left(\boldsymbol{v}_{t}, \boldsymbol{y}, \boldsymbol{\theta}\right)$
式(3)中:L(vt,y,θ)是单个样本的损失。然后将扰动值与原始向量x相加,生成新的向量,并使用新的向量计算梯度,并将其与原始梯度相加,最后根据当前梯度来更新参数。以下是FGM对抗训练算法的简要描述。
对于数据集中的x:
(1)计算x的前向传播损失,并反向传播以获取梯度。
(2)通过嵌入矩阵的梯度算出Δx,并与x相加得到xx
(3)计算xx的前向损失并反向传播以获取梯度,然后累加到步骤(1)中的梯度上。
(4)将embedding重置为步骤(1)时的状态。
(5)根据步骤(3)中计算得到的梯度来更新模型的参数。
使用双向长短期记忆网络作为字向量的特征提取模块来捕获医学文本中的上下文语义信息。LSTM是对循环神经网络进行改进后的模型,相较于传统的RNN,LSTM通过引入门控机制实现了对输入数据的有选择性存储与遗忘。门控机制赋予了LSTM网络更强大的记忆和学习能力,使其在处理长序列数据时表现更为优秀,同时很好地解决了循环神经网络的梯度爆炸问题。其中LSTM的单元结构如图4所示。
LSTM按时间步从左到右处理输入序列,每个时间步的隐藏状态ht和单元状态ct的表达式为
$i_{t}=\sigma\left(\boldsymbol{W}_{i x} \boldsymbol{x}_{t}+\boldsymbol{W}_{i \boldsymbol{h}} \boldsymbol{h}_{t-1}+\boldsymbol{b}_{i}\right)$
$\boldsymbol{f}_{t}=\sigma\left(\boldsymbol{W}_{f x} \boldsymbol{x}_{t}+\boldsymbol{W}_{f \boldsymbol{h}} \boldsymbol{h}_{t-1}+\boldsymbol{b}_{f}\right)$
$o_{t}=\sigma\left(\boldsymbol{W}_{o x} \boldsymbol{x}_{t}+\boldsymbol{W}_{o \boldsymbol{h}} \boldsymbol{h}_{t-1}+\boldsymbol{b}_{\boldsymbol{o}}\right)$
$\tilde{\boldsymbol{c}}_{t}=\tanh \left(\boldsymbol{W}_{\boldsymbol{c x}} \boldsymbol{x}_{t}+\boldsymbol{W}_{\boldsymbol{c h}} \boldsymbol{h}_{t-1}+\boldsymbol{b}_{\boldsymbol{c}}\right)$
$\boldsymbol{c}_{t}=\boldsymbol{f}_{t} \odot \boldsymbol{c}_{t-1}+\boldsymbol{i}_{t} \odot \tilde{\boldsymbol{c}}_{t}$
$\boldsymbol{h}_{t}=\boldsymbol{o}_{t} \odot \tanh \left(\boldsymbol{c}_{t}\right)$
式中:itftot c ~ t分别为输入门、遗忘门、输出门和表示当前信息的候选状态,W为相应的权重;σ和tanh分别为Sigmoid和双曲正切函数;☉表示逐元素乘积。
BiLSTM则由两个方向相反的LSTM层组成,一个从序列的起始处开始处理(正向),另一个从序列的末尾处开始处理(逆向),通过同时考虑输入序列的过去和未来信息来捕捉序列中的长期依赖关系。在医学文本中,一个实例的含义通常与其前后的文本都密切相关。BiLSTM能够捕获这种双向的依赖关系,从而更准确地识别出医学文本中的医疗实例。BiLSTM在t时刻的输出ht变化表达式为
$\overrightarrow{\boldsymbol{h}_{t}}=\operatorname{LSTM}\left(\boldsymbol{x}_{t}, \overrightarrow{\boldsymbol{h}_{t}}\right)$
$\overleftarrow{\boldsymbol{h}_{t}}=\operatorname{LSTM}\left(\boldsymbol{x}_{t}, \overleftarrow{\boldsymbol{h}_{t}}\right) $
$\boldsymbol{h}_{t}=\left[\begin{array}{ll} \overrightarrow{\boldsymbol{h}}_{t}, & \overleftarrow{\boldsymbol{h}_{t}} \end{array}\right] $
式中:xt代表输入; h t 代表正向的LSTM的隐藏状态; h t 代表逆向的LSTM的隐藏状态;ht表示BiLSTM最终的输出,由两个方向的隐藏状态拼接而成。
由于中文电子病历文本中长句较为普遍,BiLSTM在提取文本特征时难以有效捕获长距离依赖关系,以及难以获得句中各个字符对于识别的重要性,导致对于电子病历语义理解缺失,进而影响医学命名实体识别的准确性。因此本文借助注意力机制对BiLSTM模块进行增强,通过注意力分配机制使模型能够更多的学习与医学实体紧密相关的特征,同时减少学习与医学实体无关的特征,并在此基础上加强长距离语义特征获取,以达到更好的医学实体识别效果。
设BiLSTM层的输出为H={h1,h2,…,hn},其中hi为BiLSTM的第i个隐藏状态,n为序列长度。首先,计算注意力权重eiei为当前信息与上下文信息相关度的注意力分数向量,表达式为
$\boldsymbol{e}_{i}=\tanh \left(\boldsymbol{W}_{i} \boldsymbol{h}_{i}+\boldsymbol{b}_{i}\right)$
式(13)中:Wi为可学习的权重矩阵;bi为偏置向量,然后使用softmax函数对注意力分数向量ei进行归一化处理,得到注意力权重向量αi,表达式为
$\boldsymbol{\alpha}_{i}=\frac{\mathrm{e}^{e_{i}}}{\sum_{j=1}^{n} \mathrm{e}^{e_{j}}}$
最后使用注意力权重α对BiLSTM的输出进行加权求和,得到注意力机制模块的输出,表达式为
Attention(H)= i = 1 nαihi
中文电子病历文本数据在经过MacBERT-large编码、对抗训练添加扰动、双向长短期记忆网络以及注意力机制提取文本特征后,还需要进行解码以输出预测实体。条件随机场模型(conditional random field,CRF)是命名实体识别领域常用的解码器,但是针对中文电子病历中存在大量嵌套医疗实体问题未能有效的解决。因此本文提出使用全局指针网络作为模型的解码层,其同时考虑头部和尾部的特征信息进行解码,在正确预测医学非嵌套实体的基础上获得更好的医学嵌套实体预测效果。
全局指针网络的整体思想为使用类似Attention的打分机制,将多个医学实体类型的识别视为Muti-head机制,其中每一个head负责一种医学实体类型的识别。设长度为n的中文电子病历文本经过编码得到x=[v1,v2,…,vt],通过qi,α=Wq,αvi+bq,αki,α=Wk,αvi+bk,α变化得到序列向量序列qα=[q1,α,q2,α,…,qn,α]和kα=[k1,α,k2,α,…,kn,α],用来识别第α种类型实体,相应的可以定义打分函数,表达式为
sα(i,j)= q i , α Tkj,α
式(16)中:sα(i,j)表示从ij的连续电子病历片段是一个类型为α的实体的打分。在此基础上,为了充分利用相对位置信息,引入旋转位置编码(rotary positional encoding,RoPE),表达式为
sα(i,j)= ( R i q i , α ) T(Rjkj,α)= q T i , α R T iRjkj,α
q T i , α R T iRjkj,α= q T i , αRj-ikj,α
式中: R T iRj=Rj-i。同时为了解决参数量倍增问题,采用矩阵共享方法共用一个打分矩阵 ( W q h i ) T(Wkhj)对于同一实体类型的实体进行识别,该打分矩阵在新增实体类型也可继续沿用,打分函数可以优化为
sα(i,j)= ( W q h i ) T(Wkhj)+ w T α[hi;hj]
还可用[qi;ki]来替代hi以进一步减少参数量,表达式为
sα(i,j)= q T ikj+ w T α[qi;ki;qj;kj]
在拥有考虑位置信息以及优化后的打分函数sα(i,j)后,采用单目标多分类交叉熵损失函数explogsum实现权重的自动平衡,表达式为
$\begin{aligned} L= & \lg \left[1+\sum_{(i, j) \in P \alpha} \mathrm{e}^{-s_{\alpha}(i, j)}\right]+ \\ & \lg \left[1+\sum_{(i, j) \in Q_{\alpha}} \mathrm{e}^{s_{\alpha}(i, j)}\right] \end{aligned} $
式(21)中:Pα为电子病历样本所有类型为α的实体首位集合;Qα为电子病历样本的所有非实体或者类型非α的实体的首位集合,同时只考虑ij的组合。最后在解码时,满足sα(i,j)>0的电子病历文本片段会输出类型为α的医学实体,由此完成对中文电子病历中命名实体的识别。
为了验证本文所提模型对于中文电子病历命名实体识别的有效性,选取CCKS2019、CMeEE[21]、CMeEE-V2这3个数据集并设计相关实验并进行分析。CCKS2019数据集来源于2019年全国知识图谱与语义计算大会,由医渡云医学根据真实的患者病历分布人工编辑而成,包含1 000条训练样本与379条测试样本。CMeEE数据集来源于中文医疗信息处理评测基准CBLUE[22],包含15 000条训练集、5 000条验证集与3 000条测试集。CMeEE-V2是对CMeEE的补充修正版本,主要更新是修复了原始数据中的部分标注错误,提升了语料质量,数据集的分布情况没有改变。其中CMeEE与CMeEE-V2数据集均包含了医疗嵌套实体,用于验证本文提出的方法的有效性。本文所用数据集的分布情况及所用评价指标如表1所示。
本文使用F1值作为主要评价指标在三个数据集上来测试命名实体识别模型的性能与效果,其中精准率P(precision)、召回率R(recall)作为辅助评价指标,F1值的计算同时兼顾了准确率和召回率。评价指标的具体计算方法为
P= n p n p + n t
R= n p n c
F1= 2 P R P + R×100%
式中:np为正确识别出的电子病历实体数量;nt为识别错误的电子病历实体数量;nc为待识别语料中所有电子病历实体数量。
本文中所采用的实验环境设置如表2所示,实验参数设置如表3所示。
为了对比基于全局指针网络的解码方法与CRF解码方法在模型中发挥的作用,本文中分别在无嵌套实体中文电子病历数据集CCKS2019与有嵌套实体电子病历数据集CMeEE上进行实验。
表4所示,在本文提出的模型其他模块保持不变的基础上,基于全局指针网络的解码方法相比于CRF在无嵌套实体电子病历数据集CCKS2019上F1提高了0.33%,在有嵌套实体电子病历数据集CMeEE上F1提高了1.11%。全局指针网络模型从全局的角度出发,同时考虑医疗实体的起始和终止位置,是将首尾看成一个整体来预测医疗非嵌套实体与嵌套实体,其对于非嵌套实体的识别效果可以媲美甚至略优于CRF,对于嵌套实体的识别效果相比CRF有较大提升,且其训练和预测过程都是并行的,因此本文提出使用全局指针网络模型作为模型的解码方法是有效的。
为了对比全局指针网络模型与CRF模型在真实电子病历数据样本上识别医疗嵌套实体的能力,本文分别从不包含嵌套实体的中文电子病历数据集与包含嵌套实体的中文电子病历数据集数据集中随机选取测试数据并进行预测,结果如表5所示。可以看到针对只包含扁平医疗实体的电子病历文本“患者3月余前于我院诊断为直肠癌”,CRF解码方法与Global Pointer解码方法均可以准确识别出“直肠癌”为疾病和诊断实体,且与真实结果一致。针对同时包含扁平医疗实体和嵌套医疗实体的电子病历文本“显微镜下见神经细胞空泡形成”,CRF解码方法可以识别出“显微镜”为医疗设备实体,“神经细胞空泡”为临床表现实体,缺少对于嵌套身体物质实体“神经细胞”的识别,而GlobalPointer解码方法可以准确识别出以上三个类型的实体,且与真实结果一致,由此证明了全局指针网络模型作为解码方式的有效性与识别嵌套医疗实体的能力。
为了探究不同预训练模型编码方法对于电子病历命名实体识别模型性能的影响,实验选取了命名实体识别领域当中常用的预训练模型在中文电子病历数据集CMeEE-V2上进行对比实验分析。
表6所示:
(1)RoBERTa-wwm-ext模型采用全词掩码策略获得词级别的向量,在CMeEE-V2数据集上取得了73.75%的F1,优于仅获得字级别的向量模型RoBERTa,而在随机全词掩码策略的基础上采用n-gram与同义词替代方法的MacBERT模型消除了预训练阶段与下游任务不一致的问题,取得的F1分数均优于上述两种模型,其原理也更适用于专业性强的中文电子病历数据。
(2)RoBERTa模型相比原始BERT模型采用了动态字向量机制、更大的预训练数据集、更长的训练时间以及更多的预训练轮次以取得了更好的识别效果。RoBERTa-wwm-ext-large和MacBERT-large采用24层Transformer编码器而含有更大的参数量,相比于base版F1分别提升了0.94%与0.98%。实验结果表明,训练数据越多、模型结构越复杂、参数量越多,预训练模型处理电子病历文本的效果越好,因此本文使用F1最高MacBERT-large作为模型的嵌入层,模型整体的识别效果达到最优。
为了验证本文提出的中文电子病历命名实体识别模型的有效性,实验选取了命名实体识别领域常用的基准模型与近期主流模型分别在3个不同的中文电子病历数据集上进行对比,选取的模型分别是:BiLSTM-CRF、BERT-BiLSTM-CRF、Global Pointer[19]、Deep Biaffine[23]、W2NER[24]
表7所示,BiLSTM-CRF使用双向LSTM网络对文本进行序列建模,提取上下文包含的信息并使用CRF进行解码,而BERT-BiLSTM-CRF在BiLSTM-CRF的基础上使用BERT预训练模型作为字向量的嵌入方法以代替随机初始化方法,在3个数据集上的F1分别提升了3.26%、4.52%、3.46%,可以作为本文对比的基准模型。Deep Biaffine利用双仿射结构识别实体间的依存关系,间接可以对实体进行识别,相较于基准模型有略微提升。Global Pointer利用全局归一化的思路来进行命名实体识别,使用类似 Attention的打分机制作为最后的标注矩阵,同时考虑头部和尾部的特征信息,并在此基础上引入了旋转式位置编码,在包含嵌套实体的数据集上较基准模型提升明显。W2NER将命名实体识别任务转换为预测词对之间的关系类别,能够统一处理扁平实体、重叠实体和非连续实体三种命名实体识别任务但其在CCKS2019数据集上的F1较基准模型下降0.57%,在CMeEE和CMeEE-V2分别提升0.11%与0.57%,原因是在面对多为长文本的中文电子病历数据时,W2NER可能预测出了除扁平实体以外其他类型的实体,导致了F1分数不理想。本文提出的中文电子病历命名实体识别模型MABAGP相比基准模型在3个数据集上分别提高了2.07%、2.41%、4.92%;相比于识别效果最好的主流模型Global Pointer在3个数据集上分别提高了1.8%、1.37%、1.72%,由此验证了本文所提出模型在中文电子病历命名实体识别领域中的有效性与实用性。
本文所提出的方法在多个评估数据集上均优于其他对比模型,同时又对比了不同解码方法以及不同预训练模型对于模型性能的影响。在此基础之上,为了进一步探究加入对抗训练(AT)与注意力机制(Attention)对于模型识别效果的影响,本文在多为扁平医疗实体的中文电子病历数据集CCKS2019上设计了消融实验。将MacBERT-BiLSTM-GP模型作为基准模型(Baseline),并分别引入对抗训练样本和注意力机制进行实验。
表8可知:
(1)通过比较模型baseline和Baseline+AT可以得出,只增加对抗训练样本对模型识别效果有提升作用,F1较基准模型提升了0.4%。针对电子病历数据中存在的标注噪声问题,本文通过FGM方法生成对抗训练样本向量,然后以扰动形式添加至原始电子病历文本向量,迫使模型学习到更加鲁棒和泛化的特征表示,提高了其在面对未知数据时的性能,从而达到了更好的识别效果。
(2)通过比较模型baseline和Baseline+Attention可以得出,只增加注意力机制对模型识别效果有提升作用,F1较基准模型提升了约0.45%。针对中文电子病历文本语句通常较长,BiLSTM提取上下文特征时无法获得长距离的特征问题,通过引入注意力机制,在捕获长距离特征的基础上,为医疗命名实体相关的特征分配较多的注意力,无关的特征分配较少的注意力,进一步强化了当前信息与上下文信息之间潜在的语义关联性,从而提高了医疗命名实体的识别准确率。
(3)通过比较模型baseline和Baseline+AT+Attention(即MABAGP)可以得出,同时增加对抗训练与注意力机制至模型中,F1较基准模型提升了约0.6%,证明了各个模块在模型中的有效性,模型的识别效果达到最优。
为了改善中文电子病历命名实体识别存在标注噪声以及嵌套实体处理难等问题,提出了一种命名实体识别模型MABAGP。该模型首先通过MacBERT-large预训练模型将输入的电子病历文本转换为结合语境信息的动态向量,然后使用FGM对抗训练方法生成对抗样本以在原始向量上添加扰动,再将原始样本与对抗样本共同输入BiLSTM网络中捕获电子病历包含的上下文语义信息,并通过引入注意力机制增强长距离语义特征获取,最后利用全局指针网络模型同时考虑头部和尾部的特征信息进行解码。本文中分别在3个中文电子病历数据集上通过对比实验与消融实验证明了模型的有效性,在解决医学嵌套实体处理难问题的基础上改善了中文电子病历命名实体识别的效果,但是还存在一定改进空间。
(1)可以考虑使用医学领域数据预训练的BERT模型以取得更好的效果。
(2)通过引入FGM对抗训练方法提高了识别效果,可以考虑使用其他新型对抗训练方法进行对比取优。
(3)将本文提出的模型应该于其他命名实体识别领域以验证模型的泛化能力。
  • 国家重点研发计划(2021YFB2600600)
  • 北京信息科技大学校级科研项目(2023XJJ15)
  • 北京信息科技大学校级科研项目(2023XJJ17)
参考文献 引证文献
排序方式:
[1]
杜晋华, 尹浩, 冯嵩. 中文电子病历命名实体识别的研究与进展[J]. 电子学报, 2022, 50(12): 3030-3053.
Du Jinhua, Yin Hao, Feng Song. Research and progress on named entity recognition of Chinese electronic medical records[J]. Journal of Electronics, 2022, 50(12): 3030-3053.
[2]
周冬冬. 中文电子病历命名实体识别研究[D]. 大庆: 东北石油大学, 2024.
Zhou Dongdong. Research on named entity recognition of Chinese electronic medical records[D]. Daqing: Northeast University of Petroleum, 2024.
[3]
赵继贵, 钱育蓉, 王魁, 等. 中文命名实体识别研究综述[J]. 计算机工程与应用, 2024, 60(1): 15-27.
Zhao Jigui, Qian Yurong, Wang Kui, et al. A review of research on Chinese named entity recognition[J]. Computer Engineering and Applications, 2024, 60(1): 15-27.
[4]
Gao Y, Gu L, Wang Y, et al. Constructing a Chinese electronic medical record corpus for named entity recognition on resident admit notes[J]. BMC Medical Informatics and Decision Making, 2019, 19(2): 67-78.
[5]
吉旭瑞, 魏德健, 张俊忠, 等. 中文电子病历信息提取方法研究综述[J]. 计算机工程与科学, 2024, 46(2): 325-337.
Ji Xurui, Wei Dejian, Zhang Junzhong, et al. A review of research on information extraction methods for Chinese electronic medical records[J]. Computer Engineering and Science, 2024, 46(2): 325-337.
[6]
刘浩, 张建业, 吕张成, 等. 面向数控机床设计知识图谱构建的实体识别[J]. 科学技术与工程, 2023, 23(13): 5655-5661.
Liu Hao, Zhang Jianye, Zhangcheng, et al. Entity recognition based on knowledge graph construction for CNC machine tool design[J]. Science Technology and Engineering, 2023, 23 (13): 5655-5661.
[7]
赵辉, 庞海婷, 冯珊珊, 等. 中文命名实体识别技术综述[J]. 长春工业大学学报, 2021, 42(5): 444-450.
Zhao Hui, Pang Haiting, Feng Shanshan, et al. Overview of Chinese named entity recognition technology[J]. Journal of Changchun University of Technology, 2021, 42(5): 444-450.
[8]
Yin M W, Mou C J, Xiong K N, et al. Chinese clinical named entity recognition with radical-level feature and self-attention mechanism[J]. Journal of Biomedical Informatics, 2019, 98: 103289.
[9]
Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. Association for Computational Linguistics, 2019, 54(2): 104-124.
[10]
陈琛, 吴芬琳. 基于BERT的电子病历命名实体识别[J]. 自动化与仪器仪表, 2021, 41(3): 173-176.
Chen Chen, Wu Fenlin. Named entity recognition of electronic medical records based on BERT[J]. Automation and Instrumentation, 2021, 41(3): 173-176.
[11]
孔令巍, 朱艳辉, 张旭, 等. 基于对抗训练的中文电子病历命名实体识别[J]. 湖南工业大学学报, 2022, 36(3): 36-43.
Kong Lingwei, Zhu Yanhui, Zhang Xu, et al. Chinese electronic medical record named entity recognition based on adversarial training[J]. Journal of Hunan University of Technology, 2022, 36(3): 36-43.
[12]
陈娜, 孙艳秋, 燕燕. 结合注意力机制的BERT-BiGRU-CRF中文电子病历命名实体识别[J]. 小型微型计算机系统, 2023, 44(8): 1680-1685.
Chen Na, Sun Yanqiu, Yan Yan. BERT BiGRU-CRF Chinese electronic medical record named entity recognition combined with attention mechanism[J]. Small Micro Computer Systems, 2023, 44(8): 1680-1685.
[13]
李洋, 蔡红珍, 邢林林, 等. 基于对抗迁移的复合材料检测领域命名实体识别[J]. 科学技术与工程, 2022, 22(30): 13370-13377.
Li Yang, Cai Hongzhen, Xing Linlin, et al. Named entity recognition in the field of composite material detection based on adversarial transfer[J]. Science Technology and Engineering, 2022, 22(30): 13370-13377.
[14]
蒋丽媛, 吴亚东, 王书航, 等. 融合笔画特征的命名实体识别方法[J]. 科学技术与工程, 2023, 23(17): 7436-7443.
Jiang Liyuan, Wu Yadong, Wang Shuhang, et al. A named entity recognition method based on fusion of stroke features[J]. Science Technology and Engineering, 2023, 23(17): 7436-7443.
[15]
赵珍珍, 董彦如, 刘静, 等. 融合词信息和图注意力的医学命名实体识别[J]. 计算机工程与应用, 2024, 60(11): 147-155.
Zhao Zhenzhen, Dong Yanru, Liu Jing, et al. Medical named entity recognition based on fusion of word information and graph attention[J]. Computer Engineering and Applications, 2024, 60(11): 147-155.
[16]
陆鑫涛, 孙丽萍, 凌晨, 等. 融入拼音与词性特征的中文电子病历命名实体识别[J/OL]. 小型微型计算机系统: 1-12[2024-03-06]. http://kns.cnki.net/kcms/detail/21.1106.TP.20240228.1116.013.html. http://kns.cnki.net/kcms/detail/21.1106.TP.20240228.1116.013.html
Lu Xintao, Sun Liping, Ling Chen, et al. Chinese electronic medical record named entity recognition incorporating pinyin and part ofspeech features[J/OL]. Mini microcomputer systems: 1-12 [2024-03-06] http://kns.cnki.net/kcms/detail/21.1106.TP.20240228.1116.013.html. http://kns.cnki.net/kcms/detail/21.1106.TP.20240228.1116.013.html
[17]
Cui Y M, Che W X, Liu T, et al. Revisiting pre-trained models for Chinese natural language processing[J]. Findings of the Association for Computational Linguistics. Online: EMNLP, 2020: 657-668.
[18]
Szegedy C, Zaremba W, SutskeveR I, et al. Intriguing properties of neural networks[J/OL]. Computer Science, 2013. https://arxiv.org/abs/1312.6199. https://arxiv.org/abs/1312.6199
[19]
Su J L, Murtadha A, Pan S F, et al. Global pointer: novel efficient span-based approach for named entity recognition[J]. arXiv:2208.03054, 2022.
[20]
Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[EB/OL]. http://arxiv.org/abs/1412.6572. http://arxiv.org/abs/1412.6572
[21]
Hong Y Z, Wen X L, Kun L Z, et al. Building a pediatric medical corpus: word segmentation and named entity annotation[C]// Workshop on Chinese Lexical Semantics. Cham: Springer, 2020: 652-664.
[22]
Zhang N, Chen M, Bi Z, et al. CBLUE: a Chinese biomedical language understanding evaluation benchmark[J]. arXiv preprint arXiv: 2106.08087, 2021.
[23]
Wang X, Zhang Y, Ren X, et al. Cross-type biomedical named entity recognition with deep multi-task learning[J]. Bioinformatics, 2019, 35(10): 1745-1752.
[24]
Li J, Fei H, Liu J, et al. Unified named entity recognition as word-word relation classification[J]. Artificial Intelligence, 2022, 36(10): 10965-10973.
2025年第25卷第11期
PDF下载
227
72
引用本文
BibTeX
文章信息
doi: 10.12404/j.issn.1671-1815.2403519
  • 接收时间:2024-05-13
  • 首发时间:2025-07-09
  • 出版时间:2025-04-18
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2024-05-13
  • 修回日期:2024-08-01
基金
国家重点研发计划(2021YFB2600600)
北京信息科技大学校级科研项目(2023XJJ15)
北京信息科技大学校级科研项目(2023XJJ17)
作者信息
    北京信息科技大学计算机学院, 北京 100101

通讯作者:

* 郭冬冬(1990—),男,汉族,山西大同人,博士。研究方向:自然语言处理。E-mail:
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/kxjsygc/CN/10.12404/j.issn.1671-1815.2403519
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏