Article(id=1269679024126194589, tenantId=1146029695717560320, journalId=1269656373470969926, issueId=1269678996485734867, articleNumber=null, orderNo=null, doi=10.3969/j.issn.1008-0821.2026.03.004, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=null, receivedDate=1767888000000, receivedDateStr=2026-01-09, revisedDate=null, revisedDateStr=null, acceptedDate=null, acceptedDateStr=null, onlineDate=1780644668846, onlineDateStr=2026-06-05, pubDate=1772294400000, pubDateStr=2026-03-01, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1780644668846, onlineIssueDateStr=2026-06-05, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1780644668846, creator=13701087609, updateTime=1780644668846, updator=13701087609, issue=Issue{id=1269678996485734867, tenantId=1146029695717560320, journalId=1269656373470969926, year='2026', volume='46', issue='3', pageStart='3', pageEnd='183', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=1, specialIssue=null, createTime=1780644662255, creator=13701087609, updateTime=1780644725097, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1269679260173234368, tenantId=1146029695717560320, journalId=1269656373470969926, issueId=1269678996485734867, language=EN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1269679260173234369, tenantId=1146029695717560320, journalId=1269656373470969926, issueId=1269678996485734867, language=CN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=44, endPage=55, ext={EN=ArticleExt(id=1269679024365269919, articleId=1269679024126194589, tenantId=1146029695717560320, journalId=1269656373470969926, language=EN, title=Named Entity Recognition in Beiyang Government Documents Resources Using Large Language Models, columnId=1269679024180728413, journalTitle=Journal of Modern Information, columnName=DATA INTELLIGENCE and KNOWLEDGE SERVICE, runingTitle=null, highlight=null, articleAbstract=
Purpose/Significance

Addressing the challenges in named entity recognition(NER) for Beiyang Government Document Resources due to linguistic complexity, diversity, and lack of annotation data, this paper proposes a large language model-based NER framework adapted for low-resource scenarios. This framework provides methodological su⁃pport for structured mining and knowledge reorganization of modern historical documents. [Methods/

Process

This framework integrated retrieval-enhanced generation with efficient parameter fine-tuning. It used Faiss vector retrieval to build a dynamic context example selection method and used the LoRA strategy to add domain knowledge to large language models.On a custom corpus, the study designed seven special entity types,including persons, places, organizations, time, positions, events,and document types. The study then compared two deep learning entity recognition methods, BERT-BiLSTM-CRF and RoBERTa-BiLSTM-CRF,with Baichuan-4B, DcepSeck-R1, Xunzi-Qwen3-8B, Qwen3-4B,Llama, and GPT-4. The study evaluated large language models performance under different sampling methods.

Result/Conclusion

Experiments demonstrate that compared to traditional deep learning models and general-purpose large language mo⁃dels, the synergistic paradigm integrating LoRA fine-tuning with RAG significantly enhances entity recognition performance, achieving an overall F1 score of 0.857. A framework that uses RAG with large, fine-tuned language models for named entity recognition in Beiyang Government Document Resources works well together, and it achieves accurate entity identification in these historical records. This shows that large language models are practical and can be scaled when processing historical documents with limited resources.

, correspAuthors=Jun Deng, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Jun Deng, Zishu Zhang, Yubing Pan, Dongyu Ye, Yanyu Chang), CN=ArticleExt(id=1269679026785383344, articleId=1269679024126194589, tenantId=1146029695717560320, journalId=1269656373470969926, language=CN, title=基于大语言模型的北洋政府文书资源命名实体识别研究, columnId=1269679024457544608, journalTitle=现代情报, columnName=数据智能与知识服务, runingTitle=null, highlight=null, articleAbstract=
目的/意义

针对北洋政府文书资源因语言复杂性、多样性及标注数据缺乏导致的命名实体识别难题,本文提出一种适应低资源场景基于大语言模型的命名实体识别框架,为近代历史文献的结构化挖掘与知识重组提供方法支撑。

方法/过程

该框架融合检索增强生成与高效参数微调,利用Faiss向量检索构建上下文样例动态选取机制,通过LoRA策略对大语言模型进行领域知识注入。最后,在自建语料库上,系统评估深度学习基准模型与不同采样策略下的大语言模型性能。

结果/结论

结果表明,结合相似度样例选择与LoRA微调的Qwen3-4B模型效果最优,总体F1值达0.857,实现对北洋政府文书的精准实体识别,验证了大模型在低资源历史文书处理中的实用性与可扩展性。

, correspAuthors=邓君, authorNote=null, correspAuthorsNote=
邓君(1977-),女,教授、博士,博士生导师,研究方向:数字信息资源管理,数字人文与知识服务,档案管理与应用。
, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=I1vH7LH5W/WIFAd8NBF9VQ==, magXml=7TChAQN1VNFLD4jNcVK+dg==, pdfUrl=null, pdf=UeR60b5urQ27375P3GDpxw==, pdfFileSize=4404324, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=74y7HDiDB9h1Hzj94EypXg==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=JAvGA6qzt/eGnhM5gCEnsQ==, mapNumber=null, authorCompany=null, fund=null, authors=

张子姝(1995-),女,博士研究生,研究方向:数字人文与知识服务

潘禹兵(2003-),女,硕士研究生,研究方向:数字人文与知识服务

叶东宇(1996-),男,博士研究生,研究方向:自然语言处理

常严予(1998-),女,博士研究生,研究方向:数字人文与知识服务。

, authorsList=邓君, 张子姝, 潘禹兵, 叶东宇, 常严予)}, authors=[Author(id=1269679027196425142, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1269679027267728313, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679027196425142, language=EN, stringName=Jun Deng, firstName=Jun, middleName=null, lastName=Deng, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=*, address=1School of Business and Management,Jilin University,Changchun130012,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1269679027498415034, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679027196425142, language=CN, stringName=邓君, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=*, address=1吉林大学商学与管理学院,吉林长春130012, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1269679027095761842, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, xref=1, ext=[AuthorCompanyExt(id=1269679027104150451, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Business and Management,Jilin University,Changchun130012,China), AuthorCompanyExt(id=1269679027116733364, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1吉林大学商学与管理学院,吉林长春130012)])]), Author(id=1269679027573912508, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1269679027922039742, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679027573912508, language=EN, stringName=Zishu Zhang, firstName=Zishu, middleName=null, lastName=Zhang, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1School of Business and Management,Jilin University,Changchun130012,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1269679027989148607, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679027573912508, language=CN, stringName=张子姝, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1吉林大学商学与管理学院,吉林长春130012, bio={"content":"

张子姝(1995-),女,博士研究生,研究方向:数字人文与知识服务

"}, bioImg=null, bioContent=

张子姝(1995-),女,博士研究生,研究方向:数字人文与知识服务

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1269679027095761842, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, xref=1, ext=[AuthorCompanyExt(id=1269679027104150451, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Business and Management,Jilin University,Changchun130012,China), AuthorCompanyExt(id=1269679027116733364, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1吉林大学商学与管理学院,吉林长春130012)])]), Author(id=1269679028349858753, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1269679028748317635, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679028349858753, language=EN, stringName=Yubing Pan, firstName=Yubing, middleName=null, lastName=Pan, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1School of Business and Management,Jilin University,Changchun130012,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1269679028807037892, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679028349858753, language=CN, stringName=潘禹兵, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1吉林大学商学与管理学院,吉林长春130012, bio={"content":"

潘禹兵(2003-),女,硕士研究生,研究方向:数字人文与知识服务

"}, bioImg=null, bioContent=

潘禹兵(2003-),女,硕士研究生,研究方向:数字人文与知识服务

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1269679027095761842, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, xref=1, ext=[AuthorCompanyExt(id=1269679027104150451, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Business and Management,Jilin University,Changchun130012,China), AuthorCompanyExt(id=1269679027116733364, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1吉林大学商学与管理学院,吉林长春130012)])]), Author(id=1269679028886729670, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1269679029171942344, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679028886729670, language=EN, stringName=Dongyu Ye, firstName=Dongyu, middleName=null, lastName=Ye, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1School of Business and Management,Jilin University,Changchun130012,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1269679029285188553, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679028886729670, language=CN, stringName=叶东宇, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1吉林大学商学与管理学院,吉林长春130012, bio={"content":"

叶东宇(1996-),男,博士研究生,研究方向:自然语言处理

"}, bioImg=null, bioContent=

叶东宇(1996-),男,博士研究生,研究方向:自然语言处理

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1269679027095761842, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, xref=1, ext=[AuthorCompanyExt(id=1269679027104150451, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Business and Management,Jilin University,Changchun130012,China), AuthorCompanyExt(id=1269679027116733364, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1吉林大学商学与管理学院,吉林长春130012)])]), Author(id=1269679029645898699, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, orderNo=4, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1269679030010803149, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679029645898699, language=EN, stringName=Yanyu Chang, firstName=Yanyu, middleName=null, lastName=Chang, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1School of Business and Management,Jilin University,Changchun130012,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1269679030090494926, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, authorId=1269679029645898699, language=CN, stringName=常严予, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1吉林大学商学与管理学院,吉林长春130012, bio={"content":"

常严予(1998-),女,博士研究生,研究方向:数字人文与知识服务。

"}, bioImg=null, bioContent=

常严予(1998-),女,博士研究生,研究方向:数字人文与知识服务。

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1269679027095761842, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, xref=1, ext=[AuthorCompanyExt(id=1269679027104150451, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Business and Management,Jilin University,Changchun130012,China), AuthorCompanyExt(id=1269679027116733364, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1吉林大学商学与管理学院,吉林长春130012)])])], keywords=[Keyword(id=1269679030472176591, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, orderNo=1, keyword=beiyang government document resources), Keyword(id=1269679030841275344, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, orderNo=2, keyword=large language model), Keyword(id=1269679030937744337, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, orderNo=3, keyword=named entity recognition), Keyword(id=1269679031273288658, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, orderNo=4, keyword=low-resource scenarios), Keyword(id=1269679031361369043, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, orderNo=5, keyword=retrieval augmented generation), Keyword(id=1269679031684330452, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, orderNo=6, keyword=LoRA fine-tuning), Keyword(id=1269679031764022229, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, orderNo=1, keyword=北洋政府文书资源), Keyword(id=1269679032107955158, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, orderNo=2, keyword=大语言模型), Keyword(id=1269679032204424151, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, orderNo=3, keyword=命名实体识别), Keyword(id=1269679032539968472, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, orderNo=4, keyword=低资源场景), Keyword(id=1269679032636437465, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, orderNo=5, keyword=检索增强生成), Keyword(id=1269679032959398874, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, orderNo=6, keyword=LoRA微调)], refs=[Reference(id=1269679035870245875, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=1, rfOrder=0, authorNames=中华人民共和国中央人民政府, journalName=null, refType=null, unstructuredReference=中华人民共和国中央人民政府.《数字中国建设2025年行动方案》近日印发[EB/OL].[2026-02-24].https://www.gov.cn/lianbo/bumen/202505/content_7024041.htm., articleTitle=《数字中国建设2025年行动方案》近日印发, refAbstract=null), Reference(id=1269679035966714868, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2018, volume=36, issue=3, pageStart=57, pageEnd=64, url=null, language=null, rfNumber=2, rfOrder=1, authorNames=郑爽, journalName=大学图书馆学报, refType=null, unstructuredReference=郑爽.清末民初文言统一对阅读文化嬗变的影响[J].大学图书馆学报201836(3):57-64., articleTitle=清末民初文言统一对阅读文化嬗变的影响, refAbstract=null), Reference(id=1269679036071572469, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=3, rfOrder=2, authorNames=Douze M, Guzhva A, Deng C, journalName=null, refType=null, unstructuredReference=Douze MGuzhva ADeng Cet al.The Faiss Library[EB/OL].[2026-02-24].https://arxiv.org/abs/2401.08281., articleTitle=The Faiss Library, refAbstract=null), Reference(id=1269679036302259190, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=4, rfOrder=3, authorNames=Zhao P, Zhang H, Yu Q, journalName=null, refType=null, unstructuredReference=Zhao PZhang HYu Qet.al.Retrieval-Augmented Generation for AI-Generated Content:A Survey[EB/OL].[2026-02-24].https://arxiv.org/abs/2402.19473v6., articleTitle=Retrieval-Augmented Generation for AI-Generated Content:A Survey, refAbstract=null), Reference(id=1269679036369368055, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=5, rfOrder=4, authorNames=Hu E J, Shen Y, Wallis P, journalName=null, refType=null, unstructuredReference=Hu E JShen YWallis Pet al.LoRA:Low-Rank Adaptation of Large Language Models[EB/OL].[2026-02-24].https://arxiv.org/abs/2106.09685., articleTitle=LoRA:Low-Rank Adaptation of Large Language Models, refAbstract=null), Reference(id=1269679036495197176, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2020, volume=34, issue=1, pageStart=50, pageEnd=70, url=null, language=null, rfNumber=6, rfOrder=5, authorNames=Li J, Sun A, Han J, journalName=IEEE Transactions on Knowledge and Data Engineering, refType=null, unstructuredReference=Li JSun AHan Jet al.A Survey on Deep Learning for Named Entity Recognition[J].IEEE Transactions on Knowledge and Data Engineering202034(1):50-70., articleTitle=A Survey on Deep Learning for Named Entity Recognition, refAbstract=null), Reference(id=1269679036709106681, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2023, volume=56, issue=2, pageStart=1, pageEnd=47, url=null, language=null, rfNumber=7, rfOrder=6, authorNames=Ehrmann M, Hamdi A, Pontes EL, journalName=ACM Computing Surveys, refType=null, unstructuredReference=Ehrmann MHamdi APontes ELet al.Named Entity Recognition and Classification in Historical Documents:A Survey[J].ACM Computing Surveys202356(2):1-47., articleTitle=Named Entity Recognition and Classification in Historical Documents:A Survey, refAbstract=null), Reference(id=1269679036805575674, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2024, volume=36, issue=16, pageStart=8995, pageEnd=9022, url=null, language=null, rfNumber=8, rfOrder=7, authorNames=Hu Z, Hou W, Liu X, journalName=Neural Computing and Applications, refType=null, unstructuredReference=Hu ZHou WLiu X.Deep Learning for Named Entity Recognition:A Survey[J].Neural Computing and Applications202436(16):8995-9022., articleTitle=Deep Learning for Named Entity Recognition:A Survey, refAbstract=null), Reference(id=1269679036876878843, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2020, volume=40, issue=4, pageStart=3, pageEnd=12,58, url=null, language=null, rfNumber=9, rfOrder=8, authorNames=李纲, 潘荣清, 毛进, journalName=现代情报, refType=null, unstructuredReference=李纲,潘荣清,毛进,.整合BiLSTM-CRF网络和词典资源的中文电子病历实体识别[J].现代情报202040(4):3-12,58., articleTitle=整合BiLSTM-CRF网络和词典资源的中文电子病历实体识别, refAbstract=null), Reference(id=1269679036948182012, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=10, rfOrder=9, authorNames=Su J, Murtadha A, Pan S, journalName=null, refType=null, unstructuredReference=Su JMurtadha APan Set al.Global Pointer:Novel Efficient Span-based Approach for Named Entity Recognition[EB/OL].[2026-02-24].http://arxiv.org/abs/2208.03054., articleTitle=Global Pointer:Novel Efficient Span-based Approach for Named Entity Recognition, refAbstract=null), Reference(id=1269679037015290877, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2024, volume=null, issue=10, pageStart=2594, pageEnd=2615, url=null, language=null, rfNumber=11, rfOrder=10, authorNames=梁佳, 张丽萍, 闫盛, journalName=计算机科学与探索, refType=null, unstructuredReference=梁佳,张丽萍,闫盛,.基于大语言模型的命名实体识别研究进展[J].计算机科学与探索2024(10):2594-2615., articleTitle=基于大语言模型的命名实体识别研究进展, refAbstract=null), Reference(id=1269679037086594046, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=12, rfOrder=11, authorNames=Wei X, Cui X, Cheng N, journalName=null, refType=null, unstructuredReference=Wei XCui XCheng Net al.ChatIE:Zero-Shot Information Extraction via Chatting with ChatGPT[EB/OL].[2026-02-24].https://arxiv.org/abs/2302.10205., articleTitle=ChatIE:Zero-Shot Information Extraction via Chatting with ChatGPT, refAbstract=null), Reference(id=1269679037153702911, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2026, volume=10, issue=1, pageStart=116, pageEnd=132, url=null, language=null, rfNumber=13, rfOrder=12, authorNames=伊豪涵, 王昊, 周抒, journalName=数据分析与知识发现, refType=null, unstructuredReference=伊豪涵,王昊,周抒,.基于RAG-LATS的古籍零样本命名实体识别方法[J].数据分析与知识发现202610(1):116-132., articleTitle=基于RAG-LATS的古籍零样本命名实体识别方法, refAbstract=null), Reference(id=1269679037220811776, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2024, volume=44, issue=10, pageStart=154, pageEnd=167, url=null, language=null, rfNumber=14, rfOrder=13, authorNames=范颜铄, 周晓英, 王克平, journalName=现代情报, refType=null, unstructuredReference=范颜铄,周晓英,王克平,.融合GPT技术和用户需求的文学类古籍资源关联数据发布研究——以《聊斋志异·司文郎》为例[J].现代情报202444(10):154-167., articleTitle=融合GPT技术和用户需求的文学类古籍资源关联数据发布研究——以《聊斋志异·司文郎》为例, refAbstract=null), Reference(id=1269679037300502528, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2025, volume=69, issue=15, pageStart=112, pageEnd=123, url=null, language=null, rfNumber=15, rfOrder=14, authorNames=杨建梁, 王一多, 黄美雯, journalName=图书情报工作, refType=null, unstructuredReference=杨建梁,王一多,黄美雯,.基于大语言模型的红色档案资源交互式知识发现研究——以《南方局党史资料大事记》为例[J].图书情报工作202569(15):112-123., articleTitle=基于大语言模型的红色档案资源交互式知识发现研究——以《南方局党史资料大事记》为例, refAbstract=null), Reference(id=1269679037388582913, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2023, volume=43, issue=10, pageStart=3, pageEnd=15, url=null, language=null, rfNumber=16, rfOrder=15, authorNames=宋雪雁, 张祥青, 张伟民, journalName=现代情报, refType=null, unstructuredReference=宋雪雁,张祥青,张伟民.水书习俗非物质文化遗产知识元组织与可视化研究[J].现代情报202343(10):3-15., articleTitle=水书习俗非物质文化遗产知识元组织与可视化研究, refAbstract=null), Reference(id=1269679037451497474, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2025, volume=9, issue=6, pageStart=47, pageEnd=62, url=null, language=null, rfNumber=17, rfOrder=16, authorNames=余池, 陈亮, 许海云, journalName=数据分析与知识发现, refType=null, unstructuredReference=余池,陈亮,许海云,.基于大语言模型的专利命名实体识别方法研究[J].数据分析与知识发现20259(6):47-62., articleTitle=基于大语言模型的专利命名实体识别方法研究, refAbstract=null), Reference(id=1269679037522800643, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=18, rfOrder=17, authorNames=Xu D, Chen W, Peng W, journalName=null, refType=null, unstructuredReference=Xu DChen WPeng Wet al.Large Language Models for Generative Information Extraction:A Survey[EB/OL].[2026-01-18].https://arxiv.org/abs/2312.17617., articleTitle=Large Language Models for Generative Information Extraction:A Survey, refAbstract=null), Reference(id=1269679037589909508, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2023, volume=7, issue=9, pageStart=12, pageEnd=24, url=null, language=null, rfNumber=19, rfOrder=18, authorNames=张颖怡, 章成志, 周毅, journalName=数据分析与知识发现, refType=null, unstructuredReference=张颖怡,章成志,周毅,.基于ChatGPT的多视角学术论文实体识别:性能测评与可用性研究[J].数据分析与知识发现20237(9):12-24., articleTitle=基于ChatGPT的多视角学术论文实体识别:性能测评与可用性研究, refAbstract=null), Reference(id=1269679037715738629, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2025, volume=null, issue=93, pageStart=102900, pageEnd=null, url=null, language=null, rfNumber=20, rfOrder=19, authorNames=Liu X, Erkoyuncu J A, Fuh J Y H, journalName=Robotics and Computer-Integrated Manufacturing, refType=null, unstructuredReference=Liu XErkoyuncu J AFuh J Y Het.al.Knowledge Extraction for Additive Manufacturing Process via Named Entity Recognition with LLMs[J].Robotics and Computer-Integrated Manufacturing2025(93):102900., articleTitle=Knowledge Extraction for Additive Manufacturing Process via Named Entity Recognition with LLMs, refAbstract=null), Reference(id=1269679037799624710, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2025, volume=44, issue=7, pageStart=165, pageEnd=173, url=null, language=null, rfNumber=21, rfOrder=20, authorNames=刘耀文, 夏一雪, 张鹏, journalName=情报杂志, refType=null, unstructuredReference=刘耀文,夏一雪,张鹏,.国家安全情报战略知识图谱构建与检索增强问答框架研究[J].情报杂志202544(7):165-173., articleTitle=国家安全情报战略知识图谱构建与检索增强问答框架研究, refAbstract=null), Reference(id=1269679037862539271, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2022, volume=42, issue=10, pageStart=42, pageEnd=50, url=null, language=null, rfNumber=22, rfOrder=21, authorNames=林立涛, 王东波, 刘江峰, journalName=图书馆论坛, refType=null, unstructuredReference=林立涛,王东波,刘江峰,.数字人文视域下典籍动物命名实体识别研究——以SikuBERT 预训练模型为例[J].图书馆论坛202242(10):42-50., articleTitle=数字人文视域下典籍动物命名实体识别研究——以SikuBERT 预训练模型为例, refAbstract=null), Reference(id=1269679037942231048, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2020, volume=64, issue=11, pageStart=116, pageEnd=124, url=null, language=null, rfNumber=23, rfOrder=22, authorNames=刘忠宝, 党建飞, 张志剑, journalName=图书情报工作, refType=null, unstructuredReference=刘忠宝,党建飞,张志剑.《史记》历史事件自动抽取与事理图谱构建研究[J].图书情报工作202064(11):116- 124., articleTitle=《史记》历史事件自动抽取与事理图谱构建研究, refAbstract=null), Reference(id=1269679038034505737, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2025, volume=9, issue=3, pageStart=147, pageEnd=159, url=null, language=null, rfNumber=24, rfOrder=23, authorNames=斯日古楞, 林民, 郭振东, journalName=数据分析与知识发现, refType=null, unstructuredReference=斯日古楞,林民,郭振东,.基于提示学习和抽取式阅读理解的古籍礼仪实体关系联合抽取方法研究[J].数据分析与知识发现20259(3):147-159., articleTitle=基于提示学习和抽取式阅读理解的古籍礼仪实体关系联合抽取方法研究, refAbstract=null), Reference(id=1269679038126780426, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=1, pageStart=60, pageEnd=66, url=null, language=null, rfNumber=25, rfOrder=24, authorNames=梁继红, journalName=档案学通讯, refType=null, unstructuredReference=梁继红.走向文本的历史档案数字整理:历史追溯与时代转型(下)[J].档案学通讯2022(1):60-66., articleTitle=走向文本的历史档案数字整理:历史追溯与时代转型(下), refAbstract=null), Reference(id=1269679038198083595, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=2, pageStart=68, pageEnd=70, url=null, language=null, rfNumber=26, rfOrder=25, authorNames=张蓓, journalName=档案管理, refType=null, unstructuredReference=张蓓.数字人文视野下徽州文书档案开发利用研究[J].档案管理2022(2):68-70., articleTitle=数字人文视野下徽州文书档案开发利用研究, refAbstract=null), Reference(id=1269679038281969676, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2023, volume=43, issue=2, pageStart=49, pageEnd=56,2, url=null, language=null, rfNumber=27, rfOrder=26, authorNames=钟远薪, 王蕾, 杨新涯, journalName=图书馆论坛, refType=null, unstructuredReference=钟远薪,王蕾,杨新涯,.徽州文书文本化语音识别技术应用研究[J].图书馆论坛202343(2):49-56,2., articleTitle=徽州文书文本化语音识别技术应用研究, refAbstract=null), Reference(id=1269679038365855757, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2020, volume=39, issue=5, pageStart=93, pageEnd=99, url=null, language=null, rfNumber=28, rfOrder=27, authorNames=施晓华, 王昕, journalName=图书馆杂志, refType=null, unstructuredReference=施晓华,王昕.数字人文社会网络分析方法应用与研究[J].图书馆杂志202039(5):93-99., articleTitle=数字人文社会网络分析方法应用与研究, refAbstract=null), Reference(id=1269679038508462094, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2021, volume=40, issue=12, pageStart=62, pageEnd=67, url=null, language=null, rfNumber=29, rfOrder=28, authorNames=汤萌, 陆星宇, journalName=图书馆杂志, refType=null, unstructuredReference=汤萌,陆星宇.民间文书中账簿资源元数据模型与空间可视化应用研究[J].图书馆杂志202140(12):62-67., articleTitle=民间文书中账簿资源元数据模型与空间可视化应用研究, refAbstract=null), Reference(id=1269679038575570959, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2018, volume=38, issue=3, pageStart=30, pageEnd=36, url=null, language=null, rfNumber=30, rfOrder=29, authorNames=王蕾, 薛玉, 肖鹏, journalName=图书馆论坛, refType=null, unstructuredReference=王蕾,薛玉,肖鹏,.民间历史文献数字人文图书馆构建——以徽州文书数字人文图书馆实践反思为例[J].图书馆论坛201838(3):30-36., articleTitle=民间历史文献数字人文图书馆构建——以徽州文书数字人文图书馆实践反思为例, refAbstract=null), Reference(id=1269679038638485520, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=6, pageStart=14, pageEnd=21, url=null, language=null, rfNumber=31, rfOrder=30, authorNames=汤萌, 赵思渊, journalName=档案学通讯, refType=null, unstructuredReference=汤萌,赵思渊.民间文书的数字化建设与资源挖掘——以上海交通大学图书馆馆藏为中心[J].档案学通讯2020(6):14-21., articleTitle=民间文书的数字化建设与资源挖掘——以上海交通大学图书馆馆藏为中心, refAbstract=null), Reference(id=1269679038718177297, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=3, pageStart=47, pageEnd=52, url=null, language=null, rfNumber=32, rfOrder=31, authorNames=姜育彦, 刘雪立, journalName=数字图书馆论坛, refType=null, unstructuredReference=姜育彦,刘雪立.数字人文视域下缩微资料的保护与新生——以Digital Cicognara Library为例[J].数字图书馆论坛2022(3):47-52., articleTitle=数字人文视域下缩微资料的保护与新生——以Digital Cicognara Library为例, refAbstract=null), Reference(id=1269679038793674770, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2018, volume=null, issue=11, pageStart=35, pageEnd=38, url=null, language=null, rfNumber=33, rfOrder=32, authorNames=徐家成, journalName=兰台世界, refType=null, unstructuredReference=徐家成.众包模式应用于民国档案数字化工作的设想[J].兰台世界2018(11):35-38., articleTitle=众包模式应用于民国档案数字化工作的设想, refAbstract=null), Reference(id=1269679038877560851, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2025, volume=null, issue=12, pageStart=109, pageEnd=112, url=null, language=null, rfNumber=34, rfOrder=33, authorNames=郭佳, journalName=兰台世界, refType=null, unstructuredReference=郭佳.数字人文与人工智能融合视角下民间文书档案数字化流程重构思考[J].兰台世界2025(12):109-112., articleTitle=数字人文与人工智能融合视角下民间文书档案数字化流程重构思考, refAbstract=null), Reference(id=1269679038948864020, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=3, pageStart=88, pageEnd=89,91, url=null, language=null, rfNumber=35, rfOrder=34, authorNames=陈宇, journalName=档案管理, refType=null, unstructuredReference=陈宇.古籍数字人文平台对民国档案开放利用的借鉴意义[J].档案管理2021(3):88-89,91., articleTitle=古籍数字人文平台对民国档案开放利用的借鉴意义, refAbstract=null), Reference(id=1269679039032750101, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=1991, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=36, rfOrder=35, authorNames=中国第二历史档案馆, journalName=中华民国史档案资料汇编·第3辑, refType=null, unstructuredReference=中国第二历史档案馆.中华民国史档案资料汇编·第3辑[M].南京:凤凰出版社,1991., articleTitle=null, refAbstract=null), Reference(id=1269679039108247574, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=1991, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=37, rfOrder=36, authorNames=钱实甫, journalName=北洋政府职官年表, refType=null, unstructuredReference=钱实甫.北洋政府职官年表[M].上海:华东师范大学出版社,1991., articleTitle=null, refAbstract=null), Reference(id=1269679039187939351, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=1934, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=38, rfOrder=37, authorNames=立法院编译处, journalName=中华民国法规汇编, refType=null, unstructuredReference=立法院编译处.中华民国法规汇编[M].上海:中华书局,1934., articleTitle=null, refAbstract=null), Reference(id=1269679039296991256, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=39, rfOrder=38, authorNames=Team Qwen, journalName=null, refType=null, unstructuredReference=Team Qwen.Qwen3:Think Deeper,Act Faster[EB/OL].[2026-02-24].https://qwenlm.github.io/zh/blog/qwen3/., articleTitle=Qwen3:Think Deeper,Act Faster, refAbstract=null), Reference(id=1269679039368294425, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=40, rfOrder=39, authorNames=Touvron H, Lavril T, Izacard G, journalName=null, refType=null, unstructuredReference=Touvron HLavril TIzacard Get al.LLaMA:Open and Efficient Foundation Language Models[EB/OL].[2026-02-24].https://arxiv.org/abs/2302.13971., articleTitle=LLaMA:Open and Efficient Foundation Language Models, refAbstract=null), Reference(id=1269679039447986202, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=41, rfOrder=40, authorNames=Open AI, journalName=null, refType=null, unstructuredReference=Open AI.Introducing GPT-4.1 in the API[EB/OL].[2026-02-24].https://openai.com/index/gpt-4-1/., articleTitle=Introducing GPT-4, refAbstract=null), Reference(id=1269679039527677979, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=42, rfOrder=41, authorNames=DeepSeek-AI, Guo D, Yang D, journalName=null, refType=null, unstructuredReference=DeepSeek-AI,Guo DYang Det al.DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning[EB/OL].[2026-02-24].https://arxiv.org/abs/2501.12948., articleTitle=DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, refAbstract=null), Reference(id=1269679039611564060, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=43, rfOrder=42, authorNames=Yang A, Xiao B, Wang B, journalName=null, refType=null, unstructuredReference=Yang AXiao BWang Bet al.Baichuan 2:Open Large-scale Language Models[EB/OL].[2026-02-24].https://arxiv.org/abs/2309.10305., articleTitle=Baichuan 2:Open Large-scale Language Models, refAbstract=null), Reference(id=1269679039682867229, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=44, rfOrder=43, authorNames=南京农业大学, journalName=null, refType=null, unstructuredReference=南京农业大学.荀子大语言模型[EB/OL].[2026-02-24].https://xunziallm.njau.edu.cn/., articleTitle=荀子大语言模型, refAbstract=null)], funds=null, companyList=[AuthorCompany(id=1269679027095761842, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, xref=1, ext=[AuthorCompanyExt(id=1269679027104150451, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Business and Management,Jilin University,Changchun130012,China), AuthorCompanyExt(id=1269679027116733364, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, companyId=1269679027095761842, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1吉林大学商学与管理学院,吉林长春130012)])], figs=[ArticleFig(id=1269679033387217883, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Fig.1, caption=Named Entity Recognition Framework for Beiyang Government Document Resources Based on LoRA Fine-Tuning, figureFileSmall=z8etVCW4Y2whBgByDbAeoQ==, figureFileBig=74y7HDiDB9h1Hzj94EypXg==, tableContent=null), ArticleFig(id=1269679033471103964, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=图1, caption=基于LoRA微调的北洋政府文书资源命名实体识别框架, figureFileSmall=z8etVCW4Y2whBgByDbAeoQ==, figureFileBig=74y7HDiDB9h1Hzj94EypXg==, tableContent=null), ArticleFig(id=1269679033940866013, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Fig.2, caption=Schematic Diagram of Vector Database Construction and Similar Text Retrieval, figureFileSmall=TJxkOyXCz7881ClPAq1pJA==, figureFileBig=EuNnsjC9sglvWtVRerfvRQ==, tableContent=null), ArticleFig(id=1269679034020557790, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=图2, caption=向量库构建与相似文本检索示意图, figureFileSmall=TJxkOyXCz7881ClPAq1pJA==, figureFileBig=EuNnsjC9sglvWtVRerfvRQ==, tableContent=null), ArticleFig(id=1269679034213495775, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Fig.3, caption=Complete Example of Prompt, figureFileSmall=9CrxN5bAMn2YFhB4g+Gluw==, figureFileBig=K9fTuj4dHgu1M69eHJyH6w==, tableContent=null), ArticleFig(id=1269679034309964768, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=图3, caption=Prompt完整示例, figureFileSmall=9CrxN5bAMn2YFhB4g+Gluw==, figureFileBig=K9fTuj4dHgu1M69eHJyH6w==, tableContent=null), ArticleFig(id=1269679034381267937, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Fig.4, caption=Precision of Different Models on Various Entity Types, figureFileSmall=ETSONKiUqQYl1Bx0t2T9Cg==, figureFileBig=A94lSM6ZVAy+q4Ism83ofQ==, tableContent=null), ArticleFig(id=1269679034452571106, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=图4, caption=各模型对不同实体类型识别效果精确率, figureFileSmall=ETSONKiUqQYl1Bx0t2T9Cg==, figureFileBig=A94lSM6ZVAy+q4Ism83ofQ==, tableContent=null), ArticleFig(id=1269679034515485667, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Fig.5, caption=Recall of Different Models on Various Entity Types, figureFileSmall=3h5BsKMOggJLMNU2zwaBog==, figureFileBig=m6aMCK/ucwe/zdUD9OqFIA==, tableContent=null), ArticleFig(id=1269679034700035044, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=图5, caption=各模型对不同实体类型识别效果召回率, figureFileSmall=3h5BsKMOggJLMNU2zwaBog==, figureFileBig=m6aMCK/ucwe/zdUD9OqFIA==, tableContent=null), ArticleFig(id=1269679034767143909, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Fig.6, caption=F1-Scores of Different Models on Various Entity Types, figureFileSmall=0vt85voMgJEY1uhP1TTtlw==, figureFileBig=beftjfvTky8XDaMdcxMbpA==, tableContent=null), ArticleFig(id=1269679034834252774, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=图6, caption=各模型对不同实体类型识别效果F1值, figureFileSmall=0vt85voMgJEY1uhP1TTtlw==, figureFileBig=beftjfvTky8XDaMdcxMbpA==, tableContent=null), ArticleFig(id=1269679034901361639, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Fig.7, caption=Case Study of Named Entity Recognition in Beiyang Government Document Resources, figureFileSmall=GEmSwkXGq42hBrlGkCQEXQ==, figureFileBig=jsiK8bA+92S2Qh73eNIvlg==, tableContent=null), ArticleFig(id=1269679034976859112, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=图7, caption=北洋政府文书资源命名实体识别案例, figureFileSmall=GEmSwkXGq42hBrlGkCQEXQ==, figureFileBig=jsiK8bA+92S2Qh73eNIvlg==, tableContent=null), ArticleFig(id=1269679035035579369, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Tab.1, caption=

Entity Types in Beiyang Government Document Resources

, figureFileSmall=null, figureFileBig=null, tableContent=
实体类别描述实例
人物北洋政府文书文本中的人物袁世凯、段祺瑞、赵稠等
地点北洋政府文书文本中的地理位置奉天、直隶、热河等
组织机构北洋政府文书文本中涉及的组织机构国务院、外交部、内务部等
时间北洋政府文书文本中涉及的时间本年五月十九日、1913年、江等
职务北洋政府文书文本中涉及人物的职务大总统、河南督军、内阁总理等
事件北洋政府文书文本中涉及的事件白朗军起义、五四运动等
公文类型北洋政府文书文本涉及的公文类型令(谕、示、公布、状)、呈、咨等
), ArticleFig(id=1269679035136242666, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=表1, caption=

北洋政府文书资源实体类型

, figureFileSmall=null, figureFileBig=null, tableContent=
实体类别描述实例
人物北洋政府文书文本中的人物袁世凯、段祺瑞、赵稠等
地点北洋政府文书文本中的地理位置奉天、直隶、热河等
组织机构北洋政府文书文本中涉及的组织机构国务院、外交部、内务部等
时间北洋政府文书文本中涉及的时间本年五月十九日、1913年、江等
职务北洋政府文书文本中涉及人物的职务大总统、河南督军、内阁总理等
事件北洋政府文书文本中涉及的事件白朗军起义、五四运动等
公文类型北洋政府文书文本涉及的公文类型令(谕、示、公布、状)、呈、咨等
), ArticleFig(id=1269679035207545835, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Tab.2, caption=

Basic Statistics of Annotation Dataset

, figureFileSmall=null, figureFileBig=null, tableContent=
实体类别训练集各类实体数量验证集各类实体数量
人物1 437274
地点1 814423
组织机构19057
时间2 213700
职务1 828566
事件438171
公文类型627178
), ArticleFig(id=1269679035278849004, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=表2, caption=

标注数据集基本统计情况

, figureFileSmall=null, figureFileBig=null, tableContent=
实体类别训练集各类实体数量验证集各类实体数量
人物1 437274
地点1 814423
组织机构19057
时间2 213700
职务1 828566
事件438171
公文类型627178
), ArticleFig(id=1269679035341763565, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Tab.3, caption=

Overall Named Entity Recognition Results of Each Model

, figureFileSmall=null, figureFileBig=null, tableContent=
方法选择模型PrecisionRecallF1-score
传统方法BERT-BiLSTM-CRF0.5290.5700.549
RoBERTa-BiLSTM-CRF0.5490.5880.568
大语言模型Llama-3.3-70B-Instruct0.7160.8480.777
GPT4.10.8020.7720.787
DeepSeek-R10.5320.8860.665
LoRA微调Baichuan2-7B-Base0.5600.5540.557
Xunzi-Qwen1.5-4B0.6840.7250.704
结合LoRA微调与RAGQwen3-4B-tuned0.8470.8670.857
), ArticleFig(id=1269679035408872430, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=表3, caption=

各模型总体命名实体识别结果

, figureFileSmall=null, figureFileBig=null, tableContent=
方法选择模型PrecisionRecallF1-score
传统方法BERT-BiLSTM-CRF0.5290.5700.549
RoBERTa-BiLSTM-CRF0.5490.5880.568
大语言模型Llama-3.3-70B-Instruct0.7160.8480.777
GPT4.10.8020.7720.787
DeepSeek-R10.5320.8860.665
LoRA微调Baichuan2-7B-Base0.5600.5540.557
Xunzi-Qwen1.5-4B0.6840.7250.704
结合LoRA微调与RAGQwen3-4B-tuned0.8470.8670.857
), ArticleFig(id=1269679035484369903, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Tab.4, caption=

Experimental Results of Sample Selection Strategies

, figureFileSmall=null, figureFileBig=null, tableContent=
样例选择策略PrecisionRecallF1-score
ZS0.8070.1170.204
RS0.7970.8490.822
SS0.8470.8670.857
), ArticleFig(id=1269679035551478768, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=表4, caption=

样例选择策略实验结果

, figureFileSmall=null, figureFileBig=null, tableContent=
样例选择策略PrecisionRecallF1-score
ZS0.8070.1170.204
RS0.7970.8490.822
SS0.8470.8670.857
), ArticleFig(id=1269679035677307889, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=EN, label=Tab.5, caption=

Results of Ablation Experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
模型PrecisionRecallF1-score
Qwen3-4B-tuned0.8470.8670.857
Qwen3-4B-lora0.6870.6980.692
Qwen3-4B-rag0.6430.6970.669
Qwen3-4B0.4130.4660.438
), ArticleFig(id=1269679035752805362, tenantId=1146029695717560320, journalId=1269656373470969926, articleId=1269679024126194589, language=CN, label=表5, caption=

消融实验结果

, figureFileSmall=null, figureFileBig=null, tableContent=
模型PrecisionRecallF1-score
Qwen3-4B-tuned0.8470.8670.857
Qwen3-4B-lora0.6870.6980.692
Qwen3-4B-rag0.6430.6970.669
Qwen3-4B0.4130.4660.438
)], attaches=null, journal=Journal(id=1269655921169809475, delFlag=0, nameCn=现代情报, nameEn=Journal of Modern Information, nameHistory1=null, nameHistory2=null, issn=1008-0821, eissn=null, cn=22-1182/G3, coden=null, periodic=0, language=CN, oaType=null, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=hzZ2kIN9Z2OYfM7GC0mG8g==, journalPrice=null, startedYear=null, abbrevIsoEn=Journal of Modern Information, journalRemark=null, publicationField=null, createdTime=1780639160676, updatedTime=1780639531872, createdBy=18614031015, updatedBy=13701087609, firstLetterCn=J, firstLetterEn=J, subjectCode=Natural Sciences, subjectName=null, subjectCodeEn=Natural Sciences, subjectNameEn=null, picCn=hzZ2kIN9Z2OYfM7GC0mG8g==, picEn=za/Q6fknju6Wnm9GGKKQHw==, jcr=null, cjcr=null, exts=[JournalExt(id=1269657478171099337, language=CN, name=现代情报, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=, createdTime=1780639531891, updatedTime=1780639531891, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=https://xdqbauthor.manuscriptcloud.com/, submissionEditorUrl=https://xdqbeditor.manuscriptcloud.com/, submissionReviewUrl=https://xdqbauthor.manuscriptcloud.com/, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""}), JournalExt(id=1269657478221430986, language=EN, name=Journal of Modern Information, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=, createdTime=1780639531903, updatedTime=1780639531903, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=https://xdqbauthor.manuscriptcloud.com/, submissionEditorUrl=https://xdqbeditor.manuscriptcloud.com/, submissionReviewUrl=https://xdqbauthor.manuscriptcloud.com/, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1269656373470969926, websiteList=[Website(id=1269658134734860575, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1269656373470969926, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/xdqb/CN, language=CN, createTime=1780639688429, createBy=18614031015, updateTime=1780639910823, updateBy=18614031015, name=现代情报-中文, tplId=1146099689490845704, title=现代情报, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1269659212566131092, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=articleTextType, value=kx, createTime=1780639945403, updateTime=1780639945403, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212540965265, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=banner, value=null, createTime=1780639945397, updateTime=1780639945397, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212587102615, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=grayFlag, value=0, createTime=1780639945408, updateTime=1780639945408, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212532576656, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=logo, value=https://castjournals.cast.org.cn/joweb/xdqb/CN/file/pic?fileId=hAchGHMHdO6KzdfJ3HY9JQ==, createTime=1780639945395, updateTime=1780639945395, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212599685529, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=minRunFlag, value=0, createTime=1780639945411, updateTime=1780639945411, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212557742483, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/xdqb/CN/file/pic, createTime=1780639945401, updateTime=1780639945401, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212595491224, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=silenceFlag, value=0, createTime=1780639945410, updateTime=1780639945410, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212549353874, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1780639945399, updateTime=1780639945399, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212574519701, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=themeColor, value=null, createTime=1780639945405, updateTime=1780639945405, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659212578714006, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134734860575, code=themeStyle, value=null, createTime=1780639945406, updateTime=1780639945406, creator=18614031015, updator=18614031015)]), Website(id=1269658134877466914, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1269656373470969926, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/xdqb/EN, language=EN, createTime=1780639688462, createBy=18614031015, updateTime=1780639906236, updateBy=18614031015, name=现代情报-英文, tplId=1146101810881728533, title=Journal of Modern Information, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1269659152486920583, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=articleTextType, value=kx, createTime=1780639931079, updateTime=1780639931079, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152465949060, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=banner, value=null, createTime=1780639931074, updateTime=1780639931074, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152507892106, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=grayFlag, value=0, createTime=1780639931084, updateTime=1780639931084, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152453366147, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=logo, value=https://castjournals.cast.org.cn/joweb/xdqb/EN/file/pic?fileId=hAchGHMHdO6KzdfJ3HY9JQ==, createTime=1780639931071, updateTime=1780639931071, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152524669324, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=minRunFlag, value=0, createTime=1780639931088, updateTime=1780639931088, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152478531974, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/xdqb/EN/file/pic, createTime=1780639931077, updateTime=1780639931077, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152516280715, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=silenceFlag, value=0, createTime=1780639931087, updateTime=1780639931087, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152474337669, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_en_623/, createTime=1780639931076, updateTime=1780639931076, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152495309192, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=themeColor, value=null, createTime=1780639931081, updateTime=1780639931081, creator=18614031015, updator=18614031015), WebsiteProps(id=1269659152503697801, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1269658134877466914, code=themeStyle, value=null, createTime=1780639931083, updateTime=1780639931083, creator=18614031015, updator=18614031015)])], journalTitle=现代情报, weixinUrl=null, journalUrl=https://www.xdqb.net/, iacademicId=null, status=1, seqNo=null, journalTitleEn=Journal of Modern Information, journalPhotoCn=hzZ2kIN9Z2OYfM7GC0mG8g==, journalPhotoEn=za/Q6fknju6Wnm9GGKKQHw==, journalFirstLetter=J, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/xdqb/CN/10.3969/j.issn.1008-0821.2026.03.004, detailUrlEn=https://castjournals.cast.org.cn/joweb/xdqb/EN/10.3969/j.issn.1008-0821.2026.03.004, pdfUrlCn=https://castjournals.cast.org.cn/joweb/xdqb/CN/PDF/10.3969/j.issn.1008-0821.2026.03.004, pdfUrlEn=https://castjournals.cast.org.cn/joweb/xdqb/EN/PDF/10.3969/j.issn.1008-0821.2026.03.004, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
基于大语言模型的北洋政府文书资源命名实体识别研究
收藏切换
PDF下载
邓君 * , 张子姝 , 潘禹兵 , 叶东宇 , 常严予
现代情报 | 数据智能与知识服务 2026,46(3): 44-55
收起
收藏切换
现代情报 | 数据智能与知识服务 2026, 46(3): 44-55
基于大语言模型的北洋政府文书资源命名实体识别研究
全屏
邓君*, 张子姝, 潘禹兵, 叶东宇, 常严予
作者信息
  • 1吉林大学商学与管理学院,吉林长春130012
  • 张子姝(1995-),女,博士研究生,研究方向:数字人文与知识服务

    潘禹兵(2003-),女,硕士研究生,研究方向:数字人文与知识服务

    叶东宇(1996-),男,博士研究生,研究方向:自然语言处理

    常严予(1998-),女,博士研究生,研究方向:数字人文与知识服务。

通讯作者:

邓君(1977-),女,教授、博士,博士生导师,研究方向:数字信息资源管理,数字人文与知识服务,档案管理与应用。
Named Entity Recognition in Beiyang Government Documents Resources Using Large Language Models
Jun Deng*, Zishu Zhang, Yubing Pan, Dongyu Ye, Yanyu Chang
Affiliations
  • 1School of Business and Management,Jilin University,Changchun130012,China
出版时间: 2026-03-01 doi: 10.3969/j.issn.1008-0821.2026.03.004
文章导航
收藏切换
目的/意义

针对北洋政府文书资源因语言复杂性、多样性及标注数据缺乏导致的命名实体识别难题,本文提出一种适应低资源场景基于大语言模型的命名实体识别框架,为近代历史文献的结构化挖掘与知识重组提供方法支撑。

方法/过程

该框架融合检索增强生成与高效参数微调,利用Faiss向量检索构建上下文样例动态选取机制,通过LoRA策略对大语言模型进行领域知识注入。最后,在自建语料库上,系统评估深度学习基准模型与不同采样策略下的大语言模型性能。

结果/结论

结果表明,结合相似度样例选择与LoRA微调的Qwen3-4B模型效果最优,总体F1值达0.857,实现对北洋政府文书的精准实体识别,验证了大模型在低资源历史文书处理中的实用性与可扩展性。

北洋政府文书资源  /  大语言模型  /  命名实体识别  /  低资源场景  /  检索增强生成  /  LoRA微调
Purpose/Significance

Addressing the challenges in named entity recognition(NER) for Beiyang Government Document Resources due to linguistic complexity, diversity, and lack of annotation data, this paper proposes a large language model-based NER framework adapted for low-resource scenarios. This framework provides methodological su⁃pport for structured mining and knowledge reorganization of modern historical documents. [Methods/

Process

This framework integrated retrieval-enhanced generation with efficient parameter fine-tuning. It used Faiss vector retrieval to build a dynamic context example selection method and used the LoRA strategy to add domain knowledge to large language models.On a custom corpus, the study designed seven special entity types,including persons, places, organizations, time, positions, events,and document types. The study then compared two deep learning entity recognition methods, BERT-BiLSTM-CRF and RoBERTa-BiLSTM-CRF,with Baichuan-4B, DcepSeck-R1, Xunzi-Qwen3-8B, Qwen3-4B,Llama, and GPT-4. The study evaluated large language models performance under different sampling methods.

Result/Conclusion

Experiments demonstrate that compared to traditional deep learning models and general-purpose large language mo⁃dels, the synergistic paradigm integrating LoRA fine-tuning with RAG significantly enhances entity recognition performance, achieving an overall F1 score of 0.857. A framework that uses RAG with large, fine-tuned language models for named entity recognition in Beiyang Government Document Resources works well together, and it achieves accurate entity identification in these historical records. This shows that large language models are practical and can be scaled when processing historical documents with limited resources.

beiyang government document resources  /  large language model  /  named entity recognition  /  low-resource scenarios  /  retrieval augmented generation  /  LoRA fine-tuning
邓君, 张子姝, 潘禹兵, 叶东宇, 常严予. 基于大语言模型的北洋政府文书资源命名实体识别研究. 现代情报, 2026 , 46 (3) : 44 -55 . DOI: 10.3969/j.issn.1008-0821.2026.03.004
Jun Deng, Zishu Zhang, Yubing Pan, Dongyu Ye, Yanyu Chang. Named Entity Recognition in Beiyang Government Documents Resources Using Large Language Models[J]. Journal of Modern Information, 2026 , 46 (3) : 44 -55 . DOI: 10.3969/j.issn.1008-0821.2026.03.004
北洋政府(1912—1928年)文书资源系统详实记载了该历史阶段政府机构的设置沿革、法律法规的颁行实施以及公共事务管理的具体运作流程,其既具备行政凭证特有的原始性与权威性,又承载着历史文献的史料保存与学术研究价值。近年来,《数字中国建设2025年行动方案》1等一系列国家级政策相继出台,明确提出要“深度挖掘人工智能应用场景”,为北洋政府文书资源的智能化整理与知识重组提供了制度支持与技术场景的双重驱动力。在此背景下,依托人工智能技术对北洋政府文书资源中蕴含的语义信息进行系统性挖掘,已成为重构北洋时期历史知识叙事的重要技术路径。这一研究范式的转型,不仅直观彰显现代信息技术赋能人文研究的学术革新,也反映出人文学科研究范式正逐步从传统文本解读分析向数据驱动型研究的转变。
虽然,北洋政府文书资源是近代中国制度转型与治理实践的第一手记录,但由于该时期机构称谓频繁更迭、用语文白夹杂2,更因其处于近代汉语转型期,缺乏统一的实体表达规范与稳定的命名体系,导致其实体歧义性较强。与此同时,高质量标注数据稀缺,监督学习方法难以获得足够的训练支持,传统命名实体识别(Named Entity Recognition,NER)方法无法充分适应复杂的语言变异与语境依赖,导致实体识别效果有限,直接制约了北洋政府文书资源的结构化整理与深度内容挖掘。因此,本研究利用大语言模型,引入Faiss向量检索3构建动态样例索引库,结合检索增强生成(Retrieval-Augmented Generation,RAG)4与参数高效微调方法中的低秩适应(Low-Rank Adaptation,LoRA)微调策略5,构建融合领域知识的实体识别框架,实现大语言模型在民国时期文献资源命名实体识别的有效应用,为同类低资源、高复杂度历史文献语义深度挖掘提供方法参考。
命名实体识别作为自然语言处理领域一项核心任务,旨在从文本中自动识别具有特定类型的实体信息6,其技术演进呈现出从依赖人工规则向数据驱动、从通用领域向垂直领域不断深化的趋势。早期研究主要依赖人工构建的词典与规则匹配7,随之出现的统计机器学习方法8提升了命名实体识别的准确性。近年来,深度学习技术凭借其强大的上下文表征能力,跃升为NER领域的主流研究范式,特别是在应对一词多义、结构嵌套及语义模糊等复杂问题时优势显著。典型的BiLSTM-CRF等序列标注模型通过标签依赖与上下文特征,在多个基准测试中取得进展9。而针对嵌套实体的识别,基于全局指针的跨度识别方法也展现出更优越的结构适配性10
大语言模型的出现延伸并拓展了深度学习的边界,凭借极强的语义理解和上下文学习能力,重塑了命名实体识别的研究范式11,使其能在低资源甚至零样本场景下实现实体的有效抽取。现有研究表明,借助高效的提示设计12或结合检索增强生成技术13,大语言模型能有效弥补领域数据匮乏的短板,并在古籍14、红色档案15及非遗文献16等场景中展现出广阔的应用前景。然而,在面对术语密集或语言形态独特的垂直领域时,大语言模型因缺乏特定领域知识,容易出现幻觉或识别边界模糊的情况。为此,研究人员多采用领域数据持续训练或参数高效微调17的方式,实现领域知识的模型内化18,进而提升识别准确性。由此可见,提示工程19、检索增强20配合轻量化微调21,是目前增强大语言模型领域任务性能的主流手段。
聚焦历史文献这类高度专业的垂直领域,命名实体识别的应用正从通用语料向领域适配及低资源场景转移。已有学者尝试利用SikuBERT等领域预训练模型22、优化序列标注架构23,或将提示学习与抽取式阅读理解相融合24,在古籍实体识别任务中获得了优于通用模型的表现。这充分说明,将领域知识有效融入并改进模型结构,是改善历史文献命名实体识别效果的可行之策。
在数字人文与信息技术相互渗透的推动下,民国文书资源数字化开发正展现出跨学科特征与多层次发展态势。回顾现有研究不难发现,该领域聚焦数字化流程重构、资源整合与智能技术应用,致力于消解民国文书资源数字化中资源碎片化、标准缺位与技术适配偏差等系统性桎梏。为实现文书资源从数字化保存向知识化挖掘的转型,研究尝试在确立通用元数据规范25的基础上,引入文本挖掘26、语义标注27、社会网络分析28及GIS可视化29等多元分析工具。徽州文书30及高校馆藏档案31的开发案例印证了技术介入的效力,OCR增强处理32、众包加工33与关联数据34的应用,显著提升了文本转化质效与多模态关联深度。数字人文平台的搭建35,使得民国文书资源价值底蕴在地方史、文化传承等领域得以深层释放,并借由新媒体语境实现创新性转化。这些探索不仅为民国文书资源的可持续开发提供了范例,也折射出民国文书资源正从单纯的档案留存迈向开放协同的数据生态。尽管该领域正加速迈向智慧服务与跨域融合的阶段,但现有探索多集中于民间文书等语料。针对兼具特定行政属性、严格公文格式及高语义密度的北洋政府文书资源,目前仍缺乏系统性的数字化处理方法论与适配的识别模型。
综合考察现有研究脉络可知,命名实体识别已实现从规则驱动到数据驱动、从通用领域向垂直领域的跨越。在历史文献处理中,领域预训练模型与序列标注方法相结合显著提升古籍文本的实体识别效果。大语言模型凭借卓越的上下文理解与少样本学习能力,为低资源命名实体识别任务开辟了新路径。融合提示工程、检索增强生成与参数高效微调可有效增强大语言模型在专业领域的适应性。然而,现有研究多集中于语料规范、标注充分的古代典籍,对北洋政府文书这类兼具复杂语言特征与极低标注资源的文献,尚未形成系统性命名实体识别方法。因此,本研究构建面向北洋政府文书资源命名实体识别框架,通过检索增强机制,利用有限标注样本与文书自身规律动态扩充上下文信息,引入LoRA微调,在注入领域知识的同时保持模型语义能力。该框架支持人名、机构、职官、地名、时间等多类实体识别,为北洋政府文书智能处理提供可行的技术路径,为低资源、高复杂度历史文本命名实体识别提供方法参考。
针对北洋政府文书实体识别任务中因标注资源匮乏与文白夹杂带来的识别效能制约,本研究通过引入基于Faiss向量检索的上下文样例动态选取机制,利用RAG增强模型对历史语义的理解,进一步结合LoRA轻量化微调策略,将领域知识高效注入大语言模型,构建基于相似度计算的北洋政府文书命名实体识别框架,如图1所示。
该框架涵盖北洋政府文书文本语料处理、基于相似度计算的样例检索方法与实体识别模型评估3个核心模块,旨在实现北洋政府文书资源结构化挖掘与知识重组,为数字人文领域低资源历史文献智慧化处理提供可复用技术路径,推动相关研究从传统方法向数据驱动范式转型。
此模块主要完成北洋政府文书资源文本的数字化与结构化处理。对纸质或影像形态的北洋政府文书进行光学字符识别,将其转化为可供计算的机器文本。随后对识别后的文本进行清洗与格式标准化处理,并在此基础上组织高质量的人工或半自动标注,形成可供模型训练与评估的标注语料,为后续的检索与识别提供基础数据支撑。
北洋政府文书资源兼具文白杂糅、同职异称、机构迭变和标注稀缺四重特征,传统基于词汇重叠或TF-IDF的静态样例库易导致字面相近的误召回,使得下游生成式模型幻觉加剧。针对这一挑战,本研究采用Faiss向量检索引擎,该技术由FaceBookAI团队研发,对10亿量级的索引可以做到毫秒级检索,并支持在高维空间中进行相似性搜索。相较于传统方法,Faiss通过将语料语义表征与北洋政府文书知识耦合,构建与北洋政府文书特点深度适配的动态索引机制,显著提升了模型对民国特有复杂语义关系的捕捉能力。
本研究采用Faiss构建动态索引库,将RoBERTa编码的候选向量集封装为索引,实现毫秒级Top-3召回。向量库构建与相似文本检索如图2所示,向量存储与查询侧共用同一RoBERTa模型。
具体流程分为离线与在线阶段,离线阶段训练集文本经RoBERTa逐句编码后持久化入库;在线查询时,对查询句执行一次前向计算生成查询向量。为抑制北洋文书中同义异写、官职简称等噪声,对查询句与候选池同步执行实体遮罩操作,并以遮罩后的余弦相似度作为检索键值。最终仅取Top-3结果送入Prompt,在压缩上下文长度的同时保留高信息增益,为后续RAG与参数高效微调提供精炼示例支撑。余弦相似度计算如式(1)所示:
Similarity=cos (θ)=A*BAB=i=1nAi × Bii=5nAi2 × i=1nBi2
其中,A、B表示不同实体的编码向量。Similarity的结果如果计算值越逼近1,表示两个实体越相似,可对其进行实体融合,采用统一的实体名称。如果计算值越逼近0,表示两个实体相似度极低,则保留其对应的实体。
在北洋政府文书资源的命名实体识别任务中,Prompt构建是实现领域知识显性化与语义关联结构化的重要路径。其核心目标在于借助针对性的指令设计与典型样例引导,将北洋时期文书所蕴含的领域专属知识,转化为模型可感知、可学习的语义关联规则,为实体识别任务提供知识支撑与语义引导的双重支撑。完整的Prompt设计示例如图3所示。
首先,北洋政府文书资源承载着特定历史背景下的政治、军事及行政等多维度信息,实体识别工作需深度结合该时期的制度环境与历史发展脉络。Prompt借助结构化指令设计与代表性样例选取,使原本内隐的领域知识得以显性表达,助力模型理解并捕捉实体与上下文语境间的深层语义关联。其次,北洋政府文书资源语义结构复杂,常涉及职务与机构之间的层级关系、事件与时间之间的逻辑关联等。Prompt通过明确定义实体类型与文本语义之间的对应规则,并辅以典型样例示范,为模型建立语义关联的识别参照范式,引导模型在复杂语境中精准判定实体类型(如将“豫南总司令官”正确归类为“职务”实体,将“白朗军围攻”识别为“事件”实体)。此外,Prompt构建过程也需兼顾知识表达的可解释性要求。除通过指令明确定义实体类型与语义关联规则外,样例的选取尤其重要。本研究采用Faiss检索方法,筛选出与目标文本语义相似度排名前3的文本作为引导样例,以此拓宽语义关联的覆盖范围、增强样例的领域代表性,进而提升Prompt在北洋政府文书资源实体识别任务中的泛化性能与可解释性。
大语言模型微调环节的核心在于结合上述Prompt指令构建指令数据集。本研究引入LoRA微调策略,有效避免模型在面向北洋政府文书的特异性命名实体识别任务中出现灾难性遗忘问题,进而实现模型对民国时期特有职官称谓、机构名称等历史实体的精准识别。北洋政府文书语料具有历时跨度大、行政行话密集且实体关系隐含等属性,而LoRA微调能够保留大语言模型的通用语言理解能力,显著缓解标注稀缺带来的过拟合风险,并支持多轮迭代。
本文任务的形式化定义如式(2)所示:
MAX Pmρ*γδ(S1,S2,,Sn),X
其中,S1,S2,,Sn为存储在向量库里的候选文本;δ表示相似样例选择函数,通过该函数从候选数据中选出TOP-3相似数据;X是需要进行实体识别的文本;γ为Prompt构建函数;ρ表示标签数据。目标是通过优化样例组织方式来最大化标签输出的概率。
本文使用命名实体识别领域常用的精确率P(Precision)、召回率R(Recall)及F1值(Fl-score)作为本文的评价指标,如式(3)~(5)所示:
P=TPTP + FP
R=TPTP + FN
F1=2 × P × RP + R
其中,TP代表模型识别正确的实体数量,TP + FP代表模型识别出的实体数量。两者比值P为模型成功识别样本的正确率。TP + FN表示文本实际包含实体数量,R为模型识别实体占总实体比例。F1是两者调和平均值,是精确率与召回率平均数的计算方式。
为确保数据来源的权威性、数据构建的严谨性,本文选取中国第二历史档案馆权威出版物《中华民国史档案资料汇编·第3辑》36作为基准语料池。该汇编按政治、军事、外交、财政、经济、文化教育及民众运动等专题分册编排,核心收录馆藏北洋政府各部院档案,并辅之少量同期《政府公报》文件,具备时段闭合性与主题完备性。研究首先对全书进行高精度扫描形成电子版图片,继而采用OCR文本识别,形成原始语料库,并进行两轮人工精校纠正字符误识、符号冗余与断句错误,同时依据《北洋政府职官年表》37《中华民国法规汇编》38等权威工具书进行专名标准化与年代归一,形成高质量纯文本语料。经筛查,共收集6 324篇北洋政府文书文本。
本文通过对比分析6 324份原始文本,将北洋政府文书资源中特有实体类别(如公文类型等)列入实体标注范围,设计包含人物、地点、组织机构、时间、职务、事件、公文类型共7类专有实体类别,具体定义及实例如表1所示。
为保证小样本标注可靠性与一致性,首先进行两名信息资源管理博士前置培训,统一标注规范继而采取双重预标注方式,两名标注者对前10篇北洋政府文书文本进行独立标注,经过对比后对分歧实例进行仲裁并固化为例式规则。其次以前述10篇校准文本为模板,采用分层随机抽样策略抽取200篇北洋政府文书进行正式标注。此外,借助标注工具提供的一致性检验功能,得出本次数据标注一致性(Cohen’s Kappa系数)均达到0.85以上,确保了标注数据高质量。
本研究标签体系采用序列标注集合{B,I,E,S,O}来识别北洋政府文书实体,最终在17 972条数据中标注10 916个有效实体,形成北洋政府文书标注语料库。为有效进行模型训练与性能验证,按照4∶1比例将标注数据集随机分为训练集和验证集,标注数据集各类实体数量如表2所示。
本研究通过大模型平台提供的API接口完成Qwen3-4B39模型调用,针对北洋政府文书资源命名实体识别任务开展参数高效微调,微调参数设置如下:输入序列最大长度设为1 024,初始学习率设为5.0e-05,秩设为16,训练轮次为5。
针对资源受限、成本可控、数据隐私要求高、本地化部署等现实应用场景,本文提出的北洋政府文书资源命名实体识别方法侧重于验证小规模模型在特定领域任务中的潜力,通过耦合LoRA微调与RAG策略,赋予小模型媲美大语言模型的语义理解力。为验证所提方法的稳健性及优化后的轻量化模型在处理复杂北洋文书实体的有效性,本文以大规模大语言模型为上限基准,实验设计采取对照策略。首先在自建语料上复现BERT-BiLSTM-CRF与RoBERTa-BiLSTM-CRF两个深度学习实体识别方法,以验证深度学习方法在低资源场景中的性能上限。其次设计Prompt,引入Llama-3.3-70B-Instruct40、GPT4.141、DeepSeek-R142等大语言模型,测试未经过微调的通用模型能力,同时选取Baichuan2-7B-Base43、Xunzi-Qwen1.5-4B44等经过LoRA微调的模型,与本文提出的结合LoRA微调与RAG机制的Qwen-4B模型进行对比。最后遴选表现最佳模型,通过零样本、随机样例与相似度样例3种策略的对比实验,量化分析不同检索策略在精确率、召回率及F1值上的分布差异,从而证明本文所提方法的有效性。
北洋政府文书资源的命名实体识别各模型实体识别效果显示,不同技术范式的模型性能呈现明显分层差异,反映该领域文本处理的特殊性与挑战,如表3所示。
受限于北洋政府文书资源文白夹杂的语言特质和标注数据稀缺,传统模型难以捕捉深层语义结构及长距离上下文依赖,BERT-BiLSTM-CRF、RoBERTa-BiLSTM-CRF等传统序列标注架构整体效能欠佳,F1值分别为0.549和0.568,精确率与召回率均未突破0.6。
通用大语言模型表现出显著的性能分化。在未针对特定领域微调的情况下,Llama-3.3-70B-Instruct与GPT4.1凭借强大的参数规模与预训练知识,F1值分别达到了0.802与0.787,展现了较强的泛化能力。然而,DeepSeek-R1表现出召回率偏高而精确率不足的不均衡特征,致使其F1值反而不及部分小参数模型。此现象说明,单纯依赖通用大语言模型固然可以识别出更多实体,但若没有特定领域知识的限定,极易产生严重的实体幻觉和错误识别。相比之下,GPT4.1在精确率上表现最佳(0.802),显示出较强的实体辨别能力,但其召回率相对偏低(0.772),可能存在部分实体遗漏。
就参数利用效率及领域迁移能力而言,采用LoRA微调的模型效果参差不齐。从表3可以看出,Baichuan2-7B-Base经微调后性能改善甚微(F1为0.557),与传统方法相当;而针对古籍文献优化的Xunzi-Qwen1.5-4B虽然取得超越传统方法的表现(F1为0.704),但精确率(0.684)仍有待提高,这表明单靠参数高效微调手段在应对复杂的北洋政府文书资源时仍存在不足之处。
融合LoRA与RAG的方法取得了最优性能。本文提出的Qwen3-4B-tuned模型在仅4B参数规模下,通过结合RAG引入外部知识库的显式约束,有效解决了生成式模型的幻觉问题。相较于各基线模型,该方法实现了精确率与召回率的双重突破,分别达到0.847与0.867,最终F1值高达0.857。这证实了在低资源场景下,结合轻量化微调与动态知识检索是实现北洋政府文书资源高精度挖掘的可行技术路径。
图4~图6所示的实体识别实验结果揭示了不同模型在北洋政府文书命名实体识别任务中的显著性能差异与实体类型依赖性。
传统深度学习方法在部分高频常规实体上表现尚可,但在复杂语义实体上识别能力严重不足。具体而言,BERT-BiLSTM-CRF与RoBERTa-BiLSTM-CRF在人物、时间、地点三类实体上的F1值介于0.54~0.75之间,显示出一定的识别稳定性。然而,静态标注语料难以应对北洋政府文书资源文白夹杂、同义异写的近代语言变异,尤其在事件、组织机构等复杂语义实体上近乎失效,F1值普遍低于0.1,呈现典型的头部过拟合、尾部零学习困境。BERT-BiLSTM-CRF在职务识别中召回率达0.800,但精确率仅为0.571,表明该模型倾向于过度识别,将非职务词汇误判为实体,反映了传统序列标注方法在深层语义理解上的局限。
通用大语言模型与古籍专用模型展现出不同的优势特征。DeepSeek-R1在人物实体识别上展现强大的零样本能力,F1值达到0.837。然而,该模型在事件和公文类型识别上的F1值分别仅为0.408和0.300,说明其在缺乏领域知识支撑的情况下,对特定类型实体的理解仍显不足。相比之下,经过古籍领域预训练的Xunzi-Qwen1.5-4B在职务和组织机构识别上表现优异,F1值分别为0.781和0.735,显著优于Baichuan2-7B-Base,证明领域先验知识对于提升在特定历史语境下实体识别效果的重要作用。
本文提出的Qwen3-4B-tuned模型以RAG为核心,综合性能最优,验证了RAG与LoRA微调协同机制的有效性。该模型在除人物外的所有类别中均取得了最高F1值。该模型在组织机构识别中提升至0.854,在公文类型识别中更达到0.924的优异水平。对于其他模型识别效果较差的事件类实体,该模型将F1值显著提高至0.654。这一结果有力验证了RAG与LoRA微调协同机制的有效性,LoRA微调使得模型适应北洋政府文书资源的句式结构,而RAG机制通过动态检索外部知识库,为模型提供上下文约束与事实支撑,从而有效抑制生成式模型在低资源实体上的幻觉倾向,实现对复杂、细粒度实体的准确识别。
本节在自建北洋政府文书数据集上,将本文提出的相似度样例选择(Similarity Selcction,SS)与零样本(Zero-Shot,ZS)、随机采样(Random Sample,RS)两种基线置于同一识别框架下进行对照,比较三者对命名实体识别性能的影响。其中,ZS仅依赖任务描述与待抽取文本,无任何示例,RS以均匀分布随机抽取样例构建Prompt,SS则通过遮罩实体后的RoBERTa余弦相似度召回Top3高相关文本。表4实验结果揭示了样例选择策略对北洋政府文书资源命名实体识别性能的影响,证明性能的提升主要源于方案设计的优化,而非仅依赖于大语言模型的底座能力。
ZS方法虽保持较高精确率(0.807),但召回率骤降至0.117,F1值仅为0.204,表明通用大语言模型在缺乏领域适配时难以识别北洋政府文书资源中的专有实体与特定表述,尽管预测结果可信度尚可,但覆盖率严重不足。RS作为强基准,通过引入域内样例使召回率大幅提升至0.849,F1值达0.822,验证了只需少量标注数据即可实现有效领域迁移。本文提出的SS方法进一步优化,F1值达0.857,较RS提升0.346,且精确率与召回率更趋均衡,体现出语义匹配机制能够筛选与目标文本语境最接近的高质量样例,进而增强模型对历史人物、机构、事件等复杂实体的边界判别与类别判定能力。该对比实验表明,针对标注资源稀缺的历史文献场景,基于相似度的样例检索策略能够使有限标注数据的利用价值最大化,为数字人文领域的低资源场景命名实体识别提供可复用的技术范式。
本研究通过消融实验验证所提方法中各组件的有效性。以Qwen3-4B模型为基座模型设置4组对比实验,分别是基础模型Qwen3-4B模型(指令包含任务描述、任务示例和原始文本三部分),在Qwen3-4B模型的基础上进行LoRA微调的Qwen3-4B-lora模型,在Qwen3-4B模型的基础上进行RAG的Qwen3-4B-rag模型,在Qwen3-4B模型的基础上加入本文提出的方法,即删除文本中实体的影响,再进行相似度计算的Qwen3-4B-tuned模型。表5揭示了各模块对北洋政府文书命名实体识别性能的贡献程度及协同效应。
基座模型Qwen3-4B性能低下,F1值为0.438,这是由于通用大语言模型直接应用于北洋政府文书资源时存在显著领域鸿沟,难以捕获民国时期半白半文语境与特定实体特征。单独引入RAG机制后Qwen3-4B-rag的F1值增加至0.669,这是由于相似样例检索为模型提供关键历史知识与标注模式,能有效弥合领域差异。仅采用LoRA微调后Qwen3-4B-1ora的F1值达到0.692,这表明参数高效微调能够良好适配标注数据有限的命名实体识别情况。本文提出的结合LoRA微调与RAG机制的协同增益Qwen3-4B-tuned模型F1值为0.857,性能达到最优,较基线模型提升0.419,且模型精确率与召回率均有较大提升,归因于RAG与LoRA的协同增益,LoRA注入领域知识以优化表征,RAG动态供给上下文一致的标注范例。通过删除实体影响后计算相似度,能够有效避免检索偏差,确保样例质量,充分验证该方法应用于北洋政府文书资源的科学性与有效性。
为直观体现基于相似度计算的样例检索方法作用,本文选取北洋政府文书文本案例进行分析,如图7所示。
传统的监督学习方法完全遗漏“江”这一关键时间实体。“江”是电报韵目代日,指代本月第三日。传统方法严重依赖标注数据,如果训练集中“江”作为时间实体的样本不足,模型便无法识别。而Zero-shot模型凭借其在海量文本中关于电报和韵目代日相关知识,成功将“江”识别为时间实体,体现其强大的知识迁移和零样本推理能力。传统方法在事件识别中倾向于切分出更短、更具体的短语(如“踞城死守”“屡诱不出”),大模型零样本则识别出更完整、符合自然语言描述的事件单元(如“白朗军固守枣城”)。这表明大模型对语言的理解更偏向于整体语义,而非局部模式匹配。
在RAG支持下,模型找回Zero-shot遗漏的多个实体,如成功识别出“前敌”“人烟稀处”等战术地点,在“江”的基础上补充“数日”“克日”等相对时间实体,增加“克复”“攻破”等核心军事行动。这得益于Faiss向量库基于去实体RoBERTa向量毫秒级Top3召回,精准锁定索引库中最相似的标注范例,激活大模型对同类实体的泛化能力,显著提高了召回率。但RAG性能高度依赖检索质量,若检索的样例覆盖不精准或覆盖面不足,则提升效果有限。从结果看,RAG虽找回更多实体,但在精细的实体类型划分未达到完美。实验表明,高精度Faiss索引是RAG发挥效用的前提。
进一步经LoRA领域微调后,模型对检索样例的利用效率显著提高。输出“白朗军固守枣城”“派重炮轰城”等事件,表明模型能够保持语义完整,避免过度切分。同时抑制“人烟稀处”“数日”“克日”“天主教堂”等表述,表明模型对领域实体的判断阈值和置信度得到了优化。通过模型参数的调整和实时样例的引导,最终实现模型在北洋政府文书领域精确率、召回率和F1值的最佳平衡。
本研究针对北洋政府文书资源语义密度高、实体歧义性强、标注资源稀缺的特性,引入Faiss向量检索引擎,利用RoBERTa模型将候选文本编码为稠密向量,通过删除实体后计算余弦相似度的去偏策略,构建面向北洋政府文书的动态样例索引库,通过RAG与LoRA微调的双路径融合策略,构建面向北洋政府文书资源的命名实体识别框架,同时设计对比实验,涵盖横向基准评测与纵向消融验证双重维度,一方面与BERT-BiLSTM-CRF、RoBERTa-BiLSTM-CRF传统深度学习模型及GPT4.1、Qwen3-4B、Llama-3.3-70B-Instruct、DeepSeek-Rl、Baichuan-4B、Xunzi-Qwen3-8B大语言模型对比,探究最优的命名实体识别方法,另一方面设置零样本、随机采样和相似度选择三组样例检索策略,量化分析检索机制对模型幻觉抑制与边界判别精度的贡献度。该框架实验结果表明,基于相似度计算的样例检索方法结合LoRA微调与RAG机制的协同增益在自建语料库上较基座模型提升0.419,验证该框架在弥合领域鸿沟、抑制模型幻觉的显著优势。研究通过动态知识检索与静态参数优化的协同增益,将非结构化北洋政府文书资源转化为结构化知识单元,为标注资源稀缺、实体歧义性强的北洋政府文书资源提供了可计算、可复用的智能化整理范式。
参考文献 引证文献
排序方式:
1
中华人民共和国中央人民政府.《数字中国建设2025年行动方案》近日印发[EB/OL].[2026-02-24].https://www.gov.cn/lianbo/bumen/202505/content_7024041.htm.
2
郑爽.清末民初文言统一对阅读文化嬗变的影响[J].大学图书馆学报201836(3):57-64.
3
Douze MGuzhva ADeng Cet al.The Faiss Library[EB/OL].[2026-02-24].https://arxiv.org/abs/2401.08281.
4
Zhao PZhang HYu Qet.al.Retrieval-Augmented Generation for AI-Generated Content:A Survey[EB/OL].[2026-02-24].https://arxiv.org/abs/2402.19473v6.
5
Hu E JShen YWallis Pet al.LoRA:Low-Rank Adaptation of Large Language Models[EB/OL].[2026-02-24].https://arxiv.org/abs/2106.09685.
6
Li JSun AHan Jet al.A Survey on Deep Learning for Named Entity Recognition[J].IEEE Transactions on Knowledge and Data Engineering202034(1):50-70.
7
Ehrmann MHamdi APontes ELet al.Named Entity Recognition and Classification in Historical Documents:A Survey[J].ACM Computing Surveys202356(2):1-47.
8
Hu ZHou WLiu X.Deep Learning for Named Entity Recognition:A Survey[J].Neural Computing and Applications202436(16):8995-9022.
9
李纲,潘荣清,毛进,.整合BiLSTM-CRF网络和词典资源的中文电子病历实体识别[J].现代情报202040(4):3-12,58.
10
Su JMurtadha APan Set al.Global Pointer:Novel Efficient Span-based Approach for Named Entity Recognition[EB/OL].[2026-02-24].http://arxiv.org/abs/2208.03054.
11
梁佳,张丽萍,闫盛,.基于大语言模型的命名实体识别研究进展[J].计算机科学与探索2024(10):2594-2615.
12
Wei XCui XCheng Net al.ChatIE:Zero-Shot Information Extraction via Chatting with ChatGPT[EB/OL].[2026-02-24].https://arxiv.org/abs/2302.10205.
13
伊豪涵,王昊,周抒,.基于RAG-LATS的古籍零样本命名实体识别方法[J].数据分析与知识发现202610(1):116-132.
14
范颜铄,周晓英,王克平,.融合GPT技术和用户需求的文学类古籍资源关联数据发布研究——以《聊斋志异·司文郎》为例[J].现代情报202444(10):154-167.
15
杨建梁,王一多,黄美雯,.基于大语言模型的红色档案资源交互式知识发现研究——以《南方局党史资料大事记》为例[J].图书情报工作202569(15):112-123.
16
宋雪雁,张祥青,张伟民.水书习俗非物质文化遗产知识元组织与可视化研究[J].现代情报202343(10):3-15.
17
余池,陈亮,许海云,.基于大语言模型的专利命名实体识别方法研究[J].数据分析与知识发现20259(6):47-62.
18
Xu DChen WPeng Wet al.Large Language Models for Generative Information Extraction:A Survey[EB/OL].[2026-01-18].https://arxiv.org/abs/2312.17617.
19
张颖怡,章成志,周毅,.基于ChatGPT的多视角学术论文实体识别:性能测评与可用性研究[J].数据分析与知识发现20237(9):12-24.
20
Liu XErkoyuncu J AFuh J Y Het.al.Knowledge Extraction for Additive Manufacturing Process via Named Entity Recognition with LLMs[J].Robotics and Computer-Integrated Manufacturing2025(93):102900.
21
刘耀文,夏一雪,张鹏,.国家安全情报战略知识图谱构建与检索增强问答框架研究[J].情报杂志202544(7):165-173.
22
林立涛,王东波,刘江峰,.数字人文视域下典籍动物命名实体识别研究——以SikuBERT 预训练模型为例[J].图书馆论坛202242(10):42-50.
23
刘忠宝,党建飞,张志剑.《史记》历史事件自动抽取与事理图谱构建研究[J].图书情报工作202064(11):116- 124.
24
斯日古楞,林民,郭振东,.基于提示学习和抽取式阅读理解的古籍礼仪实体关系联合抽取方法研究[J].数据分析与知识发现20259(3):147-159.
25
梁继红.走向文本的历史档案数字整理:历史追溯与时代转型(下)[J].档案学通讯2022(1):60-66.
26
张蓓.数字人文视野下徽州文书档案开发利用研究[J].档案管理2022(2):68-70.
27
钟远薪,王蕾,杨新涯,.徽州文书文本化语音识别技术应用研究[J].图书馆论坛202343(2):49-56,2.
28
施晓华,王昕.数字人文社会网络分析方法应用与研究[J].图书馆杂志202039(5):93-99.
29
汤萌,陆星宇.民间文书中账簿资源元数据模型与空间可视化应用研究[J].图书馆杂志202140(12):62-67.
30
王蕾,薛玉,肖鹏,.民间历史文献数字人文图书馆构建——以徽州文书数字人文图书馆实践反思为例[J].图书馆论坛201838(3):30-36.
31
汤萌,赵思渊.民间文书的数字化建设与资源挖掘——以上海交通大学图书馆馆藏为中心[J].档案学通讯2020(6):14-21.
32
姜育彦,刘雪立.数字人文视域下缩微资料的保护与新生——以Digital Cicognara Library为例[J].数字图书馆论坛2022(3):47-52.
33
徐家成.众包模式应用于民国档案数字化工作的设想[J].兰台世界2018(11):35-38.
34
郭佳.数字人文与人工智能融合视角下民间文书档案数字化流程重构思考[J].兰台世界2025(12):109-112.
35
陈宇.古籍数字人文平台对民国档案开放利用的借鉴意义[J].档案管理2021(3):88-89,91.
36
中国第二历史档案馆.中华民国史档案资料汇编·第3辑[M].南京:凤凰出版社,1991.
37
钱实甫.北洋政府职官年表[M].上海:华东师范大学出版社,1991.
38
立法院编译处.中华民国法规汇编[M].上海:中华书局,1934.
39
Team Qwen.Qwen3:Think Deeper,Act Faster[EB/OL].[2026-02-24].https://qwenlm.github.io/zh/blog/qwen3/.
40
Touvron HLavril TIzacard Get al.LLaMA:Open and Efficient Foundation Language Models[EB/OL].[2026-02-24].https://arxiv.org/abs/2302.13971.
41
Open AI.Introducing GPT-4.1 in the API[EB/OL].[2026-02-24].https://openai.com/index/gpt-4-1/.
42
DeepSeek-AI,Guo DYang Det al.DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning[EB/OL].[2026-02-24].https://arxiv.org/abs/2501.12948.
43
Yang AXiao BWang Bet al.Baichuan 2:Open Large-scale Language Models[EB/OL].[2026-02-24].https://arxiv.org/abs/2309.10305.
44
南京农业大学.荀子大语言模型[EB/OL].[2026-02-24].https://xunziallm.njau.edu.cn/.
2026年第46卷第3期
PDF下载
23
9
引用本文
BibTeX
文章信息
doi: 10.3969/j.issn.1008-0821.2026.03.004
  • 接收时间:2026-01-09
  • 首发时间:2026-06-05
  • 出版时间:2026-03-01
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2026-01-09
基金
作者信息
    1吉林大学商学与管理学院,吉林长春130012

通讯作者:

邓君(1977-),女,教授、博士,博士生导师,研究方向:数字信息资源管理,数字人文与知识服务,档案管理与应用。
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/xdqb/CN/10.3969/j.issn.1008-0821.2026.03.004
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏