Article(id=1268884547455320084, tenantId=1146029695717560320, journalId=1268266580820377661, issueId=1268884383122494171, articleNumber=null, orderNo=null, doi=10.3969/j.issn.1001-4632.2026.02.20, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=null, receivedDate=1750694400000, receivedDateStr=2025-06-24, revisedDate=1773244800000, revisedDateStr=2026-03-12, acceptedDate=null, acceptedDateStr=null, onlineDate=1780455250847, onlineDateStr=2026-06-03, pubDate=1772294400000, pubDateStr=2026-03-01, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1780455250847, onlineIssueDateStr=2026-06-03, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1780455250847, creator=13701087609, updateTime=1780455250847, updator=13701087609, issue=Issue{id=1268884383122494171, tenantId=1146029695717560320, journalId=1268266580820377661, year='2026', volume='47', issue='2', pageStart='1', pageEnd='255', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=1, specialIssue=null, createTime=1780455211667, creator=13701087609, updateTime=1780455310713, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1268884798719300557, tenantId=1146029695717560320, journalId=1268266580820377661, issueId=1268884383122494171, language=EN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1268884798723494862, tenantId=1146029695717560320, journalId=1268266580820377661, issueId=1268884383122494171, language=CN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=232, endPage=243, ext={EN=ArticleExt(id=1268884548587782166, articleId=1268884547455320084, tenantId=1146029695717560320, journalId=1268266580820377661, language=EN, title=Research on the Application of Intelligent Recognition Technology for Sensitive Railway Ticket Data Based on Data Knowledge Base, columnId=null, journalTitle=China Railway Science, columnName=null, runingTitle=null, highlight=null, articleAbstract=

To address the data security risks arising from the explosive growth of railway passenger transport data, the core lies in achieving intelligent identification and dynamic protection of sensitive information. Then, an intelligent identification technology for sensitive data in railway passenger tickets based on data knowledge base is proposed. Firstly, a three-level knowledge base of "laws and regulations-industry standards-enterprise norms" is constructed. Secondly, combined with historical railway passenger ticket data, a multi-level intelligent identification algorithm for sensitive data is designed, thereby efficiently and accurately identifying sensitive information in multi-modal data. On this basis, the graph technology is finally introduced to construct a data asset and sensitive data lineage graph, and based on the topological relationship of data flow, the efficient propagation of sensitive information labels among related data nodes is achieved. The results show that the sensitive information identification efficiency of the proposed technology reaches about 217 000 messages per second in structured data processing, which is almost twice as high as the traditional solution. In unstructured data processing, through domain knowledge graphs injection, the F1 value of sensitive entity recognition is increased to 91.24%, and the context misjudgment rate is reduced to 5.88%. The accuracy of text extraction and sensitive information recognition of multimedia images reaches 93.71%. This technology can significantly improve the accuracy and processing efficiency of sensitive data identification in railway passenger tickets.

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Xiaopei HAO, Zhiyuan YAN, Junfeng ZHANG, Wen LI, Xiangkun LIU, Ruijun SHI), CN=ArticleExt(id=1268884555785207859, articleId=1268884547455320084, tenantId=1146029695717560320, journalId=1268266580820377661, language=CN, title=基于数据知识库的铁路客票敏感数据智能识别技术应用研究, columnId=0, journalTitle=中国铁道科学, columnName=, runingTitle=null, highlight=null, articleAbstract=

为应对铁路客运数据规模激增衍生的数据安全风险,实现敏感信息智能识别与动态防护,提出基于数据知识库的铁路客票敏感数据智能识别技术。通过构建“法律法规—行业标准—企业规范”3级知识库,结合铁路客票历史数据,设计多层次敏感数据智能识别算法,实现对多模态数据中敏感信息的高效精准识别。在此基础上引入图技术,构建数据资产及敏感数据血缘关系图谱,依据数据间流转拓扑关系,完成敏感信息标签在相关数据节点间的高效传播。结果表明:所提技术在结构化数据处理方面,可实现约21.7万条 · s-1的敏感信息识别效率,约为传统方案的2倍;在非结构化数据处理方面,通过领域知识图谱注入,将敏感实体识别的F1值提升至91.24%,上下文误判率下降至5.88%;多媒体图片文本提取及敏感信息识别准确率达93.71%。该技术可显著提升铁路客票敏感数据识别的准确性与处理效率。

, correspAuthors=null, authorNote=null, correspAuthorsNote=
李雯(1987—),男,湖南邵阳人,副研究员。E-mail:
, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=Mm8jy/mt1Gr3CZuuiCOQqA==, magXml=8CK1WmY/8UI6XoYiVUCVUQ==, pdfUrl=null, pdf=CfVXle2jW/rDpZkCjiUGHg==, pdfFileSize=1525818, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=HeNmGk6RQioXXP5abmNlLA==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=w4KFwFPGH0ZNkrngYt77yA==, mapNumber=null, authorCompany=null, fund=null, authors=

郝晓培(1990—),男,河南林州人,助理研究员。E-mail:

, authorsList=郝晓培, 阎志远, 张军锋, 李雯, 刘相坤, 石瑞君)}, authors=[Author(id=1268884556217221177, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=linuxstar@126.com, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1268884556309495868, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884556217221177, language=EN, stringName=Xiaopei HAO, firstName=Xiaopei, middleName=null, lastName=HAO, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1268884556552765501, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884556217221177, language=CN, stringName=郝晓培, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081, bio={"content":"

郝晓培(1990—),男,河南林州人,助理研究员。E-mail:

"}, bioImg=null, bioContent=

郝晓培(1990—),男,河南林州人,助理研究员。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1268884556133335093, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, xref=1., ext=[AuthorCompanyExt(id=1268884556145918006, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China), AuthorCompanyExt(id=1268884556154306615, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081)])]), Author(id=1268884556624068671, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1268884556980584513, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884556624068671, language=EN, stringName=Zhiyuan YAN, firstName=Zhiyuan, middleName=null, lastName=YAN, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1268884557064470594, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884556624068671, language=CN, stringName=阎志远, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1268884556133335093, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, xref=1., ext=[AuthorCompanyExt(id=1268884556145918006, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China), AuthorCompanyExt(id=1268884556154306615, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081)])]), Author(id=1268884557404209220, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1268884557488095302, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884557404209220, language=EN, stringName=Junfeng ZHANG, firstName=Junfeng, middleName=null, lastName=ZHANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1268884557815251015, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884557404209220, language=CN, stringName=张军锋, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1268884556133335093, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, xref=1., ext=[AuthorCompanyExt(id=1268884556145918006, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China), AuthorCompanyExt(id=1268884556154306615, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081)])]), Author(id=1268884557907525705, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=1556860343@qq.com, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1268884559618801739, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884557907525705, language=EN, stringName=Wen LI, firstName=Wen, middleName=null, lastName=LI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1268884559711076428, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884557907525705, language=CN, stringName=李雯, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1268884556133335093, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, xref=1., ext=[AuthorCompanyExt(id=1268884556145918006, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China), AuthorCompanyExt(id=1268884556154306615, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081)])]), Author(id=1268884559912403022, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, orderNo=4, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1268884560021454928, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884559912403022, language=EN, stringName=Xiangkun LIU, firstName=Xiangkun, middleName=null, lastName=LIU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1268884560331833425, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884559912403022, language=CN, stringName=刘相坤, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1268884556133335093, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, xref=1., ext=[AuthorCompanyExt(id=1268884556145918006, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China), AuthorCompanyExt(id=1268884556154306615, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081)])]), Author(id=1268884560386359379, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, orderNo=5, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1268884560755458133, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884560386359379, language=EN, stringName=Ruijun SHI, firstName=Ruijun, middleName=null, lastName=SHI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1268884560822566998, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, authorId=1268884560386359379, language=CN, stringName=石瑞君, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1268884556133335093, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, xref=1., ext=[AuthorCompanyExt(id=1268884556145918006, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China), AuthorCompanyExt(id=1268884556154306615, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081)])])], keywords=[Keyword(id=1268884561200054359, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, orderNo=1, keyword=Sensitive data), Keyword(id=1268884561279746136, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, orderNo=2, keyword=Knowledge base), Keyword(id=1268884561590124633, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, orderNo=3, keyword=Railway ticket), Keyword(id=1268884561678205018, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, orderNo=4, keyword=Intelligent recognition), Keyword(id=1268884562022137947, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, orderNo=5, keyword=Label propagation), Keyword(id=1268884562424791132, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, orderNo=6, keyword=Lineage graph), Keyword(id=1268884564010238045, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, orderNo=1, keyword=敏感数据), Keyword(id=1268884564106707038, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, orderNo=2, keyword=知识库), Keyword(id=1268884564186398815, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, orderNo=3, keyword=铁路客票), Keyword(id=1268884564521943136, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, orderNo=4, keyword=智能识别), Keyword(id=1268884564584857697, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, orderNo=5, keyword=标签传播), Keyword(id=1268884564945567842, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, orderNo=6, keyword=血缘关系图谱)], refs=[Reference(id=1268884569920012421, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2021, volume=42, issue=5, pageStart=162, pageEnd=173, url=null, language=null, rfNumber=[1], rfOrder=0, authorNames=单杏花, 张志强, 宁斐, journalName=中国铁道科学, refType=null, unstructuredReference=单杏花,张志强,宁斐,.中国铁路电子客票关键技术应用与系统实现[J].中国铁道科学202142 (5):162-173., articleTitle=中国铁路电子客票关键技术应用与系统实现, refAbstract=null), Reference(id=1268884570008092806, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2021, volume=42, issue=5, pageStart=162, pageEnd=173, url=null, language=null, rfNumber=[1], rfOrder=1, authorNames=SHAN Xinghua, ZHANG Zhiqiang, NING Fei, journalName=China Railway Science, refType=null, unstructuredReference=SHAN XinghuaZHANG ZhiqiangNING Feiet al. Key Technology Application and System Implementation of China Railway Electronic Ticket [J]. China Railway Science202142 (5): 162-173. in Chinese, articleTitle=Key Technology Application and System Implementation of China Railway Electronic Ticket, refAbstract=null), Reference(id=1268884570087784583, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2024, volume=47, issue=23, pageStart=105, pageEnd=112, url=null, language=null, rfNumber=[2], rfOrder=2, authorNames=符泽凡, 姚竟发, 滕桂法, journalName=现代电子技术, refType=null, unstructuredReference=符泽凡,姚竟发,滕桂法.基于BERT模型的网站敏感信息识别及其变体还原技术研究[J].现代电子技术202447(23):105-112., articleTitle=基于BERT模型的网站敏感信息识别及其变体还原技术研究, refAbstract=null), Reference(id=1268884570159087752, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2024, volume=47, issue=23, pageStart=105, pageEnd=112, url=null, language=null, rfNumber=[2], rfOrder=3, authorNames=FU Zefan, YAO Jingfa, TENG Guifa, journalName=Modern Electronics Technique, refType=null, unstructuredReference=FU ZefanYAO JingfaTENG Guifa. Research on Website Sensitive Information Identification and Variant Restoration Technology Based on BERT Model [J]. Modern Electronics Technique202447 (23): 105-112. in Chinese, articleTitle=Research on Website Sensitive Information Identification and Variant Restoration Technology Based on BERT Model, refAbstract=null), Reference(id=1268884570259751049, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2016, volume=50, issue=9, pageStart=80, pageEnd=84, url=null, language=null, rfNumber=[3], rfOrder=4, authorNames=李扬, 潘泉, 杨涛, journalName=西安交通大学学报, refType=null, unstructuredReference=李扬,潘泉,杨涛.基于短文本情感分析的敏感信息识别[J].西安交通大学学报201650(9):80-84., articleTitle=基于短文本情感分析的敏感信息识别, refAbstract=null), Reference(id=1268884570343637130, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2016, volume=50, issue=9, pageStart=80, pageEnd=84, url=null, language=null, rfNumber=[3], rfOrder=5, authorNames=LI Yang, PAN Quan, YANG Tao, journalName=Journal of Xi’an Jiaotong University, refType=null, unstructuredReference=LI YangPAN QuanYANG Tao. Sensitive Information Recognition Based on Short Text Sentiment Analysis [J]. Journal of Xi’an Jiaotong University201650 (9): 80-84. in Chinese, articleTitle=Sensitive Information Recognition Based on Short Text Sentiment Analysis, refAbstract=null), Reference(id=1268884570427523211, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2019, volume=4, issue=5, pageStart=1385, pageEnd=1398, url=null, language=null, rfNumber=[4], rfOrder=6, authorNames=LI W P, WU H Y, YANG J, journalName=Discrete and Continuous Dynamical Systems-S, refType=null, unstructuredReference=LI W PWU H YYANG J. Intelligent Recognition Algorithm for Social Network Sensitive Information Based on Classification Technology [J]. Discrete and Continuous Dynamical Systems-S2019, 12 (4/5): 1385-1398., articleTitle=Intelligent Recognition Algorithm for Social Network Sensitive Information Based on Classification Technology, refAbstract=null), Reference(id=1268884570507214988, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2019, volume=7, issue=null, pageStart=21527, pageEnd=21538, url=null, language=null, rfNumber=[5], rfOrder=7, authorNames=XU G, WU X, YAO H, journalName=IEEE Access, refType=null, unstructuredReference=XU GWU XYAO Het al. Research on Topic Recognition of Network Sensitive Information Based on SW-LDA Model [J]. IEEE Access20197: 21527-21538., articleTitle=Research on Topic Recognition of Network Sensitive Information Based on SW-LDA Model, refAbstract=null), Reference(id=1268884570582712461, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2019, volume=7, issue=null, pageStart=96177, pageEnd=96190, url=null, language=null, rfNumber=[6], rfOrder=8, authorNames=XU G, YU Z, CHEN Z, journalName=IEEE Access, refType=null, unstructuredReference=XU GYU ZCHEN Zet al. Sensitive Information Topics-Based Sentiment Analysis Method for Big Data [J]. IEEE Access20197: 96177-96190., articleTitle=Sensitive Information Topics-Based Sentiment Analysis Method for Big Data, refAbstract=null), Reference(id=1268884570674987150, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2007, volume=null, issue=null, pageStart=255, pageEnd=261, url=null, language=null, rfNumber=[7], rfOrder=9, authorNames=KRASSER S, TANG Y, GOULD J, journalName=null, refType=null, unstructuredReference=KRASSER STANG YGOULD Jet al. Identifying Image Spam Based on Header and File Properties Using C4.5 Decision Trees and Support Vector Machine Learning [C]// 2007 IEEE SMC Information Assurance and Security Workshop. New York: IEEE, 2007: 255-261., articleTitle=Identifying Image Spam Based on Header and File Properties Using C4.5 Decision Trees and Support Vector Machine Learning, refAbstract=null), Reference(id=1268884570750484623, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2018, volume=null, issue=null, pageStart=433, pageEnd=443, url=null, language=null, rfNumber=[8], rfOrder=10, authorNames=WANG W, WU Y, PALAIAHNAKOTE S, journalName=null, refType=null, unstructuredReference=WANG WWU YPALAIAHNAKOTE Set al. Cloud of Line Distribution for Arbitrary Text Detection in Scene/Video/License Plate Images [C]// Advances in Multimedia Information Processing-PCM 2017. Cham: Springer International Publishing, 2018: 433-443., articleTitle=Cloud of Line Distribution for Arbitrary Text Detection in Scene/Video/License Plate Images, refAbstract=null), Reference(id=1268884570817593488, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2017, volume=39, issue=11, pageStart=2298, pageEnd=2304, url=null, language=null, rfNumber=[9], rfOrder=11, authorNames=SHI B, BAI X, YAO C, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=SHI BBAI XYAO C. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence201739 (11): 2298-2304, articleTitle=An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition, refAbstract=null), Reference(id=1268884570893090961, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2024, volume=null, issue=5, pageStart=28, pageEnd=30, url=null, language=null, rfNumber=[10], rfOrder=12, authorNames=汪庆, 陈杰, journalName=网络安全技术与应用, refType=null, unstructuredReference=汪庆,陈杰.深度包检测技术中的正则表达式匹配研究综述[J].网络安全技术与应用2024(5):28-30., articleTitle=深度包检测技术中的正则表达式匹配研究综述, refAbstract=null), Reference(id=1268884570968588434, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2024, volume=null, issue=5, pageStart=28, pageEnd=30, url=null, language=null, rfNumber=[10], rfOrder=13, authorNames=WANG Qing, CHEN Jie, journalName=Network Security Technology & Application, refType=null, unstructuredReference=WANG QingCHEN Jie. A Review of Regular Expression Matching Research in Deep Packet Inspection Technology [J]. Network Security Technology & Application2024 (5): 28-30. in Chinese, articleTitle=A Review of Regular Expression Matching Research in Deep Packet Inspection Technology, refAbstract=null), Reference(id=1268884571044085907, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2022, volume=44, issue=2, pageStart=581, pageEnd=590, url=null, language=null, rfNumber=[11], rfOrder=14, authorNames=姜海洋, 李雪菲, 杨晔, journalName=电子与信息学报, refType=null, unstructuredReference=姜海洋,李雪菲,杨晔.基于距离比较的AC自动机并行匹配算法[J].电子与信息学报202244(2):581-590., articleTitle=基于距离比较的AC自动机并行匹配算法, refAbstract=null), Reference(id=1268884571111194772, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2022, volume=44, issue=2, pageStart=581, pageEnd=590, url=null, language=null, rfNumber=[11], rfOrder=15, authorNames=JIANG Haiyang, LI Xuefei, YANG Ye, journalName=Journal of Electronics & Information Technology, refType=null, unstructuredReference=JIANG HaiyangLI XuefeiYANG Yeet al. Distance Comparison Based Parallel Pattern Matching [J]. Journal of Electronics & Information Technology202244 (2): 581-590. in Chinese, articleTitle=Distance Comparison Based Parallel Pattern Matching, refAbstract=null), Reference(id=1268884571195080853, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2025, volume=53, issue=7, pageStart=155, pageEnd=164, url=null, language=null, rfNumber=[12], rfOrder=16, authorNames=杨虹, 孟晓凯, 俞华, journalName=电力系统保护与控制, refType=null, unstructuredReference=杨虹,孟晓凯,俞华,.基于BERT模型的主设备缺陷诊断方法研究[J].电力系统保护与控制202553 (7):155-164., articleTitle=基于BERT模型的主设备缺陷诊断方法研究, refAbstract=null), Reference(id=1268884571283161238, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2025, volume=53, issue=7, pageStart=155, pageEnd=164, url=null, language=null, rfNumber=[12], rfOrder=17, authorNames=YANG Hong, MENG Xiaokai, YU Hua, journalName=Power System Protection and Control, refType=null, unstructuredReference=YANG HongMENG XiaokaiYU Huaet al. Research on Primary Equipment Defect Diagnosis Method Based on the BERT Model [J]. Power System Protection and Control202553 (7): 155-164. in Chinese, articleTitle=Research on Primary Equipment Defect Diagnosis Method Based on the BERT Model, refAbstract=null), Reference(id=1268884572939911319, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2020, volume=42, issue=11, pageStart=1433, pageEnd=1448, url=null, language=null, rfNumber=[13], rfOrder=18, authorNames=白志程, 李擎, 陈鹏, journalName=工程科学学报, refType=null, unstructuredReference=白志程,李擎,陈鹏,.自然场景文本检测技术研究综述[J].工程科学学报202042(11):1433-1448., articleTitle=自然场景文本检测技术研究综述, refAbstract=null), Reference(id=1268884573040574616, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2020, volume=42, issue=11, pageStart=1433, pageEnd=1448, url=null, language=null, rfNumber=[13], rfOrder=19, authorNames=BAI Zhicheng, LI Qing, CHEN Peng, journalName=Chinese Journal of Engineering, refType=null, unstructuredReference=BAI ZhichengLI QingCHEN Penget al. Text Detection in Natural Scenes: a Literature Review [J]. Chinese Journal of Engineering202042 (11): 1433-1448. in Chinese, articleTitle=Text Detection in Natural Scenes: a Literature Review, refAbstract=null), Reference(id=1268884573116072089, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=null, pageStart=498, pageEnd=517, url=null, language=null, rfNumber=[14], rfOrder=20, authorNames=KIM G, HONG T, YIM M, journalName=null, refType=null, unstructuredReference=KIM GHONG T, YIM M, et al. OCR-Free Document Understanding Transformer [C]// Computer Vision-ECCV 2022. Cham: Springer Nature Switzerland, 2022: 498-517., articleTitle=OCR-Free Document Understanding Transformer, refAbstract=null), Reference(id=1268884573204152474, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[15], rfOrder=21, authorNames=RADFORD A, KIM J W, HALLACY C, journalName=ArXiv e-Prints, refType=null, unstructuredReference=RADFORD AKIM J WHALLACY Cet al. Learning Transferable Visual Models from Natural Language Supervision [J]. ArXiv e-Prints2021: arXiv:2103.00020 [cs. CV]., articleTitle=Learning Transferable Visual Models from Natural Language Supervision, refAbstract=null), Reference(id=1268884573279649947, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2012, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[16], rfOrder=22, authorNames=ISLAM N S, RAHMAN M W, JOSE J, journalName=null, refType=null, unstructuredReference=ISLAM N SRAHMAN M WJOSE Jet al. High Performance RDMA-Based Design of HDFS over InfiniBand [C]// SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. New York: IEEE, 2012., articleTitle=High Performance RDMA-Based Design of HDFS over InfiniBand, refAbstract=null), Reference(id=1268884573355147420, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=135, pageEnd=146, url=null, language=null, rfNumber=[17], rfOrder=23, authorNames=MONDAL S, MUKHERJEE N, journalName=null, refType=null, unstructuredReference=MONDAL SMUKHERJEE N. Efficient NoSQL Graph Database for Storage and Access of Health Data [C]// Computer Communication, Networking and IoT. Singapore: Springer, 2021: 135-146., articleTitle=Efficient NoSQL Graph Database for Storage and Access of Health Data, refAbstract=null), Reference(id=1268884573418061981, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2024, volume=41, issue=1, pageStart=76, pageEnd=82, url=null, language=null, rfNumber=[18], rfOrder=24, authorNames=潘晓华, 金泳, 高扬华, journalName=计算机应用研究, refType=null, unstructuredReference=潘晓华,金泳,高扬华,.面向复杂数据审计需求的数据血缘构建方法[J].计算机应用研究202441(1):76-82., articleTitle=面向复杂数据审计需求的数据血缘构建方法, refAbstract=null), Reference(id=1268884573480976542, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2024, volume=41, issue=1, pageStart=76, pageEnd=82, url=null, language=null, rfNumber=[18], rfOrder=25, authorNames=PAN Xiaohua, JIN Yong, GAO Yanghua, journalName=Application Research of Computers, refType=null, unstructuredReference=PAN XiaohuaJIN YongGAO Yanghuaet al. Data Lineage Construction Method for Complex Data Audit Requirements [J]. Application Research of Computers202441 (1): 76-82. in Chinese, articleTitle=Data Lineage Construction Method for Complex Data Audit Requirements, refAbstract=null), Reference(id=1268884573569056927, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, doi=null, pmid=null, pmcid=null, year=2025, volume=71, issue=2, pageStart=3930, pageEnd=3940, url=null, language=null, rfNumber=[19], rfOrder=26, authorNames=XIE Y R, journalName=IEEE Transactions on Consumer Electronics, refType=null, unstructuredReference=XIE Y R. Application of CRNN and OpenGL in Intelligent Landscape Design Systems Utilizing Internet of Things, Explainable Artificial Intelligence, and Drone Technology [J]. IEEE Transactions on Consumer Electronics202571 (2): 3930-3940., articleTitle=Application of CRNN and OpenGL in Intelligent Landscape Design Systems Utilizing Internet of Things, Explainable Artificial Intelligence, and Drone Technology, refAbstract=null)], funds=null, companyList=[AuthorCompany(id=1268884556133335093, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, xref=1., ext=[AuthorCompanyExt(id=1268884556145918006, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China), AuthorCompanyExt(id=1268884556154306615, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, companyId=1268884556133335093, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081)])], figs=[ArticleFig(id=1268884565075591267, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=Mga69QWPI/uRfiw8yUThRA==, figureFileBig=HeNmGk6RQioXXP5abmNlLA==, tableContent=null), ArticleFig(id=1268884565398552676, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图1, caption=铁路客运敏感信息识别体系框架, figureFileSmall=Mga69QWPI/uRfiw8yUThRA==, figureFileBig=HeNmGk6RQioXXP5abmNlLA==, tableContent=null), ArticleFig(id=1268884565553741925, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=ncbWDPtWJbaZaL8CVpFhhw==, figureFileBig=wSwF/4B0HSQ0OIMXUsfaQw==, tableContent=null), ArticleFig(id=1268884565784428646, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图2, caption=铁路敏感信息知识库构建技术框架, figureFileSmall=ncbWDPtWJbaZaL8CVpFhhw==, figureFileBig=wSwF/4B0HSQ0OIMXUsfaQw==, tableContent=null), ArticleFig(id=1268884565859926119, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=l3gpRKNCLKMyTt06ayLU2w==, figureFileBig=+yQQWzZBCodjyyZqStcYtg==, tableContent=null), ArticleFig(id=1268884565922840680, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图3, caption=铁路敏感信息知识库示例, figureFileSmall=l3gpRKNCLKMyTt06ayLU2w==, figureFileBig=+yQQWzZBCodjyyZqStcYtg==, tableContent=null), ArticleFig(id=1268884565989949545, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=qr3L2BW+nboElAMdSTh4+g==, figureFileBig=M6vruPIis7hQv5K6X3RD/g==, tableContent=null), ArticleFig(id=1268884566078029930, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图4, caption=多模态敏感信息混合识别引擎数据处理链路, figureFileSmall=qr3L2BW+nboElAMdSTh4+g==, figureFileBig=M6vruPIis7hQv5K6X3RD/g==, tableContent=null), ArticleFig(id=1268884566145138795, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=TizC6hhHOXCE5fp1CFiMNQ==, figureFileBig=AYMRaTMEsjQ5NX9pnXSwSg==, tableContent=null), ArticleFig(id=1268884566224830572, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图5, caption=融合的识别机制框架, figureFileSmall=TizC6hhHOXCE5fp1CFiMNQ==, figureFileBig=AYMRaTMEsjQ5NX9pnXSwSg==, tableContent=null), ArticleFig(id=1268884566291939437, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=8wBT/5WsEkgzN2opagISnw==, figureFileBig=3K5r1FHmGm0U96TycXpnaQ==, tableContent=null), ArticleFig(id=1268884566359048302, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图6, caption=非结构化数据敏感信息识别模型架构, figureFileSmall=8wBT/5WsEkgzN2opagISnw==, figureFileBig=3K5r1FHmGm0U96TycXpnaQ==, tableContent=null), ArticleFig(id=1268884566463905903, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=H+HZpf6PEmCLIE28OnL2VA==, figureFileBig=F9iwX+h/cfXVUffRTnNVnw==, tableContent=null), ArticleFig(id=1268884566551986288, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图7, caption=多媒体图像数据敏感信息识别框架, figureFileSmall=H+HZpf6PEmCLIE28OnL2VA==, figureFileBig=F9iwX+h/cfXVUffRTnNVnw==, tableContent=null), ArticleFig(id=1268884566644260977, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=+uFd3k1LtI0bFGgs8dOqcQ==, figureFileBig=gaDPcW+2QOVDbM3AZSVjjQ==, tableContent=null), ArticleFig(id=1268884566732341362, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图8, caption=铁路客运系统数据流转关系(单位:条), figureFileSmall=+uFd3k1LtI0bFGgs8dOqcQ==, figureFileBig=gaDPcW+2QOVDbM3AZSVjjQ==, tableContent=null), ArticleFig(id=1268884568401674355, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=wIdaWiPZoeVulbwTZe5V/A==, figureFileBig=+QJ94a6gSod7Ig4f4SO4RA==, tableContent=null), ArticleFig(id=1268884568514920564, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图9, caption=数据血缘关系图谱, figureFileSmall=wIdaWiPZoeVulbwTZe5V/A==, figureFileBig=+QJ94a6gSod7Ig4f4SO4RA==, tableContent=null), ArticleFig(id=1268884568598806645, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=v3IrCW5JlkSgkjwqJZDjYw==, figureFileBig=EFhAmd5tRLlhnAz9HlRLRg==, tableContent=null), ArticleFig(id=1268884568665915510, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=图10, caption=场景3试验流程, figureFileSmall=v3IrCW5JlkSgkjwqJZDjYw==, figureFileBig=EFhAmd5tRLlhnAz9HlRLRg==, tableContent=null), ArticleFig(id=1268884568749801591, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
名称依据特征描述典型示例敏感程度
身份证号《个人信息保护法》第4条唯一绑定自然人身份,泄露可导致诈骗、身份冒用110105******123X1.0
手机号《网络安全法》第41条结合其他信息可定位个人,易引发骚扰电话、钓鱼攻击138****11110.9
出行记录《数据安全法》第3条反映个人行为轨迹,泄露可能威胁人身安全20250423高铁G101次北京→上海1.0
支付信息《网络安全法》第21条直接关联资金安全,泄露可能导致财产损失银行卡号6217****12341.0
联系地址《个人信息保护法》第6条住址泄露易引发人身安全风险北京市海淀区**路0.8
未成年人信息《个人信息保护法》第29条不满14周岁人群信息受特殊保护,泄露危害更严重儿童出生日期、监护人联系方式1.0
违禁词库法律法规/社会文化暴恐违禁、文本色情、政治敏感、低俗辱骂等1.0
), ArticleFig(id=1268884568829493368, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=表1, caption=

敏感信息实体(部分)

, figureFileSmall=null, figureFileBig=null, tableContent=
名称依据特征描述典型示例敏感程度
身份证号《个人信息保护法》第4条唯一绑定自然人身份,泄露可导致诈骗、身份冒用110105******123X1.0
手机号《网络安全法》第41条结合其他信息可定位个人,易引发骚扰电话、钓鱼攻击138****11110.9
出行记录《数据安全法》第3条反映个人行为轨迹,泄露可能威胁人身安全20250423高铁G101次北京→上海1.0
支付信息《网络安全法》第21条直接关联资金安全,泄露可能导致财产损失银行卡号6217****12341.0
联系地址《个人信息保护法》第6条住址泄露易引发人身安全风险北京市海淀区**路0.8
未成年人信息《个人信息保护法》第29条不满14周岁人群信息受特殊保护,泄露危害更严重儿童出生日期、监护人联系方式1.0
违禁词库法律法规/社会文化暴恐违禁、文本色情、政治敏感、低俗辱骂等1.0
), ArticleFig(id=1268884568913379449, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
字段类型正则表达式
身份证号/^[1-6]\d{5}(18|19|20)\d{2}(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01])\d{3}[\dXx]$/
手机号^(?:(?:\+|00)86)?1(?:3\d|4[5-79]|5[0-35-9]|6[5-7]|7[0-8]|8\d|9[189])\d{8}$
电子邮箱^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,}$
), ArticleFig(id=1268884568976294010, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=表2, caption=

典型敏感信息字段及正则表达式

, figureFileSmall=null, figureFileBig=null, tableContent=
字段类型正则表达式
身份证号/^[1-6]\d{5}(18|19|20)\d{2}(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01])\d{3}[\dXx]$/
手机号^(?:(?:\+|00)86)?1(?:3\d|4[5-79]|5[0-35-9]|6[5-7]|7[0-8]|8\d|9[189])\d{8}$
电子邮箱^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,}$
), ArticleFig(id=1268884569064374395, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
Transformer层数特征抽象重点
1—4

局部短语关联:“G3次”与“上海→北京西”;

“身份证”与两个证件号的对应关系;

“学生票”“儿童票”“成人票价”的近邻关联。

5—8

跨句逻辑关联:“购买了2张…但按成人票价出”的矛盾关系;

“联系电话”与“199XXXX8765”的绑定;

“以前可以正常购买”与当前异常的对比。

9—12

整体意图:“咨询学生票购票异常原因,并提供身份信息和联系方式”;

核心实体:身份证号、手机号、车次、时间、上下车站等关键信息的全局定位。

), ArticleFig(id=1268884569148260476, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=表3, caption=

意图识别结果

, figureFileSmall=null, figureFileBig=null, tableContent=
Transformer层数特征抽象重点
1—4

局部短语关联:“G3次”与“上海→北京西”;

“身份证”与两个证件号的对应关系;

“学生票”“儿童票”“成人票价”的近邻关联。

5—8

跨句逻辑关联:“购买了2张…但按成人票价出”的矛盾关系;

“联系电话”与“199XXXX8765”的绑定;

“以前可以正常购买”与当前异常的对比。

9—12

整体意图:“咨询学生票购票异常原因,并提供身份信息和联系方式”;

核心实体:身份证号、手机号、车次、时间、上下车站等关键信息的全局定位。

), ArticleFig(id=1268884569223757949, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
方案名称动态规则扩展耗时/ms百万级数据处理耗时/s
正则表达式18.0
传统AC自动机20210.2
改进型AC自动机1658.7
双阶段识别算法1274.6
), ArticleFig(id=1268884569337004158, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=表4, caption=

结构化数据敏感信息识别模型验证结果

, figureFileSmall=null, figureFileBig=null, tableContent=
方案名称动态规则扩展耗时/ms百万级数据处理耗时/s
正则表达式18.0
传统AC自动机20210.2
改进型AC自动机1658.7
双阶段识别算法1274.6
), ArticleFig(id=1268884569437667455, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
模型名称敏感实体F1值/%敏感信息上下文误判率/%
原始BERT-base81.2213.22
BERT+规则后处理84.2710.21
通用BERT87.926.73
知识增强型BERT91.245.88
), ArticleFig(id=1268884569500582016, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=表5, caption=

非结构化数据敏感信息识别模型验证结果

, figureFileSmall=null, figureFileBig=null, tableContent=
模型名称敏感实体F1值/%敏感信息上下文误判率/%
原始BERT-base81.2213.22
BERT+规则后处理84.2710.21
通用BERT87.926.73
知识增强型BERT91.245.88
), ArticleFig(id=1268884569580273793, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
模型名称F1
原始BERT-base76.45
BERT+规则后处理80.12
通用BERT84.05
知识增强型BERT88.66
), ArticleFig(id=1268884569659965570, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=表6, caption=

鲁棒性测试集上模型F1值对比

, figureFileSmall=null, figureFileBig=null, tableContent=
模型名称F1
原始BERT-base76.45
BERT+规则后处理80.12
通用BERT84.05
知识增强型BERT88.66
), ArticleFig(id=1268884569731268739, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
模型名称字符准确率/%字段完整率/%模糊图像性能衰减/%推理平均耗时/ms
CRNN80.9462.41-21.7191
SATRN87.1364.63-16.24172
OCR-Transformer93.7191.12-8.32128
), ArticleFig(id=1268884569819349124, tenantId=1146029695717560320, journalId=1268266580820377661, articleId=1268884547455320084, language=CN, label=表7, caption=

模型验证结果

, figureFileSmall=null, figureFileBig=null, tableContent=
模型名称字符准确率/%字段完整率/%模糊图像性能衰减/%推理平均耗时/ms
CRNN80.9462.41-21.7191
SATRN87.1364.63-16.24172
OCR-Transformer93.7191.12-8.32128
)], attaches=null, journal=Journal(id=1268263582425694265, delFlag=0, nameCn=中国铁道科学, nameEn=China Railway Science, nameHistory1=null, nameHistory2=null, issn=1001-4632, eissn=null, cn=11-2480/U, coden=null, periodic=1, language=CN, oaType=null, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=SSf6OdgUxD8PAfXgcRZgdw==, journalPrice=null, startedYear=null, abbrevIsoEn=China Railway Science, journalRemark=null, publicationField=null, createdTime=1780307201251, updatedTime=1780308512476, createdBy=18614031015, updatedBy=13701087609, firstLetterCn=C, firstLetterEn=C, subjectCode=Engineering, subjectName=null, subjectCodeEn=Engineering, subjectNameEn=null, picCn=SSf6OdgUxD8PAfXgcRZgdw==, picEn=hwTX4BvIZYX9jqu505HCNA==, jcr=null, cjcr=null, exts=[JournalExt(id=1268269082185991091, language=CN, name=中国铁道科学, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=, createdTime=1780308512493, updatedTime=1780308512493, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=https://zgtk.chinajournal.net.cn/EditorE3N/index.aspx?t=1, submissionEditorUrl=https://zgtk.chinajournal.net.cn/EditorE3N/index.aspx?t=3, submissionReviewUrl=https://zgtk.chinajournal.net.cn/EditorE3N/index.aspx?t=2, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""}), JournalExt(id=1268269082236322740, language=EN, name=China Railway Science, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=, createdTime=1780308512505, updatedTime=1780308512505, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=https://zgtk.chinajournal.net.cn/EditorE3N/index.aspx?t=1, submissionEditorUrl=https://zgtk.chinajournal.net.cn/EditorE3N/index.aspx?t=3, submissionReviewUrl=https://zgtk.chinajournal.net.cn/EditorE3N/index.aspx?t=2, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1268266580820377661, websiteList=[Website(id=1268603530098148336, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1268266580820377661, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/zgtdkx/CN, language=CN, createTime=1780388251090, createBy=18614031015, updateTime=1780388320535, updateBy=18614031015, name=中国铁道科学-中文, tplId=1146099689490845704, title=中国铁道科学, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1268603928955486234, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=articleTextType, value=kx, createTime=1780388346184, updateTime=1780388346184, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928934514711, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=banner, value=null, createTime=1780388346179, updateTime=1780388346179, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928972263453, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=grayFlag, value=0, createTime=1780388346188, updateTime=1780388346188, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928926126102, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=logo, value=https://castjournals.cast.org.cn/joweb/zgtdkx/CN/file/pic?fileId=x1EkISp0xx03nZZYvnfO1Q==, createTime=1780388346177, updateTime=1780388346177, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928984846367, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=minRunFlag, value=0, createTime=1780388346191, updateTime=1780388346191, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928947097625, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/zgtdkx/CN/file/pic, createTime=1780388346182, updateTime=1780388346182, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928976457758, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=silenceFlag, value=0, createTime=1780388346189, updateTime=1780388346189, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928938709016, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1780388346180, updateTime=1780388346180, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928959680539, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=themeColor, value=null, createTime=1780388346185, updateTime=1780388346185, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603928963874844, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530098148336, code=themeStyle, value=null, createTime=1780388346186, updateTime=1780388346186, creator=18614031015, updator=18614031015)]), Website(id=1268603530697933815, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1268266580820377661, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/zgtdkx/EN, language=EN, createTime=1780388251232, createBy=18614031015, updateTime=1780388316834, updateBy=18614031015, name=中国铁道科学-英文, tplId=1146101810881728533, title=China Railway Science, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1268603954360385576, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=articleTextType, value=kx, createTime=1780388352241, updateTime=1780388352241, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954339414053, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=banner, value=null, createTime=1780388352236, updateTime=1780388352236, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954381357099, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=grayFlag, value=0, createTime=1780388352246, updateTime=1780388352246, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954331025444, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=logo, value=https://castjournals.cast.org.cn/joweb/zgtdkx/EN/file/pic?fileId=x1EkISp0xx03nZZYvnfO1Q==, createTime=1780388352234, updateTime=1780388352234, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954398134317, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=minRunFlag, value=0, createTime=1780388352250, updateTime=1780388352250, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954356191271, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/zgtdkx/EN/file/pic, createTime=1780388352240, updateTime=1780388352240, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954389745708, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=silenceFlag, value=0, createTime=1780388352248, updateTime=1780388352248, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954347802662, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_en_623/, createTime=1780388352238, updateTime=1780388352238, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954368774185, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=themeColor, value=null, createTime=1780388352243, updateTime=1780388352243, creator=18614031015, updator=18614031015), WebsiteProps(id=1268603954377162794, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1268603530697933815, code=themeStyle, value=null, createTime=1780388352245, updateTime=1780388352245, creator=18614031015, updator=18614031015)])], journalTitle=中国铁道科学, weixinUrl=null, journalUrl=https://zgtk.publish.founderss.cn/, iacademicId=null, status=1, seqNo=null, journalTitleEn=China Railway Science, journalPhotoCn=SSf6OdgUxD8PAfXgcRZgdw==, journalPhotoEn=hwTX4BvIZYX9jqu505HCNA==, journalFirstLetter=C, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/zgtdkx/CN/10.3969/j.issn.1001-4632.2026.02.20, detailUrlEn=https://castjournals.cast.org.cn/joweb/zgtdkx/EN/10.3969/j.issn.1001-4632.2026.02.20, pdfUrlCn=https://castjournals.cast.org.cn/joweb/zgtdkx/CN/PDF/10.3969/j.issn.1001-4632.2026.02.20, pdfUrlEn=https://castjournals.cast.org.cn/joweb/zgtdkx/EN/PDF/10.3969/j.issn.1001-4632.2026.02.20, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
基于数据知识库的铁路客票敏感数据智能识别技术应用研究
收藏切换
PDF下载
郝晓培 , 阎志远 , 张军锋 , 李雯 , 刘相坤 , 石瑞君
中国铁道科学 | 2026,47(2): 232-243
收起
收藏切换
中国铁道科学 | 2026, 47(2): 232-243
基于数据知识库的铁路客票敏感数据智能识别技术应用研究
全屏
郝晓培 , 阎志远, 张军锋, 李雯 , 刘相坤, 石瑞君
作者信息
  • 1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081
  • 郝晓培(1990—),男,河南林州人,助理研究员。E-mail:

通讯作者:

李雯(1987—),男,湖南邵阳人,副研究员。E-mail:
Research on the Application of Intelligent Recognition Technology for Sensitive Railway Ticket Data Based on Data Knowledge Base
Xiaopei HAO , Zhiyuan YAN, Junfeng ZHANG, Wen LI , Xiangkun LIU, Ruijun SHI
Affiliations
  • 1.Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing100081, China
出版时间: 2026-03-01 doi: 10.3969/j.issn.1001-4632.2026.02.20
文章导航
收藏切换

为应对铁路客运数据规模激增衍生的数据安全风险,实现敏感信息智能识别与动态防护,提出基于数据知识库的铁路客票敏感数据智能识别技术。通过构建“法律法规—行业标准—企业规范”3级知识库,结合铁路客票历史数据,设计多层次敏感数据智能识别算法,实现对多模态数据中敏感信息的高效精准识别。在此基础上引入图技术,构建数据资产及敏感数据血缘关系图谱,依据数据间流转拓扑关系,完成敏感信息标签在相关数据节点间的高效传播。结果表明:所提技术在结构化数据处理方面,可实现约21.7万条 · s-1的敏感信息识别效率,约为传统方案的2倍;在非结构化数据处理方面,通过领域知识图谱注入,将敏感实体识别的F1值提升至91.24%,上下文误判率下降至5.88%;多媒体图片文本提取及敏感信息识别准确率达93.71%。该技术可显著提升铁路客票敏感数据识别的准确性与处理效率。

敏感数据  /  知识库  /  铁路客票  /  智能识别  /  标签传播  /  血缘关系图谱

To address the data security risks arising from the explosive growth of railway passenger transport data, the core lies in achieving intelligent identification and dynamic protection of sensitive information. Then, an intelligent identification technology for sensitive data in railway passenger tickets based on data knowledge base is proposed. Firstly, a three-level knowledge base of "laws and regulations-industry standards-enterprise norms" is constructed. Secondly, combined with historical railway passenger ticket data, a multi-level intelligent identification algorithm for sensitive data is designed, thereby efficiently and accurately identifying sensitive information in multi-modal data. On this basis, the graph technology is finally introduced to construct a data asset and sensitive data lineage graph, and based on the topological relationship of data flow, the efficient propagation of sensitive information labels among related data nodes is achieved. The results show that the sensitive information identification efficiency of the proposed technology reaches about 217 000 messages per second in structured data processing, which is almost twice as high as the traditional solution. In unstructured data processing, through domain knowledge graphs injection, the F1 value of sensitive entity recognition is increased to 91.24%, and the context misjudgment rate is reduced to 5.88%. The accuracy of text extraction and sensitive information recognition of multimedia images reaches 93.71%. This technology can significantly improve the accuracy and processing efficiency of sensitive data identification in railway passenger tickets.

Sensitive data  /  Knowledge base  /  Railway ticket  /  Intelligent recognition  /  Label propagation  /  Lineage graph
郝晓培, 阎志远, 张军锋, 李雯, 刘相坤, 石瑞君. 基于数据知识库的铁路客票敏感数据智能识别技术应用研究. 中国铁道科学, 2026 , 47 (2) : 232 -243 . DOI: 10.3969/j.issn.1001-4632.2026.02.20
Xiaopei HAO, Zhiyuan YAN, Junfeng ZHANG, Wen LI, Xiangkun LIU, Ruijun SHI. Research on the Application of Intelligent Recognition Technology for Sensitive Railway Ticket Data Based on Data Knowledge Base[J]. China Railway Science, 2026 , 47 (2) : 232 -243 . DOI: 10.3969/j.issn.1001-4632.2026.02.20
铁路客票系统作为国家关键信息基础设施1,在为旅客提供出行服务的同时,也承担着保护海量旅客数据安全的责任。近年来铁路客运发送量持续攀升,2025年全国铁路旅客发送量超过45亿,日售票量超过2 000万,客票数据规模也随之急速膨胀,涵盖旅客的个人基本信息、出行信息、支付信息等。这些数据对铁路优化运营、提升服务质量、产品设计及推进供给侧改革意义重大,且随着我国数据要素政策的不断完善,其有效利用与流通可提升运输效率、促进多式联运与跨产业协同发展。然而,铁路客票数据中包含大量敏感信息,一旦泄露不仅会导致旅客个人信息被滥用,造成严重的隐私侵害,还可能引发社会公众对铁路部门的信任危机,对铁路声誉产生负面影响,从而影响铁路客运的可持续发展。当前各国纷纷加强了对数据安全和隐私保护的立法和监管力度,欧盟出台的《通用数据保护条例》对数据管理者的主体责任、安全措施及泄露响应等方面设置了严格的标准;我国也相继颁布《网络安全法》《数据安全法》《个人信息保护法》等一系列法律法规,明确了数据处理中的安全责任与合规要求,对铁路客票系统敏感数据的保护提出了更高的要求。在此背景下,高效准确地识别客票系统中文本及多媒体类数据的敏感信息,是实现数据资产化、保障敏感信息安全脱敏、实施差异化分级管控以及在数据要素流通中释放价值的坚实基础。
在文本类数据敏感信息识别方面,已有较多相关研究。符泽凡等2提出了基于双向编码器表示模型(Bidirectional Encoder Representations from Transformers,BERT),结合变体字还原算法的网站敏感信息识别方法,为网页的内容敏感信息识别提供了新思路。李扬等3通过定义色情、暴力、违禁、邪教、反动等5类共2 639个敏感关键词,构建了敏感关键词与情感极性协同分析的敏感信息识别方法。Li等4从敏感信息特征中提取出支持向量,对支持向量机(Support Vector Machine,SVM)进行训练,提高了网络敏感信息识别的检测速度与准确性。Xu等5-6将主题聚类融入敏感信息识别中,通过构建基于加权潜在狄利克雷分布的网络敏感信息主题识别方法,将所得主题信息与经过双向循环神经网络(Bidirectional Recurrent Neural Network,Bi-RNN)表征后的文本特征向量进行融合,利用注意力机制进行权重计算实现敏感信息识别。
图像中敏感文字信息检测的研究方法主要可分为2类:基于传统视觉特征的敏感信息检测和基于图像文本特征的敏感信息检测。Krasser等7重点考虑图像的边缘方向一致性矢量、尺度不变特征变换特征以及颜色直方图,将这些视觉特征作为线性支持向量机分类器的判别标准。Wang等8将线分布云与最大稳定极值区域算法相结合以提取文本区域,该方法在弱光条件下仍具有较好的检测效果。随着光学字符识别(Optical Character Recognition,OCR)技术的不断发展,CRNN算法9和基于注意力机制的文本识别算法已成为当前主流。CRNN算法先通过卷积神经网络提取图像的空间特征,再利用循环神经网络将空间特征转化为序列特征,最终输出对应的文本内容,从而提升识别性能;基于注意力机制的识别算法通过引入注意力权重矩阵,根据输入序列的特征动态计算每个元素的重要性,实现更精准的文字识别。
本文提出基于数据知识库的铁路客票敏感数据智能识别技术,基于铁路领域行业词典以及数据安全法律法规,通过法律条文语义解析,并与铁路客运业务规则形式化映射,形成“法律法规—行业标准—企业规范”3层关联的知识库。针对不同类型的数据,综合考虑效率及准确率,设计多模态敏感信息混合识别引擎,针对结构化数据设计正则表达式与改进型Aho-Corasick自动机的双模式匹配算法,对非结构化文本采用知识增强型BERT模型实现上下文敏感语义感知,多媒体图像数据通过OCR-Transform模型提取文本信息从而实现敏感信息识别,最后基于数据血缘关系的敏感数据标签传播,形成敏感数据在跨系统流转中的拓扑关系与权限演变路径。从而解决铁路数据安全治理中合规落地难、识别精度低、跨域溯源能力弱等问题。
为满足铁路客运多模态数据在跨场景流转中的安全合规要求,基于“知识驱动—智能识别—血缘关系”识别体系,通过构建动态铁路知识库,研发“规则+AI”协同多层次敏感信息智能识别模型体系,并结合数据血缘关系,实现敏感信息从智能识别到流转、处理的全链路高效传播。铁路客运敏感信息识别体系框架如图1所示。首先,利用本体建模与语义解析技术,将国家法律法规、行业规范及企业内部治理标准转化为结构化知识库,该知识库包含敏感信息种类、数据分类分级、铁路客运术语等核心逻辑,通过“双驱动”(定时+手动)更新机制实现知识库的动态更新,以支持合规信息的多维度检索。其次,铁路客运数据包含结构化数据、非结构化文本及多媒体图片等多模态类型,需针对不同数据设计相应的敏感信息识别模型。其中,结构化数据采用正则表达式与预定义敏感字段库,实现毫秒级高效精准识别;非结构化文本通过BERT模型捕捉上下文隐含的敏感信息;多媒体图片则结合目标检测与OCR文字提取技术,自动识别并提取其中的文本信息。最后,通过自动化采集并解析数据在采集、存储、整合、使用与呈现、分析与应用、归档和销毁等全生命周期各环节的元数据与操作日志,构建敏感数据流转关系图谱,并基于知识库的规则信息实现敏感数据标签的动态传播。
综上,通过建立知识库与多模态智能识别引擎的动态协同机制,可有效弥补传统规则引擎的覆盖盲区。依托数据血缘分析技术实现客票敏感数据跨系统的数据流转拓扑,将原本分散的敏感数据管理整合为有机管理网络,形成1套兼顾安全与效率的标准化敏感信息识别框架。
铁路客票数据知识库以各类法律、标准及规范为支撑构建。其中,法律法规涵盖《网络安全法》《数据安全法》《个人信息保护法》;行业标准包含《交通运输数据安全风险评估指南》;企业规范包括《铁路旅客运输规程》《铁路个人信息保护标准》等。利用DeepSeek大模型对相关文件进行实体抽取,如“重要数据”“个人敏感信息”“出行记录”等关键词并进行关系标注,将条款语义结构化处理,形成可供模型直接读取的知识库列表。通过 “人工采集+实时监测爬取”双驱动更新机制,自动获取法律法规发布平台的动态信息,并经合规专家审核后更新入库,确保知识库的时效性与准确性。其技术框架如图2所示。
相关要求主要以非结构化文本数据为主,首先,对文档材料进行预处理,划分为若干具有明确语义边界的语义块。随后,引入具有强大语言理解与推理能力的DeepSeek大语言模型,以原始文档的语义块及定制的提示词模板作为输入,按照JSON格式输出提取的核心信息,并将其解析为结构化数据,完成语义信息的结构化存储。最后,对结构化语义块进行逐条解析,采用基于向量检索与LLM一致性判定的增量机制,实现知识库的动态扩展。该机制的核心流程包括3步:①语义提取,基于DeepSeek与定制提示词,从语义块中提取JSON格式的目标信息(实体或关系);②向量表征,对提取结果进行Embedding操作,得到向量表示;③双重校验,通过向量匹配(余弦相似度)初筛重复项,再经LLM一致性判定语进行确认,确保仅新增信息并入全局集合。
基于上述流程,该机制分别实现了实体与关系的动态维护。在实体维护中,初始化阶段对首个文档提取初始实体集,经Embedding后写入向量数据库作为全量实体集;后续新增文档按相同方式提取增量实体集,经双重校验后仅将新增实体并入,实现动态扩展。在关系维护中,将JSON格式的语义块与全局实体集共同作为DeepSeek输入,结合定制提示词完成三元组抽取。初始化阶段生成的三元组作为全局关系基础写入向量数据库;后续每个语义块提取的增量局部三元组,经双重校验后自动归并重复关系、扩展关系类型体系,最终写入图数据库,形成合规知识图谱。
铁路敏感信息知识库示例如图3所示。以身份证号为例,基于知识库可以确定其属于敏感个人信息,从而完整展现身份证号在客票系统敏感个人信息保护语境下的处理逻辑、识别规则等。基于知识库生成的敏感信息实体见表1,同时依照合规要求,按相关实体在法律法规中出现的频次及重要程度动态维护实体权重。
多模态敏感信息混合识别引擎主要识别客票系统结构化数据、非结构化文本和多媒体图像数据中的敏感信息。针对不同模态的数据特点,设计相应的识别算法,该引擎的数据处理链路如图4所示。分为模型训练和敏感信息识别2个阶段,在模型训练阶段,先对客票系统内数据进行预处理及分类,调用知识库获取敏感信息实体(包括:内容、样例、识别规则等),再将预处理后的数据与敏感信息实体信息输入相应的识别算法进行训练,得到针对每种敏感信息实体在不同数据类型上的识别模型,并将模型参数存入数据库,供实时识别阶段调用;在敏感信息识别阶段,引擎加载训练好的模型,对新产生的多模态数据进行实时检测,输出敏感信息检测结果。
目前客票系统包括注册、常用联系人及行程轨迹等敏感信息,这些存储在数据库和业务日志中,通过正则表达以及改进型AC自动机算法进行敏感信息识别。
1)正则表达式
正则表达式通过普通字符与元字符组合定义文本匹配规则,可实现身份证号、手机号、电子邮箱等结构化敏感字段的识别,其优势在于规则定义灵活、无须复杂模型训练,但也存在计算效率随数据规模非线性下降、语义泛化能力不足及维护成本高等局限,更适用于小规模、规则明确的场景。典型敏感信息字段及正则表达式见表2
2)改进型AC自动机算法
为提升敏感信息识别效率,将知识库的敏感数据模式串预载入改进型AC自动机的Trie树11。该树采用分层结构,新增模式串时仅更新对应层级节点,通过路径标记记录分支变化,利用字符级哈希表建立子节点映射,使模式匹配时间复杂度降至O(1)。例如,当匹配身份证号前缀“320”时,传统AC自动机需多次回溯失败指针(沿失败指针回溯至“0”→“2”→“3”),而改进型算法可基于已匹配的前缀长度,直接跳转至对应子节点,从而减少冗余计算。
3)算法融合
为兼顾识别效率与人工维护成本,采用正则表达式与改进型AC自动机融合的识别机制。其总体框架如图5所示。针对固定格式敏感字段,先利用正则表达式快速过滤无效数据,并将初步筛选之后的数据输入改进型AC自动机模型,再利用哈希加速状态转移与最长前缀跳跃策略,匹配多种敏感数据模式串,从而实现结构化数据的敏感信息识别。基于铁路客运结构化数据的测试表明,该机制在千万级数据中,每万条数据的匹配耗时仅为35 ms,其效率显著优于传统正则匹配,且支持敏感字段库的动态扩展。
针对铁路非结构化文本中隐含敏感信息识别问题,设计了识别算法,该算法先将非结构化文本数据进行短文本提取,利用知识增强的BERT短文本敏感信息分类算法12,再将知识库内的客运领域知识与BERT融合,采用知识适配器将铁路知识库的敏感信息字典库知识集成到BERT底层,从而实现知识增强。该算法主要包含以下2个核心模块。
1)BERT基础模型
首先,基于知识库的信息对BERT模型进行预训练,通过自注意力机制捕捉文本信息中各类实体的共现关系,如:使“身份证”的向量与“证件”“敏感个人信息”等词的向量在高维空间中距离更近,从而让模型学习到“身份证”属于“证件”及“敏感个人信息”的语义关联。非结构化数据敏感信息识别模型架构如图6所示。以一段待检测的12306客服对话文本为例,系统首先对其进行3层嵌入处理,将离散的文本token转化为连续的向量表示,以保留词义、位置和片段信息。其中:TEmbedding将每个词映射为向量,捕捉通用语义;SEmbedding用于区分不同文本片段(图6示例中为单段文本,故向量全部用0表示);PEmbedding用于编码词的位置信息,以建模token顺序关系的位置向量表,其计算式为
Pp,2i=sin p104(2i/dmodel)
Pp,2i+1=cosp104(2i/dmodel)
式中:p为token在序列中的位置索引,取值为01,,N-1i为维度索引;Pp,2iPp,2i+1分别为第p个token在偶数维度和奇数维度的位置编码值;dmodel为隐藏维度。
经过3种嵌入的叠加,得到初始特征矩阵E
E=TEmbed+SEmbed+PEmbed
式中:TEmbed为Token嵌入矩阵,用于将每Token映射为向量表征以捕捉通用语义;SEmbed为Segment嵌入矩阵,用于区分不同文本片段;PEmbed为位置嵌入矩阵,用于表示token在序列中的位置。
接着,通过多层Transformer中的自注意力机制在文本处理时动态计算每个词与其他词之间的关联程度,从而获取长距离依赖关系,通过Transformer的多层堆叠,逐层抽象出文本的语义特征。在每一层Transformer中,对上一层的输出数据进行编码更新,表达式为
Hl=fTransformer(Hl-1)        l=1, 2, 3, , L
式中:Hl为第l层的输出数据;Hl-1为第l-1层的输出数据;L为总层数。
Hl-1进行线性变换生成查询向量Q、键向量K和值向量V,计算其注意力得分AQKV)为
A(Q,K,V)=fSoftmaxQ ·   KTdkV
式中:dk为缩放因子;fSoftmax(·)为归一化函数。
并行执行12次上述过程,将拼接结果经线性变换后,依次通过残差连接与层归一化处理完成特征融合。其组合计算式为
Ao=L(M(Q,K,V)+Hl-1)
其中,
L(x)=αx-μδ2+ε+β
式中:MQKV)为多头注意力的输出;L(·)为层归一化操作;μδ分别为特征维的均值和标准差;αβ分别为特征维的缩放参数和平移参数;ε为数据稳定项,一般为极小常数;为逐元素乘积运算。
通过上述多层特征编码与融合,最终实现语义特征从局部关联向全局意图的识别,识别结果见表3
将上述处理后的特征输入前馈神经网络进一步提取特征,该网络由2层线性变换与ReLU激活函数构成,对其输出结果执行残差连接与层归一化操作,最终输出为
Hl=LfFFN(Ao)+Ao
其中,
fFFN(Ao)=fSoftmax(0,AoW1+b1)W2+b2
式中:fFFNAo)为FFN的输出;W1W2分别为FFN第1层和第2层线性变换的权重矩阵;b1b2分别为FFN第1层和第2层线性变换的偏置项;fSoftmax为激活函数。
最终,取序列中第1个token对应的输出(即[CLS])作为整个文本的特征向量,记为Hcl
2)知识适配器
为增强模型对铁路领域敏感信息的识别能力,设计知识适配器,通过知识库特征提取与注意力机制融合,将铁路领域及合规知识注入BERT基础模型,实现领域知识与文本语义的深度结合。首先,从数据知识库中提取敏感信息特征向量KinitRdk,计算敏感词表中词的加权向量和Kw,其计算式为
Kw=wWe(w)·   g(w)
式中:W为敏感词集合;ew)为敏感词向量;gw)为词w的敏感程度。
接着,将敏感模式库、上下文规则等通过特定编码方式转化为向量Kr,其与Kw共同构成知识特征向量,最后获得融合敏感词、规则等的知识特征K',其表达式为
K'=λ1Kinit+λ2Kw+λ3Kr
其中,
λ1+λ2+λ3=1
式中:λ1λ2λ3为可学习加权系数,分别表示原始敏感特征、敏感词特征、规则模式特征的重要程度。
通过注意力机制将领域知识特征向量与待检测文本进行融合,逐位置自适应权重φ
φ=fSoftmax(W3tanh(W4Z+W5K'+b))
式中:Z为待检测文本的输出特征;W3为将隐藏层映射至φ的权重矩阵;W4W5为将ZK'映射至隐藏层的权重矩阵;b为偏置项。
权重φ反映了文本各部分与知识库知识的相关程度,将相应的知识特征向量输入BERT基础模型,最终输出为
Hl'=Hl+ηWk·   K'+γ(φZ+(1-φ)Wt  K')
式中:Hl'为BERT基础模型的输出;η为全局知识融合强度系数;γ为全局文本特征保留系数;WkWt为训练权重。
该算法主要用于从客运作业过程中产生的多媒体图像数据中提取涉及敏感信息的文本数据以及违规词汇13-14。所用多媒体图像数据包括旅客护照图片、车票照片、证件扫描件、文档截图、数据查询结果截图等。多媒体图像数据敏感信息识别框架如图7所示。该算法先通过对比度增强、数据增强、图像归一化等方式进行图像预处理,再经过特征提取、序列建模、文本预测和敏感信息识别4个步骤实现敏感信息识别。具体实现路径如下。
1)特征提取
该算法以OCR-Transformer为核心实现图像文本的精准提取,利用卷积神经网络(CNN)对输入图像进行处理,提取文本笔画、轮廓等局部特征。针对多媒体图像数据中文本因拍摄角度不佳、光线条件差或文档磨损导致的模糊问题,通过CNN的多层卷积操作捕捉文本基本结构特征,并将其转化为适合后续处理的特征向量序列。主要步骤如下。
(1)输入特征IRH'×W'×CH'为特征图高度;W'为特征图宽度;C为RGB通道数,取值为3)经CNN完成特征提取,CNN输出特征F
mi=fRELU(ξimi-1+qi)    i=1, 2, , nF=mn
式中:fRELU(·)为逐元素非线性激活运算;ξi为第i层卷积核参数;qi为第i层偏置项;为卷积操作。
(2)对输出特征FRH×W×DH为特征图高度;W为特征图宽度;D为特征通道数,取值为512)执行空间注意力加权,强化文本关键区域特征,加权后的特征Fattn
Fattn=σ(WaF+ba)F
式中:σ(·)为Sigmoid函数;Wa为空间注意力卷积核参数;ba为空间注意力权重矩阵的偏置项。
最终生成特征向量序列X=x1,,xPRL×DP为序列长度)。
2)序列建模
通过Transformer编码器及其自注意力机制,捕捉特征提取后特征向量间的长距离依赖关系,以提炼文本语义与结构信息,从而有效缓解拍摄导致的文字弯曲、倾斜及不规则字体等问题。模型可依据特征向量相关性动态分配注意力权重,从而聚焦文本关键区域。Transformer编码器采用多头自注意力机制15,其核心计算为
fMHSA(X)=OConcat(h1, , hn)ρ
式中:fMHSA(·)为多头自注意力运算函数;hii个注意力头;OConcat(·)为拼接函数,将多个注意力头进行拼接;ρ为输出矩阵的参数。
通过缩放点积与Softmax函数得到全局依赖的注意力权重,经残差连接与层归一化处理后的输出为
X'=LX+ODropoutfMHSA(X)
式中:X'为经过多头自注意力和层归一化后的特征;ODropout(·)为随机失活函数。
输出的最终特征Xenc
Xenc=L(X'+ODropout(fFFN(X')))
3)文本预测
解码器基于编码器输出的特征向量序列进行逐字符迭代预测生成最终文本序列,同时引入束搜索算法动态维护多条候选路径(束宽为k)并选择全局最优序列,从而提升预测结果的准确性。解码器采用自回归生成方式,计算可得第t步的概率分布Pyt|Y<tXenc)为
P(yt|Y<t,Xenc)=fSoftmax(Wy·   D(Y<t,Xenc)+by)
式中:yt为第t步待预测字符;Wy为可学习的权重矩阵;DY<tXenc)为解码器模块输出的特征向量;Y<t为已生成的字符序列;Xenc为编码器输出的整个特征向量序列;by为可学习的偏置向量。
束搜索算法选择Top-K候选路径,最优序列Y*的计算式为
Y*=argmaxYVTt=1TlnP(yt|Y<t,Xenc)
式中:V为字符集,T为预设的最大序列长度。
4)敏感信息检测
根据文本类型,将提取到的信息分别输入结构化与非结构化敏感信息识别模型,进行敏感信息检测。
为确保敏感数据标签传播的可审计性和准确性,需具备追溯数据流转路径的能力。当某个数据节点存在潜在的敏感数据泄露风险,或需核验其敏感数据标签的合理性时,可依托数据血缘关系追踪到数据原始来源及其全流程处理环节。通过建立详尽的数据血缘日志,记录数据在各个阶段的操作和流转情况,从而实现对数据流转路径的精确追溯,可有效防范敏感数据泄露。
1)数据采集与预处理
针对铁路客运相关系统涉及的多源异构数据源,包括Sybase和Postgres等关系型数据库,HBase和Redis等非关系型数据库,HDFS分布式文件系统16,Gbase和SybaseIQ数据仓库以及ETL工具和Flink等数据处理组件,设计了对应的元数据采集与管理插件。铁路客运系统数据流转关系如图8所示。对采集到的数据开展清洗、去重、标准化等预处理,剔除无效、错误及重复数据,对多源异构、格式不一的数据进行统一处理,以便后续分析。
2)血缘关系建模与存储
采用资源描述框架(Resource Description Framework,RDF)对数据血缘关系进行建模,以三元组(主语、谓语、宾语)的形式表示数据间的关系,进行数据血缘关系的构建17-18。例如,生产系统“旅客互联网订单”经ETL同步至湖仓一体存储系统中的“旅客互联网订单表”,该表经聚合运算后,进一步生成旅客画像指标表。通过RDF三元组表示,能够清晰展示相关数据如何在不同系统、不同的存储介质间流转和转换。数据血缘关系图谱如图9所示。
为保证血缘关系图谱的高效存储和查询,通过ETL任务解析、存储过程解析、SQL解析等生成三元组信息,存入开源图数据库,采用图查询语言对数据血缘关系进行查询和分析。此外,针对复杂的数据流转方式,可通过NoSQL数据库(如HBase)存储辅助性的元数据信息,进一步提升系统对复杂血缘链路的描述与检索能力。
在铁路客运系统的数据血缘图谱上采用广度优先搜索(Breadth-First Search,BFS)算法实现敏感数据标签的传播。从标记为敏感数据的起始节点开始,逐层遍历其下游节点,并根据传播规则为每个下游节点标注敏感数据标签。例如,从包含个人身份信息的原始数据表节点出发,运用BFS算法遍历其经由ETL过程生成的所有下游数据表节点,依据传播规则为这些节点添加相应的敏感数据标签,从而确保乘客敏感信息在铁路客运全流程中得到有效保护。
为提高标签传播速度,采用并行计算技术将标签传播任务分配至多个计算节点同步执行,同时引入缓存机制存储已计算的敏感数据标签信息,避免重复计算。该算法不仅能够有效管理和保护乘客的敏感信息,还能确保整个数据生命周期内的透明度和可追溯性,从而提升敏感信息识别体系的整体安全性和可靠性。
为验证多模态敏感信息识别效率,设计如下3个试验场景。
场景1:抽取客票系统部分结构化数据,验证敏感信息识别效率能否支撑客票系统日均1 000万售票产生的数据量。
场景2:抽取客服对话以及业务文档,对非结构化文本信息进行验证,评估模型对语义敏感信息的识别能力。
场景3:抽取护照信息以及数据库查询结果的截图信息,验证OCR-Transformer模型的文本识别能力,并将识别出的文本分别输入对应的敏感信息识别模型进行检测,具体流程如图10所示。
1)硬件环境
模型在信创环境下进行验证,采用2台服务器,每台为24核ARM架构CPU,256 G内存;500 GB SSD数据盘,4张寒武纪370显卡,所有算法均基于麒麟操作系统、Python 3.8环境运行。
2)数据集选择
场景1选取客票系统互联网售票数据,包含注册用户、常用联系人、互联网订单、电子客票等超过200个结构化字段,并从中随机抽取1.8亿条数据作为评估样本。场景2随机抽取1.8万条铁路客服对话及500份业务文档,按8∶2的比例划分为训练集与测试集;测试阶段随机选取1.44万条标注语料及400份业务文档,为保障模型训练精度,统一将非结构化文档字符集转换为UTF-8编码。场景3随机抽取护照图像5 000张、数据库查询结果截图500张,同时生成部分包含弯曲、倾斜或不规则字体的图片样本,共同用于模型的训练与验证。
在相同的硬件环境下分别利用正则表达式匹配、传统AC自动机、改进型AC自动机以及双阶段匹配算法对200个字段的1.8亿条结构化数据进行敏感信息识别,其验证结果见表4。由表4可知,所提双阶段识别算法处理百万级数据耗时仅4.6 s,对应检测速度可达约21.7万条 ∙ s-1,约为改进型AC自动机方案的2倍,同时具有动态扩展能力,其动态规则扩展耗时缩短至127 ms,能够支持铁路业务高频变化需求,实现无中断的敏感信息规则注入。
针对非结构化文本,验证知识增强型BERT模型在上下文敏感场景中的识别能力。从铁路客服对话语料库中随机抽取1.5万条,标注其中的敏感实体,将知识增强型BERT模型与原始BERT-base模型、BERT+规则后处理、行业通用BERT模型进行对比,试验结果见表5。由表5可知,知识增强型BERT模型的敏感实体F1值达91.24%,上下文误判率降至5.88%,显著优于其他模型。
为验证知识增强型BERT模型在实际铁路客运非结构化文本中的泛化能力,构建1个包含口语化表达、同义改写、跨句上下文依赖及模板/自由文本混合等多种干扰因素的鲁棒性测试集,基于该数据集的模型F1值对比见表6。由表6可知,与其他模型相比,知识增强型BERT模型的F1值最高,泛化能力更强。
OCR-Transformer采用Encoder-Decoder架构,其中Encoder由12层ResNet和6层Transformer组成,Decoder则为12层Transformer,并在注意力机制中引入空间位置编码。训练阶段采用Adam优化器,设置其学习率为1e-4,批量大小为16,共训练20个轮次。为验证模型在复杂场景下的效果,选取包含弯曲、倾斜及不规则字体的文本进行敏感信息识别,并将OCR-Transformer与基于卷积循环的文本识别网络CRNN19及基于自注意力的文本识别网络SATRN进行对比。验证结果见表7。由表7可知:所提OCR-Transformer文本检测模型的字符准确率可达93.71%,模糊图像性能衰减为8.32%;在推理效率方面,OCR-Transformer推理平均耗时为128 ms,虽高于轻量级的CRNN,但相比结构更复杂的SATRN具有明显速度优势,在精度与效率之间取得了良好平衡。
本文构建了基于知识库的多模态敏感信息协同识别体系,提出基于数据知识库的铁路客票敏感数据智能识别技术。针对结构化数据,研发了正则表达式与改进型AC自动机的双模式匹配算法,可实现约21.7万条 · s-1的敏感信息识别效率;针对非结构化数据,设计了知识增强型BERT模型提升语义理解能力通过领域知识图谱注入,将敏感实体识别的F1值提升至91.24%,上下文误判率降至5.88%;针对多媒体文件,开发了OCR-Transformer的敏感信息特征提取及识别技术,图片文本提取及敏感信息识别准确率达93.71%。
通过解析数据资源元数据及数据处理流程,构建数据血缘图谱,融合图论与知识库构建技术,集成多源法规标准和行业规范,形成结构化规则引擎,从而实现数据流动合规性的智能研判。结合“识别—脱敏”联动机制与知识库驱动的语义推理,建立覆盖数据全生命周期的合规决策支持系统,为高并发、多模态场景下的数据合规治理提供了可落地的技术方案,对数字经济时代的数据资产运营与要素市场建设具有重要的实践指导意义。
参考文献 引证文献
排序方式:
[1]
单杏花,张志强,宁斐,.中国铁路电子客票关键技术应用与系统实现[J].中国铁道科学202142 (5):162-173.
SHAN XinghuaZHANG ZhiqiangNING Feiet al. Key Technology Application and System Implementation of China Railway Electronic Ticket [J]. China Railway Science202142 (5): 162-173. in Chinese
[2]
符泽凡,姚竟发,滕桂法.基于BERT模型的网站敏感信息识别及其变体还原技术研究[J].现代电子技术202447(23):105-112.
FU ZefanYAO JingfaTENG Guifa. Research on Website Sensitive Information Identification and Variant Restoration Technology Based on BERT Model [J]. Modern Electronics Technique202447 (23): 105-112. in Chinese
[3]
李扬,潘泉,杨涛.基于短文本情感分析的敏感信息识别[J].西安交通大学学报201650(9):80-84.
LI YangPAN QuanYANG Tao. Sensitive Information Recognition Based on Short Text Sentiment Analysis [J]. Journal of Xi’an Jiaotong University201650 (9): 80-84. in Chinese
[4]
LI W PWU H YYANG J. Intelligent Recognition Algorithm for Social Network Sensitive Information Based on Classification Technology [J]. Discrete and Continuous Dynamical Systems-S2019, 12 (4/5): 1385-1398.
[5]
XU GWU XYAO Het al. Research on Topic Recognition of Network Sensitive Information Based on SW-LDA Model [J]. IEEE Access20197: 21527-21538.
[6]
XU GYU ZCHEN Zet al. Sensitive Information Topics-Based Sentiment Analysis Method for Big Data [J]. IEEE Access20197: 96177-96190.
[7]
KRASSER STANG YGOULD Jet al. Identifying Image Spam Based on Header and File Properties Using C4.5 Decision Trees and Support Vector Machine Learning [C]// 2007 IEEE SMC Information Assurance and Security Workshop. New York: IEEE, 2007: 255-261.
[8]
WANG WWU YPALAIAHNAKOTE Set al. Cloud of Line Distribution for Arbitrary Text Detection in Scene/Video/License Plate Images [C]// Advances in Multimedia Information Processing-PCM 2017. Cham: Springer International Publishing, 2018: 433-443.
[9]
SHI BBAI XYAO C. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence201739 (11): 2298-2304
[10]
汪庆,陈杰.深度包检测技术中的正则表达式匹配研究综述[J].网络安全技术与应用2024(5):28-30.
WANG QingCHEN Jie. A Review of Regular Expression Matching Research in Deep Packet Inspection Technology [J]. Network Security Technology & Application2024 (5): 28-30. in Chinese
[11]
姜海洋,李雪菲,杨晔.基于距离比较的AC自动机并行匹配算法[J].电子与信息学报202244(2):581-590.
JIANG HaiyangLI XuefeiYANG Yeet al. Distance Comparison Based Parallel Pattern Matching [J]. Journal of Electronics & Information Technology202244 (2): 581-590. in Chinese
[12]
杨虹,孟晓凯,俞华,.基于BERT模型的主设备缺陷诊断方法研究[J].电力系统保护与控制202553 (7):155-164.
YANG HongMENG XiaokaiYU Huaet al. Research on Primary Equipment Defect Diagnosis Method Based on the BERT Model [J]. Power System Protection and Control202553 (7): 155-164. in Chinese
[13]
白志程,李擎,陈鹏,.自然场景文本检测技术研究综述[J].工程科学学报202042(11):1433-1448.
BAI ZhichengLI QingCHEN Penget al. Text Detection in Natural Scenes: a Literature Review [J]. Chinese Journal of Engineering202042 (11): 1433-1448. in Chinese
[14]
KIM GHONG T, YIM M, et al. OCR-Free Document Understanding Transformer [C]// Computer Vision-ECCV 2022. Cham: Springer Nature Switzerland, 2022: 498-517.
[15]
RADFORD AKIM J WHALLACY Cet al. Learning Transferable Visual Models from Natural Language Supervision [J]. ArXiv e-Prints2021: arXiv:2103.00020 [cs. CV].
[16]
ISLAM N SRAHMAN M WJOSE Jet al. High Performance RDMA-Based Design of HDFS over InfiniBand [C]// SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. New York: IEEE, 2012.
[17]
MONDAL SMUKHERJEE N. Efficient NoSQL Graph Database for Storage and Access of Health Data [C]// Computer Communication, Networking and IoT. Singapore: Springer, 2021: 135-146.
[18]
潘晓华,金泳,高扬华,.面向复杂数据审计需求的数据血缘构建方法[J].计算机应用研究202441(1):76-82.
PAN XiaohuaJIN YongGAO Yanghuaet al. Data Lineage Construction Method for Complex Data Audit Requirements [J]. Application Research of Computers202441 (1): 76-82. in Chinese
[19]
XIE Y R. Application of CRNN and OpenGL in Intelligent Landscape Design Systems Utilizing Internet of Things, Explainable Artificial Intelligence, and Drone Technology [J]. IEEE Transactions on Consumer Electronics202571 (2): 3930-3940.
2026年第47卷第2期
PDF下载
5
1
引用本文
BibTeX
文章信息
doi: 10.3969/j.issn.1001-4632.2026.02.20
  • 接收时间:2025-06-24
  • 首发时间:2026-06-03
  • 出版时间:2026-03-01
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2025-06-24
  • 修回日期:2026-03-12
基金
作者信息
    1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京100081

通讯作者:

李雯(1987—),男,湖南邵阳人,副研究员。E-mail:
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/zgtdkx/CN/10.3969/j.issn.1001-4632.2026.02.20
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏