Article(id=1149741767722972122, tenantId=1146029695717560320, journalId=1146031787341344770, issueId=1149741761771258326, articleNumber=1003-3033(2024)02-0037-08, orderNo=null, doi=10.16265/j.cnki.issn1003-3033.2024.02.0121, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=null, receivedDate=1691942400000, receivedDateStr=2023-08-14, revisedDate=1700409600000, revisedDateStr=2023-11-20, acceptedDate=null, acceptedDateStr=null, onlineDate=1752049398594, onlineDateStr=2025-07-09, pubDate=1709049600000, pubDateStr=2024-02-28, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1752049398594, onlineIssueDateStr=2025-07-09, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1752049398594, creator=13701087609, updateTime=1752049398594, updator=13701087609, issue=Issue{id=1149741761771258326, tenantId=1146029695717560320, journalId=1146031787341344770, year='2024', volume='34', issue='2', pageStart='1', pageEnd='252', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=-1, specialIssue=0, createTime=1752049397175, creator=13701087609, updateTime=1756468934610, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1168278645379440971, tenantId=1146029695717560320, journalId=1146031787341344770, issueId=1149741761771258326, language=EN, specialIssueTitle=, coverIllustrator=, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1168278645379440972, tenantId=1146029695717560320, journalId=1146031787341344770, issueId=1149741761771258326, language=CN, specialIssueTitle=, coverIllustrator=, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=37, endPage=44, ext={EN=ArticleExt(id=1149741768171762654, articleId=1149741767722972122, tenantId=1146029695717560320, journalId=1146031787341344770, language=EN, title=Short text classification of civil aviation intelligent supervision based on character-word fusion, columnId=1149733271128420907, journalTitle=China Safety Science Journal, columnName=Safety social science and safety management, runingTitle=null, highlight=null, articleAbstract=

In order to address the inefficiencies in manually classifying and analyzing inspection records about civil aviation supervision,a dual-channel feature extraction short text classification model was proposed. The model combined data augmentation techniques and character-word vector fusion. The model aimed to tackle classification issues related to people,equipment and facilities,institutional procedures and institutional responsibilities in civil aviation supervised matters. In order to tackle the issue of class imbalance,data augmentation algorithms were employed to generate new samples by transforming the original texts,thereby balancing the sample sizes across different categories. The word vectors and character vectors were fused by combining them at the character level,resulting in character vectors that retain word-level features. These fused character vectors were then fed into TextCNN and BiLSTM for feature extraction at different dimensions. By extracting features from both local and global perspectives,this dual-channel approach aimed to capture comprehensive and effective information from the inspection records dataset in civil aviation regulatory matters. Experimental results on the civil aviation regulatory matter inspection record dataset demonstrate that the proposed model achieves an accuracy of 0.983 7 and an F1 score of 0.983 6. Compared with some existing word embedding models and character embedding models,the accuracy is improved by 0.4%. Furthermore,when compared with commonly used single-channel models,the accuracy is increased by 3%,which validates the effectiveness and comprehensiveness of the features extracted by the dual-channel model.

, correspAuthors=Yaxi XU, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Xin WANG, Zurui GAN, Yaxi XU, Ke SHI, Tao ZHENG), CN=ArticleExt(id=1149741780800811395, articleId=1149741767722972122, tenantId=1146029695717560320, journalId=1146031787341344770, language=CN, title=基于字词向量融合的民航智慧监管短文本分类, columnId=1149733271296193071, journalTitle=中国安全科学学报, columnName=安全社会科学与安全管理, runingTitle=null, highlight=null, articleAbstract=

为解决民航监管事项所产生的检查记录仅依靠人工进行分类分析导致效率低的问题,提出一种基于数据增强与字词向量融合的双通道特征提取的短文本分类模型,探讨民航监管事项的分类,包括与人、设备设施环境、制度程序和机构职责等相关问题。为解决类别不平衡问题,采用数据增强算法在原始文本上进行变换,生成新的样本,使各个类别的样本数量更加均衡。将字向量和词向量按字融合拼接,得到具有词特征信息的字向量。将字词融合的向量分别送入到文本卷积神经网络(TextCNN)和双向长短期记忆(BiLSTM)模型中进行不同维度的特征提取,从局部的角度和全局的角度分别提取特征,并在民航监管事项检查记录数据集上进行试验。结果表明:该模型准确率为0.983 7,F1值为0.983 6。与一些字嵌入模型和词嵌入模型相对比,准确率提升0.4%。和一些常用的单通道模型相比,准确率提升3%,验证了双通道模型提取的特征具有全面性和有效性。

, correspAuthors=许雅玺, authorNote=null, correspAuthorsNote=
** 许雅玺(1976—),女,四川成都人,硕士,副教授,硕士生导师,主要从事决策分析与优化、数据挖掘等方面的研究。E-mail:
, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=pHEl35qUF4WbQvjbm1Kq7Q==, magXml=usBTp7G+RG+ljj8YliCG4A==, pdfUrl=null, pdf=JRpipIlsW891K5/0tL/CkQ==, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=TAQKlIXCv/8ya9sNTG7vJA==, mapNumber=null, authorCompany=null, fund=null, authors=

王 欣 (1973—),男,四川绵阳人,博士,教授,硕士生导师,主要从事机器学习、数据挖掘、自然语言处理方面的研究。E-mail:

史 珂 高级工程师

郑 涛 副教授

, authorsList=王欣, 干镞锐, 许雅玺, 史珂, 郑涛)}, authors=[Author(id=1168128756959944974, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=cafucwx@163.com, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1168128757027053840, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128756959944974, language=EN, stringName=Xin WANG, firstName=Xin, middleName=null, lastName=WANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1 School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1168128757094162705, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128756959944974, language=CN, stringName=王欣, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1 中国民用航空飞行学院 计算机学院,四川 广汉 618307, bio={"img":"YPlZFLsnYTaudXLKon2Nyw==","content":"

王 欣 (1973—),男,四川绵阳人,博士,教授,硕士生导师,主要从事机器学习、数据挖掘、自然语言处理方面的研究。E-mail:

"}, bioImg=YPlZFLsnYTaudXLKon2Nyw==, bioContent=

王 欣 (1973—),男,四川绵阳人,博士,教授,硕士生导师,主要从事机器学习、数据挖掘、自然语言处理方面的研究。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1168128756674732292, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, xref=1, ext=[AuthorCompanyExt(id=1168128756687315205, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756674732292, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China), AuthorCompanyExt(id=1168128756695703814, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756674732292, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 中国民用航空飞行学院 计算机学院,四川 广汉 618307)])]), Author(id=1168128757178048787, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1168128757266129173, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128757178048787, language=EN, stringName=Zurui GAN, firstName=Zurui, middleName=null, lastName=GAN, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1 School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1168128757333238038, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128757178048787, language=CN, stringName=干镞锐, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1 中国民用航空飞行学院 计算机学院,四川 广汉 618307, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1168128756674732292, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, xref=1, ext=[AuthorCompanyExt(id=1168128756687315205, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756674732292, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China), AuthorCompanyExt(id=1168128756695703814, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756674732292, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 中国民用航空飞行学院 计算机学院,四川 广汉 618307)])]), Author(id=1168128757383569688, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=31858255@qq.com, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1168128757496815898, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128757383569688, language=EN, stringName=Yaxi XU, firstName=Yaxi, middleName=null, lastName=XU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=2, **, address=2 School of Economics and Management,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1168128757580701979, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128757383569688, language=CN, stringName=许雅玺, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=2, **, address=2 中国民用航空飞行学院 经济与管理学院,四川 广汉 618307, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1168128756758618375, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, xref=2, ext=[AuthorCompanyExt(id=1168128756762812680, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756758618375, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2 School of Economics and Management,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China), AuthorCompanyExt(id=1168128756771201289, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756758618375, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2 中国民用航空飞行学院 经济与管理学院,四川 广汉 618307)])]), Author(id=1168128757643616541, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1168128757698142495, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128757643616541, language=EN, stringName=Ke SHI, firstName=Ke, middleName=null, lastName=SHI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=3, address=3 Institute of Civil Aviation Supervisor Training,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1168128757752668448, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128757643616541, language=CN, stringName=史珂, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=3, address=3 中国民用航空飞行学院 民航监察员培训学院,四川 广汉 618307, bio={"content":"

史 珂 高级工程师

"}, bioImg=null, bioContent=

史 珂 高级工程师

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1168128756859281674, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, xref=3, ext=[AuthorCompanyExt(id=1168128756863475979, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756859281674, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3 Institute of Civil Aviation Supervisor Training,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China), AuthorCompanyExt(id=1168128756871864588, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756859281674, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3 中国民用航空飞行学院 民航监察员培训学院,四川 广汉 618307)])]), Author(id=1168128757807194402, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, orderNo=4, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1168128757861720357, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128757807194402, language=EN, stringName=Tao ZHENG, firstName=Tao, middleName=null, lastName=ZHENG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1 School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1168128757920440615, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, authorId=1168128757807194402, language=CN, stringName=郑涛, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1 中国民用航空飞行学院 计算机学院,四川 广汉 618307, bio={"content":"

郑 涛 副教授

"}, bioImg=null, bioContent=

郑 涛 副教授

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1168128756674732292, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, xref=1, ext=[AuthorCompanyExt(id=1168128756687315205, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756674732292, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China), AuthorCompanyExt(id=1168128756695703814, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756674732292, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 中国民用航空飞行学院 计算机学院,四川 广汉 618307)])])], keywords=[Keyword(id=1168128758033686825, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, orderNo=1, keyword=character-word vector fusion), Keyword(id=1168128758096601386, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, orderNo=2, keyword=civil aviation supervision), Keyword(id=1168128758163710251, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, orderNo=3, keyword=short text), Keyword(id=1168128758235013420, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, orderNo=4, keyword=text convolutional neural networks(TextCNN)), Keyword(id=1168128758297927981, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, orderNo=5, keyword=bi-directional long short-term memory(BiLSTM)), Keyword(id=1168128758356648238, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, orderNo=1, keyword=字词向量融合), Keyword(id=1168128758411174194, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, orderNo=2, keyword=民航监管), Keyword(id=1168128758474088758, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, orderNo=3, keyword=短文本), Keyword(id=1168128758578946359, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, orderNo=4, keyword=文本卷积神经网络(TextCNN)), Keyword(id=1168128758629278007, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, orderNo=5, keyword=双向长短期记忆(BiLSTM))], refs=[Reference(id=1168128761015837030, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=1, pageStart=25, pageEnd=27, url=null, language=null, rfNumber=[1], rfOrder=0, authorNames=吴剑青, journalName=民航管理, refType=null, unstructuredReference=吴剑青. 民航监管的数字化转型解决方案建议[J]. 民航管理, 2021(1): 25-27., articleTitle=民航监管的数字化转型解决方案建议, refAbstract=null), Reference(id=1168128761095528808, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=1, pageStart=25, pageEnd=27, url=null, language=null, rfNumber=[1], rfOrder=1, authorNames=WU Jianqing, journalName=Civil Aviation Management, refType=null, unstructuredReference=WU Jianqing. Suggestions on solutions for digital transformation of civil aviation supervision[J]. Civil Aviation Management, 2021(1): 25-27., articleTitle=Suggestions on solutions for digital transformation of civil aviation supervision, refAbstract=null), Reference(id=1168128761158443370, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=1, pageStart=18, pageEnd=22, url=null, language=null, rfNumber=[2], rfOrder=2, authorNames=张恒, 杨骁勇, journalName=大飞机, refType=null, unstructuredReference=张恒, 杨骁勇. 智慧监管怎么管[J]. 大飞机, 2022(1):18-22., articleTitle=智慧监管怎么管, refAbstract=null), Reference(id=1168128761238135148, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2021, volume=31, issue=9, pageStart=1, pageEnd=7, url=null, language=null, rfNumber=[3], rfOrder=3, authorNames=冯文刚, journalName=中国安全科学学报, refType=null, unstructuredReference=冯文刚. 基于深度长短记忆模型的民航安保事件分析[J]. 中国安全科学学报, 2021, 31(9): 1-7., articleTitle=基于深度长短记忆模型的民航安保事件分析, refAbstract=null), Reference(id=1168128761326215534, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2021, volume=31, issue=9, pageStart=1, pageEnd=7, url=null, language=null, rfNumber=[3], rfOrder=4, authorNames=FENG Wen'gang, journalName=China Safety Science Journal, refType=null, unstructuredReference=FENG Wen'gang. Research on civil aviation security event analysis based on deep LSTM model[J]. China Safety Science Journal, 2021, 31(9): 1-7., articleTitle=Research on civil aviation security event analysis based on deep LSTM model, refAbstract=null), Reference(id=1168128761443656048, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=8, pageStart=69, pageEnd=74, url=null, language=null, rfNumber=[4], rfOrder=5, authorNames=杨秀璋, 宋籍文, 武帅, journalName=计算机时代, refType=null, unstructuredReference=杨秀璋, 宋籍文, 武帅, 等. 一种融合Bert预训练和BiLSTM的场景迁移情感分析研究[J]. 计算机时代, 2022(8):69-74,79., articleTitle=一种融合Bert预训练和BiLSTM的场景迁移情感分析研究, refAbstract=null), Reference(id=1168128761535930738, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=8, pageStart=69, pageEnd=74, url=null, language=null, rfNumber=[4], rfOrder=6, authorNames=YANG Xiuzhang, SONG Jiwen, WU Shuai, journalName=Computer Era, refType=null, unstructuredReference=YANG Xiuzhang, SONG Jiwen, WU Shuai, et al. Research on sentiment analysis of scene migration based on Bert pre-training and BiLSTM[J]. Computer Era, 2022(8):69-74,79., articleTitle=Research on sentiment analysis of scene migration based on Bert pre-training and BiLSTM, refAbstract=null), Reference(id=1168128761607233908, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=43, issue=8, pageStart=2311, pageEnd=2316, url=null, language=null, rfNumber=[5], rfOrder=7, authorNames=苗将, 张仰森, 李剑龙, journalName=计算机工程与设计, refType=null, unstructuredReference=苗将, 张仰森, 李剑龙. 基于BERT的中文新闻标题分类[J]. 计算机工程与设计, 2022, 43(8): 2311-2316., articleTitle=基于BERT的中文新闻标题分类, refAbstract=null), Reference(id=1168128761720480118, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=43, issue=8, pageStart=2311, pageEnd=2316, url=null, language=null, rfNumber=[5], rfOrder=8, authorNames=MIAO Jiang, ZHANG Yangsen, LI Jianlong, journalName=Computer Engineering and Design, refType=null, unstructuredReference=MIAO Jiang, ZHANG Yangsen, LI Jianlong. Classification of Chinese news headlines based on BERT[J]. Computer Engineering and Design, 2022, 43(8): 2311-2316., articleTitle=Classification of Chinese news headlines based on BERT, refAbstract=null), Reference(id=1168128761833726328, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=32, issue=6, pageStart=103, pageEnd=108, url=null, language=null, rfNumber=[6], rfOrder=9, authorNames=尚麟宇, 尹明, 肖畅, journalName=中国安全科学学报, refType=null, unstructuredReference=尚麟宇, 尹明, 肖畅, 等. 基于 BLS 的铁路安全事件文本分类研究[J]. 中国安全科学学报, 2022, 32(6): 103-108., articleTitle=基于 BLS 的铁路安全事件文本分类研究, refAbstract=null), Reference(id=1168128761997304185, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=32, issue=6, pageStart=103, pageEnd=108, url=null, language=null, rfNumber=[6], rfOrder=10, authorNames=SHANG Linyu, YIN Ming, XIAO Chang, journalName=China Safety Science Journal, refType=null, unstructuredReference=SHANG Linyu, YIN Ming, XIAO Chang, et al. Research on text classification of railway safety incidents based on BLS[J]. China Safety Science Journal, 2022, 32(6): 103-108., articleTitle=Research on text classification of railway safety incidents based on BLS, refAbstract=null), Reference(id=1168128762076995963, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2023, volume=59, issue=9, pageStart=104, pageEnd=111, url=null, language=null, rfNumber=[7], rfOrder=11, authorNames=辛苗苗, 马丽, 胡博发, journalName=计算机工程与应用, refType=null, unstructuredReference=辛苗苗, 马丽, 胡博发. 融合多粒度信息的文本分类研究[J]. 计算机工程与应用, 2023, 59(9):104-111., articleTitle=融合多粒度信息的文本分类研究, refAbstract=null), Reference(id=1168128762139910524, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2023, volume=59, issue=9, pageStart=104, pageEnd=111, url=null, language=null, rfNumber=[7], rfOrder=12, authorNames=XIN Miaomiao, MA Li, HU Bofa, journalName=Computer Engineering and Applications, refType=null, unstructuredReference=XIN Miaomiao, MA Li, HU Bofa. Research on text classification by fusing multi-granularity information[J]. Computer Engineering and Applications, 2023, 59(9):104-111., articleTitle=Research on text classification by fusing multi-granularity information, refAbstract=null), Reference(id=1168128762211213693, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=53, issue=5, pageStart=20, pageEnd=31, url=null, language=null, rfNumber=[8], rfOrder=13, authorNames=付红萍, 陈恺之, 陈志泊, journalName=东北农业大学学报, refType=null, unstructuredReference=付红萍, 陈恺之, 陈志泊. 基于ERNIE-RCNN梅花研究信息文本分类方法[J]. 东北农业大学学报, 2022, 53(5):20-31., articleTitle=基于ERNIE-RCNN梅花研究信息文本分类方法, refAbstract=null), Reference(id=1168128762353820031, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=53, issue=5, pageStart=20, pageEnd=31, url=null, language=null, rfNumber=[8], rfOrder=14, authorNames=FU Hongping, CHEN Kaizhi, CHEN Zhibo, journalName=Journal of Northeast Agricultural University, refType=null, unstructuredReference=FU Hongping, CHEN Kaizhi, CHEN Zhibo. Research on plum blossom research information text classification based on ERNIE-RCNN[J]. Journal of Northeast Agricultural University, 2022, 53(5): 20-31., articleTitle=Research on plum blossom research information text classification based on ERNIE-RCNN, refAbstract=null), Reference(id=1168128762433511808, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=https://arxiv.org/pdf/1901.11196.pdf, language=null, rfNumber=[9], rfOrder=15, authorNames=WEI J, ZOU K, journalName=null, refType=null, unstructuredReference=WEI J, ZOU K. EDA: easy data augmentation techniques for boosting performance on text classification tasks[EB/OL]. (2019-01-31). https://arxiv.org/pdf/1901.11196.pdf., articleTitle=EDA: easy data augmentation techniques for boosting performance on text classification tasks, refAbstract=null), Reference(id=1168128762513203585, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2019, volume=null, issue=null, pageStart=4171, pageEnd=4186, url=null, language=null, rfNumber=[10], rfOrder=16, authorNames=DEVLIN J, CHANG M W, LEE K, journalName=Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), refType=null, unstructuredReference=DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]. Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), 2019:4171-4186., articleTitle=BERT: pre-training of deep bidirectional transformers for language understanding, refAbstract=null), Reference(id=1168128762601283971, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2023, volume=32, issue=3, pageStart=647, pageEnd=654, url=null, language=null, rfNumber=[11], rfOrder=17, authorNames=TANG Huanling, ZHU Hui, WEI Hongmin, journalName=Chinese Journal of Electronics, refType=null, unstructuredReference=TANG Huanling, ZHU Hui, WEI Hongmin, et al. Representation of semantic word embeddings based on SLDA and Word2vec model[J]. Chinese Journal of Electronics, 2023, 32(3): 647-654., articleTitle=Representation of semantic word embeddings based on SLDA and Word2vec model, refAbstract=null), Reference(id=1168128762806804870, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=53, issue=7, pageStart=2068, pageEnd=2076, url=null, language=null, rfNumber=[12], rfOrder=18, authorNames=鲍彤, 罗瑞, 郭婷, journalName=南方农业学报, refType=null, unstructuredReference=鲍彤, 罗瑞, 郭婷, 等. 基于BERT字向量和TextCNN的农业问句分类模型分析[J]. 南方农业学报, 2022, 53(7):2068-2076., articleTitle=基于BERT字向量和TextCNN的农业问句分类模型分析, refAbstract=null), Reference(id=1168128762932633992, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=53, issue=7, pageStart=2068, pageEnd=2076, url=null, language=null, rfNumber=[12], rfOrder=19, authorNames=BAO Tong, LUO Rui, GUO Ting, journalName=Journal of Southern Agriculture, refType=null, unstructuredReference=BAO Tong, LUO Rui, GUO Ting, et al. Agricultural question classification model based on BERT word vector and TextCNN[J]. Journal of Southern Agriculture, 2022, 53(7):2068-2076., articleTitle=Agricultural question classification model based on BERT word vector and TextCNN, refAbstract=null), Reference(id=1168128762999742857, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=42, issue=5, pageStart=102, pageEnd=109, url=null, language=null, rfNumber=[13], rfOrder=20, authorNames=赵程栋, 庄继晖, 程晓鸣, journalName=广东海洋大学学报, refType=null, unstructuredReference=赵程栋, 庄继晖, 程晓鸣, 等. 基于特征注意力机制的RNN-Bi-LSTM船舶轨迹预测[J]. 广东海洋大学学报, 2022, 42(5): 102-109., articleTitle=基于特征注意力机制的RNN-Bi-LSTM船舶轨迹预测, refAbstract=null), Reference(id=1168128763087823243, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2022, volume=42, issue=5, pageStart=102, pageEnd=109, url=null, language=null, rfNumber=[13], rfOrder=21, authorNames=ZHAO Chengdong, ZHUANG Jihui, CHENG Xiaoming, journalName=Journal of Guangdong Ocean University, refType=null, unstructuredReference=ZHAO Chengdong, ZHUANG Jihui, CHENG Xiaoming, et al. Ship trajectory prediction of RNN-Bi-LSTM based on characteristic attention mechanism[J]. Journal of Guangdong Ocean University, 2022, 42(5): 102-109., articleTitle=Ship trajectory prediction of RNN-Bi-LSTM based on characteristic attention mechanism, refAbstract=null), Reference(id=1168128763175903628, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[14], rfOrder=22, authorNames=王晓明, journalName=基于深度学习的中文文本分类的关键技术研究, refType=null, unstructuredReference=王晓明. 基于深度学习的中文文本分类的关键技术研究[D]. 成都: 电子科技大学, 2020., articleTitle=null, refAbstract=null), Reference(id=1168128763243012494, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[14], rfOrder=23, authorNames=WANG Xiaoming, journalName=Research on key technologies of chinese text classification based on deep learning, refType=null, unstructuredReference=WANG Xiaoming. Research on key technologies of chinese text classification based on deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2020., articleTitle=null, refAbstract=null), Reference(id=1168128763343675791, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2020, volume=16, issue=1, pageStart=187, pageEnd=188, url=null, language=null, rfNumber=[15], rfOrder=24, authorNames=刘凯洋, journalName=电脑知识与技术, refType=null, unstructuredReference=刘凯洋. 结合Bert字向量和卷积神经网络的新闻文本分类方法[J]. 电脑知识与技术, 2020, 16(1):187-188., articleTitle=结合Bert字向量和卷积神经网络的新闻文本分类方法, refAbstract=null), Reference(id=1168128763414978961, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, doi=null, pmid=null, pmcid=null, year=2020, volume=16, issue=1, pageStart=187, pageEnd=188, url=null, language=null, rfNumber=[15], rfOrder=25, authorNames=LIU Kaiyang, journalName=Computer Knowledge and Technology, refType=null, unstructuredReference=LIU Kaiyang. A Chinese news text classification method of combining Bert character vector and convolutional neural networks[J]. Computer Knowledge and Technology, 2020, 16(1):187-188., articleTitle=A Chinese news text classification method of combining Bert character vector and convolutional neural networks, refAbstract=null)], funds=[Fund(id=1168128760705458529, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, awardId=U2033213, language=CN, fundingSource=国家自然科学基金资助(U2033213), fundOrder=null, country=null), Fund(id=1168128760793538913, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, awardId=J2022-048, language=CN, fundingSource=中央高校基本科研业务费专项资金资助(J2022-048), fundOrder=null, country=null), Fund(id=1168128760873230691, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, awardId=J2019-045, language=CN, fundingSource=中央高校基本科研业务费专项资金资助(J2019-045), fundOrder=null, country=null)], companyList=[AuthorCompany(id=1168128756674732292, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, xref=1, ext=[AuthorCompanyExt(id=1168128756687315205, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756674732292, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China), AuthorCompanyExt(id=1168128756695703814, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756674732292, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 中国民用航空飞行学院 计算机学院,四川 广汉 618307)]), AuthorCompany(id=1168128756758618375, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, xref=2, ext=[AuthorCompanyExt(id=1168128756762812680, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756758618375, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2 School of Economics and Management,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China), AuthorCompanyExt(id=1168128756771201289, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756758618375, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2 中国民用航空飞行学院 经济与管理学院,四川 广汉 618307)]), AuthorCompany(id=1168128756859281674, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, xref=3, ext=[AuthorCompanyExt(id=1168128756863475979, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756859281674, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3 Institute of Civil Aviation Supervisor Training,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China), AuthorCompanyExt(id=1168128756871864588, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, companyId=1168128756859281674, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3 中国民用航空飞行学院 民航监察员培训学院,四川 广汉 618307)])], figs=[ArticleFig(id=1168128758876741949, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Fig.1, caption=Model structure of BWCLA, figureFileSmall=BQsBCoxwWCeeCXylGVGwEw==, figureFileBig=kHiYXLOGrMZ02bshBb0NKw==, tableContent=null), ArticleFig(id=1168128758977405247, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=图1, caption=BWCLA模型结构, figureFileSmall=BQsBCoxwWCeeCXylGVGwEw==, figureFileBig=kHiYXLOGrMZ02bshBb0NKw==, tableContent=null), ArticleFig(id=1168128759107428672, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Fig.2, caption=Fusion model structure of character vector and word vector, figureFileSmall=yPL9Mew2JbUeuFgdWu0P5w==, figureFileBig=PTnA6LnA7dVZ4MfMbVo1pg==, tableContent=null), ArticleFig(id=1168128759208091970, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=图2, caption=字向量和词向量的融合模型结构, figureFileSmall=yPL9Mew2JbUeuFgdWu0P5w==, figureFileBig=PTnA6LnA7dVZ4MfMbVo1pg==, tableContent=null), ArticleFig(id=1168128759346504004, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Fig.3, caption=Model structure of TextCNN, figureFileSmall=GRqsBFK8l3SRZtyN/ATkGg==, figureFileBig=FiCkvYvU7sQdfiM7mH+IOA==, tableContent=null), ArticleFig(id=1168128759451361606, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=图3, caption=TextCNN模型结构, figureFileSmall=GRqsBFK8l3SRZtyN/ATkGg==, figureFileBig=FiCkvYvU7sQdfiM7mH+IOA==, tableContent=null), ArticleFig(id=1168128759572996425, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Fig.4, caption=Different text length, figureFileSmall=nxGEMc59NqtNnLPzgsNT8w==, figureFileBig=BWT7bW3iVZEsrD+AWcN0uw==, tableContent=null), ArticleFig(id=1168128759690436939, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=图4, caption=不同文本长度, figureFileSmall=nxGEMc59NqtNnLPzgsNT8w==, figureFileBig=BWT7bW3iVZEsrD+AWcN0uw==, tableContent=null), ArticleFig(id=1168128759753351501, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Fig.5, caption=Accuracy curve of each model on the validation set, figureFileSmall=bIjWOha7yr7kYBUrVTDXxA==, figureFileBig=c9JcMAMmg/ihQYf4oZOedg==, tableContent=null), ArticleFig(id=1168128759812071759, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=图5, caption=各模型在验证集上准确率变化曲线, figureFileSmall=bIjWOha7yr7kYBUrVTDXxA==, figureFileBig=c9JcMAMmg/ihQYf4oZOedg==, tableContent=null), ArticleFig(id=1168128759866597713, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Tab.1, caption=

Part of the dataset of supervision item records

, figureFileSmall=null, figureFileBig=null, tableContent=
监管事项检查记录的文本 类别
客舱机组普遍存在对客舱乘务员上方、服务间、卫生间内氧气面罩的数量分布不熟悉的情况 与人有关的问题
航材库房面积小,不能满足现工作量的需求,收料区已堆满,部分大件航材还在机库中摆放 与设备设施环境有关的问题
公司未建立具体的不定期抽查机制,未开展检查 与制度程序有关的问题
机组未按照最新组织机构调整除冰雪专门协调机构 机构职责的问题
安全检查记录填写不规范 与制度程序有关的问题
机场未设置安保控制中心 机构职责的问题
目前培训计划中缺少救护车司机培训、相关岗位员工培训 与制度程序有关的问题
), ArticleFig(id=1168128759916929362, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=表1, caption=

监管事项检查记录数据集(部分)

, figureFileSmall=null, figureFileBig=null, tableContent=
监管事项检查记录的文本 类别
客舱机组普遍存在对客舱乘务员上方、服务间、卫生间内氧气面罩的数量分布不熟悉的情况 与人有关的问题
航材库房面积小,不能满足现工作量的需求,收料区已堆满,部分大件航材还在机库中摆放 与设备设施环境有关的问题
公司未建立具体的不定期抽查机制,未开展检查 与制度程序有关的问题
机组未按照最新组织机构调整除冰雪专门协调机构 机构职责的问题
安全检查记录填写不规范 与制度程序有关的问题
机场未设置安保控制中心 机构职责的问题
目前培训计划中缺少救护车司机培训、相关岗位员工培训 与制度程序有关的问题
), ArticleFig(id=1168128759984038228, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Tab.2, caption=

Category and quantity after data augmentation

, figureFileSmall=null, figureFileBig=null, tableContent=
类别名称 数量
与人有关 3 913
与设备设施环境有关 3 904
与制度程序有关 3 896
与机构职责有关 3 921
), ArticleFig(id=1168128760055341398, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=表2, caption=

数据增强后的类别及其数量

, figureFileSmall=null, figureFileBig=null, tableContent=
类别名称 数量
与人有关 3 913
与设备设施环境有关 3 904
与制度程序有关 3 896
与机构职责有关 3 921
), ArticleFig(id=1168128760156004696, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Tab.3, caption=

BWCLA model parameters

, figureFileSmall=null, figureFileBig=null, tableContent=
参数 名称
BERT 预训练模型 Bert-Chinese-wwm
D1 字向量维度 768
D2 词向量纬度 768
max_len 文本最大长度 70
hidden_size 隐藏层纬度 384
kernel_size 卷积核大小 [3,4,5]
n_filters 滤波器通道 128
lr 学习率 0.000 1
batch_size 批量梯度下降数 64
epoch 训练批次 20
dropout 丢弃率 0.4
), ArticleFig(id=1168128760344748378, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=表3, caption=

BWCLA模型参数

, figureFileSmall=null, figureFileBig=null, tableContent=
参数 名称
BERT 预训练模型 Bert-Chinese-wwm
D1 字向量维度 768
D2 词向量纬度 768
max_len 文本最大长度 70
hidden_size 隐藏层纬度 384
kernel_size 卷积核大小 [3,4,5]
n_filters 滤波器通道 128
lr 学习率 0.000 1
batch_size 批量梯度下降数 64
epoch 训练批次 20
dropout 丢弃率 0.4
), ArticleFig(id=1168128760407662940, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=EN, label=Tab.4, caption=

Comparison of experimental results of different models

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 嵌入层 ACC P R F1
Word2Vec+
BiLSTM
词向量 0.909 7 0.912 3 0.909 7 0.908 2
Word2Vec+
CNN
词向量 0.853 3 0.854 2 0.852 7 0.852 7
BERT+
BiLSTM
字向量 0.958 4 0.959 7 0.958 4 0.958 3
BERT+CNN 字向量 0.952 2 0.952 1 0.952 2 0.952 2
BERT+
BiLSTM+
CNN
字向量 0.979 8 0.979 8 0.980 2 0.979 9
BERT+
Word2Vec+
BiLSTM
字词融
合的向
0.972 2 0.972 4 0.972 2 0.972 1
BWCLA 字词融
合的向
0.983 7 0.983 9 0.983 7 0.983 6
), ArticleFig(id=1168128760491549022, tenantId=1146029695717560320, journalId=1146031787341344770, articleId=1149741767722972122, language=CN, label=表4, caption=

不同模型对比试验的结果

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 嵌入层 ACC P R F1
Word2Vec+
BiLSTM
词向量 0.909 7 0.912 3 0.909 7 0.908 2
Word2Vec+
CNN
词向量 0.853 3 0.854 2 0.852 7 0.852 7
BERT+
BiLSTM
字向量 0.958 4 0.959 7 0.958 4 0.958 3
BERT+CNN 字向量 0.952 2 0.952 1 0.952 2 0.952 2
BERT+
BiLSTM+
CNN
字向量 0.979 8 0.979 8 0.980 2 0.979 9
BERT+
Word2Vec+
BiLSTM
字词融
合的向
0.972 2 0.972 4 0.972 2 0.972 1
BWCLA 字词融
合的向
0.983 7 0.983 9 0.983 7 0.983 6
)], attaches=null, journal=Journal(id=1123942128916217864, delFlag=0, nameCn=中国安全科学学报, nameEn=China Safety Science Journal, nameHistory1=null, nameHistory2=null, issn=1003-3033, eissn=, cn=11-2865/X, coden=null, periodic=0, language=CN, oaType=0, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=fkqsFM6VKlHC4gCtS5XqTw==, journalPrice=null, startedYear=null, abbrevIsoEn=Chin Saf Sci J, journalRemark=null, publicationField=null, createdTime=null, updatedTime=1754269350027, createdBy=null, updatedBy=13701087609, firstLetterCn=C, firstLetterEn=C, subjectCode=Engineering, subjectName=工程, subjectCodeEn=Engineering, subjectNameEn=null, picCn=fkqsFM6VKlHC4gCtS5XqTw==, picEn=SHn9HgqSxtJrOcAxqD++4Q==, jcr=null, cjcr=null, exts=[JournalExt(id=1159052918994595848, language=CN, name=中国安全科学学报, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.cssjj.com.cn/, createdTime=1754269350050, updatedTime=1754269350050, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=http://www.cssjj.com.cn/CN/column/item15.shtml, submissionAuthorUrl=https://zgaqkxxbauthor.manuscriptcloud.com/login, submissionEditorUrl=https://zgaqkxxbeditor.manuscriptcloud.com/login, submissionReviewUrl=https://zgaqkxxbauthor.manuscriptcloud.com/login, submissionCeEditorUrl=https://zgaqkxxbeditor.manuscriptcloud.com/login, submissionAeEditorUrl=https://zgaqkxxbeditor.manuscriptcloud.com/login, option={"copyright":""}), JournalExt(id=1159052919040733193, language=EN, name=China Safety Science Journal, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.cssjj.com.cn/EN/1003-3033/home.shtml, createdTime=1754269350061, updatedTime=1754269350061, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=https://synbioj.cip.com.cn/EN/column/column3.shtml, submissionAuthorUrl=https://zgaqkxxbauthor.manuscriptcloud.com/login, submissionEditorUrl=https://zgaqkxxbeditor.manuscriptcloud.com/login, submissionReviewUrl=https://zgaqkxxbauthor.manuscriptcloud.com/login, submissionCeEditorUrl=https://zgaqkxxbeditor.manuscriptcloud.com/login, submissionAeEditorUrl=https://zgaqkxxbeditor.manuscriptcloud.com/login, option={"copyright":""})], databaseList=null, tenantJournalId=1146031787341344770, websiteList=[Website(id=1148243202345263519, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146031787341344770, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/zgaqkxxb/CN, language=CN, createTime=1751692112766, createBy=18614031015, updateTime=1753502583634, updateBy=18614031015, name=《中国安全科学学报》中文站点, tplId=1146099689490845704, title=中国安全科学学报, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1148618794941046792, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202345263519, code=articleTextType, value=kx, createTime=1751781661020, updateTime=1751781661020, creator=18614031015, updator=18614031015), WebsiteProps(id=1148618794911686661, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202345263519, code=banner, value=null, createTime=1751781661012, updateTime=1751781661012, creator=18614031015, updator=18614031015), WebsiteProps(id=1148618794894909444, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202345263519, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=tui0IVO9FMwB61HHtX5scg==, createTime=1751781661008, updateTime=1751781661008, creator=18614031015, updator=18614031015), WebsiteProps(id=1148618794932658183, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202345263519, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1751781661017, updateTime=1751781661017, creator=18614031015, updator=18614031015), WebsiteProps(id=1148618794924269574, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202345263519, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1751781661015, updateTime=1751781661015, creator=18614031015, updator=18614031015)]), Website(id=1155836763751993353, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146031787341344770, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/zgaqkxxb/EN, language=EN, createTime=1753502558893, createBy=18614031015, updateTime=1753524450387, updateBy=18614031015, name=《中国安全科学学报》英文站点, tplId=1146101810881728533, title=China Safety Science Journal, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1155895925743669425, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155836763751993353, code=articleTextType, value=kx, createTime=1753516664205, updateTime=1753516664205, creator=18614031015, updator=18614031015), WebsiteProps(id=1155895925722697902, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155836763751993353, code=banner, value=null, createTime=1753516664200, updateTime=1753516664200, creator=18614031015, updator=18614031015), WebsiteProps(id=1155895925714309293, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155836763751993353, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=tui0IVO9FMwB61HHtX5scg==, createTime=1753516664198, updateTime=1753516664198, creator=18614031015, updator=18614031015), WebsiteProps(id=1155895925735280816, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155836763751993353, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1753516664203, updateTime=1753516664203, creator=18614031015, updator=18614031015), WebsiteProps(id=1155895925731086511, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155836763751993353, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1753516664202, updateTime=1753516664202, creator=18614031015, updator=18614031015)])], journalTitle=中国安全科学学报, weixinUrl=null, journalUrl=null, iacademicId=null, status=0, seqNo=null, journalTitleEn=China Safety Science Journal, journalPhotoCn=fkqsFM6VKlHC4gCtS5XqTw==, journalPhotoEn=SHn9HgqSxtJrOcAxqD++4Q==, journalFirstLetter=C, journalRecommend=null, journalNew=null, journalCollection=1, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=null, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/zgaqkxxb/CN/10.16265/j.cnki.issn1003-3033.2024.02.0121, detailUrlEn=https://castjournals.cast.org.cn/joweb/zgaqkxxb/EN/10.16265/j.cnki.issn1003-3033.2024.02.0121, pdfUrlCn=https://castjournals.cast.org.cn/joweb/zgaqkxxb/CN/PDF/10.16265/j.cnki.issn1003-3033.2024.02.0121, pdfUrlEn=https://castjournals.cast.org.cn/joweb/zgaqkxxb/EN/PDF/10.16265/j.cnki.issn1003-3033.2024.02.0121, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
基于字词向量融合的民航智慧监管短文本分类
收藏切换
PDF下载
王欣 1 , 干镞锐 1 , 许雅玺 2, ** , 史珂 3 , 郑涛 1
中国安全科学学报 | 安全社会科学与安全管理 2024,34(2): 37-44
收起
收藏切换
中国安全科学学报 | 安全社会科学与安全管理 2024, 34(2): 37-44
基于字词向量融合的民航智慧监管短文本分类
全屏
王欣1 , 干镞锐1, 许雅玺2, ** , 史珂3, 郑涛1
作者信息
  • 1 中国民用航空飞行学院 计算机学院,四川 广汉 618307
  • 2 中国民用航空飞行学院 经济与管理学院,四川 广汉 618307
  • 3 中国民用航空飞行学院 民航监察员培训学院,四川 广汉 618307
  • 王 欣 (1973—),男,四川绵阳人,博士,教授,硕士生导师,主要从事机器学习、数据挖掘、自然语言处理方面的研究。E-mail:

    史 珂 高级工程师

    郑 涛 副教授

通讯作者:

** 许雅玺(1976—),女,四川成都人,硕士,副教授,硕士生导师,主要从事决策分析与优化、数据挖掘等方面的研究。E-mail:
Short text classification of civil aviation intelligent supervision based on character-word fusion
Xin WANG1 , Zurui GAN1, Yaxi XU2, ** , Ke SHI3, Tao ZHENG1
Affiliations
  • 1 School of Computer,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China
  • 2 School of Economics and Management,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China
  • 3 Institute of Civil Aviation Supervisor Training,Civil Aviation Flight University of China,Guanghan Sichuan 618307,China
出版时间: 2024-02-28 doi: 10.16265/j.cnki.issn1003-3033.2024.02.0121
文章导航
收藏切换

为解决民航监管事项所产生的检查记录仅依靠人工进行分类分析导致效率低的问题,提出一种基于数据增强与字词向量融合的双通道特征提取的短文本分类模型,探讨民航监管事项的分类,包括与人、设备设施环境、制度程序和机构职责等相关问题。为解决类别不平衡问题,采用数据增强算法在原始文本上进行变换,生成新的样本,使各个类别的样本数量更加均衡。将字向量和词向量按字融合拼接,得到具有词特征信息的字向量。将字词融合的向量分别送入到文本卷积神经网络(TextCNN)和双向长短期记忆(BiLSTM)模型中进行不同维度的特征提取,从局部的角度和全局的角度分别提取特征,并在民航监管事项检查记录数据集上进行试验。结果表明:该模型准确率为0.983 7,F1值为0.983 6。与一些字嵌入模型和词嵌入模型相对比,准确率提升0.4%。和一些常用的单通道模型相比,准确率提升3%,验证了双通道模型提取的特征具有全面性和有效性。

字词向量融合  /  民航监管  /  短文本  /  文本卷积神经网络(TextCNN)  /  双向长短期记忆(BiLSTM)

In order to address the inefficiencies in manually classifying and analyzing inspection records about civil aviation supervision,a dual-channel feature extraction short text classification model was proposed. The model combined data augmentation techniques and character-word vector fusion. The model aimed to tackle classification issues related to people,equipment and facilities,institutional procedures and institutional responsibilities in civil aviation supervised matters. In order to tackle the issue of class imbalance,data augmentation algorithms were employed to generate new samples by transforming the original texts,thereby balancing the sample sizes across different categories. The word vectors and character vectors were fused by combining them at the character level,resulting in character vectors that retain word-level features. These fused character vectors were then fed into TextCNN and BiLSTM for feature extraction at different dimensions. By extracting features from both local and global perspectives,this dual-channel approach aimed to capture comprehensive and effective information from the inspection records dataset in civil aviation regulatory matters. Experimental results on the civil aviation regulatory matter inspection record dataset demonstrate that the proposed model achieves an accuracy of 0.983 7 and an F1 score of 0.983 6. Compared with some existing word embedding models and character embedding models,the accuracy is improved by 0.4%. Furthermore,when compared with commonly used single-channel models,the accuracy is increased by 3%,which validates the effectiveness and comprehensiveness of the features extracted by the dual-channel model.

character-word vector fusion  /  civil aviation supervision  /  short text  /  text convolutional neural networks(TextCNN)  /  bi-directional long short-term memory(BiLSTM)
王欣, 干镞锐, 许雅玺, 史珂, 郑涛. 基于字词向量融合的民航智慧监管短文本分类. 中国安全科学学报, 2024 , 34 (2) : 37 -44 . DOI: 10.16265/j.cnki.issn1003-3033.2024.02.0121
Xin WANG, Zurui GAN, Yaxi XU, Ke SHI, Tao ZHENG. Short text classification of civil aviation intelligent supervision based on character-word fusion[J]. China Safety Science Journal, 2024 , 34 (2) : 37 -44 . DOI: 10.16265/j.cnki.issn1003-3033.2024.02.0121
面对航空运输量快速增长的挑战,高效的安全监管是保障民航运行安全的重中之重[1]。依靠传统的监管方式不能满足民航业安全监管的需求,必须积极开发和采用先进的安全监管技术和手段[2]。民航局正在智慧民航的框架下大力推进智慧监管建设,深入应用大数据、人工智能等新一代信息技术,促进监管效能的全面提升,使民航局能够及时掌握全行业、航空公司和相关机构的安全运行情况,提升安全管理水平[3]。民航监管事项检查记录是民航监管执法检查中针对监管事项所发现的问题而记录的文本信息,文本长度一般在10~70字之间。对监管事项检查记录文本进行分类是民航监管数据分析的基础任务,伴随着数据的海量增长,运用自然语言处理的文本分类技术,自动分类监管事项检查记录文本。对推动民航监管的智慧化、精准化,提升行业安全管理水平具有重要意义。
常用的文本分类方法主要分为基于统计机器学习的方法和基于深度学习[4]的文本分类方法2类。在基于统计学习的方法中,对文本的表征能力有限,而深度学习通过多个层数和多个神经元来处理问题,让每个神经元处理简单的任务,同时,通过增加层数来挖掘数据更深的涵义,大大提高处理复杂问题的能力[5]。因此,很多学者基于深度学习的文本分类开展了研究,如尚麟宇等[6]为更加充分分析铁路安全事件,提出一种基于宽度学习系统的铁路安全事件文本分类模型,提高了分类的准确率,能够更好地解决实际的铁路安全问题;辛苗苗等[7]为更加高效地提取出文本的核心内容,从字、词、句子3个方面构建向量,利用Word2Vec构建字向量和词向量,并结合双向长短期记忆(Bi-directional Long Short-Term Memory,BiLSTM)提取字向量和词向量的上下文信息,利用FastText模型提取句向量特征,通过试验验证此方法提高了文本的分类效率。针对常用的深度学习模型在梅花信息文本数据集中分类效果较差的问题,付红萍等[8]提出一种基于知识融合增强表征(Enhanced Representation from kNowledge IntEgration,ERNIE)模型和文本递归卷积神经网络(Text Recurrent Convolutional Neural Networks,TextRCNN)组成的分类模型,通过ERNIE预训练模型对文本进行编码,增强了模型的特征提取能力,TextRCNN利用卷积操作对文本进行自动特征抽取,在文本分类任务上取得了较好的效果。目前已有的研究中,大多数领域文本分类方法主要依赖于基于字符或词语的向量表示。然而,词向量表征可能会忽略单个字符所携带的语义信息,而字符级别的向量则可能无法充分捕捉到词汇组合的意义。此外,这些方法通常采用单通道的特征提取模型,限制了它们在同时捕获文本的全局和局部特征方面的能力。
民航监管事项检查记录是典型的短文本,同时又具有类别极度不平衡的特点。因此,笔者拟提出一种结合数据增强和深度学习的短文本分类方法。在不平衡数据集上应用简单数据增强(Easy Data Augmentation,EDA),再分别基于双向编码器的变压器表示(Bidirectional Encoder Representations from Transformers,BERT)进行字向量嵌入和Word2Vec进行词向量嵌入,将字词融合向量分别送入文本卷积神经网络 (Text Convolutional Neural Networks,TextCNN)和BiLSTM中进行局部和全局的多维度特征提取,并融合2个特征提取模块的输出结果进行分类预测,以期提高民航监管效能。
融合BERT与Word2Vec的TextCNN与BiLSTM(Word2Vec based on TextCNN and BiLSTM with Attention,BERT and BWCLA)模型的结构如图1所示。在预处理阶段,预处理数据集,包括文本数据增强、清洗、分词、分字等操作。在嵌入层,将分字后得到的文本序列输入到BERT预训练模型中,以获得字嵌入向量。同时,将分词的序列输入到Word2Vec中进行训练,以得到民航领域的预训练模型。将分词文本输入该预训练模型,以获得词嵌入向量。将字向量和词向量逐字拼接融合,得到具有词含义特征的字向量。将融合的向量分别送到TextCNN模块和BiLSTM模块中进一步提取特征。在输出层融合TextCNN和BiLSTM的特征进行分类预测。
文中所面对的民航监管事项检查记录数据集是类别极度不平衡的文本数据集。如果直接在数据集上应用文本分类算法,很难获得精度上的提升。故采用EDA[9]通过以下4种方法进行文本数据集的数据增强:
1) 同义词替换。在句子中选取若干个词,并随机选择这些词的同义词进行替换。
2) 随机插入。随机从句子中选择一个词,并求出这个词的近义词,随机插入到句子的某个位置。
3) 随机交换。随机从句子中选出2个词并交换位置。
4) 随机删除。以某个概率随机删除句子中的词。
EDA可以快速生成和原文本相似的句子,扩大样本数据量,实现数据集类别平衡,同时通过在数据集中增加合理的噪声,从而提升模型的泛化性能。
文本表示作为文本研究的基础任务,其准确度影响着众多下游任务的结果,目前主流的方法是通过向量嵌入进行文本的表示。英文语句以词为基本表示单位,通过空格切分而后进行词级别的向量嵌入可满足大部分任务的需求。不同于英文语句,中文语句以字为基本表示单位,每个字都代表一定的语义信息。使用字向量可以更准确地表达每个字的含义和上下文,将多个汉字组合成词会失去一部分信息。除此以外,字向量对于特定领域或任务的专业术语更友好,民航中存在大量的专业术语,通用的分词方法难以准确切分出专业术语,使用字向量可以更好地处理这些专业术语。但是基于字级别的向量嵌入容易丢失部分上下文语义信息,基于词级别的向量可以充分利用词之间的特征并且可以保留在分类任务中比较重要的词序信息。故将字向量与词向量结合起来,充分融合它们的优点。
使用BERT[10]预训练模型对文本进行字嵌入。BERT采用Transformer架构,通过大规模的有标注文本数据进行训练。在预训练的过程中,BERT采用掩码语言模型任务(Masked Language Model,MLM),随机掩盖输入序列中的一些标记,模型在训练时需要预测这些掩盖的标记。同时,BERT还引入了下一句预测任务(Next Sentence Prediction,NSP),让模型学习相邻句子之间的关系。预训练完成后,BERT会将参数保存下来,供后续的微调阶段使用。
为将字向量特征和词向量特征更好地融合起来,逐字将字向量与上下文可能组成词的词向量相融合,使字向量也拥有词的特征信息,增强字向量的表达能力。为充分描述字与词之间的关系,引入一个矩阵,行为词的编号,列为字的编号,用于描述字是否属于词。若第j个词包含第i个字,那么矩阵相应位置为1,否则为0。矩阵中的元素大部分是0,属于稀疏矩阵。为加快模型的收敛速度以及减少内存的消耗,将稀疏矩阵改进为三元组表。三元组表是稀疏矩阵的一种压缩存储方式。三元组表中只保存值为1的元素,提高了模型的收敛速度。字向量和词向量的融合模型结构如图2所示。
字向量和词向量融合如下式:
X = M ( Z 1 Z 2 )
式中:Z1为当前位置的字向量;Z2为当前位置的上下文组成词的词向量;M()为拼接融合特征的函数,用于拼接字向量和词向量。
输入融合向量矩阵X=[x1x2,…,xn],其中,x为输入的字词融合向量。正向长短期记忆网络(Long Short-Term Memory,LSTM)按顺序处理输入序列。正向LSTM通过输入当前字向量以及前一个时间步的隐藏状态,来计算当前时间步的隐藏状态,得到正向的隐状态ht = (h1h2,…,hT)。反向LSTM则按反序处理输入序列。反向LSTM通过输入当前字向量表示以及后一个时间步的隐藏状态,来计算当前时间步的隐藏状态,得到反向的隐状态ht'= (h1',h2',…,hT')。正向LSTM能够捕捉到当前字与前面的字之间的依赖关系,反向LSTM能够捕捉到当前字与后面的字之间的依赖关系。每个LSTM单元内的门控机制可以控制信息的流动和记忆的更新,从而有效处理长距离的依赖关系。将正向LSTM和反向LSTM得到的隐状态合并起来,形成句子的特征表示。综合文本正向和反向的特征信息,能够全面捕捉到句子的语义特征和结构特征。
单向的LSTM只能获取和前文依赖相关的特征,具有一定的局限性,而BiLSTM是一种LSTM的变体,可以从正向和反向2个方向来获取文本的特征,能够进一步获取上下文语义的依赖关系,生成具有上下文语义特征的向量。
卷积神经网络(Convolutional Neural Networks,CNN)[12]通过设置多个不同大小的卷积核并行,有效提取局部关键信息,具有较强的特征提取能力。
使用3种不同的卷积核,尺寸分别为3、4、5。输入融合后的向量矩阵X=[x1x2,…,xn],3种卷积核分别对融合向量进行卷积计算,提取出不同尺寸的局部特征[C1C2C3]。针对不同卷积核提取的特征,在池化层对每个局部特征进行最大池化操作,保留每个特征图中最显著的特征u。将所有池化后的特征k输入到全连接层进行拼接,得到局部特征向量[u1u2u3],作为CNN的输出结果。TextCNN模型结构如图3所示。
输入融合向量矩阵X=[x1x2,…,xn]。卷积核滤波器的计算公式为
$\boldsymbol{C}_{\boldsymbol{j}}=g\left(\boldsymbol{x}_{\boldsymbol{j}} \circ p+b\right)$
式中:Cj为卷积层特征向量;g()为卷积层激活函数;$\circ$表示卷积计算;p为卷积核;b为偏置项。
经过卷积后的向量最大池化保留特征u作为全连接层的输入的计算公式为
u = m a x ( C 1 C 2 C n )
通过全连接层将所有池化后的特征值拼接到特征向量,并通过TextCNN结构,提取文本局部特征,即
U = [ u 1 u 2 u n ]
注意力机制[13]的重点就是让网络关注到它更需要关注的地方。利用注意力机制分别为BiLSTM和TextCNN模型的输出赋予不同的权重,提高重要词汇对分类结果的影响。
对于BiLSTM或TextCNN的特征提取结果,将特征向量乘以3个相应的权重矩阵得到查询向量Q,键向量K和值向量V。下式以TextCNN结构的输出特征U为例。
Q = U · W Q
K = U · W K
V = U · W V
式中WQWKWV 为可以学习的权重矩阵。
将得到的查询、键和值向量切分成m个注意力头。那么Q = [Q1Q2,…,Qm],K = [K1K2,…,Km],V = [V1V2,…,Vm],其中,m为头的数量。
对每个头计算注意力权重。
A Q K V = S Q K T d k V
H i = A ( Q i K i V i )
式中:A为注意力机理函数;S( )为softmax函数,对每行进行归一化处理;dk为每个头中的查询或键向量的维度;Hi为第i个注意力头的输出。
将每个头的向量进行拼接,得到多头注意力的输出。
M H ( Q K V ) = M ( H 1 H 2 H m ) W o
式中Wo为可训练的权重矩阵。
将注意力权重与对应的值向量进行加权求和,作为BiLSTM或TextCNN模块的输出特征。
特征融合层的任务是分别将TextCNN和BiLSTM提取得到的特征融合拼接起来。再将拼接好的特征输入到输出层中预测分类,输出分类结果矩阵。
试验基于Python3.8+PyTorch深度学习框架进行,CPU为Intel(R) Core i9-10900K,GPU为NVIDIA GeForce RTX 3090,显存24G,内存96G。
试验使用的民航监管事项检查记录数据集,是民航监管执法检查中针对监管事项所发现的问题而记录的文本信息。通过筛选、去除无用信息,最终获得5 720条数据,共计4个类别,包括与人、设备设施环境、制度程序和机构职责有关的问题。数据集类别数量极度不均衡。数据集监管事项记录(部分)见表1。不同文本长度饼状图如图4所示。
民航智慧监管数据集是短文本数据集,其类别分布存在不均衡性。为了解决这个问题,首先,对数据集进行去重处理,然后,使用EDA算法平衡各个类别的数量。数据增强后的具体数值见表2。将训练集、测试集和验证集按照8∶1∶1的比例划分。
数据预处理[14]主要包括中文分词、停用词删除等。
准确率ACC表示预测正确的样本占总样本的比例,精确率P表示实际类别且预测类别都为正的样本占所有预测类别为正的样本比重,召回率R表示实际类别且预测类别都为正的样本占所有实际类别为正的样本比例,F1值为准确率和召回率的加权调和平均值[15]
准确率公式为:
A C C = T P + T N T P + T N + F N + F P
精确率公式为:
P = T P T P + F P
召回率公式为:
R = T P T P + F N
F1公式为:
F 1 = 2 P R P + R
式中:TP为判断为正且实际为正;TN为判断为负且实际为负的;FN为判断为负且实际为正;FP为判断为正且实际为负。
试验BERT预训练模型采用中文的Bert-Chinese-wwm,损失函数采用交叉熵损失函数,Epoch为20,BILSTM的隐藏层为384。BWCLA模型一些参数见表3
为验证提出的BWCLA模型的优越性,以民航监管事项检查记录作为试验数据集,与一些经典的分类模型进行对比分析。
设计1—5组试验。在嵌入层,选用并对比3种不同的嵌入模型:BERT模型、Word2Vec模型以及字词融合的向量。在特征提取层,选用并对比3种特征提取模型:BiLSTM模型、CNN模型,以及BiLSTM和CNN并联结构的模型。
1组:Word2Vec+BiLSTM和Word2Vec+CNN。采用Word2Vec模型进行文本词向量嵌入,再将得到的词向量分别输入到BILSTM或CNN进行训练分类。
2组:BERT+BiLSTM和BERT+CNN。基于BERT预训练模型得到文本的基于字向量表示,再将字向量分别输入到BILSTM或CNN进行训练分类。
3组:BERT+BILSTM+CNN。采用BERT进行字级别的向量嵌入,将向量分别输入到BILSTM和CNN,再将2个通道的结果融合。
4组:BERT+Word2Vec+BiLSTM。词嵌入层采用提出的字词向量融合,再将向量输入到BiLSTM进行训练分类。
5组:BWCLA。通过Word2Vec与BERT嵌入得到融合字词特征的文本向量表示,融合字词向量分别输入到BILSTM和TextCNN模块,融合2个通道的特征。
监管事项检查记录数据集在不同模型下对比的试验结果见表4
表4可以看出,字词融合向量相比于单一的字向量或词向量,能取得更好的结果。BiLSTM和CNN并联的双通道模型相比于单一的模型,也能取得更好的效果。试验1和2的结果相比,相较于Word2Vec模型,BERT模型在训练向量方面表现出更为优异的效果。试验2基于BERT的模型BERT+BiLSTM和BERT+CNN的F1值相比于试验1有一定的提高。主要原因是通过Word2Vec训练的词向量是静态的,是上下文无关的。和Word2Vec相比,BERT更能深度提取出文本的含义。通过对比试验2 和3,同样用BERT来训练词向量,文本分类结果都能达到不错的效果。而多通道模型基于将提取全局特征的BiLSTM和提取局部特征的CNN进行并列,对比单一的深度学习,特征提取效果更好,能更好地提取文本涵义,F1值提高2.77%。通过试验2和4的对比,试验4采用字词向量融合嵌入,能较好地提高语义表征能力,F1值提高1.38%。BWCLA模型各个指标都取得了比其他模型更好的效果,验证了文中模型的优越性。
为进一步直观地展示BWCLA模型在民航监管事项文本分类任务的优越性,分析每个模型的训练过程,各模型训练过程验证集的准确率变化曲线如图5所示。
图5可以看出,由于民航监管事项短文本的特性,以BERT为嵌入层的模型准确率均大于Word2Vec的准确率,以TextCNN为基线的模型基本上在10次迭代训练后达到收敛状态。以BILSTM为基线的模型在训练的过程中有动荡的趋势,大致需要14次迭代训练才能收敛。使用双通道模型的验证集准确率变化趋势最为优异,在5次迭代训练后达到收敛状态,且准确率最终收敛在97%~98%,高于对比试验的其他模型。而提出的字词向量融合的双通道模型最终收敛到98.38%,优于只用BERT模型进行字向量的嵌入97.96%。通过试验验证了所提出的模型在性能上具有更优异的表现。
1) BWCLA模型为处理短文本的数据量不均衡问题提供了参考方法,在智慧监管的短文本分类中取得不错的效果,各个评价指标均保持较高的水平。
2) 文中采用字词融合向量作为文本表示方法,并设计一种双通道模型以同时提取全局特征和局部特征。试验结果表明:与仅依赖于字向量或词向量的单通道模型相比,字词向量融合的双通道模型在特征提取方面展现出明显的优势。因此,字词融合向量在结合字级和词级信息后,能够更有效地表征文本数据,进而提高模型的整体性能。
3) 在未来的研究中,针对文中所用数据集,可尝试更好的数据增强方法和更加复杂的模型,进一步提取深层次的特征,提升领域数据集文本分类的准确率。
4) 目前,文中方法在短文本的领域数据集上表现出良好的效果。然而,为验证该方法在更广泛的短文本数据集上的通用性,需要进一步在更多常见的短文本数据集中进行验证。未来,计划进一步验证该模型在长文本中的适用性,并探究其在文本分类领域中的通用性。
  • 国家自然科学基金资助(U2033213)
  • 中央高校基本科研业务费专项资金资助(J2022-048)
  • 中央高校基本科研业务费专项资金资助(J2019-045)
参考文献 引证文献
排序方式:
[1]
吴剑青. 民航监管的数字化转型解决方案建议[J]. 民航管理, 2021(1): 25-27.
WU Jianqing. Suggestions on solutions for digital transformation of civil aviation supervision[J]. Civil Aviation Management, 2021(1): 25-27.
[2]
张恒, 杨骁勇. 智慧监管怎么管[J]. 大飞机, 2022(1):18-22.
[3]
冯文刚. 基于深度长短记忆模型的民航安保事件分析[J]. 中国安全科学学报, 2021, 31(9): 1-7.
FENG Wen'gang. Research on civil aviation security event analysis based on deep LSTM model[J]. China Safety Science Journal, 2021, 31(9): 1-7.
[4]
杨秀璋, 宋籍文, 武帅, 等. 一种融合Bert预训练和BiLSTM的场景迁移情感分析研究[J]. 计算机时代, 2022(8):69-74,79.
YANG Xiuzhang, SONG Jiwen, WU Shuai, et al. Research on sentiment analysis of scene migration based on Bert pre-training and BiLSTM[J]. Computer Era, 2022(8):69-74,79.
[5]
苗将, 张仰森, 李剑龙. 基于BERT的中文新闻标题分类[J]. 计算机工程与设计, 2022, 43(8): 2311-2316.
MIAO Jiang, ZHANG Yangsen, LI Jianlong. Classification of Chinese news headlines based on BERT[J]. Computer Engineering and Design, 2022, 43(8): 2311-2316.
[6]
尚麟宇, 尹明, 肖畅, 等. 基于 BLS 的铁路安全事件文本分类研究[J]. 中国安全科学学报, 2022, 32(6): 103-108.
SHANG Linyu, YIN Ming, XIAO Chang, et al. Research on text classification of railway safety incidents based on BLS[J]. China Safety Science Journal, 2022, 32(6): 103-108.
[7]
辛苗苗, 马丽, 胡博发. 融合多粒度信息的文本分类研究[J]. 计算机工程与应用, 2023, 59(9):104-111.
XIN Miaomiao, MA Li, HU Bofa. Research on text classification by fusing multi-granularity information[J]. Computer Engineering and Applications, 2023, 59(9):104-111.
[8]
付红萍, 陈恺之, 陈志泊. 基于ERNIE-RCNN梅花研究信息文本分类方法[J]. 东北农业大学学报, 2022, 53(5):20-31.
FU Hongping, CHEN Kaizhi, CHEN Zhibo. Research on plum blossom research information text classification based on ERNIE-RCNN[J]. Journal of Northeast Agricultural University, 2022, 53(5): 20-31.
[9]
WEI J, ZOU K. EDA: easy data augmentation techniques for boosting performance on text classification tasks[EB/OL]. (2019-01-31). https://arxiv.org/pdf/1901.11196.pdf. https://arxiv.org/pdf/1901.11196.pdf
[10]
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]. Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), 2019:4171-4186.
[11]
TANG Huanling, ZHU Hui, WEI Hongmin, et al. Representation of semantic word embeddings based on SLDA and Word2vec model[J]. Chinese Journal of Electronics, 2023, 32(3): 647-654.
[12]
鲍彤, 罗瑞, 郭婷, 等. 基于BERT字向量和TextCNN的农业问句分类模型分析[J]. 南方农业学报, 2022, 53(7):2068-2076.
BAO Tong, LUO Rui, GUO Ting, et al. Agricultural question classification model based on BERT word vector and TextCNN[J]. Journal of Southern Agriculture, 2022, 53(7):2068-2076.
[13]
赵程栋, 庄继晖, 程晓鸣, 等. 基于特征注意力机制的RNN-Bi-LSTM船舶轨迹预测[J]. 广东海洋大学学报, 2022, 42(5): 102-109.
ZHAO Chengdong, ZHUANG Jihui, CHENG Xiaoming, et al. Ship trajectory prediction of RNN-Bi-LSTM based on characteristic attention mechanism[J]. Journal of Guangdong Ocean University, 2022, 42(5): 102-109.
[14]
王晓明. 基于深度学习的中文文本分类的关键技术研究[D]. 成都: 电子科技大学, 2020.
WANG Xiaoming. Research on key technologies of chinese text classification based on deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2020.
[15]
刘凯洋. 结合Bert字向量和卷积神经网络的新闻文本分类方法[J]. 电脑知识与技术, 2020, 16(1):187-188.
LIU Kaiyang. A Chinese news text classification method of combining Bert character vector and convolutional neural networks[J]. Computer Knowledge and Technology, 2020, 16(1):187-188.
2024年第34卷第2期
PDF下载
345
141
引用本文
BibTeX
文章信息
doi: 10.16265/j.cnki.issn1003-3033.2024.02.0121
  • 接收时间:2023-08-14
  • 首发时间:2025-07-09
  • 出版时间:2024-02-28
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2023-08-14
  • 修回日期:2023-11-20
基金
国家自然科学基金资助(U2033213)
中央高校基本科研业务费专项资金资助(J2022-048)
中央高校基本科研业务费专项资金资助(J2019-045)
作者信息
    1 中国民用航空飞行学院 计算机学院,四川 广汉 618307
    2 中国民用航空飞行学院 经济与管理学院,四川 广汉 618307
    3 中国民用航空飞行学院 民航监察员培训学院,四川 广汉 618307

通讯作者:

** 许雅玺(1976—),女,四川成都人,硕士,副教授,硕士生导师,主要从事决策分析与优化、数据挖掘等方面的研究。E-mail:
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/zgaqkxxb/CN/10.16265/j.cnki.issn1003-3033.2024.02.0121
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏