Article(id=1203753459473428945, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1203753457208504777, articleNumber=null, orderNo=null, doi=10.12404/j.issn.1671-1815.2307782, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=research-article, receivedDate=1696608000000, receivedDateStr=2023-10-07, revisedDate=1729094400000, revisedDateStr=2024-10-17, acceptedDate=null, acceptedDateStr=null, onlineDate=1764926789395, onlineDateStr=2025-12-05, pubDate=1737129600000, pubDateStr=2025-01-18, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1764926789395, onlineIssueDateStr=2025-12-05, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1764926789395, creator=13701087609, updateTime=1764926789395, updator=13701087609, issue=Issue{id=1203753457208504777, tenantId=1146029695717560320, journalId=1146123166801305609, year='2025', volume='25', issue='2', pageStart='439', pageEnd='878', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=-1, specialIssue=null, createTime=1764926788856, creator=13701087609, updateTime=1764928745558, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1203761664261858014, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1203753457208504777, language=EN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1203761664261858015, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1203753457208504777, language=CN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=649, endPage=656, ext={EN=ArticleExt(id=1203753460127740377, articleId=1203753459473428945, tenantId=1146029695717560320, journalId=1146123166801305609, language=EN, title=Chinese News Title Classification Based on Lightweight Improved ERNIE-RCNN, columnId=1156262729162810294, journalTitle=Science Technology and Engineering, columnName=Papers·Automation and Computational Technology, runingTitle=null, highlight=null, articleAbstract=

Aiming at the problems that the large-scale pre-training language model faces when dealing with news headlines, such as huge parameters, inefficient use of contextual semantic features and circular convolution neural network’s neglect of the importance of initial input elements, a news headline classification method that combines ERNIE(enhanced representation through knowledge integration) of mixture-of-expert model and recurrent convolution neural network with attention mechanism were proposed. Firstly, the text was encoded with the help of MoE’s improved ERNIE technology, and then the text was classified with attention RCNN (recurrent convolutional neural networks)on the basis of preserving the word order and characteristics of the text. In order to improve the classification ability, RCNN was improved by calculating the input fusion context weight. In the process of calculating the weights of experts in MoE, Gumbel-Softmax was selected as a new gating function to improve the traditional Softmax function, so as to better control the smoothness. According to the experimental results, it is found that compared with the traditional classification methods, the classification method proposed in this study shows significant advantages and greatly reduces the number of parameters. On this basis, the F1 value is increased by 0.51% compared with the traditional model. After the ablation experiment, the feasibility of this classification method in the classification task has been confirmed.

, correspAuthors=Zhi-xin ZHANG, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Li LI, Zhi-xin ZHANG, Xiao-long WANG), CN=ArticleExt(id=1203753462682071633, articleId=1203753459473428945, tenantId=1146029695717560320, journalId=1146123166801305609, language=CN, title=基于轻量化改进ERNIE-RCNN的中文新闻标题分类, columnId=1156262729783567290, journalTitle=科学技术与工程, columnName=论文·自动化技术、计算机技术, runingTitle=null, highlight=null, articleAbstract=

针对大型预训练语言模型在处理新闻标题时,面临参数规模庞大、无法高效利用上下文语意特征以及循环卷积神经网络对初始输入元素重要性忽视的问题,提出了一种融合混合专家模型(mixture-of-expert,MoE)的ERNIE与注意力机制的循环卷积神经网络(recurrent convolutional neural networks,RCNN)的新闻标题分类方法。首先,借助MoE改进ERNIE技术进行文本编码,随后利用注意力RCNN在保留文本词序和特征的基础上进行分类。为提高分类能力,通过计算输入的融合上下文权重对RCNN进行改进。在计算MoE中各个专家权重的过程中,选择Gumbel_Softmax作为新型的门控函数以改进传统的Softmax函数,从而更好地控制平滑程度。根据实验结果,发现相较于传统的分类方法,本文研究提出的分类方法展现出显著优势,极大地减少了参数数量。在此基础上,F1相较于传统模型提升了0.51%。经过消融实验的验证,该分类方法在分类任务上的可行性得到了证实。

, correspAuthors=张之欣, authorNote=null, correspAuthorsNote=
* 张之欣(1998—),男,汉族,河南新乡人,硕士研究生。研究方向:自然语言处理。E-mail:
, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=dSh2MWJjPlJ04Y5Q5Oi+eQ==, magXml=Fc+pKU7QYwBz2DtcPGij/g==, pdfUrl=null, pdf=zgteTkUTCn7CUqSVvTys+w==, pdfFileSize=3101034, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=TijwtFAnAg7Sy/oK3kPVtQ==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=5FiBR9r3LX+b4IXalt3/Ig==, mapNumber=null, authorCompany=null, fund=null, authors=

李莉(1980—),女,汉族,重庆人,博士,副教授。研究方向:大数据分析、深度学习。E-mail:

, authorsList=李莉, 张之欣, 王小龙)}, authors=[Author(id=1203787147070387075, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=haolily12@163.com, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1203787147338822547, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, authorId=1203787147070387075, language=EN, stringName=Li LI, firstName=Li, middleName=null, lastName=LI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, 2, address=1 School of Control and Computer Engineering, North China Electric Power University, Baoding 071003,China
2 Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding 071003, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1203787147426902942, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, authorId=1203787147070387075, language=CN, stringName=李莉, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, 2, address=1 华北电力大学控制与计算机工程学院, 保定 071003
2 河北省能源电力知识计算重点实验室, 保定 071003, bio={"content":"

李莉(1980—),女,汉族,重庆人,博士,副教授。研究方向:大数据分析、深度学习。E-mail:

"}, bioImg=null, bioContent=

李莉(1980—),女,汉族,重庆人,博士,副教授。研究方向:大数据分析、深度学习。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1203787146780980076, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, xref=1, ext=[AuthorCompanyExt(id=1203787146785174381, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146780980076, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 School of Control and Computer Engineering, North China Electric Power University, Baoding 071003,China), AuthorCompanyExt(id=1203787146793562990, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146780980076, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 华北电力大学控制与计算机工程学院, 保定 071003)]), AuthorCompany(id=1203787146944557945, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, xref=2, ext=[AuthorCompanyExt(id=1203787146952946554, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146944557945, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2 Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding 071003, China), AuthorCompanyExt(id=1203787146957140859, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146944557945, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2 河北省能源电力知识计算重点实验室, 保定 071003)])]), Author(id=1203787147544343469, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=2973916737@qq.com, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1203787147678561206, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, authorId=1203787147544343469, language=EN, stringName=Zhi-xin ZHANG, firstName=Zhi-xin, middleName=null, lastName=ZHANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, *, address=1 School of Control and Computer Engineering, North China Electric Power University, Baoding 071003,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1203787147825361865, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, authorId=1203787147544343469, language=CN, stringName=张之欣, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, *, address=1 华北电力大学控制与计算机工程学院, 保定 071003, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1203787146780980076, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, xref=1, ext=[AuthorCompanyExt(id=1203787146785174381, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146780980076, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 School of Control and Computer Engineering, North China Electric Power University, Baoding 071003,China), AuthorCompanyExt(id=1203787146793562990, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146780980076, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 华北电力大学控制与计算机工程学院, 保定 071003)])]), Author(id=1203787147984745428, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1203787148118963171, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, authorId=1203787147984745428, language=EN, stringName=Xiao-long WANG, firstName=Xiao-long, middleName=null, lastName=WANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1 School of Control and Computer Engineering, North China Electric Power University, Baoding 071003,China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1203787149377254387, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, authorId=1203787147984745428, language=CN, stringName=王小龙, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1 华北电力大学控制与计算机工程学院, 保定 071003, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1203787146780980076, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, xref=1, ext=[AuthorCompanyExt(id=1203787146785174381, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146780980076, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 School of Control and Computer Engineering, North China Electric Power University, Baoding 071003,China), AuthorCompanyExt(id=1203787146793562990, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146780980076, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 华北电力大学控制与计算机工程学院, 保定 071003)])])], keywords=[Keyword(id=1203787149649883152, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, orderNo=1, keyword=MoE (mixture of experts)), Keyword(id=1203787149800878114, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, orderNo=2, keyword=ERNIE (enhanced representation through knowledge integration)), Keyword(id=1203787149893152816, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, orderNo=3, keyword=attention mechanism), Keyword(id=1203787149998010434, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, orderNo=4, keyword=RCNN (recurrent convolutional neural network)), Keyword(id=1203787150119645265, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, orderNo=5, keyword=text classification), Keyword(id=1203787150232891490, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, orderNo=1, keyword=混合专家系统), Keyword(id=1203787150442606700, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, orderNo=2, keyword=知识增强语义表示模型), Keyword(id=1203787150601990265, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, orderNo=3, keyword=注意力机制), Keyword(id=1203787150757179531, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, orderNo=4, keyword=循环卷积神经网络), Keyword(id=1203787150903980186, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, orderNo=5, keyword=文本分类)], refs=[Reference(id=1203787155534492199, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=33, pageStart=14264, pageEnd=14270, url=null, language=null, rfNumber=[1], rfOrder=0, authorNames=诸林云, 范菁, 曲金帅, journalName=科学技术与工程, refType=null, unstructuredReference=诸林云, 范菁, 曲金帅, 等. 基于BERT与多通道卷积神经网络的细粒度情感分类[J]. 科学技术与工程, 2023, 23(33): 14264-14270., articleTitle=基于BERT与多通道卷积神经网络的细粒度情感分类, refAbstract=null), Reference(id=1203787155656127024, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=33, pageStart=14264, pageEnd=14270, url=null, language=null, rfNumber=[1], rfOrder=1, authorNames=Zhu Linyun, Fan Jing, Qu Jinshuai, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Zhu Linyun, Fan Jing, Qu Jinshuai, et al. Fine-grained sentiment classification based on BERT and multi-channel convolutional neural networks[J]. Science Technology and Engineering, 2023, 23(33): 14264-14270., articleTitle=Fine-grained sentiment classification based on BERT and multi-channel convolutional neural networks, refAbstract=null), Reference(id=1203787155760984638, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2021, volume=45, issue=11, pageStart=144, pageEnd=151, url=null, language=null, rfNumber=[2], rfOrder=2, authorNames=俞学豪, 赵子岩, 马应龙, journalName=电力系统自动化, refType=null, unstructuredReference=俞学豪, 赵子岩, 马应龙, 等. 基于BR和GBDT的电力信息通信客服系统多标签文本分类[J]. 电力系统自动化, 2021, 45(11): 144-151., articleTitle=基于BR和GBDT的电力信息通信客服系统多标签文本分类, refAbstract=null), Reference(id=1203787155870036557, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2015, volume=45, issue=11, pageStart=144, pageEnd=151, url=null, language=null, rfNumber=[2], rfOrder=3, authorNames=Yu Xuehao, Zhao Ziyan, Ma Yinglong, journalName=Automation of Electric Power Systems, refType=null, unstructuredReference=Yu Xuehao, Zhao Ziyan, Ma Yinglong, et al. Multi-label text classification of power information communication customer service system based on BR and GBDT[J]. Automation of Electric Power Systems, 2015, 45(11): 144-151., articleTitle=Multi-label text classification of power information communication customer service system based on BR and GBDT, refAbstract=null), Reference(id=1203787155991671388, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2023, volume=17, issue=12, pageStart=2840, pageEnd=2860, url=null, language=null, rfNumber=[3], rfOrder=4, authorNames=孟祥福, 霍红锦, 张霄雁, journalName=计算机科学与探索, refType=null, unstructuredReference=孟祥福, 霍红锦, 张霄雁, 等. 个性化新闻推荐方法研究综述[J]. 计算机科学与探索, 2023, 17(12): 2840-2860., articleTitle=个性化新闻推荐方法研究综述, refAbstract=null), Reference(id=1203787156151054954, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2023, volume=17, issue=12, pageStart=2840, pageEnd=2860, url=null, language=null, rfNumber=[3], rfOrder=5, authorNames=Meng Xiangfu, Huo Hongjin, Zhang Xiaoyan, journalName=Exploration of Computer Science and Technology, refType=null, unstructuredReference=Meng Xiangfu, Huo Hongjin, Zhang Xiaoyan, et al. Research review on personalized news recommendation methods[J]. Exploration of Computer Science and Technology, 2023, 17(12): 2840-2860., articleTitle=Research review on personalized news recommendation methods, refAbstract=null), Reference(id=1203787156281078390, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2023, volume=7, issue=4, pageStart=16, pageEnd=31, url=null, language=null, rfNumber=[4], rfOrder=6, authorNames=华玮, 吴思洋, 俞超, journalName=数据分析与知识发现, refType=null, unstructuredReference=华玮, 吴思洋, 俞超, 等. 面向网络舆情事件的多层次情感分歧度分析方法[J]. 数据分析与知识发现, 2023, 7(4): 16-31., articleTitle=面向网络舆情事件的多层次情感分歧度分析方法, refAbstract=null), Reference(id=1203787156444656258, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2023, volume=7, issue=4, pageStart=16, pageEnd=31, url=null, language=null, rfNumber=[4], rfOrder=7, authorNames=Hua Wei, Wu Siyang, Yu Chao, journalName=Data Analysis and Knowledge Discovery, refType=null, unstructuredReference=Hua Wei, Wu Siyang, Yu Chao, et al. Multi-level emotion divergence analysis method for network public opinion events[J]. Data Analysis and Knowledge Discovery, 2023, 7(4): 16-31., articleTitle=Multi-level emotion divergence analysis method for network public opinion events, refAbstract=null), Reference(id=1203787156612428435, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2021, volume=47, issue=8, pageStart=1, pageEnd=13, url=null, language=null, rfNumber=[5], rfOrder=8, authorNames=王军, 冯孙铖, 程勇, journalName=计算机工程, refType=null, unstructuredReference=王军, 冯孙铖, 程勇. 深度学习的轻量化神经网络结构研究综述[J]. 计算机工程, 2021, 47(8): 1-13., articleTitle=深度学习的轻量化神经网络结构研究综述, refAbstract=null), Reference(id=1203787156755034781, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=201, issue=8, pageStart=1, pageEnd=13, url=null, language=null, rfNumber=[5], rfOrder=9, authorNames=Wang Jun, journalName=Computer Engineering, refType=null, unstructuredReference=Wang Jun, Feng Suncheng, Cheng Yong. A review of lightweight neural network structures for deep learning[J]. Computer Engineering, 201, 47(8): 1-13., articleTitle=A review of lightweight neural network structures for deep learning, refAbstract=null), Reference(id=1203787156880863912, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2014, volume=null, issue=null, pageStart=1746, pageEnd=1751, url=null, language=null, rfNumber=[6], rfOrder=10, authorNames=Kim Y, journalName=Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), refType=null, unstructuredReference=Kim Y. Convolutional neural networks for sentence classification[C]// Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: EMNLP, 2014: 1746-1751., articleTitle=Convolutional neural networks for sentence classification, refAbstract=null), Reference(id=1203787156977332914, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[7], rfOrder=11, authorNames=Liu P, Qiu X, Huang X, journalName=arXiv preprint arXiv: 1605.05101, 2016, refType=null, unstructuredReference=Liu P, Qiu X, Huang X. Recurrentneural network for text classification with multi-task learning[J]. arXiv preprint arXiv: 1605.05101, 2016., articleTitle=Recurrentneural network for text classification with multi-task learning, refAbstract=null), Reference(id=1203787157082190530, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2015, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[8], rfOrder=12, authorNames=Lai S, Xu L, Liu K, journalName=AAAI Press, refType=null, unstructuredReference=Lai S, Xu L, Liu K, et al. Recurrent convolutional neural networks for text classification[J]. AAAI Press, 2015.DOI: 10.1609/aaai.v29i1.9513., articleTitle=Recurrent convolutional neural networks for text classification, refAbstract=null), Reference(id=1203787157874913998, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[9], rfOrder=13, authorNames=Joulin A, Grave E, Bojanowski P, journalName=arXiv preprint arXiv: 1607.01759, 2016, refType=null, unstructuredReference=Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[J]. arXiv preprint arXiv: 1607.01759, 2016., articleTitle=Bag of tricks for efficient text classification, refAbstract=null), Reference(id=1203787157975577301, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2017, volume=5, issue=null, pageStart=135, pageEnd=146, url=null, language=null, rfNumber=[10], rfOrder=14, authorNames=Bojanowski P, Grave E, Joulin A, journalName=Transactions of the Association for Computational Linguistics, refType=null, unstructuredReference=Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146., articleTitle=Enriching word vectors with subword information, refAbstract=null), Reference(id=1203787158084629221, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2017, volume=null, issue=null, pageStart=562, pageEnd=570, url=null, language=null, rfNumber=[11], rfOrder=15, authorNames=Johnson R, Zhang T, journalName=Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), refType=null, unstructuredReference=Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Vancouver: DPCNN, 2017: 562-570., articleTitle=Deep pyramid convolutional neural networks for text categorization, refAbstract=null), Reference(id=1203787158197875438, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2020, volume=56, issue=23, pageStart=12, pageEnd=22, url=null, language=null, rfNumber=[12], rfOrder=16, authorNames=余同瑞, 金冉, 韩晓臻, journalName=计算机工程与应用, refType=null, unstructuredReference=余同瑞, 金冉, 韩晓臻, 等. 自然语言处理预训练模型的研究综述[J]. 计算机工程与应用, 2020, 56(23): 12-22., articleTitle=自然语言处理预训练模型的研究综述, refAbstract=null), Reference(id=1203787158306927352, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2019, volume=56, issue=23, pageStart=12, pageEnd=22, url=null, language=null, rfNumber=[12], rfOrder=17, authorNames=Yu Tongrui, Jin Ran, Han Xiaozhen, journalName=Computer Engineering and Applications, refType=null, unstructuredReference=Yu Tongrui, Jin Ran, Han Xiaozhen, et al. A review of research on pre-training models for natural language processing[J]. Computer Engineering and Applications, 2019, 56(23): 12-22., articleTitle=A review of research on pre-training models for natural language processing, refAbstract=null), Reference(id=1203787158390813438, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[13], rfOrder=18, authorNames=Devlin J, Chang M W, Lee K, journalName=arXiv preprint arXiv: 1810.04805, 2018, refType=null, unstructuredReference=Devlin J, Chang M W, Lee K, et al. Bert: pretraining of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805, 2018., articleTitle=Bert: pretraining of deep bidirectional transformers for language understanding, refAbstract=null), Reference(id=1203787158504059655, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[14], rfOrder=19, authorNames=Sun Y, Wang S, Li Y, journalName=arXiv preprint arXiv: 1904.09223, 2019, refType=null, unstructuredReference=Sun Y, Wang S, Li Y, et al. Ernie: enhanced representation through knowledge integration[J]. arXiv preprint arXiv: 1904.09223, 2019., articleTitle=Ernie: enhanced representation through knowledge integration, refAbstract=null), Reference(id=1203787158629888787, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2022, volume=22, issue=20, pageStart=8761, pageEnd=8770, url=null, language=null, rfNumber=[15], rfOrder=20, authorNames=杨秀璋, 郭明镇, 候红涛, journalName=科学技术与工程, refType=null, unstructuredReference=杨秀璋, 郭明镇, 候红涛, 等. 融合情感词典的改进BiLSTM-CNN+Attention情感分类算法[J]. 科学技术与工程, 2022, 22(20): 8761-8770., articleTitle=融合情感词典的改进BiLSTM-CNN+Attention情感分类算法, refAbstract=null), Reference(id=1203787158755717921, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2019, volume=22, issue=20, pageStart=8671, pageEnd=8770, url=null, language=null, rfNumber=[15], rfOrder=21, authorNames=Yang Xiuzhang, Guo Mingzhen, Hou Hongtao, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Yang Xiuzhang, Guo Mingzhen, Hou Hongtao, et al. Improved BiLSTM-CNN+Attention emotion classification algorithm based onIntegrated emotion dictionary[J]. Science Technology and Engineering, 2019, 22(20): 8671-8770., articleTitle=Improved BiLSTM-CNN+Attention emotion classification algorithm based onIntegrated emotion dictionary, refAbstract=null), Reference(id=1203787158948655917, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2021, volume=16, issue=2, pageStart=202, pageEnd=209, url=null, language=null, rfNumber=[16], rfOrder=22, authorNames=翟学明, 魏巍, journalName=智能系统学报, refType=null, unstructuredReference=翟学明, 魏巍. 混合神经网络和条件随机场相结合的文本情感分析[J]. 智能系统学报, 2021, 16(2): 202-209., articleTitle=混合神经网络和条件随机场相结合的文本情感分析, refAbstract=null), Reference(id=1203787159187731269, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=201, issue=2, pageStart=202, pageEnd=209, url=null, language=null, rfNumber=[16], rfOrder=23, authorNames=Zhai Xueming, Wei Wei, journalName=Journal of Intelligent Systems, refType=null, unstructuredReference=Zhai Xueming, Wei Wei. Text sentiment analysis by combining hybrid neural networks and conditional random fields[J]. Journal of Intelligent Systems, 201, 16(2): 202-209., articleTitle=Text sentiment analysis by combining hybrid neural networks and conditional random fields, refAbstract=null), Reference(id=1203787159296783181, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2021, volume=35, issue=11, pageStart=70, pageEnd=79, url=null, language=null, rfNumber=[17], rfOrder=24, authorNames=陆晓蕾, 倪斌, journalName=中文信息学报, refType=null, unstructuredReference=陆晓蕾, 倪斌. 基于预训练语言模型的BERT-CNN多层级专利分类研究[J]. 中文信息学报, 2021, 35(11): 70-79., articleTitle=基于预训练语言模型的BERT-CNN多层级专利分类研究, refAbstract=null), Reference(id=1203787159405835094, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2019, volume=35, issue=11, pageStart=70, pageEnd=79, url=null, language=null, rfNumber=[17], rfOrder=25, authorNames=Lu Xiaolei, Ni Bin, journalName=Journal of Chinese Information Technology, refType=null, unstructuredReference=Lu Xiaolei, Ni Bin. Research on BERT-CNN multi-level patent classification based onpre-trained language model[J]. Journal of Chinese Information Technology, 2019, 35(11): 70-79., articleTitle=Research on BERT-CNN multi-level patent classification based onpre-trained language model, refAbstract=null), Reference(id=1203787159556830050, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=1991, volume=3, issue=1, pageStart=79, pageEnd=87, url=null, language=null, rfNumber=[18], rfOrder=26, authorNames=Jacobs R A, Jordan M I, Nowlan S J, journalName=Neural Computation, refType=null, unstructuredReference=Jacobs R A, Jordan M I, Nowlan S J, et al. Adaptive mixtures of local experts[J]. Neural Computation, 1991, 3(1): 79-87., articleTitle=Adaptive mixtures of local experts, refAbstract=null), Reference(id=1203787159670076270, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[19], rfOrder=27, authorNames=Shazeer N, Mirhoseini A, Maziarz K, journalName=arXiv preprint arXiv: 1701.06538, 2017, refType=null, unstructuredReference=Shazeer N, Mirhoseini A, Maziarz K, et al. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer[J]. arXiv preprint arXiv: 1701.06538, 2017., articleTitle=Outrageously large neural networks: the sparsely-gated mixture-of-experts layer, refAbstract=null), Reference(id=1203787159804294011, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[20], rfOrder=28, authorNames=Lepikhin D, Lee H J, Xu Y, journalName=arXiv preprint arXiv: 2006, refType=null, unstructuredReference=Lepikhin D, Lee H J, Xu Y, et al. Gshard: scaling giant models with conditional computation and automatic sharding[J]. arXiv preprint arXiv: 2006.16668, 2020., articleTitle=Gshard: scaling giant models with conditional computation and automatic sharding, refAbstract=null), Reference(id=1203787159951094661, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2022, volume=23, issue=1, pageStart=5232, pageEnd=5270, url=null, language=null, rfNumber=[21], rfOrder=29, authorNames=Fedus W, Zoph B, Shazeer N, journalName=The Journal of Machine Learning Research, refType=null, unstructuredReference=Fedus W, Zoph B, Shazeer N. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity[J]. The Journal of Machine Learning Research, 2022, 23(1): 5232-5270., articleTitle=Switch transformers: scaling to trillion parameter models with simple and efficient sparsity, refAbstract=null), Reference(id=1203787160081118094, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[22], rfOrder=30, authorNames=Du N, Huang Y, Dai A M, journalName=arXiv preprint arXiv: 2112, refType=null, unstructuredReference=Du N, Huang Y, Dai A M, et al. Glam: efficient scaling of language models with mixture-of-experts[J]. arXiv preprint arXiv: 2112.06905, 2021., articleTitle=Glam: efficient scaling of language models with mixture-of-experts, refAbstract=null), Reference(id=1203787160227918744, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2020, volume=33, issue=null, pageStart=1877, pageEnd=1901, url=null, language=null, rfNumber=[23], rfOrder=31, authorNames=Brown T, Mann B, Ryder N, journalName=Advances in Neural Information Processing Systems, refType=null, unstructuredReference=Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in Neural Information Processing Systems, 2020, 33: 1877-1901., articleTitle=Language models are few-shot learners, refAbstract=null), Reference(id=1203787160408273823, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[24], rfOrder=32, authorNames=Xue F, Shi Z, Wei F, journalName=arXiv preprint arXiv: 2107, refType=null, unstructuredReference=Xue F, Shi Z, Wei F, et al. Go wider instead of deeper[J]. arXiv preprint arXiv: 2107.11817, 2021., articleTitle=Go wider instead of deeper, refAbstract=null), Reference(id=1203787160584434599, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[25], rfOrder=33, authorNames=Zuo S, Zhang Q, Liang C, journalName=arXiv preprint arXiv: 2204, refType=null, unstructuredReference=Zuo S, Zhang Q, Liang C, et al. Moebert: from bert to mixture-of-experts via importance-guided adaptation[J]. arXiv preprint arXiv: 2204.07675, 2022., articleTitle=Moebert: from bert to mixture-of-experts via importance-guided adaptation, refAbstract=null), Reference(id=1203787160701875122, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2015, volume=14, issue=7, pageStart=38, pageEnd=39, url=null, language=null, rfNumber=[26], rfOrder=34, authorNames=Hinton G, Vinyals O, Dean J, journalName=Computer Science, refType=null, unstructuredReference=Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. Computer Science, 2015, 14(7): 38-39., articleTitle=Distilling the knowledge in a neural network, refAbstract=null), Reference(id=1203787160819315644, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[27], rfOrder=35, authorNames=Jang E, Gu S, Poole B, journalName=arXiv preprint arXiv: 1611.01144, 2016, refType=null, unstructuredReference=Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax[J]. arXiv preprint arXiv: 1611.01144, 2016., articleTitle=Categorical reparameterization with gumbel-softmax, refAbstract=null), Reference(id=1203787160974504902, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, doi=null, pmid=null, pmcid=null, year=2019, volume=29, issue=null, pageStart=1061, pageEnd=1073, url=null, language=null, rfNumber=[28], rfOrder=36, authorNames=Zhang P, Xue J, Lan C, journalName=IEEE Transactions on Image Processing, refType=null, unstructuredReference=Zhang P, Xue J, Lan C, et al. EleAtt-RNN: adding attentiveness to neurons in recurrent neural networks[J]. IEEE Transactions on Image Processing, 2019, 29: 1061-1073., articleTitle=EleAtt-RNN: adding attentiveness to neurons in recurrent neural networks, refAbstract=null)], funds=null, companyList=[AuthorCompany(id=1203787146780980076, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, xref=1, ext=[AuthorCompanyExt(id=1203787146785174381, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146780980076, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 School of Control and Computer Engineering, North China Electric Power University, Baoding 071003,China), AuthorCompanyExt(id=1203787146793562990, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146780980076, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1 华北电力大学控制与计算机工程学院, 保定 071003)]), AuthorCompany(id=1203787146944557945, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, xref=2, ext=[AuthorCompanyExt(id=1203787146952946554, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146944557945, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2 Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding 071003, China), AuthorCompanyExt(id=1203787146957140859, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, companyId=1203787146944557945, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2 河北省能源电力知识计算重点实验室, 保定 071003)])], figs=[ArticleFig(id=1203787151256301749, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Fig.1, caption=The model framework of this paper, figureFileSmall=33YEGUqWMq3Thdpbf4v27A==, figureFileBig=78b1tdEretqAiXAgMdIYWQ==, tableContent=null), ArticleFig(id=1203787151386325190, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=图1, caption=本文的模型框架, figureFileSmall=33YEGUqWMq3Thdpbf4v27A==, figureFileBig=78b1tdEretqAiXAgMdIYWQ==, tableContent=null), ArticleFig(id=1203787151575068894, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Fig.2, caption=Improved encoder layer for ERNIE, figureFileSmall=cbTH6X5eHad3vo0qehleXw==, figureFileBig=21BVjSmVUTDaJ+2TKdetlQ==, tableContent=null), ArticleFig(id=1203787151663149290, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=图2, caption=改进ERNIE的encoder层

G为门控单元

, figureFileSmall=cbTH6X5eHad3vo0qehleXw==, figureFileBig=21BVjSmVUTDaJ+2TKdetlQ==, tableContent=null), ArticleFig(id=1203787151747035381, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Fig.3, caption=Sparsely-gated mixture-of-expert layer, figureFileSmall=IJlUhcifRUL4hlH0vZV5yA==, figureFileBig=qAy1NLy5T1z8+c7c3/LcSA==, tableContent=null), ArticleFig(id=1203787151877058825, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=图3, caption=稀疏门控混合专家系统层

Ei为第i个词向量的加权,i=1,2,…,n

, figureFileSmall=IJlUhcifRUL4hlH0vZV5yA==, figureFileBig=qAy1NLy5T1z8+c7c3/LcSA==, tableContent=null), ArticleFig(id=1203787152007082259, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Fig.4, caption=RCNN structure, figureFileSmall=3FI8T1emTmK4eHFbHSjYcg==, figureFileBig=SAP8E+Yep7/DMKCAl5ZlrA==, tableContent=null), ArticleFig(id=1203787152162271529, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=图4, caption=RCNN结构, figureFileSmall=3FI8T1emTmK4eHFbHSjYcg==, figureFileBig=SAP8E+Yep7/DMKCAl5ZlrA==, tableContent=null), ArticleFig(id=1203787152262934838, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Fig.5, caption=EleAtt-RNN structure, figureFileSmall=++aoV32ZtHFN0xop5qYfMA==, figureFileBig=5ny3xDIzatlshnaNNZZBOQ==, tableContent=null), ArticleFig(id=1203787152376181055, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=图5, caption=EleAtt-RNN结构, figureFileSmall=++aoV32ZtHFN0xop5qYfMA==, figureFileBig=5ny3xDIzatlshnaNNZZBOQ==, tableContent=null), ArticleFig(id=1203787152518787406, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Table 1, caption=

Data sample

, figureFileSmall=null, figureFileBig=null, tableContent=
输入 类别
皇马剥开胜利有个不得不说问题 走了的人是否会被怀念 7
某知名女星边工作边犒劳员工 率12人齐赴戛纳玩 9
常州一小区现楼晃晃 10栋居民楼晃动半年多 1
多地学校探索多元化评价体系 学生全面发展受重视 3
顺义香悦四季95~115 m2 3居新房源在售享98折 1
), ArticleFig(id=1203787153693192540, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=表1, caption=

数据示例

, figureFileSmall=null, figureFileBig=null, tableContent=
输入 类别
皇马剥开胜利有个不得不说问题 走了的人是否会被怀念 7
某知名女星边工作边犒劳员工 率12人齐赴戛纳玩 9
常州一小区现楼晃晃 10栋居民楼晃动半年多 1
多地学校探索多元化评价体系 学生全面发展受重视 3
顺义香悦四季95~115 m2 3居新房源在售享98折 1
), ArticleFig(id=1203787153789661540, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Table 2, caption=

Experimental configuration

, figureFileSmall=null, figureFileBig=null, tableContent=
环境 配置参数
处理器 Intel(R) Core (TM) i7-7700K CPU @4.20 GHz
显卡 NVIDIA GEFORCE RTX 3080Ti
内存 32 GB
框架 PyTorch 1.10
), ArticleFig(id=1203787153907102069, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=表2, caption=

实验配置

, figureFileSmall=null, figureFileBig=null, tableContent=
环境 配置参数
处理器 Intel(R) Core (TM) i7-7700K CPU @4.20 GHz
显卡 NVIDIA GEFORCE RTX 3080Ti
内存 32 GB
框架 PyTorch 1.10
), ArticleFig(id=1203787154016153984, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Table 3, caption=

Comprehensive experimental results

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 精确率/% 召回率/% F1/%
TextCNN 81.83 80.71 81.27
BiGRU 79.76 79.02 79.39
RCNN 83.77 82.15 82.95
BERT 90.68 90.55 90.61
ERNIE 91.04 90.99 91.01
ERNIE+CNN 92.32 92.28 92.29
ERNIE+BiGRU 92.56 92.51 92.53
ERNIE+RCNN 93.79 93.66 93.72
本文方法 94.26 94.21 94.23
), ArticleFig(id=1203787154234257805, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=表3, caption=

综合实验结果

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 精确率/% 召回率/% F1/%
TextCNN 81.83 80.71 81.27
BiGRU 79.76 79.02 79.39
RCNN 83.77 82.15 82.95
BERT 90.68 90.55 90.61
ERNIE 91.04 90.99 91.01
ERNIE+CNN 92.32 92.28 92.29
ERNIE+BiGRU 92.56 92.51 92.53
ERNIE+RCNN 93.79 93.66 93.72
本文方法 94.26 94.21 94.23
), ArticleFig(id=1203787154330726809, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Table 4, caption=

The experts compared the experimental results

, figureFileSmall=null, figureFileBig=null, tableContent=
专家数 精确率/% 召回率/% F1/% 参数量/106
4 90.84 90.85 90.84 26.64
6 91.21 91.19 91.20 27.83
8 91.37 91.32 91.34 29.01
10 91.51 91.45 91.48 30.19
12 91.38 91.36 91.37 31.38
), ArticleFig(id=1203787154443973027, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=表4, caption=

专家数对比实验结果

, figureFileSmall=null, figureFileBig=null, tableContent=
专家数 精确率/% 召回率/% F1/% 参数量/106
4 90.84 90.85 90.84 26.64
6 91.21 91.19 91.20 27.83
8 91.37 91.32 91.34 29.01
10 91.51 91.45 91.48 30.19
12 91.38 91.36 91.37 31.38
), ArticleFig(id=1203787154578190776, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Table 5, caption=

Gating function comparison experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
门控函数 精确率/% 召回率/% F1/%
Softmax 90.42 90.38 90.40
Gumbel_Softmax 91.51 91.49 91.48
), ArticleFig(id=1203787154762740168, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=表5, caption=

门控函数对比实验

, figureFileSmall=null, figureFileBig=null, tableContent=
门控函数 精确率/% 召回率/% F1/%
Softmax 90.42 90.38 90.40
Gumbel_Softmax 91.51 91.49 91.48
), ArticleFig(id=1203787154896957909, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Table 6, caption=

Classification error sample

, figureFileSmall=null, figureFileBig=null, tableContent=
输入 类别 预测
体育赛事与明星演唱会联动 为观众带来双重激情盛宴 7 9
广东高考满分作文17篇 一道语文题13万人吃鸭蛋 3 5
社会助力教育公平 偏远地区儿童获新知 4 3
糖价长期高位运行 果葡糖等替代品需求强劲 0 4
), ArticleFig(id=1203787155001815524, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=表6, caption=

分类错误样本

, figureFileSmall=null, figureFileBig=null, tableContent=
输入 类别 预测
体育赛事与明星演唱会联动 为观众带来双重激情盛宴 7 9
广东高考满分作文17篇 一道语文题13万人吃鸭蛋 3 5
社会助力教育公平 偏远地区儿童获新知 4 3
糖价长期高位运行 果葡糖等替代品需求强劲 0 4
), ArticleFig(id=1203787155161199092, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=EN, label=Table 7, caption=

Ablation experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 精确率/% 召回率/% F1/% 参数量/106
ERNIE-RCNN 93.79 93.66 93.72 101.45
ERNIE-Att-RCNN 94.01 93.99 93.99 103.02
改进ERNIE-RCNN 93.96 93.98 93.97 31.77
本文模型 94.26 94.21 94.23 33.34
), ArticleFig(id=1203787155261862403, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1203753459473428945, language=CN, label=表7, caption=

消融实验

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 精确率/% 召回率/% F1/% 参数量/106
ERNIE-RCNN 93.79 93.66 93.72 101.45
ERNIE-Att-RCNN 94.01 93.99 93.99 103.02
改进ERNIE-RCNN 93.96 93.98 93.97 31.77
本文模型 94.26 94.21 94.23 33.34
)], attaches=null, journal=Journal(id=1146119176004939786, delFlag=0, nameCn=科学技术与工程, nameEn=Science Technology and Engineering, nameHistory1=null, nameHistory2=null, issn=1671-1815, eissn=, cn=11-4688/T, coden=null, periodic=4, language=CN, oaType=是, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=UKU/O7GSka5polgCTkbIIw==, journalPrice=null, startedYear=null, abbrevIsoEn=Sci Technol Eng, journalRemark=null, publicationField=null, createdTime=null, updatedTime=1754445529766, createdBy=null, updatedBy=13701087609, firstLetterCn=S, firstLetterEn=S, subjectCode=Natural Sciences, subjectName=自然科学, subjectCodeEn=Natural Sciences, subjectNameEn=null, picCn=UKU/O7GSka5polgCTkbIIw==, picEn=5hwlULoNwcbj3xUmVi9MAQ==, jcr=null, cjcr=null, exts=[JournalExt(id=1159791870395564357, language=CN, name=科学技术与工程, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.stae.com.cn/jsygc/home, createdTime=1754445529793, updatedTime=1754445529793, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=http://www.stae.com.cn/jsygc/site/menus/20090429150146001, submissionAuthorUrl=http://www.stae.com.cn/jsygc/author/login, submissionEditorUrl=http://www.stae.com.cn/jsygc/editor/login, submissionReviewUrl=http://www.stae.com.cn/jsygc/reviewer/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""}), JournalExt(id=1159791870441701702, language=EN, name=Science Technology and Engineering, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.stae.com.cn/jsygc/home, createdTime=1754445529804, updatedTime=1754445529804, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=http://www.stae.com.cn/jsygc/author/login, submissionEditorUrl=http://www.stae.com.cn/jsygc/editor/login, submissionReviewUrl=http://www.stae.com.cn/jsygc/reviewer/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1146123166801305609, websiteList=[Website(id=1148243202391400884, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123166801305609, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kxjsygc/CN, language=CN, createTime=1751692112777, createBy=18614031015, updateTime=1753520965431, updateBy=18614031015, name=科学技术与工程-中文站点, tplId=1146099689490845704, title=科学技术与工程, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1148622798802673703, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=articleTextType, value=kx, createTime=1751782615614, updateTime=1751782615614, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798781702180, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=banner, value=null, createTime=1751782615609, updateTime=1751782615609, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798769119267, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=j86gbwi+p0Idkyl5SzIlmQ==, createTime=1751782615606, updateTime=1751782615606, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798794285094, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1751782615612, updateTime=1751782615612, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798790090789, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1751782615611, updateTime=1751782615611, creator=18614031015, updator=18614031015)]), Website(id=1155914124811976731, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123166801305609, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kxjsygc/EN, language=EN, createTime=1753521003206, createBy=18614031015, updateTime=1753521003206, updateBy=18614031015, name=科学技术与工程-英文站点, tplId=1146101810881728533, title=Science Technology and Engineering, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1155914371227308235, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=articleTextType, value=kx, createTime=1753521061952, updateTime=1753521061952, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371210531016, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=banner, value=null, createTime=1753521061947, updateTime=1753521061947, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371202142407, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=j86gbwi+p0Idkyl5SzIlmQ==, createTime=1753521061945, updateTime=1753521061945, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371223113930, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1753521061950, updateTime=1753521061950, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371218919625, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1753521061949, updateTime=1753521061949, creator=18614031015, updator=18614031015)])], journalTitle=科学技术与工程, weixinUrl=null, journalUrl=null, iacademicId=null, status=0, seqNo=null, journalTitleEn=Science Technology and Engineering, journalPhotoCn=UKU/O7GSka5polgCTkbIIw==, journalPhotoEn=5hwlULoNwcbj3xUmVi9MAQ==, journalFirstLetter=S, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=null, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/kxjsygc/CN/10.12404/j.issn.1671-1815.2307782, detailUrlEn=https://castjournals.cast.org.cn/joweb/kxjsygc/EN/10.12404/j.issn.1671-1815.2307782, pdfUrlCn=https://castjournals.cast.org.cn/joweb/kxjsygc/CN/PDF/10.12404/j.issn.1671-1815.2307782, pdfUrlEn=https://castjournals.cast.org.cn/joweb/kxjsygc/EN/PDF/10.12404/j.issn.1671-1815.2307782, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
基于轻量化改进ERNIE-RCNN的中文新闻标题分类
收藏切换
PDF下载
李莉 1, 2 , 张之欣 1, * , 王小龙 1
科学技术与工程 | 论文·自动化技术、计算机技术 2025,25(2): 649-656
收起
收藏切换
科学技术与工程 | 论文·自动化技术、计算机技术 2025, 25(2): 649-656
基于轻量化改进ERNIE-RCNN的中文新闻标题分类
全屏
李莉1, 2 , 张之欣1, * , 王小龙1
作者信息
  • 1 华北电力大学控制与计算机工程学院, 保定 071003
  • 2 河北省能源电力知识计算重点实验室, 保定 071003
  • 李莉(1980—),女,汉族,重庆人,博士,副教授。研究方向:大数据分析、深度学习。E-mail:

通讯作者:

* 张之欣(1998—),男,汉族,河南新乡人,硕士研究生。研究方向:自然语言处理。E-mail:
Chinese News Title Classification Based on Lightweight Improved ERNIE-RCNN
Li LI1, 2 , Zhi-xin ZHANG1, * , Xiao-long WANG1
Affiliations
  • 1 School of Control and Computer Engineering, North China Electric Power University, Baoding 071003,China
  • 2 Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding 071003, China
出版时间: 2025-01-18 doi: 10.12404/j.issn.1671-1815.2307782
文章导航
收藏切换

针对大型预训练语言模型在处理新闻标题时,面临参数规模庞大、无法高效利用上下文语意特征以及循环卷积神经网络对初始输入元素重要性忽视的问题,提出了一种融合混合专家模型(mixture-of-expert,MoE)的ERNIE与注意力机制的循环卷积神经网络(recurrent convolutional neural networks,RCNN)的新闻标题分类方法。首先,借助MoE改进ERNIE技术进行文本编码,随后利用注意力RCNN在保留文本词序和特征的基础上进行分类。为提高分类能力,通过计算输入的融合上下文权重对RCNN进行改进。在计算MoE中各个专家权重的过程中,选择Gumbel_Softmax作为新型的门控函数以改进传统的Softmax函数,从而更好地控制平滑程度。根据实验结果,发现相较于传统的分类方法,本文研究提出的分类方法展现出显著优势,极大地减少了参数数量。在此基础上,F1相较于传统模型提升了0.51%。经过消融实验的验证,该分类方法在分类任务上的可行性得到了证实。

混合专家系统  /  知识增强语义表示模型  /  注意力机制  /  循环卷积神经网络  /  文本分类

Aiming at the problems that the large-scale pre-training language model faces when dealing with news headlines, such as huge parameters, inefficient use of contextual semantic features and circular convolution neural network’s neglect of the importance of initial input elements, a news headline classification method that combines ERNIE(enhanced representation through knowledge integration) of mixture-of-expert model and recurrent convolution neural network with attention mechanism were proposed. Firstly, the text was encoded with the help of MoE’s improved ERNIE technology, and then the text was classified with attention RCNN (recurrent convolutional neural networks)on the basis of preserving the word order and characteristics of the text. In order to improve the classification ability, RCNN was improved by calculating the input fusion context weight. In the process of calculating the weights of experts in MoE, Gumbel-Softmax was selected as a new gating function to improve the traditional Softmax function, so as to better control the smoothness. According to the experimental results, it is found that compared with the traditional classification methods, the classification method proposed in this study shows significant advantages and greatly reduces the number of parameters. On this basis, the F1 value is increased by 0.51% compared with the traditional model. After the ablation experiment, the feasibility of this classification method in the classification task has been confirmed.

MoE (mixture of experts)  /  ERNIE (enhanced representation through knowledge integration)  /  attention mechanism  /  RCNN (recurrent convolutional neural network)  /  text classification
李莉, 张之欣, 王小龙. 基于轻量化改进ERNIE-RCNN的中文新闻标题分类. 科学技术与工程, 2025 , 25 (2) : 649 -656 . DOI: 10.12404/j.issn.1671-1815.2307782
Li LI, Zhi-xin ZHANG, Xiao-long WANG. Chinese News Title Classification Based on Lightweight Improved ERNIE-RCNN[J]. Science Technology and Engineering, 2025 , 25 (2) : 649 -656 . DOI: 10.12404/j.issn.1671-1815.2307782
在自然语言处理领域,文本分类是一项至关重要的任务,广泛应用于情感分析[1]、智能客服[2]、新闻推荐系统[3]和舆情分析[4]等信息挖掘领域。新闻文本分类是文本分类领域中的一个关键子任务,它具有广泛的实际应用价值。随着社交媒体的普及,新闻传播速度更快、传播范围更广。一旦发生突发事件并在网络中传播,舆情扩散速度极快。如果传播中的突发事件属于负面新闻,就会造成巨大的网络舆论,监管机构难以有效控制新闻舆论,也不利于社会稳定。网络舆情治理需要提前识别突发事件,而突发事件主要是以新闻文本为载体在互联网中传播。因此,新闻标题分类在网络舆情前期的监督管理工作中尤为重要,迫切需要研究新闻标题分类技术。
ERNIE(enhanced representation through knowledge integration)通过整合知识图谱和语义网络等语义信息,增强了文本的语义表示能力,从而更好地捕捉文本中的语义信息。然而,这也导致了模型在计算资源需求上的增加。为了解决这个问题,研究者们对模型进行了轻量化的改进,以减少硬件资源的需求,从而可以降低部署、运营和维护成本[5]。同时,轻量化的模型能够更好地运行在移动端设备上,快速且准确的分类模型有助于提高用户体验,随时随地满足用户获取感兴趣类别的新闻需求。
由于预训练语言模型ERNIE的参数量庞大且在提取上下文语义特征方面效率不高,导致将模型部署到移动设备上时面对着庞大的参数计算量问题。为了解决这个问题,现提出一种基于轻量化改进ERNIE-RCNN的中文新闻标题分类方法,旨在进一步提高中文新闻标题分类效率。通过局部替换ERNIE模型中的encoder的全连接层为并行处理的混合专家系统层,在保证不失精度的情况下,大幅降低计算资源需求,并且可以高效地进行词嵌入操作。注意力机制通过为每个单词或字符赋予不同的权重,RCNN(recurrent convolutional neural network)具有捕捉文本序列信息和空间特征的能力,注意力机制下的RCNN模型可以高效地解决上下文语义的学习。
Kim[6]首先将卷积神经网络(convolutional neural network,CNN)应用于文本分类提出了TextCNN,该模型在文本分类方面表现出了出色的性能。接着,Liu等[7]通过将循环神经网络(recurrent neural network,RNN)引入文本分类中,设计出了能够有效捕捉更长的序列信息的TextRNN。然而,TextCNN在处理含有复杂上下文语境的文本数据方面表现不佳,且TextRNN在处理长序列数据时出现梯度爆炸问题。为了克服这些缺陷,TextRCNN[8]应运而生,它融合了TextRNN和TextCNN的优点,具备了更快的训练速度、更强大的上下文信息捕捉能力,以及适应稍显复杂的语义特性。
尽管如此,CNN和RNN依然面临训练时间过长的挑战。为解决这一问题,Facebook推出了FastText[9-10]模型,它在处理简单文本分类任务上表现得更为迅速和高效。然而,FastText模型在捕捉深层次语义关系方面尚有不足。针对这一问题,Johnson等[11]提出了DPCNN模型,解决了TextCNN在获取文本长距离依赖方面的不足,并通过不断加深网络结构以减轻其缺点。总之,众多文本分类算法各具优缺点,在实际应用中,需要根据具体情况进行权衡和调整。
OpenAI公司提出了生成式预训练模型,模型引入了一种新的自然语言处理范式,即预训练与微调相结合的方式。通过预训练,模型可以直接根据下游任务的需求进行微调,避免了从头开始训练的繁琐过程[12]。这一新范式的出现对自然语言处理领域产生了显著影响,为其发展带来了巨大推进。BERT模型是由Devlin等[13]提出的一种基于深层Transformer的预训练语言模型,该模型不仅可以充分利用大规模无标注文本的语义信息,而且还可以加深自然语言处理中各个任务所使用的模型深度。
ERNIE[14]模型在BERT模型的基础上进行了优化,能够更准确地理解句子中的实体关系,从而更准确地提取语义信息。相比BERT,ERNIE的掩码机制有所不同。它不仅可以对单个字符进行屏蔽,还可以对整个实体进行屏蔽。通过预测被屏蔽的实体来训练模型,进而能够更好地捕捉实体之间的联系。因此,在自然语言处理任务中,ERNIE模型表现出更高的性能、准确性和可靠性,更适用于中文新闻标题分类任务。
基于词向量的模型、基于上下文机制的模型、基于注意力机制的模型和基于语言的模型都有各自的优缺点,基于此很多学者将各种模型组合在一起进行文本分类研究。杨秀璋等[15]提出一种融合情感词典的改进BiLSTM-CNN+Attention模型的情感分类模型,用多通道注意力机制提取CNN和LSTM输出信息并进行融合,最后结合注意力机制对情感特征进行加成。翟学明等[16]提出一种混合神经网络和条件随机场相结合的文本情感分析,巧妙地运用CNN和BiGRU两种神经网络来捕获文本的深层语义信息和结构特征,最后采用条件随机场模型作为分类器从而能够准确地判断文本的情感类别。陆晓蕾等[17]选取国家信息中信公布的全国专利信息为实验数据,提出了一种基于预训练语言模型的BERT-CNN多层级专利分类模型,并探讨了全局与局部策略在专利多层文本分类上的差异。
混合专家系统[18]是一种新型的监督学习方法,该方法将多层网络进行模块化转换,使用门控网络来决定每个数据应该由哪个模型进行训练。随着深度学习技术的不断进步,计算成本的制约限制了模型规模的进一步扩大。为了解决这一问题,Shazeer等[19]提出了一种基于稀疏门控的混合专家系统,并将其应用于RNN结构中。该方法在确保高效计算的同时,将模型规模提升了1 000多倍。
随后,Lepikhin等[20]将混合专家系统的思想拓展到Transformer模型上,并表现出不错的效果。Fedus等[21]提出了一个高效的预训练大模型Switch Transformer,主要亮点在于简化了混合专家系统的路由算法,从而显著提高了计算效率。Google在2021年推出了一个超大型模型GLaM[22],其规模比GPT-3[23]大3倍,但由于采用了稀疏门控的混合专家系统设计,其训练成本仅为GPT-3的1/3。此外,GLaM在29个NLP任务上超越了GPT-3。Xue等[24]提出了一种名为WideNet的结构,旨在解决在压缩模型参数量的情况下如何获得更好效果的问题。该方法首先通过层之间的参数共享来压缩模型大小,然后采用混合专家系统的设计来扩大模型容量。Zuo等[25]将混合专家系统和知识蒸馏[26]相结合,旨在提高推理速度的同时提高模型效果。
针对ERNIE模型具有较大的参数量计算问题、无法高效提取上下文语义特征以及RCNN中的输入向量中的元素具有不同的重要性级别被忽略的问题,本文模型在ERNIE的编码器层引入SGMoE(sparsely-gated mixture-of-expert),并在RCNN中引入注意力机制,旨在减少模型参数量的同时准确提取特征并高效利用所提取到的特征。
本文模型专为中文文本分类任务设计,其整体架构如图1所示,共包含4个主要组成部分:①利用token进行划分句子,得到一个个分词W1,W2,…,Wn作为模型的输入层;②采用改进的ERNIE对输入的中文分词进行预训练,获取含有上下文语义的词向量x1,x2,…,xn;③将词向量输入注意力RCNN网络中,结合上下文语境进行权重分配,得到最终的全局语义信息;④经过Softmax激活层处理后,得到最终的输出结果。
针对ERNIE模型处理任务存在海量参数,且通过现有方法进行知识蒸馏来训练小的压缩模型的性能显著下降。采用混合专家结构来增加模型容量和推理速度,通过将预训练模型中的前馈神经网络提供给多个专家网络进行适配,这样预训练模型的表示能力在很大程度上得以保留。
受到Lepikhin等[20]提出的Gshard模型的启发,改进后的ERNIE的encoder如图2所示,将每隔一个encoder的FFN(feed forward networks)层,替换成SGMoE层,将计算稀疏门控值的函数由Softmax更换为Gumbel_Softmax。在推理过程中,自适应地从众多专家网络中选择合适的专家网络以达到负载均衡,从而可以提高效率。
混合专家系统通过集成多个基础模型,旨在提高分类精度。由于不同数据来源的分布存在一定差异,单一模型通常只能处理部分数据,而在其他数据方面表现不佳。针对这一问题,采用多个专家模型处理来自不同来源的数据。每个专家网络在数据分类方面都有其专精的领域,在这些区域中的分类结果优于其他专家网络。通过门控网络进行筛选,决定将输入分配给哪一个专家网络进行处理。结构如图3所示。
对于在当前位置的输入x,输出就是所有专家的加权和,即
y = i = 1 n G ( x i ) E ( x i )
式(1)中:xi为第i个分词的词向量;G(xi)和E(xi)分别为门控网络的输出和第i个专家的输出;yi为了经过加权之后的词向量。
其中门控单元G为Softmax门控,即对输入x映射到n维后,使用Softmax来获取门控值。即
G (x) = S o f t m a x ( x * W g )
式(2)中:Wgn维权重矩阵。
考虑到不同专家之间的差异性以及负载均衡问题,通过TopK采样的方式实现稀疏性和将门控机制引入噪声的方式实现负载均衡问题。TopK采样方法无法进行梯度计算,因此无法更新网络。采用Gumbel_Softmax[27]函数代替普通的Softmax,其优点包括可以近似TopK采样的方式、提供采样所需的随机性以及不破坏计算的梯度传播。计算公式为
G (x) = G u m b e l _ S o f t m a x ( x * W g )
y i = e x p ( l g π i + g i ) τ j = 1 k e x p ( l g π j + g j ) τ
π i = x i * W g i
式(4)中: g iτ为在Softmax的基础上引入的两个额外变量;gi为增加模型的灵活性而引入的服从Gumbel分布的噪声; τ为温度系数,是一个控制平滑程度的系数;πi为第i个类别的概率; W g i为权重矩阵。
与传统的基于窗口的神经网络相比较,RCNN能够改善文本窗口大小不足的缺陷,在文本分类任务上展现出优越的分类性能。因此,针对新闻文本分类的特点,加入RCNN模型作为深度特征提取模块。RCNN结构如图4所示。
考虑到参数量计算,利用BiGRU提取文本的上下文信息,并将BiGRU获得的隐层输出与词向量拼接,组合为新的词表示,即
c l ( w i ) = f [ W l c l ( w i - 1 ) + W s l e ( w i - 1 ) ]
c r ( w i ) = f [ W r c r ( w i + 1 ) + W s r e ( w i + 1 ) ]
式中: c l ( w i ) c r ( w i )分别为词wi的前向上文表示和后向下文表示;wi为输入的第i个词; e ( w i - 1 )为单词wi-1的词向量; c l ( w i - 1 )为当前计算词的上一个词的表示形式;Wl为隐含层的转移矩阵;Wsl为另一个矩阵,用于将当前词的语义与下一个单词的前向上文表示相结合;f为一个非线性的激活函数。
由式(6)和式(7)可以计算出每个词的前文表示与后文表示。随后,通过式(8)定义出每个词在神经网络中的表示,即
x i = [ c l ( w i ) , e ( w i ) , c r ( w i ) ]
式(8)中:xi为将词wi的前文表示、词向量、后文表示拼接得到的结果,再对该结果使用一次Sigmoid激活函数,得到的句子表示经过最大池化层,得到特征向量并送入分类器进行分类。
在循环神经网络中,输入向量的元素具有不同的重要性,然而这一特点往往被低估。为了应对这个问题,Zhang等[28]提出了一种简单且有效的EleAtt-RNN结构,使得循环神经网络(RNN)神经元能够具备注意力机制,如图5所示。因此,该模型在处理输入元素时,更加注重权重分配,从而提高了整体性能。
结构单元的计算公式为
a t = S i g m o i d ( w x a X t + w h a h t - 1 + b a )
x ~ t = a t X t
式中:Xt为第t个词的词向量;ht-1为前t-1个词的前向语境对应的词向量;wxawha为两个随机权重矩阵;ba为偏置;at为第t个词对上文的贡献权重; x ~ t为结合上文加权之后的词向量。
在中文文本中,特征分布往往不均匀,不同的字词对上下文环境的贡献程度存在较大差异。为了解决这一问题,在循环神经网络的基础上引入了一种元素级注意力门控EleAttG。该方法通过逐个字元素地自适应强化重要信息的贡献并抑制不重要信息的影响,从而提高了模型的语义理解能力。
选用清华大学THUCNews新闻语料库中的一部分数据集,该数据集包含了共计10×104条数据。为了训练、验证和测试模型,从这10×104条数据中随机抽取了8×104条作为训练集,1×104条作为验证集,以及剩余的1×104条作为测试集。每条数据的平均长度为24个字符。这个数据集涵盖了10个类别:财经、房产、股票、教育、科技、社会、时政、体育、游戏和娱乐。在后续的实验中,将针对这些类别进行分类任务。具体数据示例如表1所示。
本研究网络模型的实验配置如表2所示。
使用精确率、召回率和F1作为评估模型性能的指标。精确率衡量了预测为正实例的样本中真正为正实例的比例,召回率则衡量了所有真实正实例中被正确分类的比例。而F1作为精确率和召回率的加权平均,综合考虑了两者的结果,提供了更全面的模型性能评估。
TP表示预测为正类且实际也为正类的样本数量,也被称为真正例;TN则表示预测为反类且实际也为反类的样本数量,被称为真反例。与之相对,FP表示预测为正类但实际为反类的样本数量,即假正例;而FN则表示预测为反类但实际为正类的样本数量,也被称为假反例。
精确率(precision)和召回率(recall)的计算公式为
p r e c i s i o n = T P T P + F P × 100 %
r e c a l l = T P T P + F N × 100 %
加权计算得到F1,即
F 1 = 2 p r e c i s i o n r e c a l l p r e c i s i o n + r e c a l l × 100 %
此外,模型的损失函数采用的是交叉熵损失函数,交叉熵表示为真实概率分布与预测概率分布之间的差异,并且交叉熵的值越小,说明模型分类的结果越好。其公式为
L = 1 N i = 1 n c = 1 M y i c l g p i c
式(14)中:L为模型的损失值;M为新闻类别的数量;yic为符号函数(0或1),即若样本i的真实类别等于c取1,否则取0;pic为观测样本i属于类别c的预测概率。
本文模型设计的批尺寸大小(batch_size)设置为64,训练迭代次数epoch设置为10。TextCNN中卷积核的尺寸大小分别取3、4、5,BiGRU和RCNN的隐藏层的数量为256,BERT和ERNIE隐藏层的数量为768,网络模型优化器使用的是Adam,设置学习率为0.001,设Gumbel_Softmax中温度系数 τ初始值为1并设置衰减率0.01来调整 τ值。若超过1 000个batch_size效果还没有提升就提前结束训练。
在实验中,RCNN、CNN与BiGRU均采用相同的Word2Vec模型进行词向量表示,所用RNN和RCNN网络都是采用的是双向门控单元循环神经网络BiGRU。通过对这些模型进行实验对比,综合分析了它们的优缺点,并在此基础上提出了本文方法。综合实验对比结果如表3所示。
表3可以看出来,RCNN、TextCNN与BiGRU在采用相同词向量表示方法下,得到的标题分类结果的F1达到了82.95%,相较于TextCNN和BiGRU分别提升了1.68%和3.56%。通过对比BERT和ERNIE模型,ERNIE模型的精确率提升了0.44%,说明在处理标题分类问题时ERNIE模型能够得到更为准确完整的词向量语义表示。最后,通过将本文模型分别与ERNIE-CNN、ERNIE-BiGRU和ERNIE-RCNN,相较于这3个模型在F1上分别提升了1.94%、1.7%和0.51%,从而可以看出本章所提模型能够更充分地捕捉上下文语境从而提高模型的分类精度。
为探究混合专家网络模型中专家数的重要性,将专家数设置为4、6、8、10、12,共计5组实验,以ERNIE-base为例,各组实验结果如表4所示。
根据表4中的结果分析,当专家数取10时,实验表现出最好的效果以及具有相对较小的参数量。
Gumbel_Softmax可以更好地进行采样,通过自适应学习温度系数控制平滑程度,为验证有效性通过对比Softmax进行实验,同时取专家数为10,实验结果如表5所示。从表5可以看出,替换门控函数之后,精确率值提升了1.09%,有着更好的分类效果。
通过分析预测错误的样本发现,预测错误的样本很多都是语境相比较复杂以及可以被标注为多类别的。其中部分分类错误的样本如表6所示。
为了验证模型中关键模块设计的合理性和有效性,同时确保模型在保持高精度的同时具有更低的参数计算量,本文进行了消融实验,相关结果如表7所示。
通过实验对比与分析,与原始的ERNIE-RCNN相比,本文模型在精确率上提升了0.47%且参数量约降为原来的1/3,能够在保持较高分类效率的同时显著降低参数量。改进后的Att-RCNN能够更加有效地让输入元素关注到其上下文语境,并在识别性能方面表现得更为优异。
随着新媒体平台的不断演进,新闻传播速度正日益加快,但这也可能带来潜在的舆情问题。为了迅速遏制不良舆论蔓延,高效的新闻文本分类变得尤为关键。然而,当前新闻标题特征稀疏、信息处理难度大,并且大规模模型受限于延时需求等多重挑战。因此,提出了一种融合ERNIE和改进RCNN的混合专家网络新闻标题文本分类模型。实验结果表明,模型在保持较低参数量计算的同时,实现了对短文本的高效分类。这得益于采用了基于ERNIE预训练语言模型的向量表示提取方法,确保在初步特征提取阶段保留丰富的语义信息;引入稀疏门控混合专家网络策略显著减少了模型参数量计算;并通过元素级注意力门控机制实现了字词与上下文的紧密结合。
参考文献 引证文献
排序方式:
[1]
诸林云, 范菁, 曲金帅, 等. 基于BERT与多通道卷积神经网络的细粒度情感分类[J]. 科学技术与工程, 2023, 23(33): 14264-14270.
Zhu Linyun, Fan Jing, Qu Jinshuai, et al. Fine-grained sentiment classification based on BERT and multi-channel convolutional neural networks[J]. Science Technology and Engineering, 2023, 23(33): 14264-14270.
[2]
俞学豪, 赵子岩, 马应龙, 等. 基于BR和GBDT的电力信息通信客服系统多标签文本分类[J]. 电力系统自动化, 2021, 45(11): 144-151.
Yu Xuehao, Zhao Ziyan, Ma Yinglong, et al. Multi-label text classification of power information communication customer service system based on BR and GBDT[J]. Automation of Electric Power Systems, 2015, 45(11): 144-151.
[3]
孟祥福, 霍红锦, 张霄雁, 等. 个性化新闻推荐方法研究综述[J]. 计算机科学与探索, 2023, 17(12): 2840-2860.
Meng Xiangfu, Huo Hongjin, Zhang Xiaoyan, et al. Research review on personalized news recommendation methods[J]. Exploration of Computer Science and Technology, 2023, 17(12): 2840-2860.
[4]
华玮, 吴思洋, 俞超, 等. 面向网络舆情事件的多层次情感分歧度分析方法[J]. 数据分析与知识发现, 2023, 7(4): 16-31.
Hua Wei, Wu Siyang, Yu Chao, et al. Multi-level emotion divergence analysis method for network public opinion events[J]. Data Analysis and Knowledge Discovery, 2023, 7(4): 16-31.
[5]
王军, 冯孙铖, 程勇. 深度学习的轻量化神经网络结构研究综述[J]. 计算机工程, 2021, 47(8): 1-13.
Wang Jun, Feng Suncheng, Cheng Yong. A review of lightweight neural network structures for deep learning[J]. Computer Engineering, 201, 47(8): 1-13.
[6]
Kim Y. Convolutional neural networks for sentence classification[C]// Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: EMNLP, 2014: 1746-1751.
[7]
Liu P, Qiu X, Huang X. Recurrentneural network for text classification with multi-task learning[J]. arXiv preprint arXiv: 1605.05101, 2016.
[8]
Lai S, Xu L, Liu K, et al. Recurrent convolutional neural networks for text classification[J]. AAAI Press, 2015.DOI: 10.1609/aaai.v29i1.9513.
[9]
Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[J]. arXiv preprint arXiv: 1607.01759, 2016.
[10]
Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146.
[11]
Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Vancouver: DPCNN, 2017: 562-570.
[12]
余同瑞, 金冉, 韩晓臻, 等. 自然语言处理预训练模型的研究综述[J]. 计算机工程与应用, 2020, 56(23): 12-22.
Yu Tongrui, Jin Ran, Han Xiaozhen, et al. A review of research on pre-training models for natural language processing[J]. Computer Engineering and Applications, 2019, 56(23): 12-22.
[13]
Devlin J, Chang M W, Lee K, et al. Bert: pretraining of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805, 2018.
[14]
Sun Y, Wang S, Li Y, et al. Ernie: enhanced representation through knowledge integration[J]. arXiv preprint arXiv: 1904.09223, 2019.
[15]
杨秀璋, 郭明镇, 候红涛, 等. 融合情感词典的改进BiLSTM-CNN+Attention情感分类算法[J]. 科学技术与工程, 2022, 22(20): 8761-8770.
Yang Xiuzhang, Guo Mingzhen, Hou Hongtao, et al. Improved BiLSTM-CNN+Attention emotion classification algorithm based onIntegrated emotion dictionary[J]. Science Technology and Engineering, 2019, 22(20): 8671-8770.
[16]
翟学明, 魏巍. 混合神经网络和条件随机场相结合的文本情感分析[J]. 智能系统学报, 2021, 16(2): 202-209.
Zhai Xueming, Wei Wei. Text sentiment analysis by combining hybrid neural networks and conditional random fields[J]. Journal of Intelligent Systems, 201, 16(2): 202-209.
[17]
陆晓蕾, 倪斌. 基于预训练语言模型的BERT-CNN多层级专利分类研究[J]. 中文信息学报, 2021, 35(11): 70-79.
Lu Xiaolei, Ni Bin. Research on BERT-CNN multi-level patent classification based onpre-trained language model[J]. Journal of Chinese Information Technology, 2019, 35(11): 70-79.
[18]
Jacobs R A, Jordan M I, Nowlan S J, et al. Adaptive mixtures of local experts[J]. Neural Computation, 1991, 3(1): 79-87.
[19]
Shazeer N, Mirhoseini A, Maziarz K, et al. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer[J]. arXiv preprint arXiv: 1701.06538, 2017.
[20]
Lepikhin D, Lee H J, Xu Y, et al. Gshard: scaling giant models with conditional computation and automatic sharding[J]. arXiv preprint arXiv: 2006.16668, 2020.
[21]
Fedus W, Zoph B, Shazeer N. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity[J]. The Journal of Machine Learning Research, 2022, 23(1): 5232-5270.
[22]
Du N, Huang Y, Dai A M, et al. Glam: efficient scaling of language models with mixture-of-experts[J]. arXiv preprint arXiv: 2112.06905, 2021.
[23]
Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in Neural Information Processing Systems, 2020, 33: 1877-1901.
[24]
Xue F, Shi Z, Wei F, et al. Go wider instead of deeper[J]. arXiv preprint arXiv: 2107.11817, 2021.
[25]
Zuo S, Zhang Q, Liang C, et al. Moebert: from bert to mixture-of-experts via importance-guided adaptation[J]. arXiv preprint arXiv: 2204.07675, 2022.
[26]
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. Computer Science, 2015, 14(7): 38-39.
[27]
Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax[J]. arXiv preprint arXiv: 1611.01144, 2016.
[28]
Zhang P, Xue J, Lan C, et al. EleAtt-RNN: adding attentiveness to neurons in recurrent neural networks[J]. IEEE Transactions on Image Processing, 2019, 29: 1061-1073.
2025年第25卷第2期
PDF下载
225
81
引用本文
BibTeX
文章信息
doi: 10.12404/j.issn.1671-1815.2307782
  • 接收时间:2023-10-07
  • 首发时间:2025-12-05
  • 出版时间:2025-01-18
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2023-10-07
  • 修回日期:2024-10-17
基金
作者信息
    1 华北电力大学控制与计算机工程学院, 保定 071003
    2 河北省能源电力知识计算重点实验室, 保定 071003

通讯作者:

* 张之欣(1998—),男,汉族,河南新乡人,硕士研究生。研究方向:自然语言处理。E-mail:
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/kxjsygc/CN/10.12404/j.issn.1671-1815.2307782
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏