收藏切换
New Word Discovery Method in the Energy Field Combining N-Gram and Multiple Attention Mechanism
收藏切换
PDF
Yi-han WANG1, Si-jia ZHANG1, 2, 3, *, Heng CAO4, Jia-ning LIU1, Zheng-long ZHANG1
Science Technology and Engineering | 2025, 25(18) : 7668 - 7677
Less
收藏切换
Science Technology and Engineering | 2025, 25(18): 7668-7677
Papers·Automation and Computational Technology
New Word Discovery Method in the Energy Field Combining N-Gram and Multiple Attention Mechanism
Full
Yi-han WANG1, Si-jia ZHANG1, 2, 3, *, Heng CAO4, Jia-ning LIU1, Zheng-long ZHANG1
Affiliations
  • 1 College of Information Engineering, Dalian Ocean University/Liaoning Key Laboratory of Marine Information Technology, Dalian 116023, China
  • 2 Key Laboratory of Environment Controlled Aquaculture (Dalian Ocean University), Ministry of Education, Dalian 116023, China
  • 3 Dalian Key Laboratory of Smart Fisheries, Dalian 116023, China
  • 4 Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
Published: 2025-06-28 doi: 10.12404/j.issn.1671-1815.2404063
Outline
收藏切换

With the rapid development of the energy industry and technological innovation, a large number of professional terms and expressions are constantly updated, and new words continue to emerge. However, traditional neologism discovery methods often rely on dictionaries or rules, and it is difficult to efficiently process and update a large number of specialized terms, especially in the rapidly changing energy field. Therefore, combined with the characteristics of text data in the energy field, a new word discovery method in ENFM(energy field combining N-Gram and multiple attention mechanism) was proposed. Firstly, the N-Gram model was used to process the text data in the field of energy, and the candidate list of new words was generated by statistics and analysis of word frequency. Subsequently, the ERNIE-BiLSTM-CRF model integrating multiple attention mechanism was introduced to further improve the accuracy and efficiency of neologism discovery. Compared with the traditional neologism discovery technology, the accurate identification and overall efficiency of neologism have been significantly improved. The accuracy rate, recall rate and F1 value of neologism in the data set of policy text in the energy field are 95.71%, 95.56% and 95.63%, respectively. The experimental results show that this method can accurately identify new words in a large number of text data in the field of energy, effectively identify the specific words and expressions in the field of energy, and significantly improve the recognition ability of professional terms in the field of energy in Chinese word segmentation tasks.

energy field  /  new word discovery  /  pre-trained model  /  N-Gram  /  Chinese word segmentation
Yi-han WANG, Si-jia ZHANG, Heng CAO, Jia-ning LIU, Zheng-long ZHANG. New Word Discovery Method in the Energy Field Combining N-Gram and Multiple Attention Mechanism[J]. Science Technology and Engineering, 2025 , 25 (18) : 7668 -7677 . DOI: 10.12404/j.issn.1671-1815.2404063
Year 2025 volume 25 Issue 18
PDF
223
96
Cite this Article
BibTeX
Article Info
doi: 10.12404/j.issn.1671-1815.2404063
  • Receive Date:2024-05-31
  • Online Date:2025-12-17
  • Published:2025-06-28
Article Data
Affiliations
History
  • Received:2024-05-31
  • Revised:2025-03-05
Funding
Affiliations
    1 College of Information Engineering, Dalian Ocean University/Liaoning Key Laboratory of Marine Information Technology, Dalian 116023, China
    2 Key Laboratory of Environment Controlled Aquaculture (Dalian Ocean University), Ministry of Education, Dalian 116023, China
    3 Dalian Key Laboratory of Smart Fisheries, Dalian 116023, China
    4 Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
References
Share
https://castjournals.cast.org.cn/joweb/kxjsygc/EN/10.12404/j.issn.1671-1815.2404063
Share to
QR

Scan QR to access full text

Cite this article
BibTeX
Citations
表12种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏
  • BibTeX
  • EndNote
  • RefWorks
  • TxT