With the rapid development of the energy industry and technological innovation, a large number of professional terms and expressions are constantly updated, and new words continue to emerge. However, traditional neologism discovery methods often rely on dictionaries or rules, and it is difficult to efficiently process and update a large number of specialized terms, especially in the rapidly changing energy field. Therefore, combined with the characteristics of text data in the energy field, a new word discovery method in ENFM(energy field combining N-Gram and multiple attention mechanism) was proposed. Firstly, the N-Gram model was used to process the text data in the field of energy, and the candidate list of new words was generated by statistics and analysis of word frequency. Subsequently, the ERNIE-BiLSTM-CRF model integrating multiple attention mechanism was introduced to further improve the accuracy and efficiency of neologism discovery. Compared with the traditional neologism discovery technology, the accurate identification and overall efficiency of neologism have been significantly improved. The accuracy rate, recall rate and F1 value of neologism in the data set of policy text in the energy field are 95.71%, 95.56% and 95.63%, respectively. The experimental results show that this method can accurately identify new words in a large number of text data in the field of energy, effectively identify the specific words and expressions in the field of energy, and significantly improve the recognition ability of professional terms in the field of energy in Chinese word segmentation tasks.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |