Addressing the limitation of existing sequence labeling approaches in effectively recognizing nested entities within Chinese electronic health records (EHRs), a novel named entity recognition model that integrates MacBERT and a global pointer network was proposed. Initially, the MacBERT-large pre-trained model transformed the text into context-sensitive dynamic vectors. Subsequently, the fast gradient method (FGM) was employed to generate adversarial samples, which were incorporated into the original vectors and fed into a BiLSTM (bi-directional long short-term memory) network to capture contextual features. To enhance the capture of long-distance semantic features, an attention mechanism was introduced. Finally, a global pointer network model was leveraged to decode simultaneously considering both head and tail feature information, thereby achieving superior prediction performance for medical nested entities. Experimental results demonstrate that compared to the state-of-the-art global pointer model, the proposed model achieves an improvement of 1.8%, 1.37%, and 1.72% in F1-score on the CCKS2019 dataset and two versions of the CMeEE Chinese EHR dataset, respectively, validating the effectiveness of the proposed approach.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |