OBJECTIVE To establish an origin classification model of Angelica dahurica with unbalanced sample size based on near-infrared spectroscopy combined with data-enhanced convolutional neural network(CNN) algorithm. METHODS In this study, 95 samples of Angelica dahurica were collected, and near-infrared spectroscopy was performed on different samples within the wavelength range of 12 500 to 4 000 cm-1. The near-infrared spectroscopy dataset of Angelica dahurica used in this study faces issues such as small sample size and uneven distribution of sample origins. To enhance the generalizability of the model, three data augmentation algorithms were proposed, including spectral shifting, spectral noise addition, and spectral combination. Additionally, to address the problem of sample imbalance, Focal Loss was used as the loss function for training the CNN model. RESULTS The three data enhancement algorithms were applied to the SVM model. Adding Gaussian noise with a signal-to-noise ratio of 20 to the spectral data had the best effect, which could increase the accuracy of the model to 84.2%. Aiming at the problem of sample imbalance, Focal Loss is used as the loss function to train the CNN model, and the accuracy rate can reach 94.7%. CONCLUSION The infrared spectroscopy combined with data-enhanced CNN algorithm provides a rapid and non-destructive detection method and reliable data analysis method for the origin traceability of Radix Angelicae Dahuricae, and provides a new method reference for the origin traceability of Chinese medicinal materials.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |