OBJECTIVE To explore the effects of different data preprocessing algorithms and their combinations, different band selection algorithms and different classification methods on the performance of the model by using near infrared spectroscopy to classify the genuineness of Scutellaria baicalensis Georgi in Hebei province. METHODS A total of 138 samples of Scutellaria baicalensis Georgi were collected, and the spectral acquisition of different Scutellaria baicalensis Georgi samples was carried out by using 12 500-4 000 cm-1 band. Firstly, the single performance and combined performance of different spectral preprocessing methods are compared. Secondly, the performance of competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE), successive projections algorithm (SPA) and principal component analysis (PCA) in infrared spectral band selection and feature extraction are compared. Finally, the performance differences of partial least squares discriminant analysis (PLS-DA), support vector (SVM), artificial neural network (ANN), random forest (RF), traditional one-dimensional convolutional neural network (CNN) and stacked autoencoder (SAE) in establishing the attribute classification model of traditional Chinese medicine were compared. RESULTS The best preprocessing algorithm is to use mean centralization (MC) and multiple scattering correction (MSC), and the overall accuracy rate can reach 92.9 %. The optimal band selection algorithm is based on the 25-dimensional variables selected by PCA, which can increase the total accuracy by 10.7 %. The best classification algorithm is a one-dimensional CNN model established after MSC processing and PCA dimensionality reduction, which can achieve 100 % accuracy of geo-authentic binary classification. CONCLUSIONS The study of genuineness classification algorithm by infrared spectroscopy provides a fast and non-destructive detection method and reliable data analysis method for the genuineness classification of Scutellaria baicalensis, and provides a new method reference for the traceability of Chinese medicinal materials.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |