To study the rapid identification of cow-bezoar and its substitutes medicinal herbs using the technique of rapid evaporative ionization mass spectrometry (REIMS) couple with machine learning.
The samples were ionized and determined by REIMS with m/z 50-1 200 as scanning range in sensitive mode and negative ion mode, 0.2 s as scanning time, and using dry burning method. REIMS data of samples was recorded as continuous mode. Then the general situation of REIMS data distribution was studied and analyzed through the methods of cluster analysis and principal component analysis. Some models or algorithms, such as partial least squares discriminant analysis (PLS-DA), logistic regression (LR), decision tree (DT), random forest (RF) and adaptive boosting (AdaBoost, with LR and DT as base estimator respectively) were established. In the models training procedure, simulation synthesis data generated by algorithms of GaussianCopula, CTGAN, CopulaGAN and TVAE joined the original training set data as the new training set.
AdaBoost (DT as base estimator) trained with the new training set was the best model which could accurately predict cow-bezoar and its substitutes medicinal herbs. The accuracy for identifying the test set was 0.97, the precision was 0.90, the recall was 0.97, the F1 score was 0.93, and the AUC of ROC was 1.00. The probability output from the model could also be flexibly used by adjusting the probability threshold according to the actual application scenarios of drug regulation.
The combination of REIMS technology and machine learning technology can achieve fast and accurate recognition of cow-bezoar and its substitutes medicinal herbs.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |