Traditional data stream clustering methods lack online dimensionality reduction capabilities for high-dimensional data, leading to limited clustering performance. To address this issue,a Scalable Subspace Learning for Clustering Data Streams(S2LCStream) method is proposed. Firstly,this method establishes a projection relationship between historical data and new data through scalable subspace learning,projecting the new data into the subspace spanned by historical data to obtain its clustering assignment in real-time. Secondly,to maintain the accuracy of clustering assignments over time, the method performs consistency detection of data distribution on the continuously arriving data stream,capturing concept drifts and adjusting clustering assignments through a backtracking mechanism to adapt to dynamically changing data distributions. Finally,the proposed method is validated on multiple real-world datasets, demonstrating its efficiency in handling high-dimensional data streams. Specifically, S2LCStream maintains high clustering accuracy while efficiently handling concept drift.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |