Current integration methods for single-cell RNA sequencing (scRNA-seq) data and spatial transcriptomics (ST) data are typically designed for specific tasks, such as deconvolution of cell types or spatial distribution prediction of RNA transcripts. These methods usually only offer a partial analysis of ST data, neglecting the complex relationship between spatial expression patterns underlying cell-type specificity and intercellular cross-talk. Here, we present eMCI, an explainable multimodal correlation integration model based on deep neural network framework. eMCI leverages the fusion of scRNA-seq and ST data using different spot–cell correlations to integrate multiple synthetic analysis tasks of ST data at cellular level. First, eMCI can achieve better or comparable accuracy in cell-type classification and deconvolution according to wide evaluations and comparisons with state-of-the-art methods on both simulated and real ST datasets. Second, eMCI can identify key components across spatial domains responsible for different cell types and elucidate the spatial expression patterns underlying cell-type specificity and intercellular communication, by employing an attribution algorithm to dissect the visual input. Especially, eMCI has been applied to 3 cross-species datasets, including zebrafish melanomas, soybean nodule maturation, and human embryonic lung, which accurately and efficiently estimate per-spot cell composition and infer proximal and distal cellular interactions within the spatial and temporal context. In summary, eMCI serves as an integrative analytical framework to better resolve the spatial transcriptome based on existing single-cell datasets and elucidate proximal and distal intercellular signal transduction mechanisms over spatial domains without requirement of biological prior reference. This approach is expected to facilitate the discovery of spatial expression patterns of potential biomolecules with cell type and cell–cell communication specificity.
| 1. | MI, a commonly used correlation analysis tool based on information theory, captures linear and nonlinear dependence between 2 random variables. MI is zero when variables are independent, and it increases as dependency increases. For a cell cj and spot sk (j = 1, ⋯, n1 and k = 1, ⋯, n2), their gene expression vectors are denoted as and , respectively, where m represents the number of genes. The MI between Uj and Vk is defined as the following form: where is the joint probability mass function of Uj and Vk, and and are the marginal probability mass functions of Uj and Vk, respectively. In light of another study [48], the above formulation can be expressed as follows.where C(Uj)/C(Vk) is the covariance matrix of the vector Uj/Vk, |C| is the determinant of matrix C, and C(Uj, Vk) is the covariance matrix between the vectors Uj and Vk. |
| 2. | Coefficient of determination, often denoted as R2, is a statistical measure representing the proportion of the variation in the dependent variable that is predictable from the independent variable in a regression model, quantifying the strength of the relationship between the independent and dependent variables. The R2 normally ranges from 0 to 1, where high R2 values indicate a strong relationship. The R2 between Uj and Vk is defined as the following form: where is the mean of the gene expression vector Uj. |
| 3. | Pearson correlation coefficient, denoted as PCC, focuses exclusively on the level of linear dependence between pairs of variables, and has range of −1 (perfect but negative linear relationship) to +1 (perfect and positive linear relationship) with 0 denoting the absence of a linear relationship. PCC measures the similarity of trend, removing the dimensional difference of different variables in the calculation process. The PCC between Uj and Vk is defined as the following form: where and are the mean/standard deviation of expression vectors and , respectively.Therefore, the multimodal similarity correlation matrix containing n1 rows and n2 columns was calculated by the above methods. |
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |