Understanding and Execution of MIA Analysis
Recommended Post : 【Bioinformatics】 Bioinformatics Analysis Table of Contents
2. Code
**3. Result 1.** scRNA-seq data validation
**4. Result 2.** ST characterization
**5. Result 3.** MIA (Multimodal Intersection Analysis)
**6. Result 4.** Analysis of scRNA-seq Subpopulations using MIA
**7. Result 5.** Analysis of Cancer Population using MIA
**8. Result 6.** Analysis of Tumor Microenvironment using MIA
**9. Result 7.** Augmentation with TCGA (The Cancer Genome Atlas)
10. Limitations
1. Background Theory
⑴ scRNA-seq
① Advantages : Unbiasedness. High resolution.
② Disadvantages : Loss of spatial information. Lack of knowledge about cellular interaction and organization in the Tumor Microenvironment (TME).
⑵ ST (Spatial Transcriptomics)
① Advantages : Unbiasedness. Includes spatial information.
② Disadvantages : Low cellular resolution. Each spot contains only 10-200 cells.
○ In other words, each spot contains a mixture of different cell types, leading to the loss of cell type information.
⑶ KNN (k-nearest neighbors smoothing) : Used to remove inherent noise from scRNA-seq data.
⑷ MIA (Multimodal Intersection Analysis)
① Conducted in parallel with scRNA-seq and ST.
② Integrates the two datasets.
2. Code
⑵ Log base 10 of n!
⑶ Using Fisher’s exact test to test statistical equivalence of two sets
⑷ MIA assay (enrichment)
⑸ MIA assay (depletion)
3. Result 1. scRNA-seq Data Validation
⑴ t-SNE projection (Fig. 1b)
① PDAC-A: 1,926 cells. PDAC-B: 1,733 cells.
② Recursive hierarchical clustering scheme used to identify cell types.
○ 1st. KNN smoothing.
○ 2nd. Tfeeman-Tukey transformation.
○ 3rd. Identification of most variable genes using Fano factor and mean expression.
○ 4th. Clustering using Ward’s criterion.
○ 5th. Naming clusters based on Differential Expression Genes (DEGs).
○ 6th. Removal of clusters of low-quality cells by comparing UMI and mitochondrial content among PDAC-A, PDAC-B, and PDAC-C.
○ (Comment) Recursive hierarchical clustering scheme can also be achieved using Seurat package.
③ Significance: Understanding the location of cancer clusters.
⑵ Correspondence between PDAC-A and PDAC-B
① Demonstrating experimental reproducibility.
⑶ SNV profile (Suppl’ Fig. 2)
① Genes rearranged by chromosomal location on the x-axis.
② Arranged autonomously based on ductal cells, TM4SF1 positive cells, and S100A4 positive cells on the y-axis.
③ Discovery of various cell subpopulations.
○ Ductal cells.
○ Cancer (TM4SF1) in PDAC-A.
○ Cancer (S100A4) in PDAC-A.
○ Cancer (TM4SF1) in PDAC-B.
⑷ Expression profiles of KRT19, TM4SF1, S100A4 on t-SNE: Validation aspect of ⑶
① Similarity between PDAC-A and PDAC-B.
⑸ Double immunofluorescent imaging: Validation aspect of ⑶
① Fig. 1f
○ Mutually exclusive staining for TM4SF1 and S100A4.
○ Colocalization of KRT19 and TM4SF1 in PDAC-B, lack of colocalization of KRT19 and S100A4 matches with ⑵.
② Suppl’ Fig. 2
○ Different patterns between malignant ducts and non-malignant ducts.
4. Result 2. ST Characterization
⑴ H&E Staining
① Each spot contains 20 ~ 74 nuclei (Suppl’ Fig. 4).
⑵ ST Mapping
① PDAC-A (Fig. 2e): 428 spots.
② PDAC-B (Fig. 2f): 224 spots.
③ (Comment) Potential cell loss in scRNA-seq, 428 × 20 ≫ 1926.
④ Spatial expression of variably expressed genes matches annotated histological regions (Fig. 2c, d).
⑶ PCA (Principal Component Analysis)
① Resulting clusters consistent with independent histological annotations (Fig. 2e, f, Suppl’ Fig. 5g, h).
5. Result 3. MIA (Multimodal Intersection Analysis)
⑴ MIA Procedure
① 1st. Investigate DEGs (P < 10^-5, two-tailed Student’s t-test) for each cell type derived from scRNA-seq.
② 2nd. Investigate DEGs (P < 0.01, two-tailed Student’s t-test) for each region derived from ST datasets.
③ 3rd. Construct a matrix, arrange rows by cell type, and columns by region.
④ 4th. For each cell of the matrix, apply DEG sets from ① and ② to the hypergeometric cumulative distribution.
⑤ 5th. For instance, demonstrate the abundance of fibroblast cells in the cancer region.
⑥ Examples
○ Fig. 2g: Expected 14.44, actual result 15.06356.
○ Fig. 5b: Expected 2.92, actual result 2.366678.
○ (Comment) Insufficient understanding of background genes.
⑦ (Comment) Due to the sample size, using chi-square independence test might be more appropriate.
⑵ **Criterion
1.** Fairness of MIA results for enrichments and depletions (Suppl’ Fig. 5i, j).
① (Comment) The less stringent p-value threshold might bias towards considering less related genes as enrichments.
⑶ Criterion 2. Sufficient count of specific genes.
① When the number of cancer region-specific genes is less than 100, fibroblast-specific genes are not significant for cancer (P > 0.05).
② Increasing cancer region-specific genes sufficiently could establish parameter independence.
6. Result 4. Analysis of scRNA-seq Subpopulations using MIA
⑴ Division of KRT19-expressing ductal cells into four ductal subpopulations
① Subpopulations (Fig. 3a-d): Ductal cells divided into four subpopulations when analyzed separately using scRNA-seq.
○ APOL1 high/hypoxic: High expression of genes including APOL1, APOL2, APOL3, CA9, DUOX2, ERO1A, etc.
○ Newly discovered sub-population.
○ Centroacinar: High expression of genes including AQP3, CFTR, CRISP3, REG1A, REG1B, REG3A, etc.
○ Pre-existing sub-population from previous studies.
○ Antigen presenting: High expression of genes including C1S, C4A, C4B, CFB, CFH, CD74, HLA-DPA1, HLA-DQA2, HLA-DRA, HLA-DRB1, HLA-DRB5, etc.
○ Newly discovered sub-population.
○ Although B-cells, macrophages, and dendritic cells highly express MHC Class II, epithelial cells in the liver, gastrointestinal tracts, and respiratory tracts also express MHC Class II.
○ These antigen-presenting ductal cells regulate inflammatory response through T-cell activation.
○ Terminal duct: High expression of genes including KRT16, KRT18, TFF1, TFF2, TFF3, etc.
○ Pre-existing sub-population from previous studies.
○ Confirmation that these subpopulations express KRT19 as seen in H&E staining, confirming KRT19-expressing ductal sub-population (Fig. 3e-h).
○ Refer to Suppl’ Fig. 6 for images before merging.
② MIA analysis
○ Ductal subpopulation observed in both PDAC-A and PDAC-B, highly present in duct epithelium.
○ Uniquely high presence of APOL1 high/hypoxic in cancer region due to low oxygen content, while scarce in pancreatic tissue.
⑵ Division of macrophages into M1-like and M2-like subpopulations
① Subpopulation (Suppl’ Fig. 7a): Identified using violin plots, suggesting difficulty in separating them using scRNA-seq.
○ Used M1 markers: IL1B, IL1RN, CLEC5A.
○ Used M2 markers: MS4A6A, SDS, CD163.
② MIA analysis (Suppl’ Fig. 7b)
○ M1-like macrophages distributed in stroma and cancer regions, reflecting inflammatory environment. M2-like macrophages distributed in ducts, reflecting tissue-resident macrophages.
⑶ Division of dendritic cells into A and B
① Subpopulation (Suppl’ Fig. 7c): Identified using violin plots, suggesting difficulty in separating them using scRNA-seq.
○ Used A markers: TUBB, TLR5, CLEC4A.
○ Used B markers: C1QA, C1QB, HLA-DRB1.
○ B characterized by abundant complement pathway genes and MHC Class II expression.
② MIA analysis
○ Exclusive pattern of two sub-clusters like macrophages, each with a unique role.
7. Result 5. Analysis of Cancer Population using MIA
⑴ Problem Statement
① Clearly differentiating TM4SF1-cancer and S100A4-cancer in PDAC-A.
② However, MIA analysis suggests both cell types are present in the same region.
③ Are these two cell types colocalized or distributed differently?
⑵ Conducted additional ST analysis for PDAC-A-2 alongside PDAC-A (PDAC-A-1) to define sub-regions
⑶ MIA results: Cancer cluster 1 highly present in fibroblast-rich region (likely related to stromal response), cancer cluster 2 less present in fibroblast-rich region (likely related to stromal response) (Fig. 4d).
8. Result 6. Analysis of Tumor Microenvironment using MIA
⑴ Problem Statement
① Recent scRNA-seq studies reveal various cancer types like glioblastoma, melanoma, and head and neck cancer.
② Interaction between cancers might lead to diverse cancer cell states.
③ Can MIA analysis map these cancer cell states?
⑵ Heatmap (Fig. 5a)
⑶ Coenrichment of genes from stress-response module with cancer stress-response module
① 1st. Cancer stress-response module defined by Elyada et al.’s scRNA-seq-defined inflammatory fibroblast signature.
② 2nd. Genes from stress-response module
○ 2nd-1st. Identified cancer region spots in PDAC-A-1.
○ 2nd-2nd. Colored by cancer stress-response module.
○ 2nd-3rd. Distinguish highly expressing spots from lowly expressing spots.
○ 2nd-4th. Investigate DEGs distinguishing the two regions.
③ 3rd. MIA analysis
○ Strong coenrichment of stress-module spots with inflammatory fibroblasts.
○ Significant coenrichments of monocytes and T/natural killer cells (not as much as inflammatory fibroblasts).
⑷ Immunofluorescence
① Immunofluorescence using IL-6, highly expressed in inflammatory fibroblasts.
② DAPI, KRT19, IL-6, and H&E staining conducted.
③ Conclusion 1. Colocalization of IL-6 and KRT19: Matches the result of IL-6 being highly expressed in inflammatory fibroblasts.
④ Conclusion 2. H&E staining confirms epithelial cells in the tissue are indeed malignant cells.
⑤ Final Conclusion: Inflammatory fibroblasts and cancer cells expressing a stress-response gene module are related.
9. Result 7. Augmentation using TCGA (The Cancer Genome Atlas)
⑴ Correlation between expression of identified cancer gene modules and inflammatory fibroblast gene signature
① Conducted on 142 PDAC tumors from TCGA.
② Among hypoxia module, oxidative phosphorylation module, and stress-response module, only stress-response module showed significant Pearson correlation coefficient.
⑵ Used Tirosh’s scRNA-seq and Thrane et al.’s ST data (Suppl’ Fig. 12)
① Conclusion 1. Colocalization of fibroblasts and endothelial cells: Majority of stromal tissue compartment composed of fibroblasts and endothelial cells, as confirmed by TCGA data.
② Conclusion 2. Macrophages exist only in specific areas within melanoma region.
③ Conclusion 3. CD8+ T cell marker is notably lacking: Implies CD8+ T cell marker’s potential for use in prognosis and therapy response.
10. Limitations
⑴ Limitation 1. ST array doesn’t cover entire tissue and each spot lacks single-cell resolution.
⑵ Limitation 2. Optimal tissue permeabilization condition might not apply to all histological features.
Input : 2021.01.03 23:28