Introduction: The Cancer Genome Atlas (TCGA) is a comprehensive multi-omics database of 33 different cancers, providing a wealth of clinical information and molecular datasets for biomarker discovery (Peng amp; Croce, 2016) and genomic pipeline development (Hutter amp; Zenklusen, 2018). MicroRNAs (miRNAs) are small non-coding regulatory RNAs that are excellent classificatory tissue markers due to their abundance and cell-type and disease-stage specificity (Gustafson et al., 2016); miRNA expression data are available from all 33 cancer types in the TCGA.
Objectives and Hypothesis: The primary objective of this study is to design and validate a miRNA-based classifier. We hypothesize that our machine learning algorithm will identify a set of miRNA biomarkers that can discriminate between the 33 types of cancer listed in the TCGA.
Methods: We compiled and preprocessed miRNA expression profiles for the 33 different cancers in TCGA. After preprocessing to remove duplicates, batch effects and technical outliers, unsupervised hierarchical clustering of miRNA expression profiles showed distinct separation of ovarian serous cystadenocarcinoma, glioblastoma multiforme and low-grade glioma, and testicular germ cell tumors from other cancer types (Figure 1). We observed certain cancer types with reproductive system origin had grouped together and apart from other cancer types during unsupervised clustering. Based on these observations and prior knowledge of tumor anatomy and pathology, we created a hierarchical classifier wherein each cancer type is systematically discriminated from the remaining cancer types, until each cancer is identifiable through process of elimination. At each step, a feature selection algorithm developed in our lab identified miRNA biomarkers that can classify these cancers based on organ system and subsequently, the specific cancer.
Results: After data preprocessing and filtering at the 90th percentile of expression, the data set included 8287 samples and 617 miRNAs. Feature selection analysis for classifier design effectively separates reproductive cancers from others.
Conclusion: We have developed a miRNA-based classifier for of ovarian serous cystadenocarcinoma and testicular germ cell tumors, both under reproductive system cancers. We will continue to expand the classifier for all 33 cancers in TCGA and finalize through validation with unlabeled samples. The resultant classifier for multiple cancers has clinical potential for sample diagnosis, but also provides insight into cancer diversity and pathogenesis across organ systems and subtypes.
Gustafson, D., Tyryshkin, K., amp; Renwick, N. (2016). microRNA-guided diagnostics in clinical samples. Best Pract Res Clin Endocrinol Metab, 30(5), 563-575. https://doi.org/10.1016/j.beem.2016.07.002
Hutter, C., amp; Zenklusen, J. C. (2018). The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell, 173(2), 283-285. https://doi.org/10.1016/j.cell.2018.03.042
Peng, Y., amp; Croce, C. M. (2016). The role of MicroRNAs in human cancer. Signal Transduction and Targeted Therapy, 1(1), 15004. https://doi.org/10.1038/sigtrans.2015.4
Support or Funding Information
No external support or funding.
Figure 1. Unsupervised clustering using mean spearman correlation between samples, after preprocessing and filtering at 90th percentile. Colors on right indicate correspond to bar at bottom and cancer. miRNA expression indicated with color scale: yellow and darker blue represent higher and lower expression respectively. Ovarian serous cystadenocarcinoma (OV), glioblastoma multiforme and low-grade glioma (GBM and LGG), and testicular germ cell tumors (TGCT) are labelled."