scholarly journals Compendiums of Cancer Transcriptome for Machine Learning Applications

2018 ◽  
Author(s):  
Su Bin Lim ◽  
Swee Jin Tan ◽  
Wan-Teck Lim ◽  
Chwee Teck Lim

AbstractBackgroundThere exist massive transcriptome profiles in the form of microarray, enabling reuse. The challenge is that they are processed with diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset or cross-cancer analyses. If there exists a single, integrated data source consisting of thousands of samples, similar to TCGA, data-reuse will be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy.FindingsWe present 11 merged microarray-acquired datasets (MMDs) of major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Highly concordant MMD-derived patterns of genome-wide differential gene expression were observed with matching TCGA cohorts. Using machine learning algorithms, we show that clinical models trained from all MMDs, except breast MMD, can be directly applied to RNA-seq-acquired TCGA data with an average accuracy of 0.96 in classifying cancer. Machine learning optimized MMD further aids to reveal immune landscape of human cancers critically needed in disease management and clinical interventions.ConclusionsTo facilitate large-scale meta-analysis, we generated a newly curated, unified, large-scale MMD across 11 cancer types. Besides TCGA, this single data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.

2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Su Bin Lim ◽  
Swee Jin Tan ◽  
Wan-Teck Lim ◽  
Chwee Teck Lim

Abstract There are massive transcriptome profiles in the form of microarray. The challenge is that they are processed using diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset analyses. If there exists a single, integrated data source, data-reuse can be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy. Here, we present merged microarray-acquired datasets (MMDs) across 11 major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Using machine learning algorithms, we show that diagnostic models trained from MMDs can be directly applied to RNA-seq-acquired TCGA data with high classification accuracy. Machine learning optimized MMD further aids to reveal immune landscape across various carcinomas critically needed in disease management and clinical interventions. This unified data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.


Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2606
Author(s):  
Evi J. van Kempen ◽  
Max Post ◽  
Manoj Mannil ◽  
Benno Kusters ◽  
Mark ter Laan ◽  
...  

Treatment planning and prognosis in glioma treatment are based on the classification into low- and high-grade oligodendroglioma or astrocytoma, which is mainly based on molecular characteristics (IDH1/2- and 1p/19q codeletion status). It would be of great value if this classification could be made reliably before surgery, without biopsy. Machine learning algorithms (MLAs) could play a role in achieving this by enabling glioma characterization on magnetic resonance imaging (MRI) data without invasive tissue sampling. The aim of this study is to provide a performance evaluation and meta-analysis of various MLAs for glioma characterization. Systematic literature search and meta-analysis were performed on the aggregated data, after which subgroup analyses for several target conditions were conducted. This study is registered with PROSPERO, CRD42020191033. We identified 724 studies; 60 and 17 studies were eligible to be included in the systematic review and meta-analysis, respectively. Meta-analysis showed excellent accuracy for all subgroups, with the classification of 1p/19q codeletion status scoring significantly poorer than other subgroups (AUC: 0.748, p = 0.132). There was considerable heterogeneity among some of the included studies. Although promising results were found with regard to the ability of MLA-tools to be used for the non-invasive classification of gliomas, large-scale, prospective trials with external validation are warranted in the future.


2019 ◽  
Author(s):  
Sun Jae Moon ◽  
Jin Seub Hwang ◽  
Rajesh Kana ◽  
John Torous ◽  
Jung Won Kim

BACKGROUND Over the recent years, machine learning algorithms have been more widely and increasingly applied in biomedical fields. In particular, its application has been drawing more attention in the field of psychiatry, for instance, as diagnostic tests/tools for autism spectrum disorder. However, given its complexity and potential clinical implications, there is ongoing need for further research on its accuracy. OBJECTIVE The current study aims to summarize the evidence for the accuracy of use of machine learning algorithms in diagnosing autism spectrum disorder (ASD) through systematic review and meta-analysis. METHODS MEDLINE, Embase, CINAHL Complete (with OpenDissertations), PsyINFO and IEEE Xplore Digital Library databases were searched on November 28th, 2018. Studies, which used a machine learning algorithm partially or fully in classifying ASD from controls and provided accuracy measures, were included in our analysis. Bivariate random effects model was applied to the pooled data in meta-analysis. Subgroup analysis was used to investigate and resolve the source of heterogeneity between studies. True-positive, false-positive, false negative and true-negative values from individual studies were used to calculate the pooled sensitivity and specificity values, draw SROC curves, and obtain area under the curve (AUC) and partial AUC. RESULTS A total of 43 studies were included for the final analysis, of which meta-analysis was performed on 40 studies (53 samples with 12,128 participants). A structural MRI subgroup meta-analysis (12 samples with 1,776 participants) showed the sensitivity at 0.83 (95% CI-0.76 to 0.89), specificity at 0.84 (95% CI -0.74 to 0.91), and AUC/pAUC at 0.90/0.83. An fMRI/deep neural network (DNN) subgroup meta-analysis (five samples with 1,345 participants) showed the sensitivity at 0.69 (95% CI- 0.62 to 0.75), the specificity at 0.66 (95% CI -0.61 to 0.70), and AUC/pAUC at 0.71/0.67. CONCLUSIONS Machine learning algorithms that used structural MRI features in diagnosis of ASD were shown to have accuracy that is similar to currently used diagnostic tools.


Author(s):  
Manjunath Thimmasandra Narayanapppa ◽  
T. P. Puneeth Kumar ◽  
Ravindra S. Hegadi

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.


10.2196/14108 ◽  
2019 ◽  
Vol 6 (12) ◽  
pp. e14108 ◽  
Author(s):  
Sun Jae Moon ◽  
Jinseub Hwang ◽  
Rajesh Kana ◽  
John Torous ◽  
Jung Won Kim

Background In the recent years, machine learning algorithms have been more widely and increasingly applied in biomedical fields. In particular, their application has been drawing more attention in the field of psychiatry, for instance, as diagnostic tests/tools for autism spectrum disorder (ASD). However, given their complexity and potential clinical implications, there is an ongoing need for further research on their accuracy. Objective This study aimed to perform a systematic review and meta-analysis to summarize the available evidence for the accuracy of machine learning algorithms in diagnosing ASD. Methods The following databases were searched on November 28, 2018: MEDLINE, EMBASE, CINAHL Complete (with Open Dissertations), PsycINFO, and Institute of Electrical and Electronics Engineers Xplore Digital Library. Studies that used a machine learning algorithm partially or fully for distinguishing individuals with ASD from control subjects and provided accuracy measures were included in our analysis. The bivariate random effects model was applied to the pooled data in a meta-analysis. A subgroup analysis was used to investigate and resolve the source of heterogeneity between studies. True-positive, false-positive, false-negative, and true-negative values from individual studies were used to calculate the pooled sensitivity and specificity values, draw Summary Receiver Operating Characteristics curves, and obtain the area under the curve (AUC) and partial AUC (pAUC). Results A total of 43 studies were included for the final analysis, of which a meta-analysis was performed on 40 studies (53 samples with 12,128 participants). A structural magnetic resonance imaging (sMRI) subgroup meta-analysis (12 samples with 1776 participants) showed a sensitivity of 0.83 (95% CI 0.76-0.89), a specificity of 0.84 (95% CI 0.74-0.91), and AUC/pAUC of 0.90/0.83. A functional magnetic resonance imaging/deep neural network subgroup meta-analysis (5 samples with 1345 participants) showed a sensitivity of 0.69 (95% CI 0.62-0.75), specificity of 0.66 (95% CI 0.61-0.70), and AUC/pAUC of 0.71/0.67. Conclusions The accuracy of machine learning algorithms for diagnosis of ASD was considered acceptable by few accuracy measures only in cases of sMRI use; however, given the many limitations indicated in our study, further well-designed studies are warranted to extend the potential use of machine learning algorithms to clinical settings. Trial Registration PROSPERO CRD42018117779; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=117779


Sign in / Sign up

Export Citation Format

Share Document