Compendiums of Cancer Transcriptome for Machine Learning Applications

AbstractBackgroundThere exist massive transcriptome profiles in the form of microarray, enabling reuse. The challenge is that they are processed with diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset or cross-cancer analyses. If there exists a single, integrated data source consisting of thousands of samples, similar to TCGA, data-reuse will be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy.FindingsWe present 11 merged microarray-acquired datasets (MMDs) of major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Highly concordant MMD-derived patterns of genome-wide differential gene expression were observed with matching TCGA cohorts. Using machine learning algorithms, we show that clinical models trained from all MMDs, except breast MMD, can be directly applied to RNA-seq-acquired TCGA data with an average accuracy of 0.96 in classifying cancer. Machine learning optimized MMD further aids to reveal immune landscape of human cancers critically needed in disease management and clinical interventions.ConclusionsTo facilitate large-scale meta-analysis, we generated a newly curated, unified, large-scale MMD across 11 cancer types. Besides TCGA, this single data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.

Download Full-text

Compendiums of cancer transcriptomes for machine learning applications

Scientific Data ◽

10.1038/s41597-019-0207-2 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Su Bin Lim ◽

Swee Jin Tan ◽

Wan-Teck Lim ◽

Chwee Teck Lim

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Reuse ◽

Rna Seq ◽

Genomic Landscape ◽

Source Data ◽

Machine Learning Applications ◽

Cancer Types ◽

Data Source

Abstract There are massive transcriptome profiles in the form of microarray. The challenge is that they are processed using diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset analyses. If there exists a single, integrated data source, data-reuse can be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy. Here, we present merged microarray-acquired datasets (MMDs) across 11 major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Using machine learning algorithms, we show that diagnostic models trained from MMDs can be directly applied to RNA-seq-acquired TCGA data with high classification accuracy. Machine learning optimized MMD further aids to reveal immune landscape across various carcinomas critically needed in disease management and clinical interventions. This unified data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.

Download Full-text

Accuracy of Machine Learning Algorithms for the Classification of Molecular Features of Gliomas on MRI: A Systematic Literature Review and Meta-Analysis

Cancers ◽

10.3390/cancers13112606 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2606

Author(s):

Evi J. van Kempen ◽

Max Post ◽

Manoj Mannil ◽

Benno Kusters ◽

Mark ter Laan ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Meta Analysis ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Molecular Characteristics ◽

Aggregated Data ◽

Molecular Features

Treatment planning and prognosis in glioma treatment are based on the classification into low- and high-grade oligodendroglioma or astrocytoma, which is mainly based on molecular characteristics (IDH1/2- and 1p/19q codeletion status). It would be of great value if this classification could be made reliably before surgery, without biopsy. Machine learning algorithms (MLAs) could play a role in achieving this by enabling glioma characterization on magnetic resonance imaging (MRI) data without invasive tissue sampling. The aim of this study is to provide a performance evaluation and meta-analysis of various MLAs for glioma characterization. Systematic literature search and meta-analysis were performed on the aggregated data, after which subgroup analyses for several target conditions were conducted. This study is registered with PROSPERO, CRD42020191033. We identified 724 studies; 60 and 17 studies were eligible to be included in the systematic review and meta-analysis, respectively. Meta-analysis showed excellent accuracy for all subgroups, with the classification of 1p/19q codeletion status scoring significantly poorer than other subgroups (AUC: 0.748, p = 0.132). There was considerable heterogeneity among some of the included studies. Although promising results were found with regard to the ability of MLA-tools to be used for the non-invasive classification of gliomas, large-scale, prospective trials with external validation are warranted in the future.

Download Full-text

Efficient Image Retrieval approach for Large-scale Chest X Ray data using Hand-Crafted Features and Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.890896 ◽

2018 ◽

Vol 6 (11) ◽

pp. 890-896

Author(s):

Irene Getzi S ◽

D. Christopher Durairaj ◽

V Joseph Raj

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Chest X Ray

Download Full-text

Diagnostic test accuracy for use of machine learning in diagnosis of autism spectrum disorder: A Systematic Review and Meta-Analysis (Preprint)

10.2196/preprints.14108 ◽

2019 ◽

Author(s):

Sun Jae Moon ◽

Jin Seub Hwang ◽

Rajesh Kana ◽

John Torous ◽

Jung Won Kim

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Autism Spectrum Disorder ◽

Meta Analysis ◽

Learning Algorithms ◽

Structural Mri ◽

Autism Spectrum ◽

Machine Learning Algorithms ◽

Spectrum Disorder ◽

Test Accuracy

BACKGROUND Over the recent years, machine learning algorithms have been more widely and increasingly applied in biomedical fields. In particular, its application has been drawing more attention in the field of psychiatry, for instance, as diagnostic tests/tools for autism spectrum disorder. However, given its complexity and potential clinical implications, there is ongoing need for further research on its accuracy. OBJECTIVE The current study aims to summarize the evidence for the accuracy of use of machine learning algorithms in diagnosing autism spectrum disorder (ASD) through systematic review and meta-analysis. METHODS MEDLINE, Embase, CINAHL Complete (with OpenDissertations), PsyINFO and IEEE Xplore Digital Library databases were searched on November 28th, 2018. Studies, which used a machine learning algorithm partially or fully in classifying ASD from controls and provided accuracy measures, were included in our analysis. Bivariate random effects model was applied to the pooled data in meta-analysis. Subgroup analysis was used to investigate and resolve the source of heterogeneity between studies. True-positive, false-positive, false negative and true-negative values from individual studies were used to calculate the pooled sensitivity and specificity values, draw SROC curves, and obtain area under the curve (AUC) and partial AUC. RESULTS A total of 43 studies were included for the final analysis, of which meta-analysis was performed on 40 studies (53 samples with 12,128 participants). A structural MRI subgroup meta-analysis (12 samples with 1,776 participants) showed the sensitivity at 0.83 (95% CI-0.76 to 0.89), specificity at 0.84 (95% CI -0.74 to 0.91), and AUC/pAUC at 0.90/0.83. An fMRI/deep neural network (DNN) subgroup meta-analysis (five samples with 1,345 participants) showed the sensitivity at 0.69 (95% CI- 0.62 to 0.75), the specificity at 0.66 (95% CI -0.61 to 0.70), and AUC/pAUC at 0.71/0.67. CONCLUSIONS Machine learning algorithms that used structural MRI features in diagnosis of ASD were shown to have accuracy that is similar to currently used diagnostic tools.

Download Full-text

Essentiality of Machine Learning Algorithms for Big Data Computation

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch011 ◽

2016 ◽

pp. 156-167

Author(s):

Manjunath Thimmasandra Narayanapppa ◽

T. P. Puneeth Kumar ◽

Ravindra S. Hegadi

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Real Time Analysis ◽

Large Scale Data ◽

Computational Environment ◽

Large Scale Data Processing

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

Download Full-text

Large-Scale Machine Learning Algorithms for Biomedical Data Science

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '19 ◽

10.1145/3307339.3342130 ◽

2019 ◽

Author(s):

Heng Huang

Keyword(s):

Machine Learning ◽

Large Scale ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomedical Data

Download Full-text

Test data reuse for evaluation of adaptive machine learning algorithms: over-fitting to a fixed 'test' dataset and a potential solution

Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment ◽

10.1117/12.2293818 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alexej Gossmann ◽

Aria Pezeshk ◽

Berkman Sahiner

Keyword(s):

Machine Learning ◽

Test Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Reuse ◽

Potential Solution ◽

Test Dataset

Download Full-text

Erratum to: Combining semi-automated image analysis techniques with machine learning algorithms to accelerate large-scale genetic studies

GigaScience ◽

10.1093/gigascience/giy043 ◽

2018 ◽

Vol 7 (7) ◽

Author(s):

Jonathan A Atkinson ◽

Guillaume Lobet ◽

Manuel Noll ◽

Patrick E Meyer ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Automated Image Analysis ◽

Genetic Studies ◽

Analysis Techniques ◽

Image Analysis Techniques

Download Full-text

The large scale digital mapping of soil organic carbon using machine learning algorithms

Dokuchaev Soil Bulletin ◽

10.19047/0136-1694-2018-91-46-62 ◽

2018 ◽

Vol 91 ◽

pp. 46-62 ◽

Cited By ~ 1

Author(s):

A. V. Chinilin ◽

◽

I. Yu. Savin ◽

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Digital Mapping

Download Full-text

Accuracy of Machine Learning Algorithms for the Diagnosis of Autism Spectrum Disorder: Systematic Review and Meta-Analysis of Brain Magnetic Resonance Imaging Studies

JMIR Mental Health ◽

10.2196/14108 ◽

2019 ◽

Vol 6 (12) ◽

pp. e14108 ◽

Cited By ~ 1

Author(s):

Sun Jae Moon ◽

Jinseub Hwang ◽

Rajesh Kana ◽

John Torous ◽

Jung Won Kim

Keyword(s):

Magnetic Resonance Imaging ◽

Machine Learning ◽

Systematic Review ◽

Magnetic Resonance ◽

Meta Analysis ◽

Learning Algorithms ◽

Autism Spectrum ◽

Machine Learning Algorithms ◽

Resonance Imaging ◽

Accuracy Measures

Background In the recent years, machine learning algorithms have been more widely and increasingly applied in biomedical fields. In particular, their application has been drawing more attention in the field of psychiatry, for instance, as diagnostic tests/tools for autism spectrum disorder (ASD). However, given their complexity and potential clinical implications, there is an ongoing need for further research on their accuracy. Objective This study aimed to perform a systematic review and meta-analysis to summarize the available evidence for the accuracy of machine learning algorithms in diagnosing ASD. Methods The following databases were searched on November 28, 2018: MEDLINE, EMBASE, CINAHL Complete (with Open Dissertations), PsycINFO, and Institute of Electrical and Electronics Engineers Xplore Digital Library. Studies that used a machine learning algorithm partially or fully for distinguishing individuals with ASD from control subjects and provided accuracy measures were included in our analysis. The bivariate random effects model was applied to the pooled data in a meta-analysis. A subgroup analysis was used to investigate and resolve the source of heterogeneity between studies. True-positive, false-positive, false-negative, and true-negative values from individual studies were used to calculate the pooled sensitivity and specificity values, draw Summary Receiver Operating Characteristics curves, and obtain the area under the curve (AUC) and partial AUC (pAUC). Results A total of 43 studies were included for the final analysis, of which a meta-analysis was performed on 40 studies (53 samples with 12,128 participants). A structural magnetic resonance imaging (sMRI) subgroup meta-analysis (12 samples with 1776 participants) showed a sensitivity of 0.83 (95% CI 0.76-0.89), a specificity of 0.84 (95% CI 0.74-0.91), and AUC/pAUC of 0.90/0.83. A functional magnetic resonance imaging/deep neural network subgroup meta-analysis (5 samples with 1345 participants) showed a sensitivity of 0.69 (95% CI 0.62-0.75), specificity of 0.66 (95% CI 0.61-0.70), and AUC/pAUC of 0.71/0.67. Conclusions The accuracy of machine learning algorithms for diagnosis of ASD was considered acceptable by few accuracy measures only in cases of sMRI use; however, given the many limitations indicated in our study, further well-designed studies are warranted to extend the potential use of machine learning algorithms to clinical settings. Trial Registration PROSPERO CRD42018117779; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=117779

Download Full-text