A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data

PeerJ Computer Science ◽

10.7717/peerj-cs.270 ◽

2020 ◽

Vol 6 ◽

pp. e270 ◽

Cited By ~ 1

Author(s):

Reinel Tabares-Soto ◽

Simon Orozco-Arias ◽

Victor Romero-Cano ◽

Vanesa Segovia Bucheli ◽

José Luis Rodríguez-Sotelo ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Supervised Machine Learning ◽

Successful Outcome ◽

Cancer Type ◽

Tumor Type ◽

Tumor Identification ◽

Major Interest

Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms to construct a molecular-based classification of carcinoma cells from breast, bladder, adenocarcinoma, colorectal, gastro esophagus, kidney, liver, lung, ovarian, pancreas, and prostate tumors. These datasets are collectively known as the 11_tumor database, although this database has been used in several works in the ML field, no comparative studies of different algorithms can be found in the literature. On the other hand, advances in both hardware and software technologies have fostered considerable improvements in the precision of solutions that use ML, such as Deep Learning (DL). In this study, we compare the most widely used algorithms in classical ML and DL to classify the tumors described in the 11_tumor database. We obtained tumor identification accuracies between 90.6% (Logistic Regression) and 94.43% (Convolutional Neural Networks) using k-fold cross-validation. Also, we show how a tuning process may or may not significantly improve algorithms’ accuracies. Our results demonstrate an efficient and accurate classification method based on gene expression (microarray data) and ML/DL algorithms, which facilitates tumor type prediction in a multi-cancer-type scenario.

Download Full-text

A deep learning and novelty detection framework for rapid phenotyping in high-content screening

10.1101/134627 ◽

2017 ◽

Cited By ~ 2

Author(s):

Christoph Sommer ◽

Rudolf Hoefler ◽

Matthias Samwer ◽

Daniel W. Gerlich

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Novelty Detection ◽

A Priori ◽

Mitotic Cell ◽

Supervised Machine Learning ◽

High Content Screening ◽

Data Sets ◽

User Training

AbstractSupervised machine learning is a powerful and widely used method to analyze high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.

Download Full-text

Advanced Machine Learning Models for Large Scale Gene Expression Analysis in Cancer Classification: Deep Learning Versus Classical Models

Communications in Computer and Information Science - Big Data, Cloud and Applications ◽

10.1007/978-3-319-96292-4_17 ◽

2018 ◽

pp. 210-221

Author(s):

Imene Zenbout ◽

Souham Meshoul

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Deep Learning ◽

Expression Analysis ◽

Large Scale ◽

Gene Expression Analysis ◽

Cancer Classification ◽

Learning Models ◽

Classical Models ◽

Machine Learning Models

Download Full-text

Development and clinical validation of Lantern Pharma’s AI engine: Response algorithm for drug positioning and rescue (RADR).

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.3114 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. 3114-3114

Author(s):

Umesh Kathad ◽

Yuvanesh Vedaraju ◽

Aditya Kulkarni ◽

Gregory Tobin ◽

Panna Sharma

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Machine Learning ◽

Prediction Accuracy ◽

Response Prediction ◽

Supervised Machine Learning ◽

Clinical Validation ◽

Tumor Type ◽

Patient Records ◽

Number Of Patients

3114 Background: The Response Algorithm for Drug positioning and Rescue (RADR) technology is Lantern Pharma's proprietary Artificial Intelligence (Al)-based machine learning approach for biomarker identification and patient stratification. RADR is a combination of three automated modules working sequentially to generate drug- and tumor type-specific gene signatures predictive of response. Methods: RADR integrates genomics, drug sensitivity and systems biology inputs with supervised machine learning strategies and generates gene expression-based responder/ non-responder profiles for specific tumor indications with high accuracy, in addition to identification of new correlations of genetic biomarkers with drug activity. Pre-treatment patient gene expression profiles along with corresponding treatment outcomes were used as algorithm inputs. Model training was typically performed using an initial set of genes derived from cancer cell line data when available, and further applied to patient data for model tuning, cross-validation and final gene signature development. Model testing and performance computation were carried out on patient records held out as blinded datasets. Response prediction accuracy and sensitivity were among the model performance metrics calculated. Results: On average, RADR achieved a response prediction accuracy of 80% during clinical validation. We present retrospective analyses performed as part of RADR validation using more than 10 independent datasets of patients from selected cancer types treated with approved drugs including chemotherapy, targeted therapy and immunotherapy agents. For an instance, the application of the RADR program to a Paclitaxel trial in breast cancer patients could have potentially reduced the number of patients in the treatment arm from 92 unselected patients to 24 biomarker-selected patients to produce the same number of responders. Also, we cite published evidence correlating genes from RADR derived biomarkers with increased Paclitaxel sensitivity in breast cancer. Conclusions: The value of RADR platform architecture is derived from its validation through the analysis of about ~17 million oncology-specific clinical data points, and ~1000 patient records. By implementing unique biological, statistical and machine learning workflows, Lantern Pharma's RADR technology is capable of deriving robust biomarker panels for pre-selecting true responders for recruitment into clinical trials which may improve the success rate of oncology drug approvals.

Download Full-text

A new computational drug repurposing method using established disease–drug pair knowledge

Bioinformatics ◽

10.1093/bioinformatics/btz156 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3672-3678 ◽

Cited By ~ 8

Author(s):

Nafiseh Saberian ◽

Azam Peyvandipour ◽

Michele Donato ◽

Sahar Ansari ◽

Sorin Draghici

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Large Scale ◽

Expression Profiles ◽

Drug Repurposing ◽

Supervised Machine Learning ◽

Supplementary Information ◽

Sources Of Information ◽

Approved Drugs ◽

Fda Approved Drugs

Abstract Motivation Drug repurposing is a potential alternative to the classical drug discovery pipeline. Repurposing involves finding novel indications for already approved drugs. In this work, we present a novel machine learning-based method for drug repurposing. This method explores the anti-similarity between drugs and a disease to uncover new uses for the drugs. More specifically, our proposed method takes into account three sources of information: (i) large-scale gene expression profiles corresponding to human cell lines treated with small molecules, (ii) gene expression profile of a human disease and (iii) the known relationship between Food and Drug Administration (FDA)-approved drugs and diseases. Using these data, our proposed method learns a similarity metric through a supervised machine learning-based algorithm such that a disease and its associated FDA-approved drugs have smaller distance than the other disease-drug pairs. Results We validated our framework by showing that the proposed method incorporating distance metric learning technique can retrieve FDA-approved drugs for their approved indications. Once validated, we used our approach to identify a few strong candidates for repurposing. Availability and implementation The R scripts are available on demand from the authors. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A deep learning and novelty detection framework for rapid phenotyping in high-content screening

Molecular Biology of the Cell ◽

10.1091/mbc.e17-05-0333 ◽

2017 ◽

Vol 28 (23) ◽

pp. 3428-3436 ◽

Cited By ~ 34

Author(s):

Christoph Sommer ◽

Rudolf Hoefler ◽

Matthias Samwer ◽

Daniel W. Gerlich

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Novelty Detection ◽

A Priori ◽

Mitotic Cell ◽

Supervised Machine Learning ◽

High Content Screening ◽

Data Sets ◽

User Training

Supervised machine learning is a powerful and widely used method for analyzing high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.

Download Full-text

Towards Neuromorphic Learning Machines Using Emerging Memory Devices with Brain-Like Energy Efficiency

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea8040034 ◽

2018 ◽

Vol 8 (4) ◽

pp. 34 ◽

Cited By ~ 10

Author(s):

Vishal Saxena ◽

Xinyu Wu ◽

Ira Srivastava ◽

Kehan Zhu

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

Deep Learning ◽

Large Scale ◽

Learning Algorithms ◽

Memory Devices ◽

Cognitive Computing ◽

Mixed Signal ◽

Von Neumann ◽

Emerging Memory

The ongoing revolution in Deep Learning is redefining the nature of computing that is driven by the increasing amount of pattern classification and cognitive tasks. Specialized digital hardware for deep learning still holds its predominance due to the flexibility offered by the software implementation and maturity of algorithms. However, it is being increasingly desired that cognitive computing occurs at the edge, i.e., on hand-held devices that are energy constrained, which is energy prohibitive when employing digital von Neumann architectures. Recent explorations in digital neuromorphic hardware have shown promise, but offer low neurosynaptic density needed for scaling to applications such as intelligent cognitive assistants (ICA). Large-scale integration of nanoscale emerging memory devices with Complementary Metal Oxide Semiconductor (CMOS) mixed-signal integrated circuits can herald a new generation of Neuromorphic computers that will transcend the von Neumann bottleneck for cognitive computing tasks. Such hybrid Neuromorphic System-on-a-chip (NeuSoC) architectures promise machine learning capability at chip-scale form factor, and several orders of magnitude improvement in energy efficiency. Practical demonstration of such architectures has been limited as performance of emerging memory devices falls short of the expected behavior from the idealized memristor-based analog synapses, or weights, and novel machine learning algorithms are needed to take advantage of the device behavior. In this article, we review the challenges involved and present a pathway to realize large-scale mixed-signal NeuSoCs, from device arrays and circuits to spike-based deep learning algorithms with ‘brain-like’ energy-efficiency.

Download Full-text

Predicting hepatitis B virus–positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning

Nature Medicine ◽

10.1038/nm843 ◽

2003 ◽

Vol 9 (4) ◽

pp. 416-423 ◽

Cited By ~ 574

Author(s):

Qing-Hai Ye ◽

Lun-Xiu Qin ◽

Marshonna Forgues ◽

Ping He ◽

Jin Woo Kim ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Hepatitis B Virus ◽

Hepatitis B ◽

Gene Expression Profiling ◽

Expression Profiling ◽

Supervised Machine Learning ◽

Hepatocellular Carcinomas ◽

B Virus

Download Full-text

Breast Cancer Type Classification Using Machine Learning

Journal of Personalized Medicine ◽

10.3390/jpm11020061 ◽

2021 ◽

Vol 11 (2) ◽

pp. 61

Author(s):

Jiande Wu ◽

Chindo Hicks

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Machine Learning ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

Genomic Research ◽

Support Vector ◽

Cancer Type ◽

Classification Models

Background: Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical management of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data. Methods: We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets. Results: Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated. Conclusions: The prediction results show that ML algorithms are efficient and can be used for classification of breast cancer into triple negative and non-triple negative breast cancer types.

Download Full-text

A Physics-Infused Deep Learning Model for the Prediction of Refractive Indices and Its Use for the Large-Scale Screening of Organic Compound Space

10.26434/chemrxiv.8796950 ◽

2019 ◽

Author(s):

Mojtaba Haghighatlari ◽

Gaurav Vishwakarma ◽

Mohammad Atif Faiz Afzal ◽

Johannes Hachmann

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Organic Molecules ◽

Learning Model ◽

Training Data ◽

Refractive Indices ◽

Learning Models ◽

Deep Learning Model ◽

Machine Learning Models

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text