Biomarker discovery studies for patient stratification using machine learning analysis of omics data: a scoping review

ObjectiveTo review biomarker discovery studies using omics data for patient stratification which led to clinically validated FDA-cleared tests or laboratory developed tests, in order to identify common characteristics and derive recommendations for future biomarker projects.DesignScoping review.MethodsWe searched PubMed, EMBASE and Web of Science to obtain a comprehensive list of articles from the biomedical literature published between January 2000 and July 2021, describing clinically validated biomarker signatures for patient stratification, derived using statistical learning approaches. All documents were screened to retain only peer-reviewed research articles, review articles or opinion articles, covering supervised and unsupervised machine learning applications for omics-based patient stratification. Two reviewers independently confirmed the eligibility. Disagreements were solved by consensus. We focused the final analysis on omics-based biomarkers which achieved the highest level of validation, that is, clinical approval of the developed molecular signature as a laboratory developed test or FDA approved tests.ResultsOverall, 352 articles fulfilled the eligibility criteria. The analysis of validated biomarker signatures identified multiple common methodological and practical features that may explain the successful test development and guide future biomarker projects. These include study design choices to ensure sufficient statistical power for model building and external testing, suitable combinations of non-targeted and targeted measurement technologies, the integration of prior biological knowledge, strict filtering and inclusion/exclusion criteria, and the adequacy of statistical and machine learning methods for discovery and validation.ConclusionsWhile most clinically validated biomarker models derived from omics data have been developed for personalised oncology, first applications for non-cancer diseases show the potential of multivariate omics biomarker design for other complex disorders. Distinctive characteristics of prior success stories, such as early filtering and robust discovery approaches, continuous improvements in assay design and experimental measurement technology, and rigorous multicohort validation approaches, enable the derivation of specific recommendations for future studies.

Download Full-text

Machine learning for single cell genomics data analysis

10.1101/2021.02.04.429763 ◽

2021 ◽

Author(s):

Félix Raimundo ◽

Laetitia Papaxanthos ◽

Céline Vallot ◽

Jean-Philippe Vert

Keyword(s):

Machine Learning ◽

Single Cell ◽

Network Inference ◽

Method Development ◽

Biological Knowledge ◽

Omics Data ◽

Gene Regulatory Network Inference ◽

Multimodal Data ◽

Low Dimensional ◽

Type Classification

AbstractSingle-cell omics technologies produce large quantities of data describing the genomic, transcriptomic or epigenomic profiles of many individual cells in parallel. In order to infer biological knowledge and develop predictive models from these data, machine learning (ML)-based model are increasingly used due to their flexibility, scalability, and impressive success in other fields. In recent years, we have seen a surge of new ML-based method development for low-dimensional representations of single-cell omics data, batch normalization, cell type classification, trajectory inference, gene regulatory network inference or multimodal data integration. To help readers navigate this fast-moving literature, we survey in this review recent advances in ML approaches developed to analyze single-cell omics data, focusing mainly on peer-reviewed publications published in the last two years (2019-2020).

Download Full-text

Applications of Machine Learning in Drug Discovery II: Biomarker Discovery, Patient Stratification and Pharmacoeconomics

Artificial Intelligence in Oncology Drug Discovery and Development ◽

10.5772/intechopen.93160 ◽

2020 ◽

Author(s):

John W. Cassidy

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Biomarker Discovery ◽

Patient Stratification ◽

Applications Of Machine Learning

Download Full-text

Practicing precision medicine with intelligently integrative clinical and multi-omics data analysis

Human Genomics ◽

10.1186/s40246-020-00287-z ◽

2020 ◽

Vol 14 (1) ◽

Author(s):

Zeeshan Ahmed

Keyword(s):

Machine Learning ◽

Precision Medicine ◽

Ethical Issues ◽

Predictive Biomarkers ◽

Heterogeneous Data ◽

Prognostic Models ◽

Omics Data ◽

Clinical Predictors ◽

Personalized Care ◽

Complex Disorders

Abstract Precision medicine aims to empower clinicians to predict the most appropriate course of action for patients with complex diseases like cancer, diabetes, cardiomyopathy, and COVID-19. With a progressive interpretation of the clinical, molecular, and genomic factors at play in diseases, more effective and personalized medical treatments are anticipated for many disorders. Understanding patient’s metabolomics and genetic make-up in conjunction with clinical data will significantly lead to determining predisposition, diagnostic, prognostic, and predictive biomarkers and paths ultimately providing optimal and personalized care for diverse, and targeted chronic and acute diseases. In clinical settings, we need to timely model clinical and multi-omics data to find statistical patterns across millions of features to identify underlying biologic pathways, modifiable risk factors, and actionable information that support early detection and prevention of complex disorders, and development of new therapies for better patient care. It is important to calculate quantitative phenotype measurements, evaluate variants in unique genes and interpret using ACMG guidelines, find frequency of pathogenic and likely pathogenic variants without disease indicators, and observe autosomal recessive carriers with a phenotype manifestation in metabolome. Next, ensuring security to reconcile noise, we need to build and train machine-learning prognostic models to meaningfully process multisource heterogeneous data to identify high-risk rare variants and make medically relevant predictions. The goal, today, is to facilitate implementation of mainstream precision medicine to improve the traditional symptom-driven practice of medicine, and allow earlier interventions using predictive diagnostics and tailoring better-personalized treatments. We strongly recommend automated implementation of cutting-edge technologies, utilizing machine learning (ML) and artificial intelligence (AI) approaches for the multimodal data aggregation, multifactor examination, development of knowledgebase of clinical predictors for decision support, and best strategies for dealing with relevant ethical issues.

Download Full-text

Machine learning for precision medicine forecasts and challenges when incorporating non omics and omics data

Intelligent Decision Technologies ◽

10.3233/idt-200044 ◽

2021 ◽

Vol 15 (1) ◽

pp. 69-85

Author(s):

J. Susymary ◽

P. Deepalakshmi

Keyword(s):

Machine Learning ◽

Precision Medicine ◽

Data Analytics ◽

Forecast Model ◽

Biological Knowledge ◽

Omics Data ◽

Future Directions ◽

Hard Data ◽

Data Limitations ◽

Future Project

Precision Medicine has emerged as a preventive, diagnostic and treatment tool to approach human diseases in a personalized manner. Since precision medicine incorporates omics data and knowledge in personal health records, people who live in industrially polluted areas have an advantage in the medicinal field. Integration of non-omics data and related biological knowledge in term omics data is a reality. The heterogenic characteristics of non-omics data and high dimensional omics data makes the integration challengeable. Hard data analytics problems create better opportunities in analytics. This review cut across the boundaries of machine learning models for the eventual development of a successful precision medicine forecast model, different strategies for the integration of non-omics data and omics data, limitations and challenges in data integration, and future directions for the precision medicine forecasts. The literature also discusses non-omics data, diseases associated with air pollutants, and omics data. This information gives insight to the integrated data analytics and their application in future project implications. It intends to motivate researchers and precision medicine forecast model developers in a global integrative analytical approach.

Download Full-text

Computational Methods for Structure-to-Function Analysis of Diet-Derived Catechins-Mediated Targeting of In Vitro Vasculogenic Mimicry

Cancer Informatics ◽

10.1177/11769351211009229 ◽

2021 ◽

Vol 20 ◽

pp. 117693512110092

Author(s):

Abicumaran Uthamacumaran ◽

Narjara Gonzalez Suarez ◽

Abdoulaye Baniré Diallo ◽

Borhane Annabi

Keyword(s):

Machine Learning ◽

Cancer Cells ◽

Structural Changes ◽

Function Analysis ◽

Vasculogenic Mimicry ◽

Machine Learning Algorithms ◽

Emergent Behavior ◽

Molecular Signature ◽

Ovarian Cancer Cells

Background: Vasculogenic mimicry (VM) is an adaptive biological phenomenon wherein cancer cells spontaneously self-organize into 3-dimensional (3D) branching network structures. This emergent behavior is considered central in promoting an invasive, metastatic, and therapy resistance molecular signature to cancer cells. The quantitative analysis of such complex phenotypic systems could require the use of computational approaches including machine learning algorithms originating from complexity science. Procedures: In vitro 3D VM was performed with SKOV3 and ES2 ovarian cancer cells cultured on Matrigel. Diet-derived catechins disruption of VM was monitored at 24 hours with pictures taken with an inverted microscope. Three computational algorithms for complex feature extraction relevant for 3D VM, including 2D wavelet analysis, fractal dimension, and percolation clustering scores were assessed coupled with machine learning classifiers. Results: These algorithms demonstrated the structure-to-function galloyl moiety impact on VM for each of the gallated catechin tested, and shown applicable in quantifying the drug-mediated structural changes in VM processes. Conclusions: Our study provides evidence of how appropriate 3D VM compression and feature extractors coupled with classification/regression methods could be efficient to study in vitro drug-induced perturbation of complex processes. Such approaches could be exploited in the development and characterization of drugs targeting VM.

Download Full-text

Machine learning for the analysis of multi-omics data

Methods ◽

10.1016/j.ymeth.2021.02.005 ◽

2021 ◽

Author(s):

Yanni Sun

Keyword(s):

Machine Learning ◽

Omics Data

Download Full-text

Multiple Sclerosis Biomarker Discoveries by Proteomics and Metabolomics Approaches

Biomarker Insights ◽

10.1177/11772719211013352 ◽

2021 ◽

Vol 16 ◽

pp. 117727192110133

Author(s):

Ameneh Jafari ◽

Amirhesam Babajani ◽

Mostafa Rezaei-Tavirani

Keyword(s):

Multiple Sclerosis ◽

Complex Disease ◽

Biomarker Discovery ◽

Response To Treatment ◽

Inflammatory Disorder ◽

Protein Marker ◽

Complex Disorders ◽

New Information ◽

The Central Nervous System ◽

Monitoring Treatment

Multiple sclerosis (MS) is an autoimmune inflammatory disorder of the central nervous system (CNS) resulting in demyelination and axonal loss in the brain and spinal cord. The precise pathogenesis and etiology of this complex disease are still a mystery. Despite many studies that have been aimed to identify biomarkers, no protein marker has yet been approved for MS. There is urgently needed for biomarkers, which could clarify pathology, monitor disease progression, response to treatment, and prognosis in MS. Proteomics and metabolomics analysis are powerful tools to identify putative and novel candidate biomarkers. Different human compartments analysis using proteomics, metabolomics, and bioinformatics approaches has generated new information for further clarification of MS pathology, elucidating the mechanisms of the disease, finding new targets, and monitoring treatment response. Overall, omics approaches can develop different therapeutic and diagnostic aspects of complex disorders such as multiple sclerosis, from biomarker discovery to personalized medicine.

Download Full-text

A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research

International Journal of Molecular Sciences ◽

10.3390/ijms22062822 ◽

2021 ◽

Vol 22 (6) ◽

pp. 2822

Author(s):

Efstathios Iason Vlachavas ◽

Jonas Bohn ◽

Frank Ückert ◽

Sylvia Nürnberg

Keyword(s):

Cancer Research ◽

Clinical Information ◽

Disease Diagnosis ◽

Molecular Data ◽

Molecular Networks ◽

Biological Knowledge ◽

Omics Data ◽

Translational Cancer Research ◽

Using Data ◽

Biological Entities

Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.

Download Full-text

FGF23, a novel muscle biomarker detected in the early stages of ALS

Scientific Reports ◽

10.1038/s41598-021-91496-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ying Si ◽

Mohamed Kazamel ◽

Michael Benatar ◽

Joanne Wuu ◽

Yuri Kwon ◽

...

Keyword(s):

Biomarker Discovery ◽

Clinical Symptoms ◽

Early Stage ◽

Fold Increase ◽

Progressive Increase ◽

Molecular Signature ◽

Muscle Membrane ◽

Sequencing Project ◽

Progressive Muscle Weakness ◽

End Stage

AbstractAmyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease characterized by progressive muscle weakness. Skeletal muscle is a prime source for biomarker discovery since it is one of the earliest sites to manifest disease pathology. From a prior RNA sequencing project, we identified FGF23 as a potential muscle biomarker in ALS. Here, we validate this finding with a large collection of ALS muscle samples and found a 13-fold increase over normal controls. FGF23 was also increased in the SOD1G93A mouse, beginning at a very early stage and well before the onset of clinical symptoms. FGF23 levels progressively increased through end-stage in the mouse. Immunohistochemistry of ALS muscle showed prominent FGF23 immunoreactivity in the endomysial connective tissue and along the muscle membrane and was significantly higher around grouped atrophic fibers compared to non-atrophic fibers. ELISA of plasma samples from the SOD1G93A mouse showed an increase in FGF23 at end-stage whereas no increase was detected in a large cohort of ALS patients. In conclusion, FGF23 is a novel muscle biomarker in ALS and joins a molecular signature that emerges in very early preclinical stages. The early appearance of FGF23 and its progressive increase with disease progression offers a new direction for exploring the molecular basis and response to the underlying pathology of ALS.

Download Full-text