Outcome‐guided sparse K‐means for disease subtype discovery via integrating phenotypic data with high‐dimensional transcriptomic data

Quantifying and Comparing Phylogenetic Evolutionary Rates for Shape and Other High-Dimensional Phenotypic Data

Systematic Biology ◽

10.1093/sysbio/syt105 ◽

2014 ◽

Vol 63 (2) ◽

pp. 166-177 ◽

Cited By ~ 107

Author(s):

Dean C. Adams

Keyword(s):

Evolutionary Rates ◽

High Dimensional ◽

Phenotypic Data

Download Full-text

Transcriptome Deconvolution of Heterogeneous Tumor Samples with Immune Infiltration

10.1101/146795 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zeya Wang ◽

Shaolong Cao ◽

Jeffrey S. Morris ◽

Jaeil Ahn ◽

Rongjie Liu ◽

...

Keyword(s):

Experimental Validation ◽

Expression Profiles ◽

R Package ◽

High Accuracy ◽

High Dimensional ◽

Transcriptomic Data ◽

Conditional Mode ◽

Improve Accuracy ◽

Heterogeneous Tissues ◽

Heterogeneous Tumor

AbstractTranscriptomic deconvolution in cancer and other heterogeneous tissues remains challenging. Available methods lack the ability to estimate both component-specific proportions and expression profiles for individual samples. We present DeMixT, a new tool to deconvolve high dimensional data from mixtures of more than two components. DeMixT implements an iterated conditional mode algorithm and a novel gene-set-based component merging approach to improve accuracy. In a series of experimental validation studies and application to TCGA data, DeMixT showed high accuracy. Improved deconvolution is an important step towards linking tumor transcriptomic data with clinical outcomes. An R package, scripts and data are available: https://github.com/wwylab/DeMixT/.

Download Full-text

Perspective: Advancing Understanding of Population Nutrient–Health Relations via Metabolomics and Precision Phenotypes

Advances in Nutrition ◽

10.1093/advances/nmz045 ◽

2019 ◽

Vol 10 (6) ◽

pp. 944-952 ◽

Cited By ~ 5

Author(s):

Stephanie Andraos ◽

Melissa Wake ◽

Richard Saffery ◽

David Burgner ◽

Martin Kussmann ◽

...

Keyword(s):

Nutritional Status ◽

New Technologies ◽

Prediction Models ◽

Population Level ◽

Objective Evaluation ◽

High Dimensional ◽

Nutrition And Health ◽

Phenotypic Data ◽

The Public ◽

Diet And Lifestyle

ABSTRACT Diet and lifestyle are vital to population health, but their true contribution is difficult to quantify using traditional methods. Nutrient–health relations are typically based on epidemiological associations that are assessed at the population level, traditionally using self-reported dietary and lifestyle data. Unfortunately, such measures are inherently inaccurate. New technologies such as metabolomics can measure nutritional and micronutrient profiles in body fluids, providing objective evaluation of nutritional status. A critical step toward accurate health prediction models would be the building of integrated repositories of nutritional measures combining subjective methods of reporting with objective metabolomics profiles and precise phenotypic data. Here we outline a roadmap to achieve this goal and discuss both the advantages and risks of this approach. We also highlight the uncertain associations between the complexity of high-dimensional data generated in ‘omics research (along with the public confusion this may engender) and the rapid adoption of ‘omics approaches by nutrition and health companies to develop nutritional products and services.

Download Full-text

Hierarchical classification of microorganisms based on high-dimensional phenotypic data

Journal of Biophotonics ◽

10.1002/jbio.201700047 ◽

2017 ◽

Vol 11 (3) ◽

pp. e201700047 ◽

Cited By ~ 5

Author(s):

Valeria Tafintseva ◽

Evelyne Vigneau ◽

Volha Shapaval ◽

Véronique Cariou ◽

El Mostafa Qannari ◽

...

Keyword(s):

Hierarchical Classification ◽

High Dimensional ◽

Phenotypic Data

Download Full-text

Rampant false detection of adaptive phenotypic optimization by ParTI-based Pareto front inference

Molecular Biology and Evolution ◽

10.1093/molbev/msaa330 ◽

2020 ◽

Author(s):

Mengyi Sun ◽

Jianzhi Zhang

Keyword(s):

Pareto Front ◽

Simulated Data ◽

High Dimensional ◽

Phenotypic Data ◽

Cast Doubt ◽

Population Structures ◽

Molecular Phenotypes ◽

Almost All ◽

Pareto Fronts ◽

Gene Expression Levels

Abstract Organisms face tradeoffs in performing multiple tasks. Identifying the optimal phenotypes maximizing the organismal fitness (or Pareto front) and inferring the relevant tasks allow testing phenotypic adaptations and help delineate evolutionary constraints, tradeoffs, and critical fitness components, so are of broad interest. It has been proposed that Pareto fronts can be identified from high-dimensional phenotypic data, including molecular phenotypes such as gene expression levels, by fitting polytopes (lines, triangles, tetrahedrons, etc.), and a program named ParTI was recently introduced for this purpose. ParTI has identified Pareto fronts and inferred phenotypes best for individual tasks (or archetypes) from numerous datasets such as the beak morphologies of Darwin’s finches and mRNA concentrations in human tumors, implying evolutionary optimizations of the involved traits. Nevertheless, the reliabilities of these findings are unknown. Using real and simulated data that lack evolutionary optimization, we here report extremely high false positive rates of ParTI. The errors arise from phylogenetic relationships or population structures of the organisms analyzed and the flexibility of data analysis in ParTI that is equivalent to p-hacking. Because these problems are virtually universal, our findings cast doubt on almost all ParTI-based results and suggest that reliably identifying Pareto fronts and archetypes from high-dimensional phenotypic data is currently generally difficult.

Download Full-text

G2P: Using machine learning to understand and predict genes causing rare neurological disorders

10.1101/288845 ◽

2018 ◽

Cited By ~ 2

Author(s):

Juan A. Botía ◽

Sebastian Guelfi ◽

David Zhang ◽

Karishma D’Sa ◽

Regina Reynolds ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Neurological Disease ◽

Neurological Diseases ◽

Missense Mutations ◽

Specific Expression ◽

Phenotypic Data ◽

Disease Subtype ◽

Neurological Phenotype ◽

Significant Enrichment

AbstractTo facilitate precision medicine and neuroscience research, we developed a machine-learning technique that scores the likelihood that a gene, when mutated, will cause a neurological phenotype. We analysed 1126 genes relating to 25 subtypes of Mendelian neurological disease defined by Genomics England (March 2017) together with 154 gene-specific features capturing genetic variation, gene structure and tissue-specific expression and co-expression. We randomly re-sampled genes with no known disease association to develop bootstrapped decision-tree models, which were integrated to generate a decision tree-based ensemble for each disease subtype. Genes generating larger numbers of distinct transcripts and with higher probability of having missense mutations in normal individuals were significantly more likely to cause neurological diseases. Using mouse-mutant phenotypic data we tested the accuracy of gene-phenotype predictions and found that for 88% of all disease subtypes there was a significant enrichment of relevant phenotypic abnormalities when predicted genes were mutated in mice and in many cases mutations produced specific and matching phenotypes. Furthermore, using only newly identified genes included in the Genomics England November 2017 release, we assessed our gene-phenotype predictions and showed an 8.3 fold enrichment relative to chance for correct predictions. Thus, we demonstrate both the explanatory and predictive power of machine-learning-based models in neurological disease.

Download Full-text

phenopype: a phenotyping pipeline for Python

10.1101/2021.03.17.435781 ◽

2021 ◽

Author(s):

Moritz D Luerig

Keyword(s):

Computer Vision ◽

Data Collection ◽

High Throughput ◽

Digital Images ◽

State Of The Art ◽

High Dimensional ◽

Phenotypic Data ◽

Scientific Image ◽

Phenotypic Information ◽

High Throughput Phenotyping

Digital images are a ubiquitous way to represent phenotypes. More and more ecologists and evolutionary biologists are using images to capture and analyze high dimensional phenotypic data to understand complex developmental and evolutionary processes. As a consequence, images are being collected at ever increasing rates, already outpacing our abilities for processing and analysis of the contained phenotypic information. phenopype is a high throughput phenotyping package for the programming language Python to support ecologists and evolutionary biologists in extracting high dimensional phenotypic data from digital images. phenopype integrates existing state-of-the-art computer vision functions (using the OpenCV library as a backend), GUI-based interactions, and a project management ecosystem to facilitate rapid data collection and reproducibility. phenopype offers three different workflow types that support users during different stages of scientific image analysis (prototyping, low-throughput, and high-throughput). In the high-throughput workflow, users interact with human-readable YAML configuration files to effectively modify settings for different images. These settings are stored along with processed images and results, so that the acquired phenotypic information becomes highly reproducible. phenopype combines the advantages of the Python environment, with its state-of-the-art computer vision, array manipulation and data handling libraries, and basic GUI capabilities, which allow users to step into the automatic workflow when necessary. Overall, phenopype is aiming to augment, rather than replace the utility of existing Python CV libraries, allowing biologists to focus on rapid and reproducible data collection.

Download Full-text

High-dimensional profiling reveals phenotypic heterogeneity and disease-specific alterations of granulocytes in COVID-19

10.1101/2021.01.27.21250591 ◽

2021 ◽

Author(s):

Magda Lourda ◽

Majda Dzidic ◽

Laura Hertwig ◽

Helena Bergsten ◽

Laura M. Palma Medina ◽

...

Keyword(s):

Clinical Features ◽

High Dimensional ◽

Laboratory Measurements ◽

Phenotypic Data ◽

Altered Expression ◽

Peripheral Protein ◽

Clinical Complications ◽

Virus Clearance ◽

Inflammatory State ◽

And Migration

AbstractSince the outset of the COVID-19 pandemic, increasing evidence suggests that the innate immune responses play an important role in the disease development. A dysregulated inflammatory state has been proposed as key driver of clinical complications in COVID-19, with a potential detrimental role of granulocytes. However, a comprehensive phenotypic description of circulating granulocytes in SARS-CoV-2-infected patients is lacking. In this study, we used high-dimensional flow cytometry for granulocyte immunophenotyping in peripheral blood collected from COVID-19 patients during acute and convalescent phases. Severe COVID-19 was associated with increased levels of both mature and immature neutrophils, and decreased counts of eosinophils and basophils. Distinct immunotypes were evident in COVID-19 patients, with altered expression of several receptors involved in activation, adhesion and migration of granulocytes (e.g. CD62L, CD11a/b, CD69, CD63, CXCR4). Paired sampling revealed recovery and phenotypic restoration of the granulocytic signature in the convalescent phase. The identified granulocyte immunotypes correlated with distinct sets of soluble inflammatory markers supporting pathophysiologic relevance. Furthermore, clinical features, including multi-organ dysfunction and respiratory function, could be predicted using combined laboratory measurements and immunophenotyping. This study provides a comprehensive granulocyte characterization in COVID-19 and reveals specific immunotypes with potential predictive value for key clinical features associated with COVID-19.SignificanceAccumulating evidence shows that granulocytes are key modulators of the immune response to SARS-CoV-2 infection and their dysregulation could significantly impact COVID-19 severity and patient recovery after virus clearance. In the present study, we identify selected immune traits in neutrophil, eosinophil and basophil subsets associated to severity of COVID-19 and to peripheral protein profiles. Moreover, computational modeling indicates that the combined use of phenotypic data and laboratory measurements can effectively predict key clinical outcomes in COVID-19 patients. Finally, patient-matched longitudinal analysis shows phenotypic normalization of granulocyte subsets 4 months after hospitalization. Overall, in this work we extend the current understanding of the distinct contribution of granulocyte subsets to COVID-19 pathogenesis.

Download Full-text

P2-249: INVESTIGATION OF DNA METHYLATION, EPIGENETIC AGING AND NEUROPSYCHIATRIC SYMPTOMS AMONG OLDER ADULTS WITH AND WITHOUT AMNESTIC MILD COGNITIVE IMPAIRMENT USING HIGH-DIMENSIONAL MOLECULAR AND PHENOTYPIC DATA

Alzheimer s & Dementia ◽

10.1016/j.jalz.2019.06.2656 ◽

2019 ◽

Vol 15 ◽

pp. P678-P679

Author(s):

Chirag M. Vyas ◽

Jennifer R. Gatchel ◽

David Mischoulon ◽

Grace Chang ◽

Joann E. Manson ◽

...

Keyword(s):

Older Adults ◽

Dna Methylation ◽

Cognitive Impairment ◽

Mild Cognitive Impairment ◽

Neuropsychiatric Symptoms ◽

Amnestic Mild Cognitive Impairment ◽

High Dimensional ◽

Phenotypic Data ◽

Epigenetic Aging

Download Full-text

High-Dimensional Probability

10.1017/9781108231596 ◽

2018 ◽

Cited By ~ 101

Author(s):

Roman Vershynin

Keyword(s):

High Dimensional

Download Full-text