scholarly journals Outcome‐guided sparse K‐means for disease subtype discovery via integrating phenotypic data with high‐dimensional transcriptomic data

Author(s):  
Lingsong Meng ◽  
Dorina Avram ◽  
George Tseng ◽  
Zhiguang Huo
2017 ◽  
Author(s):  
Zeya Wang ◽  
Shaolong Cao ◽  
Jeffrey S. Morris ◽  
Jaeil Ahn ◽  
Rongjie Liu ◽  
...  

AbstractTranscriptomic deconvolution in cancer and other heterogeneous tissues remains challenging. Available methods lack the ability to estimate both component-specific proportions and expression profiles for individual samples. We present DeMixT, a new tool to deconvolve high dimensional data from mixtures of more than two components. DeMixT implements an iterated conditional mode algorithm and a novel gene-set-based component merging approach to improve accuracy. In a series of experimental validation studies and application to TCGA data, DeMixT showed high accuracy. Improved deconvolution is an important step towards linking tumor transcriptomic data with clinical outcomes. An R package, scripts and data are available: https://github.com/wwylab/DeMixT/.


2019 ◽  
Vol 10 (6) ◽  
pp. 944-952 ◽  
Author(s):  
Stephanie Andraos ◽  
Melissa Wake ◽  
Richard Saffery ◽  
David Burgner ◽  
Martin Kussmann ◽  
...  

ABSTRACT Diet and lifestyle are vital to population health, but their true contribution is difficult to quantify using traditional methods. Nutrient–health relations are typically based on epidemiological associations that are assessed at the population level, traditionally using self-reported dietary and lifestyle data. Unfortunately, such measures are inherently inaccurate. New technologies such as metabolomics can measure nutritional and micronutrient profiles in body fluids, providing objective evaluation of nutritional status. A critical step toward accurate health prediction models would be the building of integrated repositories of nutritional measures combining subjective methods of reporting with objective metabolomics profiles and precise phenotypic data. Here we outline a roadmap to achieve this goal and discuss both the advantages and risks of this approach. We also highlight the uncertain associations between the complexity of high-dimensional data generated in ‘omics research (along with the public confusion this may engender) and the rapid adoption of ‘omics approaches by nutrition and health companies to develop nutritional products and services.


2017 ◽  
Vol 11 (3) ◽  
pp. e201700047 ◽  
Author(s):  
Valeria Tafintseva ◽  
Evelyne Vigneau ◽  
Volha Shapaval ◽  
Véronique Cariou ◽  
El Mostafa Qannari ◽  
...  

Author(s):  
Mengyi Sun ◽  
Jianzhi Zhang

Abstract Organisms face tradeoffs in performing multiple tasks. Identifying the optimal phenotypes maximizing the organismal fitness (or Pareto front) and inferring the relevant tasks allow testing phenotypic adaptations and help delineate evolutionary constraints, tradeoffs, and critical fitness components, so are of broad interest. It has been proposed that Pareto fronts can be identified from high-dimensional phenotypic data, including molecular phenotypes such as gene expression levels, by fitting polytopes (lines, triangles, tetrahedrons, etc.), and a program named ParTI was recently introduced for this purpose. ParTI has identified Pareto fronts and inferred phenotypes best for individual tasks (or archetypes) from numerous datasets such as the beak morphologies of Darwin’s finches and mRNA concentrations in human tumors, implying evolutionary optimizations of the involved traits. Nevertheless, the reliabilities of these findings are unknown. Using real and simulated data that lack evolutionary optimization, we here report extremely high false positive rates of ParTI. The errors arise from phylogenetic relationships or population structures of the organisms analyzed and the flexibility of data analysis in ParTI that is equivalent to p-hacking. Because these problems are virtually universal, our findings cast doubt on almost all ParTI-based results and suggest that reliably identifying Pareto fronts and archetypes from high-dimensional phenotypic data is currently generally difficult.


2018 ◽  
Author(s):  
Juan A. Botía ◽  
Sebastian Guelfi ◽  
David Zhang ◽  
Karishma D’Sa ◽  
Regina Reynolds ◽  
...  

AbstractTo facilitate precision medicine and neuroscience research, we developed a machine-learning technique that scores the likelihood that a gene, when mutated, will cause a neurological phenotype. We analysed 1126 genes relating to 25 subtypes of Mendelian neurological disease defined by Genomics England (March 2017) together with 154 gene-specific features capturing genetic variation, gene structure and tissue-specific expression and co-expression. We randomly re-sampled genes with no known disease association to develop bootstrapped decision-tree models, which were integrated to generate a decision tree-based ensemble for each disease subtype. Genes generating larger numbers of distinct transcripts and with higher probability of having missense mutations in normal individuals were significantly more likely to cause neurological diseases. Using mouse-mutant phenotypic data we tested the accuracy of gene-phenotype predictions and found that for 88% of all disease subtypes there was a significant enrichment of relevant phenotypic abnormalities when predicted genes were mutated in mice and in many cases mutations produced specific and matching phenotypes. Furthermore, using only newly identified genes included in the Genomics England November 2017 release, we assessed our gene-phenotype predictions and showed an 8.3 fold enrichment relative to chance for correct predictions. Thus, we demonstrate both the explanatory and predictive power of machine-learning-based models in neurological disease.


2021 ◽  
Author(s):  
Moritz D Luerig

Digital images are a ubiquitous way to represent phenotypes. More and more ecologists and evolutionary biologists are using images to capture and analyze high dimensional phenotypic data to understand complex developmental and evolutionary processes. As a consequence, images are being collected at ever increasing rates, already outpacing our abilities for processing and analysis of the contained phenotypic information. phenopype is a high throughput phenotyping package for the programming language Python to support ecologists and evolutionary biologists in extracting high dimensional phenotypic data from digital images. phenopype integrates existing state-of-the-art computer vision functions (using the OpenCV library as a backend), GUI-based interactions, and a project management ecosystem to facilitate rapid data collection and reproducibility. phenopype offers three different workflow types that support users during different stages of scientific image analysis (prototyping, low-throughput, and high-throughput). In the high-throughput workflow, users interact with human-readable YAML configuration files to effectively modify settings for different images. These settings are stored along with processed images and results, so that the acquired phenotypic information becomes highly reproducible. phenopype combines the advantages of the Python environment, with its state-of-the-art computer vision, array manipulation and data handling libraries, and basic GUI capabilities, which allow users to step into the automatic workflow when necessary. Overall, phenopype is aiming to augment, rather than replace the utility of existing Python CV libraries, allowing biologists to focus on rapid and reproducible data collection.


2021 ◽  
Author(s):  
Magda Lourda ◽  
Majda Dzidic ◽  
Laura Hertwig ◽  
Helena Bergsten ◽  
Laura M. Palma Medina ◽  
...  

AbstractSince the outset of the COVID-19 pandemic, increasing evidence suggests that the innate immune responses play an important role in the disease development. A dysregulated inflammatory state has been proposed as key driver of clinical complications in COVID-19, with a potential detrimental role of granulocytes. However, a comprehensive phenotypic description of circulating granulocytes in SARS-CoV-2-infected patients is lacking. In this study, we used high-dimensional flow cytometry for granulocyte immunophenotyping in peripheral blood collected from COVID-19 patients during acute and convalescent phases. Severe COVID-19 was associated with increased levels of both mature and immature neutrophils, and decreased counts of eosinophils and basophils. Distinct immunotypes were evident in COVID-19 patients, with altered expression of several receptors involved in activation, adhesion and migration of granulocytes (e.g. CD62L, CD11a/b, CD69, CD63, CXCR4). Paired sampling revealed recovery and phenotypic restoration of the granulocytic signature in the convalescent phase. The identified granulocyte immunotypes correlated with distinct sets of soluble inflammatory markers supporting pathophysiologic relevance. Furthermore, clinical features, including multi-organ dysfunction and respiratory function, could be predicted using combined laboratory measurements and immunophenotyping. This study provides a comprehensive granulocyte characterization in COVID-19 and reveals specific immunotypes with potential predictive value for key clinical features associated with COVID-19.SignificanceAccumulating evidence shows that granulocytes are key modulators of the immune response to SARS-CoV-2 infection and their dysregulation could significantly impact COVID-19 severity and patient recovery after virus clearance. In the present study, we identify selected immune traits in neutrophil, eosinophil and basophil subsets associated to severity of COVID-19 and to peripheral protein profiles. Moreover, computational modeling indicates that the combined use of phenotypic data and laboratory measurements can effectively predict key clinical outcomes in COVID-19 patients. Finally, patient-matched longitudinal analysis shows phenotypic normalization of granulocyte subsets 4 months after hospitalization. Overall, in this work we extend the current understanding of the distinct contribution of granulocyte subsets to COVID-19 pathogenesis.


Sign in / Sign up

Export Citation Format

Share Document