omic data
Recently Published Documents





2022 ◽  
Malvika Sudhakar ◽  
Raghunathan Rengaswamy ◽  
Karthik Raman

The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require a large number of samples to produce reliable results. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumour progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify the perturbation in the network. Our method is the first machine learning model to classify genes as tumour suppressor gene (TSG), oncogene (OG) or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalised Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalised driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labelling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared to mutation and expression data, achieving an accuracy >0.99 for BRCA, LUAD and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD and LUAD cancer types reveal commonly altered genes such as TP53, and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9 and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalised driver genes as TSGs and OGs and also identifies rare driver genes. PIVOT is available at

2021 ◽  
Lu Lu ◽  
Joshua D Welch

Motivation: LIGER is a widely-used R package for single-cell multi-omic data integration. However, many users prefer to analyze their single-cell datasets in Python, which offers an attractive syntax and highly-optimized scientific computing libraries for increased efficiency. Results: We developed PyLiger, a Python package for integrating single-cell multi-omic datasets. PyLiger offers faster performance than the previous R implementation (2-5× speedup), interoperability with AnnData format, flexible on-disk or in-memory analysis capability, and new functionality for gene ontology enrichment analysis. The on-disk capability enables analysis of arbitrarily large single-cell datasets using fixed memory.

Mathematics ◽  
2021 ◽  
Vol 9 (24) ◽  
pp. 3233
Inmaculada Barranco-Chamorro ◽  
Rosa M. Carrillo-García

Confusion matrices are numerical structures that deal with the distribution of errors between different classes or categories in a classification process. From a quality perspective, it is of interest to know if the confusion between the true class A and the class labelled as B is not the same as the confusion between the true class B and the class labelled as A. Otherwise, a problem with the classifier, or of identifiability between classes, may exist. In this paper two statistical methods are considered to deal with this issue. Both of them focus on the study of the off-diagonal cells in confusion matrices. First, McNemar-type tests to test the marginal homogeneity are considered, which must be followed from a one versus all study for every pair of categories. Second, a Bayesian proposal based on the Dirichlet distribution is introduced. This allows us to assess the probabilities of misclassification in a confusion matrix. Three applications, including a set of omic data, have been carried out by using the software R.

2021 ◽  
Leonardo Duarte Rodrigues Alexandre ◽  
Rafael S. Costa ◽  
Rui Henriques

Motivation: Pattern discovery and subspace clustering play a central role in the biological domain, supporting for instance putative regulatory module discovery from omic data for both descriptive and predictive ends. In the presence of target variables (e.g. phenotypes), regulatory patterns should further satisfy delineate discriminative power properties, well-established in the presence of categorical outcomes, yet largely disregarded for numerical outcomes, such as risk profiles and quantitative phenotypes. Results: DISA (Discriminative and Informative Subspace Assessment), a Python software package, is proposed to assess patterns in the presence of numerical outcomes using well-established measures together with a novel principle able to statistically assess the correlation gain of the subspace against the overall space. Results confirm the possibility to soundly extend discriminative criteria towards numerical outcomes without the drawbacks well-associated with discretization procedures. A case study is provided to show the properties of the proposed method. Availability: DISA is freely available at under the MIT license.

2021 ◽  
Sarah Hannah Alves ◽  
Cristovao Antunes de Lanna ◽  
Karla Tereza Figueiredo Leite ◽  
Mariana Boroni ◽  
Marley Maria Bernardes Rebuzzi Vellasco

2021 ◽  
Vol 12 (1) ◽  
J. M. Beman ◽  
S. M. Vargas ◽  
J. M. Wilson ◽  
E. Perez-Coronel ◽  
J. S. Karolewski ◽  

AbstractOceanic oxygen minimum zones (OMZs) are globally significant sites of biogeochemical cycling where microorganisms deplete dissolved oxygen (DO) to concentrations <20 µM. Amid intense competition for DO in these metabolically challenging environments, aerobic nitrite oxidation may consume significant amounts of DO and help maintain low DO concentrations, but this remains unquantified. Using parallel measurements of oxygen consumption rates and 15N-nitrite oxidation rates applied to both water column profiles and oxygen manipulation experiments, we show that the contribution of nitrite oxidation to overall DO consumption systematically increases as DO declines below 2 µM. Nitrite oxidation can account for all DO consumption only under DO concentrations <393 nM found in and below the secondary chlorophyll maximum. These patterns are consistent across sampling stations and experiments, reflecting coupling between nitrate reduction and nitrite-oxidizing Nitrospina with high oxygen affinity (based on isotopic and omic data). Collectively our results demonstrate that nitrite oxidation plays a pivotal role in the maintenance and biogeochemical dynamics of OMZs.

2021 ◽  
Madalina Ciortan ◽  
Matthieu Defrance

Subspace clustering identifies multiple feature subspaces embedded in a dataset together with the underlying sample clusters. When applied to omic data, subspace clustering is a challenging task, as additional problems have to be addressed: the curse of dimensionality, the imperfect data quality and cluster separation, the presence of multiple subspaces representative of divergent views of the dataset, and the lack of consensus on the best clustering method. First, we propose a computational method discover to perform subspace clustering on tabular high dimensional data by maximizing the internal clustering score (i.e. cluster compactness) of feature subspaces. Our algorithm can be used in both unsupervised and semi-supervised settings. Secondly, by applying our method to a large set of omic datasets (i.e. microarray, bulk RNA-seq, scRNA-seq), we show that the subspace corresponding to the provided ground truth annotations is rarely the most compact one, as assumed by the methods maximizing the internal quality of clusters. Our results highlight the difficulty of fully validating subspace clusters (justified by the lack of feature annotations). Tested on identifying the ground-truth subspace, our method compared favorably with competing techniques on all datasets. Finally, we propose a suite of techniques to interpret the clustering results biologically in the absence of annotations. We demonstrate that subspace clustering can provide biologically meaningful sample-wise and feature-wise information, typically missed by traditional methods.

2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi214-vi214
Alina Pandele ◽  
Alison Woodward ◽  
Sophie Lankford ◽  
Donald Macarthur ◽  
Ian Kamaly-Asl ◽  

Abstract Ependymoma (EPN) is the second most common malignant paediatric brain tumour with a five-year survival rate of only 25% following relapse. While molecular heterogeneity between EPN tumours is well understood, little is known concerning spatially-distinct intratumour heterogeneity within patients. In this context, we present a multi-omics integration of expression data at transcriptomic and metabolomic levels revealing intratumour heterogeneity and novel therapeutic targets. Surgically resected ependymoma tissue from two epigenetic subgroups, posterior fossa-A (PF-A) and supratentorial RELA, were first homogenised and polar metabolites, lipids and RNA simultaneously extracted from the same cellular population. Using liquid chromatography-mass spectrometry (LC-MS) and RNAseq 115 metabolites and 1580 upregulated genes were identified between the two subgroups, therefore validating previously reported genetic clustering of these two subtypes. Sampling of anatomically distinct regions was performed between eight PF-A EPN patients and multi-omic data was compared across 28 intratumour regions, with at least 3 different regions per patient. Integration of genes and metabolites revealed 124 dysregulated metabolic pathways, encompassing 156 genes and 49 metabolites. A large number of interactions occur in the gluconeogenesis and glycine pathways in 6 out of 8 patients, putatively representing therapeutically relevant ubiquitous metabolic pathways critical for EPN survival. Each anatomical region also presented at least one unique gene-metabolite interaction demonstrating heterogeneity within and across PF-A EPN tumours. A subset of the eight most prevalent genes across patients (GAD1, NT5C, FBP1, FMO3, HK3, TALDO1, NT5E, ALDH3A1) were selected for in vitro metabolic assays using 10 repurposed cytotoxic agents against PF-A EPN cell lines derived from intratumour regions of the same patient. 5/8 genes map within the gluconeogenesis metabolic pathway, further highlighting its significance within PF-A EPN. This is the first instance where multi-omic data integration and intratumour heterogeneity has been investigated for paediatric EPN revealing novel potential targets in the context of gene-metabolite correlations.

Sign in / Sign up

Export Citation Format

Share Document