scholarly journals Machine Translation between paired Single Cell Multi Omics Data

2021 ◽  
Author(s):  
Xabier Martinez-de-Morentin ◽  
Sumeer A. Khan ◽  
Robert Lehmann ◽  
Jesper Tegner ◽  
David Gomez-Cabrero

AbstractSingle-cell multi-omics technologies enable profiling of several data-modalities from the same cell. We designed LIBRA, a Neural Network based framework, for learning translations between paired multi-omics profiles into a shared latent space. We demonstrate LIBRA to be state-of-the-art for multi-omics clustering. In addition, LIBRA is more robust with decreasing cell-numbers compared with existing tools. Training LIBRA on paired data-sets, LIBRA predicts multi-omic profiles using only a single data-modality from the same biological system.

2019 ◽  
Author(s):  
Chenling Xu ◽  
Romain Lopez ◽  
Edouard Mehlman ◽  
Jeffrey Regier ◽  
Michael I. Jordan ◽  
...  

AbstractAs single-cell transcriptomics becomes a mainstream technology, the natural next step is to integrate the accumulating data in order to achieve a common ontology of cell types and states. However, owing to various nuisance factors of variation, it is not straightforward how to compare gene expression levels across data sets and how to automatically assign cell type labels in a new data set based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of cohorts of single-cell RNA-seq data sets, while accounting for uncertainty caused by biological and measurement noise. We also introduce single-cell ANnotation using Variational Inference (scANVI), a semi-supervised variant of scVI designed to leverage any available cell state annotations — for instance when only one data set in a cohort is annotated, or when only a few cells in a single data set can be labeled using marker genes. We demonstrate that scVI and scANVI compare favorably to the existing methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings such as a hierarchical structure of cell state labels. We further show that different from existing methods, scVI and scANVI represent the integrated datasets with a single generative model that can be directly used for any probabilistic decision making task, using differential expression as our case study. scVI and scANVI are available as open source software and can be readily used to facilitate cell state annotation and help ensure consistency and reproducibility across studies.


Author(s):  
Hai Yang ◽  
Rui Chen ◽  
Dongdong Li ◽  
Zhe Wang

Abstract Motivation The discovery of cancer subtyping can help explore cancer pathogenesis, determine clinical actionability in treatment, and improve patients' survival rates. However, due to the diversity and complexity of multi-omics data, it is still challenging to develop integrated clustering algorithms for tumor molecular subtyping. Results We propose Subtype-GAN, a deep adversarial learning approach based on the multiple-input multiple-output neural network to model the complex omics data accurately. With the latent variables extracted from the neural network, Subtype-GAN uses consensus clustering and the Gaussian Mixture model to identify tumor samples' molecular subtypes. Compared with other state-of-the-art subtyping approaches, Subtype-GAN achieved outstanding performance on the benchmark data sets consisting of ∼4,000 TCGA tumors from 10 types of cancer. We found that on the comparison data set, the clustering scheme of Subtype-GAN is not always similar to that of the deep learning method AE but is identical to that of NEMO, MCCA, VAE, and other excellent approaches. Finally, we applied Subtype-GAN to the BRCA data set and automatically obtained the number of subtypes and the subtype labels of 1031 BRCA tumors. Through the detailed analysis, we found that the identified subtypes are clinically meaningful and show distinct patterns in the feature space, demonstrating the practicality of Subtype-GAN. Availability The source codes, the clustering results of Subtype-GAN across the benchmark data sets are available at https://github.com/haiyang1986/Subtype-GAN. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 12 (17) ◽  
pp. 2804
Author(s):  
Junmin Liu ◽  
Yunqiao Feng ◽  
Changsheng Zhou ◽  
Chunxia Zhang

Pansharpening is a typical image fusion problem, which aims to produce a high resolution multispectral (HRMS) image by integrating a high spatial resolution panchromatic (PAN) image with a low spatial resolution multispectral (MS) image. Prior arts have used either component substitution (CS)-based methods or multiresolution analysis (MRA)-based methods for this propose. Although they are simple and easy to implement, they usually suffer from spatial or spectral distortions and could not fully exploit the spatial and/or spectral information existed in PAN and MS images. By considering their complementary performances and with the goal of combining their advantages, we propose a pansharpening weight network (PWNet) to adaptively average the fusion results obtained by different methods. The proposed PWNet works by learning adaptive weight maps for different CS-based and MRA-based methods through an end-to-end trainable neural network (NN). As a result, the proposed PWN inherits the data adaptability or flexibility of NN, while maintaining the advantages of traditional methods. Extensive experiments on data sets acquired by three different kinds of satellites demonstrate the superiority of the proposed PWNet and its competitiveness with the state-of-the-art methods.


2011 ◽  
Vol 21 (04) ◽  
pp. 311-317 ◽  
Author(s):  
ALEXIS MARCANO-CEDEÑO ◽  
A. MARIN-DE-LA-BARCENA ◽  
J. JIMENEZ-TRILLO ◽  
J. A. PIÑUELA ◽  
D. ANDINA

The assessment of the risk of default on credit is important for financial institutions. Different Artificial Neural Networks (ANN) have been suggested to tackle the credit scoring problem, however, the obtained error rates are often high. In the search for the best ANN algorithm for credit scoring, this paper contributes with the application of an ANN Training Algorithm inspired by the neurons' biological property of metaplasticity. This algorithm is especially efficient when few patterns of a class are available, or when information inherent to low probability events is crucial for a successful application, as weight updating is overemphasized in the less frequent activations than in the more frequent ones. Two well-known and readily available such as: Australia and German data sets has been used to test the algorithm. The results obtained by AMMLP shown have been superior to state-of-the-art classification algorithms in credit scoring.


2021 ◽  
Author(s):  
Geert-Jan Huizing ◽  
Gabriel Peyré ◽  
Laura Cantini

AbstractThe recent advent of high-throughput single-cell molecular profiling is revolutionizing biology and medicine by unveiling the diversity of cell types and states contributing to development and disease. The identification and characterization of cellular heterogeneity is typically achieved through unsupervised clustering, which crucially relies on a similarity metric.We here propose the use of Optimal Transport (OT) as a cell-cell similarity metric for single-cell omics data. OT defines distances to compare, in a geometrically faithful way, high-dimensional data represented as probability distributions. It is thus expected to better capture complex relationships between features and produce a performance improvement over state-of-the-art metrics. To speed up computations and cope with the high-dimensionality of single-cell data, we consider the entropic regularization of the classical OT distance. We then extensively benchmark OT against state-of-the-art metrics over thirteen independent datasets, including simulated, scRNA-seq, scATAC-seq and single-cell DNA methylation data. First, we test the ability of the metrics to detect the similarity between cells belonging to the same groups (e.g. cell types, cell lines of origin). Then, we apply unsupervised clustering and test the quality of the resulting clusters.In our in-depth evaluation, OT is found to improve cell-cell similarity inference and cell clustering in all simulated and real scRNA-seq data, while its performances are comparable with Pearson correlation in scATAC-seq and single-cell DNA methylation data. All our analyses are reproducible through the OT-scOmics Jupyter notebook available at https://github.com/ComputationalSystemsBiology/OT-scOmics.


2021 ◽  
Author(s):  
Nafiseh Erfanian ◽  
A. Ali Heydari ◽  
Pablo Ianez ◽  
Afshin Derakhshani ◽  
Mohammad Ghasemigol ◽  
...  

Deep learning (DL) is a branch of machine learning (ML) capable of extracting high-level features from raw inputs in multiple stages. Compared to traditional ML, DL models have provided significant improvements across a range of domains and applications. Single-cell (SC) omics are often high-dimensional, sparse, and complex, making DL techniques ideal for analyzing and processing such data. We examine DL applications in a variety of single-cell omics (genomics, transcriptomics, proteomics, metabolomics and multi-omics integration) and address whether DL techniques will prove to be advantageous or if the SC omics domain poses unique challenges. Through a systematic literature review, we have found that DL has not yet revolutionized or addressed the most pressing challenges of the SC omics field. However, using DL models for single-cell omics has shown promising results (in many cases outperforming the previous state-of-the-art models) but lacking the needed biological interpretability in many cases. Although such developments have generally been gradual, recent advances reveal that DL methods can offer valuable resources in fast-tracking and advancing research in SC.


2019 ◽  
Vol 21 (1) ◽  
pp. 365-393 ◽  
Author(s):  
Yanxiang Deng ◽  
Amanda Finck ◽  
Rong Fan

Single-cell omics studies provide unique information regarding cellular heterogeneity at various levels of the molecular biology central dogma. This knowledge facilitates a deeper understanding of how underlying molecular and architectural changes alter cell behavior, development, and disease processes. The emerging microchip-based tools for single-cell omics analysis are enabling the evaluation of cellular omics with high throughput, improved sensitivity, and reduced cost. We review state-of-the-art microchip platforms for profiling genomics, epigenomics, transcriptomics, proteomics, metabolomics, and multi-omics at single-cell resolution. We also discuss the background of and challenges in the analysis of each molecular layer and integration of multiple levels of omics data, as well as how microchip-based methodologies benefit these fields. Additionally, we examine the advantages and limitations of these approaches. Looking forward, we describe additional challenges and future opportunities that will facilitate the improvement and broad adoption of single-cell omics in life science and medicine.


2021 ◽  
Author(s):  
Stefan Canzar ◽  
Van Hoan Do ◽  
Slobodan Jelic ◽  
Soeren Laue ◽  
Domagoj Matijevic ◽  
...  

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a neural network based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.


2021 ◽  
Vol 8 ◽  
Author(s):  
Nico Gerstner ◽  
Tim Kehl ◽  
Kerstin Lenhof ◽  
Lea Eckhart ◽  
Lara Schneider ◽  
...  

Experimental high-throughput techniques, like next-generation sequencing or microarrays, are nowadays routinely applied to create detailed molecular profiles of cells. In general, these platforms generate high-dimensional and noisy data sets. For their analysis, powerful bioinformatics tools are required to gain novel insights into the biological processes under investigation. Here, we present an overview of the GeneTrail tool suite that offers rich functionality for the analysis and visualization of (epi-)genomic, transcriptomic, miRNomic, and proteomic profiles. Our framework enables the analysis of standard bulk, time-series, and single-cell measurements and includes various state-of-the-art methods to identify potentially deregulated biological processes and to detect driving factors within those deregulated processes. We highlight the capabilities of our web service with an analysis of a single-cell COVID-19 data set that demonstrates its potential for uncovering complex molecular mechanisms.GeneTrail can be accessed freely and without login requirements at http://genetrail.bioinf.uni-sb.de.


2022 ◽  
Author(s):  
Huidong Chen ◽  
Jayoung Ryu ◽  
Michael Vinyard ◽  
Adam Lerer ◽  
Luca Pinello

Abstract Recent advances in single-cell omics technologies enable the individual and joint profiling of cellular measurements including gene expression, epigenetic features, chromatin structure and DNA sequences. Currently, most single-cell analysis pipelines are cluster-centric, i.e., they first cluster cells into non-overlapping cellular states and then extract their defining genomic features. These approaches assume that discrete clusters correspond to biologically relevant subpopulations and do not explicitly model the interactions between different feature types. In addition, single-cell methods are generally designed for a particular task as distinct single-cell problems are formulated differently. To address these current shortcomings, we present SIMBA, a graph embedding method that jointly embeds single cells and their defining features, such as genes, chromatin accessible regions, and transcription factor binding sequences into a common latent space. By leveraging the co-embedding of cells and features, SIMBA allows for the study of cellular heterogeneity, clustering-free marker discovery, gene regulation inference, batch effect removal, and omics data integration. SIMBA has been extensively applied to scRNA-seq, scATAC-seq, and dual-omics data. We show that SIMBA provides a single framework that allows diverse single-cell analysis problems to be formulated in a unified way and thus simplifies the development of new analyses and integration of other single-cell modalities.


Sign in / Sign up

Export Citation Format

Share Document