scholarly journals Coupled Co-clustering-based Unsupervised Transfer Learning for the Integrative Analysis of Single-Cell Genomic Data

2020 ◽  
Author(s):  
Pengcheng Zeng ◽  
Jiaxuan WangWu ◽  
Zhixiang Lin

AbstractUnsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. Most current clustering methods are designed for one data type only, such as scRNA-seq, scATAC-seq or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. Integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. We propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data, and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic data sets. The software and data sets are available at https://github.com/cuhklinlab/coupleCoC.

Author(s):  
Pengcheng Zeng ◽  
Jiaxuan Wangwu ◽  
Zhixiang Lin

Abstract Unsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. The most current clustering methods are designed for one data type only, such as single-cell RNA sequencing (scRNA-seq), single-cell ATAC sequencing (scATAC-seq) or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. The integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. In this paper, we propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. In co-clustering, both the cells and the genomic features are simultaneously clustered. Clustering similar genomic features reduces the noise in single-cell data and facilitates transfer of knowledge across single-cell datasets. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. Our method coupleCoC is also computationally efficient and can scale up to large datasets. Availability: The software and datasets are available at https://github.com/cuhklinlab/coupleCoC.


Metabolomics ◽  
2019 ◽  
Vol 16 (1) ◽  
Author(s):  
Masoumeh Alinaghi ◽  
Hanne Christine Bertram ◽  
Anders Brunse ◽  
Age K. Smilde ◽  
Johan A. Westerhuis

Abstract Introduction Integrative analysis of multiple data sets can provide complementary information about the studied biological system. However, data fusion of multiple biological data sets can be complicated as data sets might contain different sources of variation due to underlying experimental factors. Therefore, taking the experimental design of data sets into account could be of importance in data fusion concept. Objectives In the present work, we aim to incorporate the experimental design information in the integrative analysis of multiple designed data sets. Methods Here we describe penalized exponential ANOVA simultaneous component analysis (PE-ASCA), a new method for integrative analysis of data sets from multiple compartments or analytical platforms with the same underlying experimental design. Results Using two simulated cases, the result of simultaneous component analysis (SCA), penalized exponential simultaneous component analysis (P-ESCA) and ANOVA-simultaneous component analysis (ASCA) are compared with the proposed method. Furthermore, real metabolomics data obtained from NMR analysis of two different brains tissues (hypothalamus and midbrain) from the same piglets with an underlying experimental design is investigated by PE-ASCA. Conclusions This method provides an improved understanding of the common and distinct variation in response to different experimental factors.


Author(s):  
Yuhan Hao ◽  
Stephanie Hao ◽  
Erica Andersen-Nissen ◽  
William M. Mauck ◽  
Shiwei Zheng ◽  
...  

AbstractThe simultaneous measurement of multiple modalities, known as multimodal analysis, represents an exciting frontier for single-cell genomics and necessitates new computational methods that can define cellular states based on multiple data types. Here, we introduce ‘weighted-nearest neighbor’ analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of hundreds of thousands of human white blood cells alongside a panel of 228 antibodies to construct a multimodal reference atlas of the circulating immune system. We demonstrate that integrative analysis substantially improves our ability to resolve cell states and validate the presence of previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets, and to interpret immune responses to vaccination and COVID-19. Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets, including paired measurements of RNA and chromatin state, and to look beyond the transcriptome towards a unified and multimodal definition of cellular identity.AvailabilityInstallation instructions, documentation, tutorials, and CITE-seq datasets are available at http://www.satijalab.org/seurat


2021 ◽  
Author(s):  
Pengcheng Zeng ◽  
Zhixiang Lin

AbstractTechnological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, and mouse cortex sc-methylation and scRNA-seq data, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC plus.


2017 ◽  
Vol 16 ◽  
pp. 117693511769077
Author(s):  
Sangin Lee ◽  
Faming Liang ◽  
Ling Cai ◽  
Guanghua Xiao

The construction of gene regulatory networks (GRNs) is an essential component of biomedical research to determine disease mechanisms and identify treatment targets. Gaussian graphical models (GGMs) have been widely used for constructing GRNs by inferring conditional dependence among a set of gene expressions. In practice, GRNs obtained by the analysis of a single data set may not be reliable due to sample limitations. Therefore, it is important to integrate multiple data sets from comparable studies to improve the construction of a GRN. In this article, we introduce an equivalent measure of partial correlation coefficients in GGMs and then extend the method to construct a GRN by combining the equivalent measures from different sources. Furthermore, we develop a method for multiple data sets with a natural missing mechanism to accommodate the differences among different platforms in multiple sources of data. Simulation results show that this integrative analysis outperforms the standard methods and can detect hub genes in the true network. The proposed integrative method was applied to 12 lung adenocarcinoma data sets collected from different studies. The constructed network is consistent with the current biological knowledge and reveals new insights about lung adenocarcinoma.


2019 ◽  
Author(s):  
Nikola Simidjievski ◽  
Cristian Bodnar ◽  
Ifrah Tariq ◽  
Paul Scherer ◽  
Helena Andres-Terre ◽  
...  

ABSTRACTInternational initiatives such as the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) are collecting multiple data sets at different genome-scales with the aim to identify novel cancer bio-markers and predict patient survival. To analyse such data, several machine learning, bioinformatics and statistical methods have been applied, among them neural networks such as autoencoders. Although these models provide a good statistical learning framework to analyse multi-omic and/or clinical data, there is a distinct lack of work on how to integrate diverse patient data and identify the optimal design best suited to the available data.In this paper, we investigate several autoencoder architectures that integrate a variety of cancer patient data types (e.g., multi-omics and clinical data). We perform extensive analyses of these approaches and provide a clear methodological and computational framework for designing systems that enable clinicians to investigate cancer traits and translate the results into clinical applications. We demonstrate how these networks can be designed, built and, in particular, applied to tasks of integrative analyses of heterogeneous breast cancer data. The results show that these approaches yield relevant data representations that, in turn, lead to accurate and stable diagnosis.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009064
Author(s):  
Pengcheng Zeng ◽  
Zhixiang Lin

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.


2021 ◽  
pp. 096973302110032
Author(s):  
Sastrawan Sastrawan ◽  
Jennifer Weller-Newton ◽  
Gabrielle Brand ◽  
Gulzar Malik

Background: In the ever-changing and complex healthcare environment, nurses encounter challenging situations that may involve a clash between their personal and professional values resulting in a profound impact on their practice. Nevertheless, there is a dearth of literature on how nurses develop their personal–professional values. Aim: The aim of this study was to understand how nurses develop their foundational values as the base for their value system. Research design: A constructivist grounded theory methodology was employed to collect multiple data sets, including face-to-face focus group and individual interviews, along with anecdote and reflective stories. Participants and research context: Fifty-four nurses working across various nursing settings in Indonesia were recruited to participate. Ethical considerations: Ethics approval was obtained from the Monash University Human Ethics Committee, project approval number 1553. Findings: Foundational values acquisition was achieved through family upbringing, professional nurse education and organisational/institutional values reinforcement. These values are framed through three reference points: religious lens, humanity perspective and professionalism. This framing results in a unique combination of personal–professional values that comprise nurses’ values system. Values are transferred to other nurses either in a formal or informal way as part of one’s professional responsibility and customary social interaction via telling and sharing in person or through social media. Discussion: Values and ethics are inherently interweaved during nursing practice. Ethical and moral values are part of professional training, but other values are often buried in a hidden curriculum, and attained and activated through interactions during nurses’ training. Conclusion: Developing a value system is a complex undertaking that involves basic social processes of attaining, enacting and socialising values. These processes encompass several intertwined entities such as the sources of values, the pool of foundational values, value perspectives and framings, initial value structures, and methods of value transference.


Sign in / Sign up

Export Citation Format

Share Document