scholarly journals Coupled co-clustering-based unsupervised transfer learning for the integrative analysis of single-cell genomic data

Author(s):  
Pengcheng Zeng ◽  
Jiaxuan Wangwu ◽  
Zhixiang Lin

Abstract Unsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. The most current clustering methods are designed for one data type only, such as single-cell RNA sequencing (scRNA-seq), single-cell ATAC sequencing (scATAC-seq) or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. The integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. In this paper, we propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. In co-clustering, both the cells and the genomic features are simultaneously clustered. Clustering similar genomic features reduces the noise in single-cell data and facilitates transfer of knowledge across single-cell datasets. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. Our method coupleCoC is also computationally efficient and can scale up to large datasets. Availability: The software and datasets are available at https://github.com/cuhklinlab/coupleCoC.

2020 ◽  
Author(s):  
Pengcheng Zeng ◽  
Jiaxuan WangWu ◽  
Zhixiang Lin

AbstractUnsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. Most current clustering methods are designed for one data type only, such as scRNA-seq, scATAC-seq or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. Integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. We propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data, and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic data sets. The software and data sets are available at https://github.com/cuhklinlab/coupleCoC.


Metabolomics ◽  
2019 ◽  
Vol 16 (1) ◽  
Author(s):  
Masoumeh Alinaghi ◽  
Hanne Christine Bertram ◽  
Anders Brunse ◽  
Age K. Smilde ◽  
Johan A. Westerhuis

Abstract Introduction Integrative analysis of multiple data sets can provide complementary information about the studied biological system. However, data fusion of multiple biological data sets can be complicated as data sets might contain different sources of variation due to underlying experimental factors. Therefore, taking the experimental design of data sets into account could be of importance in data fusion concept. Objectives In the present work, we aim to incorporate the experimental design information in the integrative analysis of multiple designed data sets. Methods Here we describe penalized exponential ANOVA simultaneous component analysis (PE-ASCA), a new method for integrative analysis of data sets from multiple compartments or analytical platforms with the same underlying experimental design. Results Using two simulated cases, the result of simultaneous component analysis (SCA), penalized exponential simultaneous component analysis (P-ESCA) and ANOVA-simultaneous component analysis (ASCA) are compared with the proposed method. Furthermore, real metabolomics data obtained from NMR analysis of two different brains tissues (hypothalamus and midbrain) from the same piglets with an underlying experimental design is investigated by PE-ASCA. Conclusions This method provides an improved understanding of the common and distinct variation in response to different experimental factors.


2021 ◽  
Author(s):  
Pengcheng Zeng ◽  
Zhixiang Lin

AbstractTechnological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, and mouse cortex sc-methylation and scRNA-seq data, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC plus.


2017 ◽  
Vol 16 ◽  
pp. 117693511769077
Author(s):  
Sangin Lee ◽  
Faming Liang ◽  
Ling Cai ◽  
Guanghua Xiao

The construction of gene regulatory networks (GRNs) is an essential component of biomedical research to determine disease mechanisms and identify treatment targets. Gaussian graphical models (GGMs) have been widely used for constructing GRNs by inferring conditional dependence among a set of gene expressions. In practice, GRNs obtained by the analysis of a single data set may not be reliable due to sample limitations. Therefore, it is important to integrate multiple data sets from comparable studies to improve the construction of a GRN. In this article, we introduce an equivalent measure of partial correlation coefficients in GGMs and then extend the method to construct a GRN by combining the equivalent measures from different sources. Furthermore, we develop a method for multiple data sets with a natural missing mechanism to accommodate the differences among different platforms in multiple sources of data. Simulation results show that this integrative analysis outperforms the standard methods and can detect hub genes in the true network. The proposed integrative method was applied to 12 lung adenocarcinoma data sets collected from different studies. The constructed network is consistent with the current biological knowledge and reveals new insights about lung adenocarcinoma.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009064
Author(s):  
Pengcheng Zeng ◽  
Zhixiang Lin

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.


2021 ◽  
pp. 096973302110032
Author(s):  
Sastrawan Sastrawan ◽  
Jennifer Weller-Newton ◽  
Gabrielle Brand ◽  
Gulzar Malik

Background: In the ever-changing and complex healthcare environment, nurses encounter challenging situations that may involve a clash between their personal and professional values resulting in a profound impact on their practice. Nevertheless, there is a dearth of literature on how nurses develop their personal–professional values. Aim: The aim of this study was to understand how nurses develop their foundational values as the base for their value system. Research design: A constructivist grounded theory methodology was employed to collect multiple data sets, including face-to-face focus group and individual interviews, along with anecdote and reflective stories. Participants and research context: Fifty-four nurses working across various nursing settings in Indonesia were recruited to participate. Ethical considerations: Ethics approval was obtained from the Monash University Human Ethics Committee, project approval number 1553. Findings: Foundational values acquisition was achieved through family upbringing, professional nurse education and organisational/institutional values reinforcement. These values are framed through three reference points: religious lens, humanity perspective and professionalism. This framing results in a unique combination of personal–professional values that comprise nurses’ values system. Values are transferred to other nurses either in a formal or informal way as part of one’s professional responsibility and customary social interaction via telling and sharing in person or through social media. Discussion: Values and ethics are inherently interweaved during nursing practice. Ethical and moral values are part of professional training, but other values are often buried in a hidden curriculum, and attained and activated through interactions during nurses’ training. Conclusion: Developing a value system is a complex undertaking that involves basic social processes of attaining, enacting and socialising values. These processes encompass several intertwined entities such as the sources of values, the pool of foundational values, value perspectives and framings, initial value structures, and methods of value transference.


2014 ◽  
Vol 45 (5-6) ◽  
pp. 1325-1354 ◽  
Author(s):  
Emilia Paula Diaconescu ◽  
Philippe Gachon ◽  
John Scinocca ◽  
René Laprise

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Siyu Hou ◽  
Zhaoyang Guo ◽  
Chuangneng Cai ◽  
Xiaobo Jiao

Purpose The purpose of this study is to examine the influence of firm performance on corporate social responsibility (CSR) and its possible moderating effect. Despite the significance of CSR, there remains an extensive debate about how it is affected by firm performance. Design/methodology/approach The conceptual model is mainly built on goal-setting theory. Based on archival data from multiple data sets on 1,650 companies, collected from 2010 to 2017, the hypotheses are tested using the two-stage instrumental variable regression method. Findings There is an inverted U-shaped relationship between firm performance and CSR that first increases and then decreases. In addition, considering the boundary conditions, state ownership makes the inverted U-shaped curve steeper, while high executive wage concentration makes the inverted U-shaped curve flatter. Research limitations/implications This study harmonizes the traditional contradictory findings of the influence of firm performance on CSR, that is, it supports a positive, negative or neutral relationship between the two. Originality/value This research provides a necessary structure for the CSR literature. By delving deeply into the relationship between firm performance and CSR, it enables scholars to better address the critical management question of whether earning more will lead to doing good.


Sign in / Sign up

Export Citation Format

Share Document