scholarly journals scHiCSRS: A Self-Representation Smoothing Method with Gaussian Mixture Model for Imputing single-cell Hi-C Data

2021 ◽  
Author(s):  
Shili Lin ◽  
Qing Xie

Motivation: Single-cell Hi-C techniques make it possible to study cell-to-cell variability in genomic features. However, excess zeros are commonly seen in single-cell Hi-C (scHi-C) data, making scHi-C matrices extremely sparse and bringing extra difficulties in downstream analysis. The observed zeros are a combination of two events: structural zeros for which the loci never inter- act due to underlying biological mechanisms, and dropouts or sampling zeros where the two loci interact but are not captured due to insufficient sequencing depth. Although quality improvement approaches have been proposed as an intermediate step for analyzing scHi-C data, little has been done to address these two types of zeros. We believe that differentiating between structural zeros and dropouts would benefit downstream analysis such as clustering. Results: We propose scHiCSRS, a self-representation smoothing method that improves the data quality, and a Gaussian mixture model that identifies structural zeros among observed zeros. scHiCSRS not only takes spatial dependencies of a scHi-C 2D data structure into account but also borrows information from similar single cells. Through an extensive set of simulation studies, we demonstrate the ability of scHiCSRS for identifying structural zeros with high sensitivity and for accurate imputation of dropout values in sampling zeros. Downstream analysis for three real datasets show that data improved from scHiCSRS yield more accurate clustering of cells than simply using observed data or improved data from several comparison methods.

2016 ◽  
Vol 8 (11) ◽  
pp. 1133-1144 ◽  
Author(s):  
Susan E. Leggett ◽  
Jea Yun Sim ◽  
Jonathan E. Rubins ◽  
Zachary J. Neronha ◽  
Evelyn Kendall Williams ◽  
...  

Heterogeneous single cells are classified by shape into epithelial and mesenchymal phenotypes using a Gaussian mixture model.


2021 ◽  
Author(s):  
Qing Xie ◽  
Chengong Han ◽  
Victor Jin ◽  
Shili Lin

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicate things further is the fact that not all zeros are created equal, as some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros), whereas others are indeed due to insufficient sequencing depth (sampling zeros), especially for loci that interact infrequently. Differentiating between structural zeros and sampling zeros is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchy model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.


2019 ◽  
Vol 13 (01) ◽  
pp. 1950020
Author(s):  
Jinghong Wu ◽  
Sijie Niu ◽  
Qiang Chen ◽  
Wen Fan ◽  
Songtao Yuan ◽  
...  

We introduce a method based on Gaussian mixture model (GMM) clustering and level-set to automatically detect intraretina fluid on diabetic retinopathy (DR) from spectral domain optical coherence tomography (SD-OCT) images in this paper. First, each B-scan is segmented using GMM clustering. The original clustering results are refined using location and thickness information. Then, the spatial information among every consecutive five B-scans is used to search potential fluid. Finally, the improved level-set method is used to obtain the accurate boundaries. The high sensitivity and accuracy demonstrated here show its potential for detection of fluid.


Abstract. The Guided Wave (GW) based Structural Health Monitoring (SHM) method is of significant research interest because of its wide monitoring range and high sensitivity. However, there are still many challenges in real engineering applications due to complex time-varying conditions, such as changes in temperature and humidity, random dynamic loads, and structural boundary conditions. In this paper, a Gaussian Mixture Model (GMM) is adopted to deal with these problems. Multi-dimensional GMM (MDGMM) is proposed to model the probability distribution of GW features under time-varying conditions. Furthermore, to measure the migration degree of MDGMM to reveal the crack propagation, research on migration indexes of the probability model is carried out. Finally, the validation in an aircraft fatigue test shows a good performance of the MDGMM.


2018 ◽  
Vol 30 (4) ◽  
pp. 642
Author(s):  
Guichao Lin ◽  
Yunchao Tang ◽  
Xiangjun Zou ◽  
Qing Zhang ◽  
Xiaojie Shi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document