scholarly journals SAILER: Scalable and Accurate Invariant Representation Learning for Single-Cell ATAC-Seq Processing and Integration

2021 ◽  
Author(s):  
Yingxin Cao ◽  
Laiyi Fu ◽  
Jie Wu ◽  
Qinke Peng ◽  
Qing Nie ◽  
...  

AbstractMotivationSingle-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) provides new opportunities to dissect epigenomic heterogeneity and elucidate transcriptional regulatory mechanisms. However, computational modelling of scATAC-seq data is challenging due to its high dimension, extreme sparsity, complex dependencies, and high sensitivity to confounding factors from various sources.ResultsHere we propose a new deep generative model framework, named SAILER, for analysing scATAC-seq data. SAILER aims to learn a low-dimensional nonlinear latent representation of each cell that defines its intrinsic chromatin state, invariant to extrinsic confounding factors like read depth and batch effects. SAILER adopts the conventional encoder-decoder framework to learn the latent representation but imposes additional constraints to ensure the independence of the learned representations from the confounding factors. Experimental results on both simulated and real scATAC-seq datasets demonstrate that SAILER learns better and biologically more meaningful representations of cells than other methods. Its noise-free cell embeddings bring in significant benefits in downstream analyses: Clustering and imputation based on SAILER result in 6.9% and 18.5% improvements over existing methods, respectively. Moreover, because no matrix factorization is involved, SAILER can easily scale to process millions of cells. We implemented SAILER into a software package, freely available to all for large-scale scATAC-seq data analysis.AvailabilityThe software is publicly available at https://github.com/uci-cbcl/[email protected] and [email protected]

2021 ◽  
Author(s):  
Lingfei Wang

AbstractSingle-cell RNA sequencing (scRNA-seq) provides unprecedented technical and statistical potential to study gene regulation but is subject to technical variations and sparsity. Here we present Normalisr, a linear-model-based normalization and statistical hypothesis testing framework that unifies single-cell differential expression, co-expression, and CRISPR scRNA-seq screen analyses. By systematically detecting and removing nonlinear confounding from library size, Normalisr achieves high sensitivity, specificity, speed, and generalizability across multiple scRNA-seq protocols and experimental conditions with unbiased P-value estimation. We use Normalisr to reconstruct robust gene regulatory networks from trans-effects of gRNAs in large-scale CRISPRi scRNA-seq screens and gene-level co-expression networks from conventional scRNA-seq.


2019 ◽  
Author(s):  
Brian Hie ◽  
Hyunghoon Cho ◽  
Benjamin DeMeo ◽  
Bryan Bryson ◽  
Bonnie Berger

SUMMARYLarge-scale single-cell RNA-sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a data set using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and accurately reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validatedin vitro. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks such as scRNA-seq data integration. We anticipate that our algorithm will become an increasingly essential step when sharing and analyzing the rapidly-growing volume of scRNA-seq data and help enable the democratization of single-cell omics.


2019 ◽  
Author(s):  
Ashley D. Sanders ◽  
Sascha Meiers ◽  
Maryam Ghareghani ◽  
David Porubsky ◽  
Hyobin Jeong ◽  
...  

AbstractStructural variation (SV), where rearrangements delete, duplicate, invert or translocate DNA segments, is a major source of somatic cell variation. It can arise in rapid bursts, mediate genetic heterogenity, and dysregulate cancer-related pathways. The challenge to systematically discover SVs in single cells remains unsolved, with copy-neutral and complex variants typically escaping detection. We developed single cell tri-channel-processing (scTRIP), a computational framework that jointly integrates read depth, template strand and haplotype phase to comprehensively discover SVs in single cells. We surveyed SV landscapes of 565 single cell genomes, including transformed epithelial cells and patient-derived leukemic samples, and discovered abundant SV classes including inversions, translocations and large-scale genomic rearrangements mediating oncogenic dysregulation. We dissected the ‘molecular karyotype’ of the leukemic samples and examined their clonal structure. Different from prior methods, scTRIP also enabled direct detection and discrimination of SV mutational processes in individual cells, including breakage-fusion-bridge cycles. scTRIP will facilitate studies of clonal evolution, genetic mosaicism and somatic SV formation, and could improve disease classification for precision medicine.


2019 ◽  
Author(s):  
Florian Mair ◽  
Jami R. Erickson ◽  
Valentin Voillet ◽  
Yannick Simoni ◽  
Timothy Bi ◽  
...  

SummaryHigh throughput single-cell RNA sequencing (sc-RNAseq) has become a frequently used tool to assess immune cell function and heterogeneity. Recently, the combined measurement of RNA and protein expression by sequencing was developed, which is commonly known as CITE-Seq. Acquisition of protein expression data along with transcriptome data resolves some of the limitations inherent to only assessing transcript, but also nearly doubles the sequencing read depth required per single cell. Furthermore, there is still a paucity of analysis tools to visualize combined transcript-protein datasets.Here, we describe a novel targeted transcriptomics approach that combines analysis of over 400 genes with simultaneous measurement of over 40 proteins on more than 25,000 cells. This targeted approach requires only about 1/10 of the read depth compared to a whole transcriptome approach while retaining high sensitivity for low abundance transcripts. To analyze these multi-omic transcript-protein datasets, we adapted One-SENSE for intuitive visualization of the relationship of proteins and transcripts on a single-cell level.


1999 ◽  
Vol 39 (4) ◽  
pp. 55-60 ◽  
Author(s):  
J. Alex ◽  
R. Tschepetzki ◽  
U. Jumar ◽  
F. Obenaus ◽  
K.-H. Rosenwinkel

Activated sludge models are widely used for planning and optimisation of wastewater treatment plants and on line applications are under development to support the operation of complex treatment plants. A proper model is crucial for all of these applications. The task of parameter calibration is focused in several papers and applications. An essential precondition for this task is an appropriately defined model structure, which is often given much less attention. Different model structures for a large scale treatment plant with circulation flow are discussed in this paper. A more systematic method to derive a suitable model structure is applied to this case. Results of a numerical hydraulic model are used for this purpose. The importance of these efforts are proven by a high sensitivity of the simulation results with respect to the selection of the model structure and the hydraulic conditions. Finally it is shown, that model calibration was possible only by adjusting to the hydraulic behaviour and without any changes of biological parameters.


2020 ◽  
Vol 15 (7) ◽  
pp. 750-757
Author(s):  
Jihong Wang ◽  
Yue Shi ◽  
Xiaodan Wang ◽  
Huiyou Chang

Background: At present, using computer methods to predict drug-target interactions (DTIs) is a very important step in the discovery of new drugs and drug relocation processes. The potential DTIs identified by machine learning methods can provide guidance in biochemical or clinical experiments. Objective: The goal of this article is to combine the latest network representation learning methods for drug-target prediction research, improve model prediction capabilities, and promote new drug development. Methods: We use large-scale information network embedding (LINE) method to extract network topology features of drugs, targets, diseases, etc., integrate features obtained from heterogeneous networks, construct binary classification samples, and use random forest (RF) method to predict DTIs. Results: The experiments in this paper compare the common classifiers of RF, LR, and SVM, as well as the typical network representation learning methods of LINE, Node2Vec, and DeepWalk. It can be seen that the combined method LINE-RF achieves the best results, reaching an AUC of 0.9349 and an AUPR of 0.9016. Conclusion: The learning method based on LINE network can effectively learn drugs, targets, diseases and other hidden features from the network topology. The combination of features learned through multiple networks can enhance the expression ability. RF is an effective method of supervised learning. Therefore, the Line-RF combination method is a widely applicable method.


Diagnostics ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 869
Author(s):  
Amedeo De Nicolò ◽  
Valeria Avataneo ◽  
Jessica Cusato ◽  
Alice Palermiti ◽  
Jacopo Mula ◽  
...  

Recently, large-scale screening for COVID-19 has presented a major challenge, limiting timely countermeasures. Therefore, the application of suitable rapid serological tests could provide useful information, however, little evidence regarding their robustness is currently available. In this work, we evaluated and compared the analytical performance of a rapid lateral-flow test (LFA) and a fast semiquantitative fluorescent immunoassay (FIA) for anti-nucleocapsid (anti-NC) antibodies, with the reverse transcriptase real-time PCR assay as the reference. In 222 patients, LFA showed poor sensitivity (55.9%) within two weeks from PCR, while later testing was more reliable (sensitivity of 85.7% and specificity of 93.1%). Moreover, in a subset of 100 patients, FIA showed high sensitivity (89.1%) and specificity (94.1%) after two weeks from PCR. The coupled application for the screening of 183 patients showed satisfactory concordance (K = 0.858). In conclusion, rapid serological tests were largely not useful for early diagnosis, but they showed good performance in later stages of infection. These could be useful for back-tracing and/or to identify potentially immune subjects.


Nanophotonics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 1081-1086 ◽  
Author(s):  
Abdoulaye Ndao ◽  
Liyi Hsu ◽  
Wei Cai ◽  
Jeongho Ha ◽  
Junhee Park ◽  
...  

AbstractOne of the key challenges in biology is to understand how individual cells process information and respond to perturbations. However, most of the existing single-cell analysis methods can only provide a glimpse of cell properties at specific time points and are unable to provide cell secretion and protein analysis at single-cell resolution. To address the limits of existing methods and to accelerate discoveries from single-cell studies, we propose and experimentally demonstrate a new sensor based on bound states in the continuum to quantify exosome secretion from a single cell. Our optical sensors demonstrate high-sensitivity refractive index detection. Because of the strong overlap between the medium supporting the mode and the analytes, such an optical cavity has a figure of merit of 677 and sensitivity of 440 nm/RIU. Such results facilitate technological progress for highly conducive optical sensors for different biomedical applications.


Author(s):  
A J Rigby ◽  
N Peretto ◽  
R Adam ◽  
P Ade ◽  
M Anderson ◽  
...  

Abstract Determining the mechanism by which high-mass stars are formed is essential for our understanding of the energy budget and chemical evolution of galaxies. By using the New IRAM KIDs Array 2 (NIKA2) camera on the Institut de Radio Astronomie Millimétrique (IRAM) 30-m telescope, we have conducted high-sensitivity and large-scale mapping of a fraction of the Galactic plane in order to search for signatures of the transition between the high- and low-mass star-forming modes. Here, we present the first results from the Galactic Star Formation with NIKA2 (GASTON) project, a Large Programme at the IRAM 30-m telescope which is mapping ≈2 deg2 of the inner Galactic plane (GP), centred on ℓ = 23${_{.}^{\circ}}$9, b = 0${_{.}^{\circ}}$05, as well as targets in Taurus and Ophiuchus in 1.15 and 2.00 mm continuum wavebands. In this paper we present the first of the GASTON GP data taken, and present initial science results. We conduct an extraction of structures from the 1.15 mm maps using a dendrogram analysis and, by comparison to the compact source catalogues from Herschel survey data, we identify a population of 321 previously-undetected clumps. Approximately 80 per cent of these new clumps are 70 μm-quiet, and may be considered as starless candidates. We find that this new population of clumps are less massive and cooler, on average, than clumps that have already been identified. Further, by classifying the full sample of clumps based upon their infrared-bright fraction – an indicator of evolutionary stage – we find evidence for clump mass growth, supporting models of clump-fed high-mass star formation.


Cancers ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 2111
Author(s):  
Bo-Wei Zhao ◽  
Zhu-Hong You ◽  
Lun Hu ◽  
Zhen-Hao Guo ◽  
Lei Wang ◽  
...  

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.


Sign in / Sign up

Export Citation Format

Share Document