scholarly journals Dissecting differential signals in high-throughput data from complex tissues

2019 ◽  
Vol 35 (20) ◽  
pp. 3898-3905 ◽  
Author(s):  
Ziyi Li ◽  
Zhijin Wu ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation Samples from clinical practices are often mixtures of different cell types. The high-throughput data obtained from these samples are thus mixed signals. The cell mixture brings complications to data analysis, and will lead to biased results if not properly accounted for. Results We develop a method to model the high-throughput data from mixed, heterogeneous samples, and to detect differential signals. Our method allows flexible statistical inference for detecting a variety of cell-type specific changes. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method compared with existing ones serving similar purpose. Availability and implementation The proposed method is implemented as an R package and is freely available on GitHub (https://github.com/ziyili20/TOAST). Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Ziyi Li ◽  
Zhijin Wu ◽  
Peng Jin ◽  
Hao Wu

AbstractSamples from clinical practices are often mixtures of different cell types. The high-throughput data obtained from these samples are thus mixed signals. The cell mixture brings complications to data analysis, and will lead to biased results if not properly accounted for. We develop a method to model the high-throughput data from mixed, heterogeneous samples, and to detect differential signals. Our method allows flexible statistical inference for detecting a variety of cell-type specific changes. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method compared with existing ones serving similar purpose.


2021 ◽  
Author(s):  
Chi Wai Yip ◽  
Divya M. Sivaraman ◽  
Anika V. Prabhu ◽  
Jay W. Shin

Abstract Recent efforts on the characterization of long non-coding RNAs (lncRNAs) revealed their functional roles in modulating diverse cellular processes. These include pluripotency maintenance, lineage commitment, carcinogenesis, and pathogenesis of various diseases. By interacting with DNA, RNA and protein, lncRNAs mediate multifaceted mechanisms to regulate transcription, RNA processing, RNA interference and translation. Of more than 173000 discovered lncRNAs, the majority remain functionally unknown. The cell type-specific expression and localization of the lncRNA also suggest potential distinct functions of lncRNAs across different cell types. This highlights the niche of identifying functional lncRNAs in different biological processes and diseases through high-throughput (HTP) screening. This review summarizes the current work performed and perspectives on HTP screening of functional lncRNAs where different technologies, platforms, cellular responses and the downstream analyses are discussed. We hope to provide a better picture in applying different technologies to facilitate functional annotation of lncRNA efficiently.


2016 ◽  
Author(s):  
Elizabeth Baskin ◽  
Rick Farouni ◽  
Ewy A. Mathe

AbstractSummaryRegulatory elements regulate gene transcription, and their location and accessibility is cell-type specific, particularly for enhancers. Mapping and comparing chromatin accessibility between different cell types may identify mechanisms involved in cellular development and disease progression. To streamline and simplify differential analysis of regulatory elements genome-wide using chromatin accessibility data, such as DNase-seq, ATAC-seq, we developed ALTRE (ALTered Regulatory Elements), an R package and associated R Shiny web app. ALTRE makes such analysis accessible to a wide range of users – from novice to practiced computational biologists.Availabilityhttps://github.com/Mathelab/[email protected]


2021 ◽  
Author(s):  
Yajing Hao ◽  
Changwei Shao ◽  
Guofeng Zhao ◽  
Xiang-Dong Fu

AbstractThe rapid advance of high-throughput technologies has enabled the generation of two-dimensional or even multi-dimensional high-throughput data, e.g., genome-wide siRNA screen (1st dimension) for multiple changes in gene expression (2nd dimension) in many different cell types or tissues or under different experimental conditions (3rd dimension). We show that the simple Z-based statistic and derivatives are no longer suitable for analyzing such data because of the accumulation of experimental noise and/or off-target effects. Here, we introduce ZetaSuite, a statistical package designed to score and rank hits from two-dimensional screens, construct regulatory networks based on response similarities, and eliminate off-targets. Applying this method to two large cancer dependency screen datasets, we identify not only genes critical for cell fitness, but also those required for constraining cell proliferation. Strikingly, most of those cancer constraining genes function in DNA replication/repair checkpoint, suggesting that cancer cells also need to protect their genomes for long-term survival.


2020 ◽  
Author(s):  
Yupeng Wang ◽  
Rosario B. Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

AbstractWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


2019 ◽  
Vol 35 (22) ◽  
pp. 4764-4766 ◽  
Author(s):  
Jonathan Cairns ◽  
William R Orchard ◽  
Valeriya Malysheva ◽  
Mikhail Spivakov

Abstract Summary Capture Hi-C is a powerful approach for detecting chromosomal interactions involving, at least on one end, DNA regions of interest, such as gene promoters. We present Chicdiff, an R package for robust detection of differential interactions in Capture Hi-C data. Chicdiff enhances a state-of-the-art differential testing approach for count data with bespoke normalization and multiple testing procedures that account for specific statistical properties of Capture Hi-C. We validate Chicdiff on published Promoter Capture Hi-C data in human Monocytes and CD4+ T cells, identifying multitudes of cell type-specific interactions, and confirming the overall positive association between promoter interactions and gene expression. Availability and implementation Chicdiff is implemented as an R package that is publicly available at https://github.com/RegulatoryGenomicsGroup/chicdiff. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Wei Lin ◽  
Pawan Noel ◽  
Erkut H. Borazanci ◽  
Jeeyun Lee ◽  
Albert Amini ◽  
...  

Abstract Background Solid tumors such as pancreatic ductal adenocarcinoma (PDAC) comprise not just tumor cells but also a microenvironment with which the tumor cells constantly interact. Detailed characterization of the cellular composition of the tumor microenvironment is critical to the understanding of the disease and treatment of the patient. Single-cell transcriptomics has been used to study the cellular composition of different solid tumor types including PDAC. However, almost all of those studies used primary tumor tissues. Methods In this study, we employed a single-cell RNA sequencing technology to profile the transcriptomes of individual cells from dissociated primary tumors or metastatic biopsies obtained from patients with PDAC. Unsupervised clustering analysis as well as a new supervised classification algorithm, SuperCT, was used to identify the different cell types within the tumor tissues. The expression signatures of the different cell types were then compared between primary tumors and metastatic biopsies. The expressions of the cell type-specific signature genes were also correlated with patient survival using public datasets. Results Our single-cell RNA sequencing analysis revealed distinct cell types in primary and metastatic PDAC tissues including tumor cells, endothelial cells, cancer-associated fibroblasts (CAFs), and immune cells. The cancer cells showed high inter-patient heterogeneity, whereas the stromal cells were more homogenous across patients. Immune infiltration varies significantly from patient to patient with majority of the immune cells being macrophages and exhausted lymphocytes. We found that the tumor cellular composition was an important factor in defining the PDAC subtypes. Furthermore, the expression levels of cell type-specific markers for EMT+ cancer cells, activated CAFs, and endothelial cells significantly associated with patient survival. Conclusions Taken together, our work identifies significant heterogeneity in cellular compositions of PDAC tumors and between primary tumors and metastatic lesions. Furthermore, the cellular composition was an important factor in defining PDAC subtypes and significantly correlated with patient outcome. These findings provide valuable insights on the PDAC microenvironment and could potentially inform the management of PDAC patients.


2019 ◽  
Vol 47 (19) ◽  
pp. 10027-10039 ◽  
Author(s):  
Eldad David Shulman ◽  
Ran Elkon

AbstractAlternative polyadenylation (APA) is emerging as an important layer of gene regulation because the majority of mammalian protein-coding genes contain multiple polyadenylation (pA) sites in their 3′ UTR. By alteration of 3′ UTR length, APA can considerably affect post-transcriptional gene regulation. Yet, our understanding of APA remains rudimentary. Novel single-cell RNA sequencing (scRNA-seq) techniques allow molecular characterization of different cell types to an unprecedented degree. Notably, the most popular scRNA-seq protocols specifically sequence the 3′ end of transcripts. Building on this property, we implemented a method for analysing patterns of APA regulation from such data. Analyzing multiple datasets from diverse tissues, we identified widespread modulation of APA in different cell types resulting in global 3′ UTR shortening/lengthening and enhanced cleavage at intronic pA sites. Our results provide a proof-of-concept demonstration that the huge volume of scRNA-seq data that accumulates in the public domain offers a unique resource for the exploration of APA based on a very broad collection of cell types and biological conditions.


1985 ◽  
Vol 101 (4) ◽  
pp. 1442-1454 ◽  
Author(s):  
P Cowin ◽  
H P Kapprell ◽  
W W Franke

Desmosomal plaque proteins have been identified in immunoblotting and immunolocalization experiments on a wide range of cell types from several species, using a panel of monoclonal murine antibodies to desmoplakins I and II and a guinea pig antiserum to desmosomal band 5 protein. Specifically, we have taken advantage of the fact that certain antibodies react with both desmoplakins I and II, whereas others react only with desmoplakin I, indicating that desmoplakin I contains unique regions not present on the closely related desmoplakin II. While some of these antibodies recognize epitopes conserved between chick and man, others display a narrow species specificity. The results show that proteins whose size, charge, and biochemical behavior are very similar to those of desmoplakin I and band 5 protein of cow snout epidermis are present in all desmosomes examined. These include examples of simple and pseudostratified epithelia and myocardial tissue, in addition to those of stratified epithelia. In contrast, in immunoblotting experiments, we have detected desmoplakin II only among cells of stratified and pseudostratified epithelial tissues. This suggests that the desmosomal plaque structure varies in its complement of polypeptides in a cell-type specific manner. We conclude that the obligatory desmosomal plaque proteins, desmoplakin I and band 5 protein, are expressed in a coordinate fashion but independently from other differentiation programs of expression such as those specific for either epithelial or cardiac cells.


2015 ◽  
Vol 32 (6) ◽  
pp. 850-858 ◽  
Author(s):  
Sangjin Kim ◽  
Paul Schliekelman

Abstract Motivation: The advent of high throughput data has led to a massive increase in the number of hypothesis tests conducted in many types of biological studies and a concomitant increase in stringency of significance thresholds. Filtering methods, which use independent information to eliminate less promising tests and thus reduce multiple testing, have been widely and successfully applied. However, key questions remain about how to best apply them: When is filtering beneficial and when is it detrimental? How good does the independent information need to be in order for filtering to be effective? How should one choose the filter cutoff that separates tests that pass the filter from those that don’t? Result: We quantify the effect of the quality of the filter information, the filter cutoff and other factors on the effectiveness of the filter and show a number of results: If the filter has a high probability (e.g. 70%) of ranking true positive features highly (e.g. top 10%), then filtering can lead to dramatic increase (e.g. 10-fold) in discovery probability when there is high redundancy in information between hypothesis tests. Filtering is less effective when there is low redundancy between hypothesis tests and its benefit decreases rapidly as the quality of the filter information decreases. Furthermore, the outcome is highly dependent on the choice of filter cutoff. Choosing the cutoff without reference to the data will often lead to a large loss in discovery probability. However, naïve optimization of the cutoff using the data will lead to inflated type I error. We introduce a data-based method for choosing the cutoff that maintains control of the family-wise error rate via a correction factor to the significance threshold. Application of this approach offers as much as a several-fold advantage in discovery probability relative to no filtering, while maintaining type I error control. We also introduce a closely related method of P-value weighting that further improves performance. Availability and implementation: R code for calculating the correction factor is available at http://www.stat.uga.edu/people/faculty/paul-schliekelman. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document