scholarly journals LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq

2021 ◽  
Author(s):  
Lance D. Hentges ◽  
Martin J. Sergeant ◽  
Damien J. Downes ◽  
Jim R. Hughes ◽  
Stephen Taylor

AbstractGenomics technologies, such as ATAC-seq, ChIP-seq, and DNase-seq, have revolutionized molecular biology, generating a complete genome’s worth of signal in a single assay. Coupled with the use of genome browsers, researchers can now see and identify important DNA encoded elements as peaks in an analog signal. Despite the ease with which humans can visually identify peaks, converting these signals into meaningful genome-wide peak calls from such massive datasets requires complex analytical techniques. Current methods use statistical frameworks to identify peaks as sites of significant signal enrichment, discounting that the analog data do not follow any archetypal distribution. Recent advances in artificial intelligence have shown great promise in image recognition, on par or exceeding human ability, providing an opportunity to reimagine and improve peak calling. We present an interactive and intuitive peak calling framework, LanceOtron, built around image recognition using a wide and deep neural network. We hand-labelled 499Mb of genomic data, built 5,000 models, and tested with over 100 unique users from labs around the world. In benchmarking open chromatin, transcription factor binding, and chromatin modification datasets, LanceOtron outperforms the long-standing, gold-standard peak caller MACS2 with its increased selectivity and near perfect sensitivity. Additionally, this command-line optional approach allows researchers to easily generate optimal peak-calls using only a web interface. Together, the enhanced performance, and usability of LanceOtron will improve the reliability and reproducibility of peak calls and subsequent data analysis. This tool highlights the general utility of applying machine learning to genomic data extraction and analysis.

2020 ◽  
Author(s):  
Nanxiang Zhao ◽  
Alan P. Boyle

ABSTRACTGenomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing technologies. Peak calling is one of the first essential steps in analyzing these features by delineating regions such as open chromatin regions and transcription factor binding sites. Our original peak calling software, F-Seq, has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive sites sequencing (DNase-seq) data. However, F-Seq lacks support for user-input control dataset nor reporting test statistics, limiting its ability to capture systematic and experimental biases and accurately estimate background distributions. Here we present an improved version, F-Seq2, which combined the power of kernel density estimation and a dynamic “continuous” Poisson distribution to robustly account for local biases and solve ties when ranking candidate peaks. In F-score and motif distance analysis, we demonstrated the superior performance of F-Seq2 than other competing peak callers used by the ENCODE Consortium on simulated and real ATAC-seq and ChIP-seq datasets. The output of F-Seq2 is suitable for irreproducible discovery rate (IDR) analysis as the test statistics calculated for individual candidate summit and ties are robustly solved.


2021 ◽  
Author(s):  
Helen H Habib ◽  
Jefferson Mwaisaka ◽  
Kwasi Torpey ◽  
Ernest Tei Maya ◽  
Augustine Ankomah

Abstract Background: Intrapartum mistreatment of women is a globally rising public health and human rights phenomenon. The issue reportedly has severe maternal and neonatal outcomes including mortality, and generally leads to a decreased satisfaction with maternity care. Intrapartum mistreatment, despite being ubiquitous, indicates higher incidence among adolescent parturients who are simultaneously at a higher risk of maternal morbidity and mortality. Studies have suggested that Respectful Maternity Care (RMC) interventions reduce intrapartum mistreatment and improve clinical outcomes for women and neonates in general. However, evidence on the effect of RMC on adolescents is unclear. Hence the specific aim of this study is to synthesise the available evidence relating to the provision of RMC for adolescents during childbirth.Methods: The methodology of the proposed systematic review follows the procedural guideline depicted in the preferred reporting items for systematic review protocol. The review will include all observational and intervention studies conducted between January 1, 1990 and April 30, 2020. Electronic databases including MEDLINE, PubMed, ScienceDirect, Cochrane, CINAHL, PsycINFO, Scopus, Google Scholar, and Web of Science will be searched to retrieve available studies using the appropriate search strings. The search results will be appraised with Joanna Briggs Institute quality assessment tool. The selection of relevant studies, data extraction, and quality assessment of individual studies will be carried out by two independent authors. Results: A systematic narrative synthesis of the resultant studies will be done, and the relevant themes extracted. Findings will also be summarised in tables.Discussion: Respectful Maternity Care for adolescents holds great promise for improved maternal and neonatal care. However, there is a gap in knowledge on the interventions that work and the extent of their effectiveness. Findings from this study will be beneficial in improving Adolescents Sexual and Reproductive Health and Rights (ASRHR) and reducing maternal mortality, especially for adolescents.Systematic review registration: PROSPERO (Submitted 21 August 2020)


2020 ◽  
Vol 48 (11) ◽  
pp. e62-e62 ◽  
Author(s):  
Qi Song ◽  
Jiyoung Lee ◽  
Shamima Akter ◽  
Matthew Rogers ◽  
Ruth Grene ◽  
...  

Abstract Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.


1988 ◽  
Vol 51 (2) ◽  
pp. 203-213
Author(s):  
John Wansbrough

Use of the term ‘exegesis’ is now so general that scholars in the field of scriptural studies must have sensed an impingement upon their conventional prerogative. If, perhaps, they are justified in so doing, they might none the less be prepared to acknowledge the value of ancillary functions accumulated in its extension into areas beyond its standard application to literature. While it may be that these can be encompassed in the general shift from self-consciously ‘interpretative’ to epistemologically ‘hermeneutic’, it would seem more practical to identify as ‘exegesis’ any and every act of perception. That, of course, is facilitated by the now conventional notion of ‘text’ espoused by most practitioners of structuralism. Whether one equates every datum of perception as somehow ‘textual’ or, conversely, the perception of every text as dependent upon the totality of experience, does not really matter. ‘Exegesis’ is conveniently inclusive and may be thought of general utility in the service of every taste and all analytical techniques. As such, it is ineluctably present in every transaction of the intellect: one observes, hears, reads, and makes the necessary adjustments in aid of understanding. In the very interests of survival, one seldom elects not to understand. It is the ‘necessary adjustments’ that require description, abundantly documented in the textbooks of literary criticism: from the rhetorical ‘naming of parts’ to contemporary discourse analysis. If it seems difficult to add to that vast corpus of technical terms, it is certainly possible to take a stand in respect of their presumptive efficiency.


2019 ◽  
Author(s):  
Aseel Awdeh ◽  
Marcel Turcotte ◽  
Theodore J. Perkins

AbstractMotivationChromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), initially introduced more than a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alleviate bias, the incorporation of control datasets in ChIP-seq analysis is an essential step. The controls are used to account for the background signal, while the remainder of the ChIP-seq signal captures true binding or histone modification. However, a recurrent issue is different types of bias in different ChIP-seq experiments. Depending on which controls are used, different aspects of ChIP-seq bias are better or worse accounted for, and peak calling can produce different results for the same ChIP-seq experiment. Consequently, generating “smart” controls, which model the non-signal effect for a specific ChIP-seq experiment, could enhance contrast and increase the reliability and reproducibility of the results.ResultsWe propose a peak calling algorithm, Weighted Analysis of ChIP-seq (WACS), which is an extension of the well-known peak caller MACS2. There are two main steps in WACS: First, weights are estimated for each control using non-negative least squares regression. The goal is to customize controls to model the noise distribution for each ChIP-seq experiment. This is then followed by peak calling. We demonstrate that WACS significantly outperforms MACS2 and AIControl, another recent algorithm for generating smart controls, in the detection of enriched regions along the genome, in terms of motif enrichment and reproducibility analyses.ConclusionThis ultimately improves our understanding of ChIP-seq controls and their biases, and shows that WACS results in a better approximation of the noise distribution in controls.


2019 ◽  
Author(s):  
Qi Song ◽  
Jiyoung Lee ◽  
Shamima Akter ◽  
Ruth Grene ◽  
Song Li

AbstractRecent advances in genomic technologies have generated large-scale protein-DNA interaction data and open chromatic regions for multiple plant species. To predict condition specific gene regulatory networks using these data, we developed the Condition Specific Regulatory network inference engine (ConSReg), which combines heterogeneous genomic data using sparse linear model followed by feature selection and stability selection to select key regulatory genes. Using Arabidopsis as a model system, we constructed maps of gene regulation under more than 50 experimental conditions including abiotic stresses, cell type-specific expression, and stress responses in individual cell types. Our results show that ConSReg accurately predicted gene expressions (average auROC of 0.84) across multiple testing datasets. We found that, (1) including open chromatin information from ATAC-seq data significantly improves the performance of ConSReg across all tested datasets; (2) choice of negative training samples and length of promoter regions are two key factors that affect model performance. We applied ConSReg to Arabidopsis single cell RNA-seq data of two root cell types (endodermis and cortex) and identified five regulators in two root cell types. Four out of the five regulators have additional experimental evidence to support their roles in regulating gene expression in Arabidopsis roots. By comparing regulatory maps in abiotic stress responses and cell type-specific experiments, we revealed that transcription factors that regulate tissue levels abiotic stresses tend to also regulate stress responses in individual cell types in plants.


Author(s):  
Lee Kuan Xin ◽  
Afnizanfaizal Abdullah

<span>The 21st centuries were deemed to be the era of big data. Data driven research had become a necessity. This hold true not only in the business world, yet also in the field of biomedical world. From a few years of biological data extraction and derivation. With the advancement of Next Generation Sequencing, genomics data had grown to become an ambiguous giant which could not keep up with the pace of its advancement in it analysis counter parts. This results in a large amount of unanalysed genomic data. These genomic data consist not only plain information, researcher had discovered the potential of most gene called the non-coding variant and still failing in identifying their function. With the growth in volume of data, there is also a growth of hardware or technologies. With current technologies, we were able to implement a more complex and sophisticated algorithm in analysis these genomics data. The domain of deep learning had become a major interest of researcher as it was proven to have achieve a significant success in deriving insight from various field. This paper aims to review the current trend of non-coding variant analysis using deep learning approach.</span>


2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Thomas Faux ◽  
Kalle T Rytkönen ◽  
Mehrad Mahmoudian ◽  
Niklas Paulin ◽  
Sini Junttila ◽  
...  

Abstract Changes in cellular chromatin states fine-tune transcriptional output and ultimately lead to phenotypic changes. Here we propose a novel application of our reproducibility-optimized test statistics (ROTS) to detect differential chromatin states (ATAC-seq) or differential chromatin modification states (ChIP-seq) between conditions. We compare the performance of ROTS to existing and widely used methods for ATAC-seq and ChIP-seq data using both synthetic and real datasets. Our results show that ROTS outperformed other commonly used methods when analyzing ATAC-seq data. ROTS also displayed the most accurate detection of small differences when modeling with synthetic data. We observed that two-step methods that require the use of a separate peak caller often more accurately called enrichment borders, whereas one-step methods without a separate peak calling step were more versatile in calling sub-peaks. The top ranked differential regions detected by the methods had marked correlation with transcriptional differences of the closest genes. Overall, our study provides evidence that ROTS is a useful addition to the available differential peak detection methods to study chromatin and performs especially well when applied to study differential chromatin states in ATAC-seq data.


2018 ◽  
Vol 2018 (4) ◽  
pp. 104-124 ◽  
Author(s):  
Gilad Asharov ◽  
Shai Halevi ◽  
Yehuda Lindell ◽  
Tal Rabin

Abstract The growing availability of genomic data holds great promise for advancing medicine and research, but unlocking its full potential requires adequate methods for protecting the privacy of individuals whose genome data we use. One example of this tension is running Similar Patient Query on remote genomic data: In this setting a doctor that holds the genome of his/her patient may try to find other individuals with “close” genomic data, and use the data of these individuals to help diagnose and find effective treatment for that patient’s conditions. This is clearly a desirable mode of operation. However, the privacy exposure implications are considerable, and so we would like to carry out the above “closeness” computation in a privacy preserving manner. In this work we put forward a new approach for highly efficient secure computation for computing an approximation of the Similar Patient Query problem. We present contributions on two fronts. First, an approximation method that is designed with the goal of achieving efficient private computation. Second, further optimizations of the two-party protocol. Our tests indicate that the approximation method works well, it returns the exact closest records in 98% of the queries and very good approximation otherwise. As for speed, our protocol implementation takes just a few seconds to run on databases with thousands of records, each of length thousands of alleles, and it scales almost linearly with both the database size and the length of the sequences in it. As an example, in the datasets of the recent iDASH competition, after a one-time preprocessing of around 12 seconds, it takes around a second to find the nearest five records to a query, in a size-500 dataset of length- 3500 sequences. This is 2-3 orders of magnitude faster than using state-of-the-art secure protocols with existing edit distance algorithms.


2021 ◽  
Vol 13 (7) ◽  
pp. 164
Author(s):  
Tony Gwyn ◽  
Kaushik Roy ◽  
Mustafa Atay

In the realm of computer security, the username/password standard is becoming increasingly antiquated. Usage of the same username and password across various accounts can leave a user open to potential vulnerabilities. Authentication methods of the future need to maintain the ability to provide secure access without a reduction in speed. Facial recognition technologies are quickly becoming integral parts of user security, allowing for a secondary level of user authentication. Augmenting traditional username and password security with facial biometrics has already seen impressive results; however, studying these techniques is necessary to determine how effective these methods are within various parameters. A Convolutional Neural Network (CNN) is a powerful classification approach which is often used for image identification and verification. Quite recently, CNNs have shown great promise in the area of facial image recognition. The comparative study proposed in this paper offers an in-depth analysis of several state-of-the-art deep learning based-facial recognition technologies, to determine via accuracy and other metrics which of those are most effective. In our study, VGG-16 and VGG-19 showed the highest levels of image recognition accuracy, as well as F1-Score. The most favorable configurations of CNN should be documented as an effective way to potentially augment the current username/password standard by increasing the current method’s security with additional facial biometrics.


Sign in / Sign up

Export Citation Format

Share Document