motif finding
Recently Published Documents


TOTAL DOCUMENTS

131
(FIVE YEARS 23)

H-INDEX

17
(FIVE YEARS 2)

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 517
Author(s):  
Len Taing ◽  
Gali Bai ◽  
Clara Cousins ◽  
Paloma Cejas ◽  
Xintao Qiu ◽  
...  

Motivation: The chromatin profile measured by ATAC-seq, ChIP-seq, or DNase-seq experiments can identify genomic regions critical in regulating gene expression and provide insights on biological processes such as diseases and development. However, quality control and processing chromatin profiling data involves many steps, and different bioinformatics tools are used at each step. It can be challenging to manage the analysis. Results: We developed a Snakemake pipeline called CHIPS (CHromatin enrIchment ProcesSor) to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data. The pipeline supports single- and paired-end data and is flexible to start with FASTQ or BAM files. It includes basic steps such as read trimming, mapping, and peak calling. In addition, it calculates quality control metrics such as contamination profiles, polymerase chain reaction bottleneck coefficient, the fraction of reads in peaks, percentage of peaks overlapping with the union of public DNaseI hypersensitivity sites, and conservation profile of the peaks. For downstream analysis, it carries out peak annotations, motif finding, and regulatory potential calculation for all genes. The pipeline ensures that the processing is robust and reproducible. Availability: CHIPS is available at https://github.com/liulab-dfci/CHIPS.


2021 ◽  
Author(s):  
Mohammad Vahed ◽  
Majid Vahed ◽  
Lana Garmire

Motif discovery and characterization are important for gene regulation analysis. The lack of intuitive and integrative web servers impedes effective use of motifs. Here we describe Bipartite Motifs Learning (BML), a web server that provides a user-friendly portal for online discovery and analysis of sequence motifs, using high-throughput sequencing data as the input. BML utilizes both position weight matrix (PWM) and dinucleotide weight matrix (DWM), the latter of which enables the expression of the interdependencies of neighboring bases. With input parameters concerning the motifs are given, the BML achieves significantly higher accuracy than other available tools for motif finding. When no parameters are given by non-expert users, unlike other tools BML employs a learning method to identify motifs automatically and achieve accuracy comparable to the scenario where the parameters are set. The BML web server is freely available at http://motif.t-ridership.com/.


2021 ◽  
Author(s):  
Len Taing ◽  
Clara Cousins ◽  
Gali Bai ◽  
Paloma Cejas ◽  
Xintao Qiu ◽  
...  

AbstractMotivationThe chromatin profile measured by ATAC-seq, ChIP-seq, or DNase-seq experiments can identify genomic regions critical in regulating gene expression and provide insights on biological processes such as diseases and development. However, quality control and processing chromatin profiling data involve many steps, and different bioinformatics tools are used at each step. It can be challenging to manage the analysis.ResultsWe developed a Snakemake pipeline called CHIPS (CHromatin enrichment Processor) to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data. The pipeline supports single- and paired-end data and is flexible to start with FASTQ or BAM files. It includes basic steps such as read trimming, mapping, and peak calling. In addition, it calculates quality control metrics such as contamination profiles, PCR bottleneck coefficient, the fraction of reads in peaks, percentage of peaks overlapping with the union of public DNaseI hypersensitivity sites, and conservation profile of the peaks. For downstream analysis, it carries out peak annotations, motif finding, and regulatory potential calculation for all genes. The pipeline ensures that the processing is robust and reproducible.AvailabilityCHIPS is available at https://github.com/liulab-dfci/CHIPS


2020 ◽  
Author(s):  
Yan Wang ◽  
Shuangquan Zhang ◽  
Anjun Ma ◽  
Cankun Wang ◽  
Zhenyu Wu ◽  
...  

AbstractCis-regulatory motif finding is a crucial step in the detection of gene regulatory mechanisms using genomic data. Deep learning (DL) models have been utilized to denovoly identify motifs, and have been proven to outperform traditional methods. By 2020, twenty DL models have been developed to identify DNA and RNA motifs with diverse framework designs and implementation styles. Hence, it is beneficial to systematically compare their performances, which can facilitate researchers in selecting the appropriate tools for their motif analyses. Here, we carried out an in-depth assessment of the 20 models utilizing 1,043 genomic sequencing datasets, including 690 ENCODE ChIP-Seq, 126 cancer ChIP-Seq, 172 single-cell cleavages under targets and release using a nuclease, and 55 RNA CLIP-Seq. Four metrics were designed and investigated, including the accuracy of motif finding, the performance of DNA/RNA sequence classification, algorithm scalability, and tool usability. The assessment results demonstrated the high complementarity of the existing models, and it was determined that the most suitable model should primarily depend on the data size and type as well as the model outputs. A webserver was developed to allow efficient access of the identified motifs and effective utilization of high-performing DL models.


2020 ◽  
Vol 6 (31) ◽  
pp. eabb3350
Author(s):  
Zhuokun Li ◽  
Xiaojue Wang ◽  
Dongyang Xu ◽  
Dengwei Zhang ◽  
Dan Wang ◽  
...  

Here, we report a sensitive DocMF system that uses next-generation sequencing chips to profile protein-DNA interactions. Using DocMF, we successfully identified a variety of endonuclease recognition sites and the protospacer adjacent motif (PAM) sequences of different CRISPR systems. DocMF can simultaneously screen both 5′ and 3′ PAMs with high coverage. For SpCas9, we found noncanonical 5′-NAG-3′ (~5%) and 5′-NGA-3′ (~1.6%), in addition to its common PAMs, 5′-NGG-3′ (~89.9%). More relaxed PAM sequences of two uncharacterized Cas endonucleases, VeCas9 and BvCas12a, were extensively characterized using DocMF. Moreover, we observed that dCas9, a DNA binding protein lacking endonuclease activity, preferably bound to the previously reported 5′-NGG-3′ sequence. In summary, our studies demonstrate that DocMF is the first tool with the capacity to exhaustively assay both the binding and the cutting properties of different DNA binding proteins.


Author(s):  
Kaijian Xia ◽  
Xiang Wu ◽  
Yaqing Mao ◽  
Huanhuan Wang
Keyword(s):  

2020 ◽  
Vol 22 (10) ◽  
pp. 683-693 ◽  
Author(s):  
Xun Wang ◽  
Shudong Wang ◽  
Tao Song

Background: Genes are known as functional patterns in the genome and are presumed to have biological significance. They can indicate binding sites for transcription factors and they encode certain proteins. Finding genes from biological sequences is a major task in computational biology for unraveling the mechanisms of gene expression. Objective: Planted motif finding problems are a class of mathematical models abstracted from the process of detecting genes from genome, in which a specific gene with a number of mutations is planted into a randomly generated background sequence, and then gene finding algorithms can be tested to check if the planted gene can be found in feasible time. Method: In this work, a spectral rotation method based on triplet periodicity property is proposed to solve planted motif finding problems. Results: The proposed method gives significant tolerance of base mutations in genes. Specifically, genes having a number of substitutions can be detected from randomly generated background sequences. Experimental results on genomic data set from Saccharomyces cerevisiae reveal that genes can be visually distinguished. It is proposed that genes with about 50% mutations can be detected from randomly generated background sequences. Conclusion: It is found that with about 5 insertions or deletions, this method fails in finding the planted genes. For a particular case, if the deletion of bases is located at the beginning of the gene, that is, bases are not randomly deleted, then the tolerance of the method for base deletion is increased.


Sign in / Sign up

Export Citation Format

Share Document