motif prediction
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 7)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Bishoy Wadie ◽  
Vitalii Kleshchevnikov ◽  
Elissavet Sandaltzopoulou ◽  
Caroline Benz ◽  
Evangelia Petsalaki

Linear motifs have an integral role in dynamic cell functions including cell signalling, the cell cycle and others. However, due to their small size, low complexity, degenerate nature, and frequent mutations, identifying novel functional motifs is a challenging task. Viral proteins rely extensively on the molecular mimicry of cellular linear motifs for modifying cell signalling and other processes in ways that favour viral infection. This study aims to discover human linear motifs convergently evolved also in disordered regions of viral proteins, under the hypothesis that these will result in enrichment in functional motif instances. We systematically apply computational motif prediction, combined with implementation of several functional and structural filters to the most recent publicly available human-viral and human-human protein interaction network. By limiting the search space to the sequences of viral proteins, we observed an increase in the sensitivity of motif prediction, as well as improved enrichment in known instances compared to the same analysis using only human protein interactions. We identified > 8,400 motif instances at various confidence levels, 105 of which were supported by all functional and structural filters applied. Overall, we provide a pipeline to improve the identification of functional linear motifs from interactomics datasets and a comprehensive catalogue of putative human motifs that can contribute to our understanding of the human domain-linear motif code and the mechanisms of viral interference with this.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10207
Author(s):  
Richard John Tiika ◽  
Jia Wei ◽  
Rui Ma ◽  
Hongshan Yang ◽  
Guangxin Cui ◽  
...  

Background The WRKY gene family, one of the major transcription factor families in plants, plays crucial regulatory roles in physiological and biological developmental processes, and the adaptation of plants to the environment. However, the systematic study of WRKY structure, expression profiling, and regulatory functions has not been extensively reported in Lycium ruthenicum, although these aspects have been comprehensively studied in most plant species. Methods In this study, the WRKY genes were identified from a L. ruthenicum transcriptome database by using bioinformatics. The identification, phylogenetic analysis, zinc-finger structures, and conserved motif prediction were extensively explored. Moreover, the expression levels of 23 selected genes with fragments per kilobase of exons per million mapped reads (FPKM) >5 were assayed during different fruit developmental stages with real-time quantitative polymerase chain reaction (RT-qPCR). Results A total of 73 putative WRKY proteins in the L. ruthenicum transcriptome database were identified and examined. Forty-four proteins with the WRKY domain were identified and divided into three major groups with several subgroups, in accordance with those in other plant species. All 44 LrWRKY proteins contained one or two conserved WRKY domains and a zinc-finger structure. Conserved motif prediction revealed conservation of the WRKY DNA-binding domain in L. ruthenicum proteins. The selected LrWRKY genes exhibited discrete expression patterns during different fruit developmental stages. Interestingly, five LrWRKYs (-20, -21, -28, -30, and -31) were expressed remarkably throughout the fruit developmental stages. Discussion Our results reveal the characteristics of the LrWRKY gene family, thus laying a foundation for further functional analysis of the WRKY family in L. ruthenicum.


2019 ◽  
Author(s):  
Alexandra Grote ◽  
Yichao Li ◽  
Canhui Liu ◽  
Denis Voronin ◽  
Adam Geber ◽  
...  

AbstractFilarial nematodes can cause debilitating diseases in humans. They have complicated life cycles involving an insect vector and mammalian hosts, and they go through a number of developmental molts. While whole genome sequences of parasitic worms are now available, very little is known about transcription factor (TF) binding sites and their cognate transcription factors that play a role in regulating development. To address this gap, we developed a novel motif prediction pipeline, Emotif Alpha, that integrates ten different motif discovery algorithms, multiple statistical tests, and a comparative analysis of conserved elements between the filarial worms Brugia malayi and Onchocerca volvulus, and the free-living nematode Caenorhabditis elegans. We identified stage-specific TF binding motifs in B. malayi, with a particular focus on those potentially involved in the L3-L4 molt, a stage important for the establishment of infection in the mammalian host. Using an in vitro molting system, we tested and validated three of these motifs demonstrating the accuracy of the motif prediction pipeline.


2019 ◽  
Vol 35 (21) ◽  
pp. 4405-4407 ◽  
Author(s):  
Steven Monger ◽  
Michael Troup ◽  
Eddie Ip ◽  
Sally L Dunwoodie ◽  
Eleni Giannoulatou

Abstract Motivation In silico prediction tools are essential for identifying variants which create or disrupt cis-splicing motifs. However, there are limited options for genome-scale discovery of splice-altering variants. Results We have developed Spliceogen, a highly scalable pipeline integrating predictions from some of the individually best performing models for splice motif prediction: MaxEntScan, GeneSplicer, ESRseq and Branchpointer. Availability and implementation Spliceogen is available as a command line tool which accepts VCF/BED inputs and handles both single nucleotide variants (SNVs) and indels (https://github.com/VCCRI/Spliceogen). SNV databases with prediction scores are also available, covering all possible SNVs at all genomic positions within all Gencode-annotated multi-exon transcripts. Supplementary information Supplementary data are available at Bioinformatics online.


Biotechnology ◽  
2019 ◽  
pp. 1069-1085
Author(s):  
Andrei Lihu ◽  
Ștefan Holban

De novo motif discovery is essential in understanding the cis-regulatory processes that play a role in gene expression. Finding unknown patterns of unknown lengths in massive amounts of data has long been a major challenge in computational biology. Because algorithms for motif prediction have always suffered of low performance issues, there is a constant effort to find better techniques. Evolutionary methods, including swarm intelligence algorithms, have been applied with limited success for motif prediction. However, recently developed methods, such as the Fireworks Algorithm (FWA) which simulates the explosion process of fireworks, may show better prospects. This paper describes a motif finding algorithm based on FWA that maximizes the Kullback-Leibler divergence between candidate solutions and the background noise. Following the terminology of FWA's framework, the candidate motifs are fireworks that generate additional sparks (i.e. derived motifs) in their neighborhood. During the iterations, better sparks can replace the fireworks, as the Fireworks Motif Finder (FW-MF) assumes a one occurrence per sequence mode. The results obtained on a standard benchmark for promoter analysis show that our proof of concept is promising.


2018 ◽  
Vol 466 ◽  
pp. 25-43 ◽  
Author(s):  
Nung Kion Lee ◽  
Xi Li ◽  
Dianhui Wang

2018 ◽  
Author(s):  
Jinyu Yang ◽  
Adam D. Hoppe ◽  
Bingqiang Liu ◽  
Qin Ma

ABSTRACTIdentification of transcription factor binding sites (TFBSs) and cis-regulatory motifs (motifs for short) from genomics datasets, provides a powerful view of the rules governing the interactions between TFs and DNA. Existing motif prediction methods however, are limited by high false positive rates in TFBSs identification, contributions from non-sequence-specific binding, and complex and indirect binding mechanisms. High throughput next-generation sequencing data provides unprecedented opportunities to overcome these difficulties, as it provides multiple whole-genome scale measurements of TF binding information. Uncovering this information brings new computational and modeling challenges in high-dimensional data mining and heterogeneous data integration. To improve TFBS identification and novel motifs prediction accuracy in the human genome, we developed an advanced computational technique based on deep learning (DL) and high-performance computing, named DESSO. DESSO utilizes deep neural network and binomial distribution to optimize the motif prediction. Our results showed that DESSO outperformed existing tools in predicting distinct motifs from the 690 in vivo ENCODE ChIP-Sequencing (ChIP-Seq) datasets for 161 human TFs in 91 cell lines. We also found that protein-protein interactions (PPIs) are prevalent among human TFs, and a total of 61 potential tethering binding were identified among the 100 TFs in the K562 cell line. To further expand DESSO’s deep-learning capabilities, we included DNA shape features and found that (i) shape information has a strong predictive power for TF-DNA binding specificity; and (ii) it aided in identification of the shape motifs recognized by human TFs which in turn contributed to the interpretation of TF-DNA binding in the absence of sequence recognition. DESSO and the analyses it enabled will continue to improve our understanding of how gene expression is controlled by TFs and the complexities of DNA binding. The source code and the predicted motifs and TFBSs from the 690 ENCODE TF ChIP-Seq datasets are freely available at the DESSO web server: http://bmbl.sdstate.edu/DESSO.


Sign in / Sign up

Export Citation Format

Share Document