motif occurrence
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 4)

H-INDEX

2
(FIVE YEARS 1)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Bjørn André Bredesen ◽  
Marc Rehmsmeier

Abstract Background Cis-regulatory elements (CREs) are DNA sequence segments that regulate gene expression. Among CREs are promoters, enhancers, Boundary Elements (BEs) and Polycomb Response Elements (PREs), all of which are enriched in specific sequence motifs that form particular occurrence landscapes. We have recently introduced a hierarchical machine learning approach (SVM-MOCCA) in which Support Vector Machines (SVMs) are applied on the level of individual motif occurrences, modelling local sequence composition, and then combined for the prediction of whole regulatory elements. We used SVM-MOCCA to predict PREs in Drosophila and found that it was superior to other methods. However, we did not publish a polished implementation of SVM-MOCCA, which can be useful for other researchers, and we only tested SVM-MOCCA with IUPAC motifs and PREs. Results We here present an expanded suite for modelling CRE sequences in terms of motif occurrence combinatorics—Motif Occurrence Combinatorics Classification Algorithms (MOCCA). MOCCA contains efficient implementations of several modelling methods, including SVM-MOCCA, and a new method, RF-MOCCA, a Random Forest–derivative of SVM-MOCCA. We used SVM-MOCCA and RF-MOCCA to model Drosophila PREs and BEs in cross-validation experiments, making this the first study to model PREs with Random Forests and the first study that applies the hierarchical MOCCA approach to the prediction of BEs. Both models significantly improve generalization to PREs and boundary elements beyond that of previous methods—including 4-spectrum and motif occurrence frequency Support Vector Machines and Random Forests—, with RF-MOCCA yielding the best results. Conclusion MOCCA is a flexible and powerful suite of tools for the motif-based modelling of CRE sequences in terms of motif composition. MOCCA can be applied to any new CRE modelling problems where motifs have been identified. MOCCA supports IUPAC and Position Weight Matrix (PWM) motifs. For ease of use, MOCCA implements generation of negative training data, and additionally a mode that requires only that the user specifies positives, motifs and a genome. MOCCA is licensed under the MIT license and is available on Github at https://github.com/bjornbredesen/MOCCA.


Author(s):  
Jiayu Zhang ◽  
Zhen Shen ◽  
Zheyu Song ◽  
Jian Luan ◽  
Yezhou Li ◽  
...  

Abstract Background: Colon cancer is still the most commonly diagnosed malignancy and leading cause of death worldwide. Apart from living habits, genetic and epigenetic changes are key factors to influence the risk of colon cancer. However, the impact of epigenetic alterations in non-coding RNAs and the consequences for colon cancer has not been fully characterized.Methods: We detected differential methylation sites (DMSs) in lncRNA promoters, and identified lncQTMs by association test. To investigate TF binding affected by DNA methylation, we characterized known TF motif occurrence among DMSs collected from MEME suit. We further combined methylome and transcriptome data to construct TF-methylation-lncRNA relationships. To study the role of lncRNAs in drug response, we used pharmacological and lncRNA profiles derived from CCLE and predict drug response by lncRNA expression level. We also used the combination of TF-methylation-lncRNA relationship to stratified patient survival information by a risk model.Results: DNA methylation display global hyper-methylation character in lncRNA promoters, and they tend to have negative relationship with the corresponding lncRNAs. Negative lncQTMs located near TSS have more significant and stronger correlation with the corresponding lncRNAs. Some lncRNAs mediated by the interplay between DNA methylation and TFs are proved markers for colon cancer. Typically, lncRNA CAHM, RP11-834C11.4 and LINC00460 are good predictors for 5 drug components (17-AAG, Sorafenib, TKI258, RAF265, Topotecan) in colon cancer. And we found HES1_cg24685006_RP4-728D4.2 and SREBF1_cg05372727_LINC00460 relationships are prognostic signatures for colon cancer.Conclusions: These findings suggested lncRNAs mediated by the interplay between DNA methylation and TFs are promising predictors for drug response, besides, combined TF-methylation-lncRNA can serve as prognostic signature for colon cancer.


2019 ◽  
Vol 33 (21) ◽  
pp. 1950237
Author(s):  
Wen-Jie Xie ◽  
Rui-Qi Han ◽  
Wei-Xing Zhou

It is of great significance to identify the characteristics of time series to quantify their similarity and classify different classes of time series. We define six types of triadic time-series motifs and investigate the motif occurrence profiles extracted from the time series. Based on triadic time series motif profiles, we further propose to estimate the similarity coefficients between different time series and classify these time series with high accuracy. We validate the method with time series generated from nonlinear dynamic systems (logistic map, chaotic logistic map, chaotic Henon map, chaotic Ikeda map, hyperchaotic generalized Henon map and hyperchaotic folded-tower map) and retrieved from the UCR Time Series Classification Archive. Our analysis shows that the proposed triadic time series motif analysis performs better than the classic dynamic time wrapping method in classifying time series for certain datasets investigated in this work.


Biostatistics ◽  
2019 ◽  
Vol 21 (3) ◽  
pp. 625-639
Author(s):  
Ioannis Vardaxis ◽  
Finn Drabløs ◽  
Morten B Rye ◽  
Bo Henry Lindqvist

Summary We present model-based analysis for ChIA-PET (MACPET), which analyzes paired-end read sequences provided by ChIA-PET for finding binding sites of a protein of interest. MACPET uses information from both tags of each PET and searches for binding sites in a two-dimensional space, while taking into account different noise levels in different genomic regions. MACPET shows favorable results compared with MACS in terms of motif occurrence and spatial resolution. Furthermore, significant binding sites discovered by MACPET are involved in a higher number of significant three-dimensional interactions than those discovered by MACS. MACPET is freely available on Bioconductor. ChIA-PET; MACPET; Model-based clustering; Paired-end tags; Peak-calling algorithm.


2018 ◽  
Author(s):  
Yang Li ◽  
Pengyu Ni ◽  
Shaoqiang Zhang ◽  
Guojun Li ◽  
Zhengchang Su

ABSTRACTThe availability of a large volume of chromatin immunoprecipitation followed by sequencing (ChIP-seq) datasets for various transcription factors (TF) has provided an unprecedented opportunity to identify all functional TF binding motifs clustered in the enhancers in genomes. However, the progress has been largely hindered by the lack of a highly efficient and accurate tool that is fast enough to find not only the target motifs, but also cooperative motifs contained in very large ChIP-seq datasets with a binding peak length of typical enhancers (∼ 1,000 bp). To circumvent this hurdle, we herein present an ultra-fast and highly accurate motif-finding algorithm, ProSampler, with automatic motif length detection. ProSampler first identifies significant k-mers in the dataset and combines highly similar significant k-mers to form preliminary motifs. ProSampler then merges preliminary motifs with subtle similarity using a novel graph-based Gibbs sampler to find core motifs. Finally, ProSampler extends the core motifs by applying a two-proportion z-test to the flanking positions to identify motifs longer than k. As the number of preliminary motifs is much smaller than that of k-mers in a dataset, we greatly reduce the search space of the Gibbs sampler compared with conventional ones. By storing flanking sequences in a hash table, we avoid extensive IO and the necessity of examining all lengths of motifs in an interval. When evaluated on both synthetic and real ChIP-seq datasets, ProSampler runs orders of magnitude faster than the fastest existing tools while more accurately discovering primary motifs as well as cooperative motifs than do the best existing tools. Using ProSampler, we revealed previously unknown complex motif occurrence patterns in large ChIP-seq datasets, thereby providing insights into the mechanisms of cooperative TF binding for gene transcriptional regulation. Therefore, by allowing fast and accurate mining of the entire ChIP-seq datasets, ProSampler can greatly facilitate the efforts to identify the entire cis-regulatory code in genomes.


2018 ◽  
Author(s):  
Ioannis Vardaxis ◽  
Finn Drabløs ◽  
Morten B. Rye ◽  
Bo Henry Lindqvist

AbstractWe present Model-based Analysis for ChIA-PET (MACPET) which analyzes paired-end read sequences provided by ChIA-PET for finding binding sites of a protein of interest. MACPET uses information from both tags of each PET and searches for binding sites in a two-dimensional space, while taking into account different noise levels in different genomic regions. MACPET shows favorable results compared to MACS in terms of motif occurrence, spatial resolution and false discovery rate. Significant binding sites discovered by MACPET are involved in a higher number of significant 3D interactions than those discovered by MACS. MACPET is freely available on Bioconductor.


2017 ◽  
Author(s):  
Gandharva Nagpal ◽  
Kumardeep Chaudhary ◽  
Piyush Agrawal ◽  
Gajendra P.S. Raghava

ABSTRACTBackgroundEvidences in literature strongly advocate the potential of immunomodulatory peptides for use as vaccine adjuvants. All the mechanisms of vaccine adjuvants ensuing immunostimulatory effects directly or indirectly stimulate Antigen Presenting Cells (APCs). While numerous methods have been developed in the past for predicting B-cell and T-cell epitopes; no method is available for predicting the peptides that can modulate the APCs.MethodsWe named the peptides that can activate APCs as A-cell epitopes and developed methods for their prediction in this study. A dataset of experimentally validated A-cell epitopes was collected and compiled from various resources. To predict A-cell epitopes, we developed Support Vector Machine-based machine learning models using different sequence-based features.ResultsA hybrid model developed on a combination of sequence-based features (dipeptide composition and motif occurrence), achieved the highest accuracy of 96.91% with Matthews Correlation Coefficient (MCC) value of 0.94 on the training dataset. We also evaluated the hybrid models on an independent dataset and achieved a comparable accuracy of 94.93% with MCC 0.90.ConclusionThe models developed in this study were implemented in a web-based platform VaxinPAD to predict and design immunomodulatory peptides or A-cell epitopes. This web server available at http://webs.iiitd.edu.in/raghava/vaxinpad/ and http://crdd.osdd.net/raghava/vaxinpad/ will facilitate researchers in designing peptide-based vaccine adjuvants.


2015 ◽  
Vol 59 (3) ◽  
pp. 384-402 ◽  
Author(s):  
David L. González-Álvarez ◽  
Miguel A. Vega-Rodríguez ◽  
Álvaro Rubio-Largo

PeerJ ◽  
2014 ◽  
Vol 2 ◽  
pp. e559 ◽  
Author(s):  
Naoki Matsushita ◽  
Shigeto Seno ◽  
Yoichi Takenaka ◽  
Hideo Matsuda

Sign in / Sign up

Export Citation Format

Share Document