scholarly journals MoMo: discovery of statistically significant post-translational modification motifs

2018 ◽  
Vol 35 (16) ◽  
pp. 2774-2782 ◽  
Author(s):  
Alice Cheng ◽  
Charles E Grant ◽  
William S Noble ◽  
Timothy L Bailey

Abstract Motivation Post-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called ‘motifs’ that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data. Accurate statistical confidence estimates for discovered motifs are critically important for proper interpretation and in the design of downstream experimental validation. Results We describe a method for assigning statistical confidence estimates to PTM motifs, and we demonstrate that this method provides accurate P-values on both simulated and real data. Our methods are implemented in MoMo, a software tool for discovering motifs among sets of PTMs that we make available as a web server and as downloadable source code. MoMo re-implements the two most widely used PTM motif discovery algorithms—motif-x and MoDL—while offering many enhancements. Relative to motif-x, MoMo offers improved statistical confidence estimates and more accurate calculation of motif scores. The MoMo web server offers more proteome databases, more input formats, larger inputs and longer running times than the motif-x web server. Finally, our study demonstrates that the confidence estimates produced by motif-x are inaccurate. This inaccuracy stems in part from the common practice of drawing ‘background’ peptides from an unshuffled proteome database. Our results thus suggest that many of the papers that use motif-x to find motifs may be reporting results that lack statistical support. Availability and implementation The MoMo web server and source code are provided at http://meme-suite.org. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Alice Cheng ◽  
Charles E. Grant ◽  
William S. Noble ◽  
Timothy L. Bailey

AbstractMotivationPost-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called “motifs” that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data. Accurate statistical confidence estimates for discovered motifs are critically important for proper interpretation and in the design of downstream experimental validation.ResultsWe describe a method for assigning statistical confidence estimates to PTM motifs, and we demonstrate that this method provides accurate p-values on both simulated and real data. Our methods are implemented in MoMo, a software tool for discovering motifs among sets of PTMs that we make available as a web server and as downloadable source code. MoMo reimplements the two most widely used PTM motif discovery algorithms—motif-x and MoDL—while offering many enhancements. Relative to motif-x, MoMo offers improved statistical confidence estimates and more accurate calculation of motif scores. The MoMo web server offers more proteome databases, more input formats, larger inputs and longer running times than the motif-x web server. Finally, our study demonstrates that the confidence estimates produced by motif-x are inaccurate. This inaccuracy stems in part from the common practice of drawing “background” peptides from an unshuffled proteome database. Our results thus suggest that many of the hundreds of papers that use motif-x to find motifs may be reporting results that lack statistical support.Availabilityhttp://[email protected]


2017 ◽  
Author(s):  
Alice Cheng ◽  
Charles E. Grant ◽  
Timothy L. Bailey ◽  
William Stafford Noble

AbstractMotivationPost-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called “motifs” that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data.ResultsMoMo is a software tool for identifying motifs among sets of PTMs. The program re-implements two previously described algorithms, Motif-X and MoDL, packaging them in a web-accessible user interface. In addition to reading sequence files in FASTA format, MoMo is capable of directly parsing output files produced by commonly used mass spectrometry search engines. The resulting motifs are presented to the user in an HTML summary with motif logos and linked text files in MEME motif format.AvailabilitySource code and web server available at http://[email protected] and [email protected] informationSupplementary figures are available at Bioinformatics online.


Author(s):  
Marcela Aguilera Flores ◽  
Iulia M Lazar

Abstract Summary The ‘Unknown Mutation Analysis (XMAn)’ database is a compilation of Homo sapiens mutated peptides in FASTA format, that was constructed for facilitating the identification of protein sequence alterations by tandem mass spectrometry detection. The database comprises 2 539 031 non-redundant mutated entries from 17 599 proteins, of which 2 377 103 are missense and 161 928 are nonsense mutations. It can be used in conjunction with search engines that seek the identification of peptide amino acid sequences by matching experimental tandem mass spectrometry data to theoretical sequences from a database. Availability and implementation XMAn v2 can be accessed from github.com/lazarlab/XMAnv2. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 46 (5) ◽  
pp. 1381-1392 ◽  
Author(s):  
Ivar W. Dilweg ◽  
Remus T. Dame

Post-translational modification (PTM) of histones has been investigated in eukaryotes for years, revealing its widespread occurrence and functional importance. Many PTMs affect chromatin folding and gene activity. Only recently the occurrence of such modifications has been recognized in bacteria. However, it is unclear whether PTM of the bacterial counterparts of eukaryotic histones, nucleoid-associated proteins (NAPs), bears a comparable significance. Here, we scrutinize proteome mass spectrometry data for PTMs of the four most abundantly present NAPs in Escherichia coli (H-NS, HU, IHF and FIS). This approach allowed us to identify a total of 101 unique PTMs in the 11 independent proteomic studies covered in this review. Combined with structural and genetic information on these proteins, we describe potential effects of these modifications (perturbed DNA-binding, structural integrity or interaction with other proteins) on their function.


2019 ◽  
Vol 35 (22) ◽  
pp. 4632-4639 ◽  
Author(s):  
Yang Li ◽  
Pengyu Ni ◽  
Shaoqiang Zhang ◽  
Guojun Li ◽  
Zhengchang Su

Abstract Motivation The availability of numerous ChIP-seq datasets for transcription factors (TF) has provided an unprecedented opportunity to identify all TF binding sites in genomes. However, the progress has been hindered by the lack of a highly efficient and accurate tool to find not only the target motifs, but also cooperative motifs in very big datasets. Results We herein present an ultrafast and accurate motif-finding algorithm, ProSampler, based on a novel numeration method and Gibbs sampler. ProSampler runs orders of magnitude faster than the fastest existing tools while often more accurately identifying motifs of both the target TFs and cooperators. Thus, ProSampler can greatly facilitate the efforts to identify the entire cis-regulatory code in genomes. Availability and implementation Source code and binaries are freely available for download at https://github.com/zhengchangsulab/prosampler. It was implemented in C++ and supported on Linux, macOS and MS Windows platforms. Supplementary information Supplementary materials are available at Bioinformatics online.


2015 ◽  
Vol 32 (6) ◽  
pp. 955-957 ◽  
Author(s):  
Filippo Piccinini ◽  
Alexa Kiss ◽  
Peter Horvath

Abstract Motivation: Time-lapse experiments play a key role in studying the dynamic behavior of cells. Single-cell tracking is one of the fundamental tools for such analyses. The vast majority of the recently introduced cell tracking methods are limited to fluorescently labeled cells. An equally important limitation is that most software cannot be effectively used by biologists without reasonable expertise in image processing. Here we present CellTracker, a user-friendly open-source software tool for tracking cells imaged with various imaging modalities, including fluorescent, phase contrast and differential interference contrast (DIC) techniques. Availability and implementation: CellTracker is written in MATLAB (The MathWorks, Inc., USA). It works with Windows, Macintosh and UNIX-based systems. Source code and graphical user interface (GUI) are freely available at: http://celltracker.website/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3947-3948
Author(s):  
Jose-Jesus Fernandez ◽  
Teobaldo E Torres ◽  
Eva Martin-Solana ◽  
Gerardo F Goya ◽  
Maria-Rosario Fernandez-Fernandez

Abstract Summary We have developed a software tool to improve the image quality in focused ion beam–scanning electron microscopy (FIB–SEM) stacks: PolishEM. Based on a Gaussian blur model, it automatically estimates and compensates for the blur affecting each individual image. It also includes correction for artifacts commonly arising in FIB–SEM (e.g. curtaining). PolishEM has been optimized for an efficient processing of huge FIB–SEM stacks on standard computers. Availability and implementation PolishEM has been developed in C. GPL source code and binaries for Linux, OSX and Windows are available at http://www.cnb.csic.es/%7ejjfernandez/polishem. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (16) ◽  
pp. 4423-4431
Author(s):  
Wenbo Xu ◽  
Yan Tian ◽  
Siye Wang ◽  
Yupeng Cui

Abstract Motivation The classification of high-throughput protein data based on mass spectrometry (MS) is of great practical significance in medical diagnosis. Generally, MS data are characterized by high dimension, which inevitably leads to prohibitive cost of computation. To solve this problem, one-bit compressed sensing (CS), which is an extreme case of quantized CS, has been employed on MS data to select important features with low dimension. Though enjoying remarkably reduction of computation complexity, the current one-bit CS method does not consider the unavoidable noise contained in MS dataset, and does not exploit the inherent structure of the underlying MS data. Results We propose two feature selection (FS) methods based on one-bit CS to deal with the noise and the underlying block-sparsity features, respectively. In the first method, the FS problem is modeled as a perturbed one-bit CS problem, where the perturbation represents the noise in MS data. By iterating between perturbation refinement and FS, this method selects the significant features from noisy data. The second method formulates the problem as a perturbed one-bit block CS problem and selects the features block by block. Such block extraction is due to the fact that the significant features in the first method usually cluster in groups. Experiments show that, the two proposed methods have better classification performance for real MS data when compared with the existing method, and the second one outperforms the first one. Availability and implementation The source code of our methods is available at: https://github.com/tianyan8023/OBCS. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document