SAPH-ire TFx – A Recommendation-based Machine Learning Model Captures a Broad Feature Landscape Underlying Functional Post-Translational Modifications

ABSTRACTProtein post-translational modifications (PTMs) are a rapidly expanding feature class of significant importance in cell biology. Due to a high burden of experimental proof, the number of functional PTMs in the eukaryotic proteome is currently underestimated. Furthermore, not all PTMs are functionally equivalent. Therefore, computational approaches that can confidently recommend the functional potential of experimental PTMs are essential. To address this challenge, we developed SAPH-ire TFx (https://saphire.biosci.gatech.edu/): a multi-feature neural network model and web resource optimized for recommending experimental PTMs with high potential for biological impact. The model is rigorously benchmarked against independent datasets and alternative models, exhibiting unmatched performance in the recall of known functional PTM sites and the recommendation of PTMs that were later confirmed experimentally. An analysis of feature contributions to model outcome provides further insight on the need for multiple rather than single features to capture the breadth of functional data in the public [email protected] InformationSee Tables S1-S6 & Figures S1-S4.

Download Full-text

SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties

Bioinformatics ◽

10.1093/bioinformatics/btz215 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4140-4146 ◽

Cited By ~ 10

Author(s):

Ghazaleh Taherzadeh ◽

Abdollah Dehzangi ◽

Maryam Golchin ◽

Yaoqi Zhou ◽

Matthew P Campbell

Keyword(s):

Protein Glycosylation ◽

Supplementary Information ◽

Support Vector ◽

Intercellular Signaling ◽

Post Translational Modifications ◽

Novel Structure ◽

Glycosylation Sites ◽

Improved Performance ◽

Human And Mouse ◽

Fold Cross Validation

Abstract Motivation Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. Results The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features. Availability and implementation http://sparks-lab.org/server/SPRINT-Gly/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PmliPred: a method based on hybrid model and fuzzy decision for plant miRNA–lncRNA interaction prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa074 ◽

2020 ◽

Vol 36 (10) ◽

pp. 2986-2992 ◽

Cited By ~ 3

Author(s):

Qiang Kang ◽

Jun Meng ◽

Jun Cui ◽

Yushi Luan ◽

Ming Chen

Keyword(s):

Biological Activities ◽

Learning Model ◽

Supplementary Information ◽

Fuzzy Decision ◽

Interaction Prediction ◽

Machine Learning Model ◽

Plant Mirna ◽

Mechanism Of Interaction ◽

Polymerase Chain ◽

Deep Learning Model

Abstract Motivation The studies have indicated that not only microRNAs (miRNAs) or long non-coding RNAs (lncRNAs) play important roles in biological activities, but also their interactions affect the biological process. A growing number of studies focus on the miRNA–lncRNA interactions, while few of them are proposed for plant. The prediction of interactions is significant for understanding the mechanism of interaction between miRNA and lncRNA in plant. Results This article proposes a new method for fulfilling plant miRNA–lncRNA interaction prediction (PmliPred). The deep learning model and shallow machine learning model are trained using raw sequence and manually extracted features, respectively. Then they are hybridized based on fuzzy decision for prediction. PmliPred shows better performance and generalization ability compared with the existing methods. Several new miRNA–lncRNA interactions in Solanum lycopersicum are successfully identified using quantitative real time–polymerase chain reaction from the candidates predicted by PmliPred, which further verifies its effectiveness. Availability and implementation The source code of PmliPred is freely available at http://bis.zju.edu.cn/PmliPred/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Non-Coding RNAs in Cartilage Development: An Updated Review

International Journal of Molecular Sciences ◽

10.3390/ijms20184475 ◽

2019 ◽

Vol 20 (18) ◽

pp. 4475 ◽

Cited By ~ 13

Author(s):

Ehsan Razmara ◽

Amirreza Bitaraf ◽

Hassan Yousefi ◽

Tina H. Nguyen ◽

Masoud Garshasbi ◽

...

Keyword(s):

Cell Biology ◽

Endochondral Ossification ◽

Association Studies ◽

Post Translational Modifications ◽

Cartilage Development ◽

Complex Process ◽

Cartilage Cells ◽

Non Coding Rnas ◽

Skeletal Disorders

In the development of the skeleton, the long bones are arising from the process of endochondral ossification (EO) in which cartilage is replaced by bone. This complex process is regulated by various factors including genetic, epigenetic, and environmental elements. It is recognized that DNA methylation, higher-order chromatin structure, and post-translational modifications of histones regulate the EO. With emerging understanding, non-coding RNAs (ncRNAs) have been identified as another mode of EO regulation, which is consist of microRNAs (miRNAs or miRs) and long non-coding RNAs (lncRNAs). There is expanding experimental evidence to unlock the role of ncRNAs in the differentiation of cartilage cells, as well as the pathogenesis of several skeletal disorders including osteoarthritis. Cutting-edge technologies such as epigenome-wide association studies have been employed to reveal disease-specific patterns regarding ncRNAs. This opens a new avenue of our understanding of skeletal cell biology, and may also identify potential epigenetic-based biomarkers. In this review, we provide an updated overview of recent advances in the role of ncRNAs especially focus on miRNA and lncRNA in the development of bone from cartilage, as well as their roles in skeletal pathophysiology.

Download Full-text

Effect of supplementing Moringa oleifera essential oils on milk quality and fatty acid profile in dairy sheep

Indian Journal of Animal Research ◽

10.18805/ijar.b-3808 ◽

2019 ◽

Author(s):

N. Hemamalini ◽

S. Ezhilmathi ◽

A. Angela Mercy

Keyword(s):

Recombinant Protein ◽

Protein Expression ◽

Cell Biology ◽

Genetic Manipulation ◽

Recombinant Protein Expression ◽

Coli Strain ◽

Expression Vectors ◽

Functional Domain ◽

Post Translational Modifications ◽

E Coli

Escherichia coli is the most extensively used organism in recombinant protein production. It has several advantages including a very short life cycle, ease of genetic manipulation and the well-known cell biology etc. which makes E. coli as the perfect host for recombinant protein expression. Despite many advantages, E. coli also have few disadvantages such as coupled transcription and translation and lack of eukaryotic post-translational modifications. These challenges can be overcome by adopting several strategies such as, using different E. coli expression vectors, changing the gene sequence without altering the functional domain, modified E. coli strain usage, changing the culture parameters and co-expression with a molecular chaperone. In this review, we present the level of strategies used to enhance the recombinant protein expression and its stability in E. coli.

Download Full-text

Reinspection of a Clinical Proteomics Tumor Analysis Consortium (CPTAC) Dataset with Cloud Computing Reveals Abundant Post-Translational Modifications and Protein Sequence Variants

Cancers ◽

10.3390/cancers13205034 ◽

2021 ◽

Vol 13 (20) ◽

pp. 5034

Author(s):

Amol Prakash ◽

Lorne Taylor ◽

Manu Varkey ◽

Nate Hoxie ◽

Yassene Mohammed ◽

...

Keyword(s):

Cloud Computing ◽

Human Tumors ◽

Clinical Proteomics ◽

Sequence Variants ◽

Proteomic Profiling ◽

High Confidence ◽

Independent Evidence ◽

Post Translational Modifications ◽

Web Resource ◽

Proteomic Data

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has provided some of the most in-depth analyses of the phenotypes of human tumors ever constructed. Today, the majority of proteomic data analysis is still performed using software housed on desktop computers which limits the number of sequence variants and post-translational modifications that can be considered. The original CPTAC studies limited the search for PTMs to only samples that were chemically enriched for those modified peptides. Similarly, the only sequence variants considered were those with strong evidence at the exon or transcript level. In this multi-institutional collaborative reanalysis, we utilized unbiased protein databases containing millions of human sequence variants in conjunction with hundreds of common post-translational modifications. Using these tools, we identified tens of thousands of high-confidence PTMs and sequence variants. We identified 4132 phosphorylated peptides in nonenriched samples, 93% of which were confirmed in the samples which were chemically enriched for phosphopeptides. In addition, our results also cover 90% of the high-confidence variants reported by the original proteogenomics study, without the need for sample specific next-generation sequencing. Finally, we report fivefold more somatic and germline variants that have an independent evidence at the peptide level, including mutations in ERRB2 and BCAS1. In this reanalysis of CPTAC proteomic data with cloud computing, we present an openly available and searchable web resource of the highest-coverage proteomic profiling of human tumors described to date.

Download Full-text

Giving more insight for automatic risk prediction during pregnancy with interpretable machine learning

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i3.2344 ◽

2021 ◽

Vol 10 (3) ◽

Author(s):

Muhammad Irfan ◽

Setio Basuki ◽

Yufis Azhar

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Classification Model ◽

K Nearest Neighbor ◽

Pregnancy Risk ◽

The Public ◽

Risk Monitoring ◽

Machine Learning Model ◽

Union Operation ◽

Correlation Based Feature Selection

Maternal mortality rate (MMR) in Indonesia intercensal population survey (SUPAS) was considered high. For pregnancy risk detection, the public health center (puskesmas) applies a Poedji Rochjati screening card (KSPR) demonstrating 20 features. In addition to KSPR, pregnancy risk monitoring has been assisted with a pregnancy control card. Because of the differences in the number of features between the two control cards, it is necessary to make agreements between them. Our objectives are determining the most influential features, exploring the links among features on the KSPR and pregnancy control cards, and building a machine learning model for predicting pregnancy risk. For the first objective, we use correlation-based feature selection (CFS) and C5.0 algorithm. The next objective was answered by the union operation in the features produced by the two techniques. By performing the machine learning experiment on these features, the accuracy of the XGBoost algorithm demonstrated the hightest results of 94% followed by random forest, Naïve Bayes, and k-Nearest neighbor algorithms, 87%, 66%, and 60% respectively. Interpretability aspects are implemented with SHAP and LIME to provide more insight for classification model. In conclusion, the similarity feature generated in the two interpretation approaches confirmed that Cesar was dominant in determining pregnancy risk.

Download Full-text

Understanding the limit of open search in the identification of peptides with post-translational modifications — A simulation-based study

10.1101/289710 ◽

2018 ◽

Author(s):

Jiaan Dai ◽

Fengchao Yu ◽

Ning Li ◽

Weichuan Yu

Keyword(s):

Analytical Model ◽

Mass Spectra ◽

Mass Spectrometry Data ◽

Supplementary Information ◽

Necessary Condition ◽

Tandem Mass ◽

Search Methods ◽

Post Translational Modifications ◽

Tandem Mass Spectra ◽

The Relationship

AbstractMotivationAnalyzing tandem mass spectrometry data to recognize peptides in a sample is the fundamental task in computational proteomics. Traditional peptide identification algorithms perform well when identifying unmodified peptides. However, when peptides have post-translational modifications (PTMs), these methods cannot provide satisfactory results. Recently, Chick et al., 2015 and Yu et al., 2016 proposed the spectrum-based and tag-based open search methods, respectively, to identify peptides with PTMs. While the performance of these two methods is promising, the identification results vary greatly with respect to the quality of tandem mass spectra and the number of PTMs in peptides. This motivates us to systematically study the relationship between the performance of open search methods and quality parameters of tandem mass spectrum data, as well as the number of PTMs in peptides.ResultsThrough large-scale simulations, we obtain the performance trend when simulated tandem mass spectra are of different quality. We propose an analytical model to describe the relationship between the probability of obtaining correct identifications and the spectrum quality as well as the number of PTMs. Based on the analytical model, we can quantitatively describe the necessary condition to effectively apply open search methods.AvailabilitySource codes of the simulation are available at http://bioinformatics.ust.hk/[email protected] or [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

ClonoMatch: a tool for identifying homologous immunoglobulin and T cell receptor sequences in large databases

Bioinformatics ◽

10.1093/bioinformatics/btaa1028 ◽

2020 ◽

Author(s):

Taylor Jones ◽

Samuel B Day ◽

Luke Myers ◽

James E Crowe ◽

Cinque Soto

Keyword(s):

T Cell ◽

T Cell Receptor ◽

Cell Receptor ◽

Supplementary Information ◽

Software Support ◽

The Public ◽

Large Databases ◽

Increasing Demand ◽

Data Collections ◽

Next Generation Sequencing Ngs

Abstract Summary B cell receptor (BCR) and T cell receptor (TCR) repertoires are generated through somatic DNA rearrangements and are responsible for the molecular basis of antigen recognition in the immune system. Next-generation sequencing (NGS) of DNA and the falling cost of sequencing due to continued development of these technologies have made sequencing assays an affordable way to characterize the repertoire of adaptive immune receptors (sometimes termed the ‘immunome’). Many new workflows have been developed to take advantage of NGS and have placed the resulting immunome datasets in the public domain. The scale of these NGS datasets has made it challenging to search through the Complementarity-determining region 3 (CDR3), which is responsible for imparting specific antibody-antigen interactions. Thus, there is an increasing demand for sequence analysis tools capable of searching through CDR3s from immunome data collections containing millions of sequences. To address this need, we created a software package called ClonoMatch that facilitates rapid searches in bulk immunome data for BCR or TCR sequences based on their CDR3 sequence or V3J clonotype. Availability and implementation Documentation, software support and the codebase are all available at https://github.com/crowelab/clonomatch. This software is distributed under the GPL v3 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Phosphomatics: interactive interrogation of substrate–kinase networks in global phosphoproteomics datasets

Bioinformatics ◽

10.1093/bioinformatics/btaa916 ◽

2020 ◽

Author(s):

Michael G Leeming ◽

Sean O’Callaghan ◽

Luana Licata ◽

Marta Iannuccelli ◽

Prisca Lo Surdo ◽

...

Keyword(s):

Mass Spectrometry ◽

Supplementary Information ◽

The Internet ◽

Phosphorylation Sites ◽

Supplementary Data ◽

Single Experiment ◽

Web Resource ◽

Phosphorylated Peptides

Abstract Motivation Mass spectrometry-based phosphoproteomics can routinely identify and quantify thousands of phosphorylated peptides from a single experiment. However interrogating possible upstream kinases and identifying key literature for phosphorylation sites is laborious and time-consuming. Results Here, we present Phosphomatics—a publicly available web resource for interrogating phosphoproteomics data. Phosphomatics allows researchers to upload phosphoproteomics data and interrogate possible relationships from a substrate-, kinase- or pathway-centric viewpoint. Availability and implementation Phosphomatics is freely available via the internet at: https://phosphomatics.com. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Educating Congress on the importance of investigator-initiated biomedical research: Role of individual investigators and professional societies

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100167743 ◽

1994 ◽

Vol 52 ◽

pp. 2-3

Author(s):

B. R. Brinkley

Keyword(s):

Public Policy ◽

Biomedical Research ◽

Cell Biology ◽

Research Funding ◽

Political Process ◽

Biomedical Science ◽

Blue Ribbon ◽

The Public ◽

Active Involvement ◽

American Society

Although American biomedical science relies heavily on the Federal Government for research funding, individual scientists have traditionally shunned politics and public policy. In years past, scientists were not encouraged to mingle with politicians, most of whom viewed scientists as fuzzballs and eggheads with whom they had little in common. Scientists generally believed that government and society valued their services and would always provide substantial support for research and training. Today, biomedical research funding requires a keen knowledge of the U. S. Congress and the political process. Indeed, our professional survival and that of our students and trainees requires active involvement in Washington politics. We can no longer defer the task of justifying our role in society to institutions or blue ribbon panels of elite science experts. Democratic decision-making at its best is process-oriented, time-consuming, and bottom-up, not top-down. Through its proactive policies involving networking, congressional testimony, education and targeted funding goals, the Public Policy Committee of the American Society for Cell Biology has provided a model strategy for member-oriented commitment to science and public policy.

Download Full-text