Machine learning methods in prediction of protein palmitoylation sites: A brief review

2020 ◽  
Vol 26 ◽  
Author(s):  
Yanwen Li ◽  
Feng Pu ◽  
Jingru Wang ◽  
Zhiguo Zhou ◽  
Chunhua Zhang ◽  
...  

: Protein palmitoylation is a fundamental and reversible post-translational lipid modification that involves a series of biological processes. Although a large number of experimental studies have explored the molecular mechanism behind the palmitoylation process, the computational methods has attracted much attention for its good performance in predicting palmitoylation sites compared with expensive and time-consuming biochemical experiments. The prediction of protein palmitoylation sites is helpful to reveal its biological mechanism. Therefore, the research on the application of machine learning methods to predict palmitoylation sites has become a hot topic in bioinformatics and promoted the development in related fields. In this review, we briefly introduced the recent development in predicting protein palmitoylation sites by using machine learning-based methods and discussed their benefits and drawbacks. The perspective of machine learning-based methods in predicting palmitoylation sites was also provided. We hope the review could provide a guide in related fields.

2020 ◽  
Vol 26 (26) ◽  
pp. 3049-3058
Author(s):  
Ting Liu ◽  
Hua Tang

The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.


2021 ◽  
Vol 21 ◽  
Author(s):  
Han Yu ◽  
Zi-Ang Shen ◽  
Yuan-Ke Zhou ◽  
Pu-Feng Du

: Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semi-supervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.


2021 ◽  
Vol 3 (1) ◽  
pp. 49-55
Author(s):  
R. O. Tkachenko ◽  
◽  
I. V. Izonіn ◽  
V. M. Danylyk ◽  
V. Yu. Mykhalevych ◽  
...  

Improving prediction accuracy by artificial intelligence tools is an important task in various industries, economics, medicine. Ensemble learning is one of the possible options to solve this task. In particular, the construction of stacking models based on different machine learning methods, or using different parts of the existing data set demonstrates high prediction accuracy of the. However, the need for proper selection of ensemble members, their optimal parameters, etc., necessitates large time costs for the construction of such models. This paper proposes a slightly different approach to building a simple but effective ensemble method. The authors developed a new model of stacking of nonlinear SGTM neural-like structures, which is based on the use of only one type of ANN as an element base of the ensemble and the use of the same training sample for all members of the ensemble. This approach provides a number of advantages over the procedures for building ensembles based on different machine learning methods, at least in the direction of selecting the optimal parameters for each of them. In our case, a tuple of random hyperparameters for each individual member of the ensemble was used as the basis of ensemble. That is, the training of each combined SGTM neural-like structure with an additional RBF layer, as a separate member of the ensemble occurs using different, randomly selected values of RBF centers and centersfof mass. This provides the necessary variety of ensemble elements. Experimental studies on the effectiveness of the developed ensemble were conducted using a real data set. The task is to predict the amount of health insurance costs based on a number of independent attributes. The optimal number of ensemble members is determined experimentally, which provides the highest prediction accuracy. The results of the work of the developed ensemble are compared with the existing methods of this class. The highest prediction accuracy of the developed ensemble at satisfactory duration of procedure of its training is established.


Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 21
Author(s):  
Yury Rodimkov ◽  
Evgeny Efimenko ◽  
Valentin Volokitin ◽  
Elena Panova ◽  
Alexey Polovinkin ◽  
...  

When entering the phase of big data processing and statistical inferences in experimental physics, the efficient use of machine learning methods may require optimal data preprocessing methods and, in particular, optimal balance between details and noise. In experimental studies of strong-field quantum electrodynamics with intense lasers, this balance concerns data binning for the observed distributions of particles and photons. Here we analyze the aspect of binning with respect to different machine learning methods (Support Vector Machine (SVM), Gradient Boosting Trees (GBT), Fully-Connected Neural Network (FCNN), Convolutional Neural Network (CNN)) using numerical simulations that mimic expected properties of upcoming experiments. We see that binning can crucially affect the performance of SVM and GBT, and, to a less extent, FCNN and CNN. This can be interpreted as the latter methods being able to effectively learn the optimal binning, discarding unnecessary information. Nevertheless, given limited training sets, the results indicate that the efficiency can be increased by optimizing the binning scale along with other hyperparameters. We present specific measurements of accuracy that can be useful for planning of experiments in the specified research area.


2020 ◽  
Vol 15 (7) ◽  
pp. 657-661
Author(s):  
Yingjuan Yang ◽  
Chunlong Fan ◽  
Qi Zhao

In the field of bioinformatics, the prediction of phage virion proteins helps us understand the interaction between phage and its host cells and promotes the development of new antibacterial drugs. However, traditional experimental methods to identify phage virion proteins are expensive and inefficient, more researchers are working to develop new computational methods. In this review, we summarized the machine learning methods for predicting phage virion proteins during recent years, and briefly described their advantages and limitations. Finally, some research directions related to phage virion proteins are listed.


2016 ◽  
Vol 26 (09n10) ◽  
pp. 1341-1360 ◽  
Author(s):  
Xinzhi Wang ◽  
Hui Zhang ◽  
Zheng Xu

Sentiment analysis from microblog platform has received an increasing interest from web mining community in recent years. Current sentiment analysis methods are mainly based on the hypothesis that each word expresses only one sentiment. However, human sentiment are prototyped and fuzzy-confined as declared in social psychology, which is conflicting with the hypothesis. This is one of the barriers that impede the computation of complex public sentiment of web events in microblog. Therefore, how to find a reasonable computational model, combining learning technology and human sentiment cognition theory, is a novel idea in event sentiment analysis of microblog. In this paper, a new sentiment computation approach, which is defined as public sentiments discriminator (PSD), considering both fuzzy logic and sentiment complexity, is proposed. Unlike traditional machine learning methods, PSD is based on the rational hypothesis that sentiments are correlated with each other. A three-level computing structure, sentiment-term level, microblog level and public sentiment level, is employed. Experiments show that the proposed approach, PSD, can achieve similar accuracy and [Formula: see text]1-measure but more cognitive results when compared with traditional well-known machine learning methods. These experimental studies have confirmed that PSD can generate an interpretable result with no restriction among sentiments.


2020 ◽  
Author(s):  
Sumant Shringari ◽  
Sam Giannakoulias ◽  
John J. Ferrie ◽  
E. James Petersson

Protein-protein interfaces play essential roles in a variety of biological processes and many therapeutic molecules are targeted at these interfaces. However, accurate predictions of the effects of interfacial mutations to identify “hotspots” have remained elusive despite the myriad of modeling and machine learning methods tested. Here, for the first time, we demonstrate that nonlinear reweighting of energy terms from Rosetta, through the use of machine learning, exhibits improved predictability of ΔΔG values associated with interfacial mutations.


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Hao Lin ◽  
Wei Chen

In cells, ion channels are one of the most important classes of membrane proteins which allow inorganic ions to move across the membrane. A wide range of biological processes are involved and regulated by the opening and closing of ion channels. Ion channels can be classified into numerous classes and different types of ion channels exhibit different functions. Thus, the correct identification of ion channels and their types using computational methods will provide in-depth insights into their function in various biological processes. In this review, we will briefly introduce and discuss the recent progress in ion channel prediction using machine learning methods.


2020 ◽  
Vol 21 (10) ◽  
pp. 804-809
Author(s):  
Pengmian Feng ◽  
Lijing Feng

Antioxidants are molecules that can prevent damages to cells caused by free radicals. Recent studies also demonstrated that antioxidants play roles in preventing diseases. However, the number of known molecules with antioxidant activity is very small. Therefore, it is necessary to identify antioxidants from various resources. In the past several years, a series of computational methods have been proposed to identify antioxidants. In this review, we briefly summarized recent advances in computationally identifying antioxidants. The challenges and future perspectives for identifying antioxidants were also discussed. We hope this review will provide insights into researches on antioxidant identification.


2019 ◽  
Vol 20 (3) ◽  
pp. 217-223 ◽  
Author(s):  
Huan-Huan Wei ◽  
Wuritu Yang ◽  
Hua Tang ◽  
Hao Lin

Background:Cell-penetrating Peptides (CPPs) are important short peptides that facilitate cellular intake or uptake of various molecules. CPPs can transport drug molecules through the plasma membrane and send these molecules to different cellular organelles. Thus, CPP identification and related mechanisms have been extensively explored. In order to reveal the penetration mechanisms of a large number of CPPs, it is necessary to develop convenient and fast methods for CPPs identification.Methods:Biochemical experiments can provide precise details for accurately identifying CPP, but these methods are expensive and laborious. To overcome these disadvantages, several computational methods have been developed to identify CPPs. We have performed review on the development of machine learning methods in CPP identification. This review provides an insight into CPP identification.Results:We summarized the machine learning-based CPP identification methods and compared the construction strategies of 11 different computational methods. Furthermore, we pointed out the limitations and difficulties in predicting CPPs.Conclusion:In this review, the last studies on CPP identification using machine learning method were reported. We also discussed the future development direction of CPP recognition with computational methods.


Sign in / Sign up

Export Citation Format

Share Document