An Overview on Predicting Protein Subchloroplast Localization by using Machine Learning Methods

2020 ◽  
Vol 21 (12) ◽  
pp. 1229-1241 ◽  
Author(s):  
Meng-Lu Liu ◽  
Wei Su ◽  
Zheng-Xing Guan ◽  
Dan Zhang ◽  
Wei Chen ◽  
...  

: The chloroplast is a type of subcellular organelle of green plants and eukaryotic algae, which plays an important role in the photosynthesis process. Since the function of a protein correlates with its location, knowing its subchloroplast localization is helpful for elucidating its functions. However, due to a large number of chloroplast proteins, it is costly and time-consuming to design biological experiments to recognize subchloroplast localizations of these proteins. To address this problem, during the past ten years, twelve computational prediction methods have been developed to predict protein subchloroplast localization. This review summarizes the research progress in this area. We hope the review could provide important guide for further computational study on protein subchloroplast localization.

2019 ◽  
Vol 14 (3) ◽  
pp. 178-189 ◽  
Author(s):  
Xiaoyang Jing ◽  
Qimin Dong ◽  
Ruqian Lu ◽  
Qiwen Dong

Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2012 ◽  
Author(s):  
Hashem Koohy

In the era of explosion in biological data, machine learning techniques are becoming more popular in life sciences, including biology and medicine. This research note examines the rise and fall of the most commonly used machine learning techniques in life sciences over the past three decades.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Claus Boye Asmussen ◽  
Charles Møller

Abstract Manual exploratory literature reviews should be a thing of the past, as technology and development of machine learning methods have matured. The learning curve for using machine learning methods is rapidly declining, enabling new possibilities for all researchers. A framework is presented on how to use topic modelling on a large collection of papers for an exploratory literature review and how that can be used for a full literature review. The aim of the paper is to enable the use of topic modelling for researchers by presenting a step-by-step framework on a case and sharing a code template. The framework consists of three steps; pre-processing, topic modelling, and post-processing, where the topic model Latent Dirichlet Allocation is used. The framework enables huge amounts of papers to be reviewed in a transparent, reliable, faster, and reproducible way.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Octav Caldararu ◽  
Tom L. Blundell ◽  
Kasper P. Kepp

Abstract Background Prediction of the change in fold stability (ΔΔG) of a protein upon mutation is of major importance to protein engineering and screening of disease-causing variants. Many prediction methods can use 3D structural information to predict ΔΔG. While the performance of these methods has been extensively studied, a new problem has arisen due to the abundance of crystal structures: How precise are these methods in terms of structure input used, which structure should be used, and how much does it matter? Thus, there is a need to quantify the structural sensitivity of protein stability prediction methods. Results We computed the structural sensitivity of six widely-used prediction methods by use of saturated computational mutagenesis on a diverse set of 87 structures of 25 proteins. Our results show that structural sensitivity varies massively and surprisingly falls into two very distinct groups, with methods that take detailed account of the local environment showing a sensitivity of ~ 0.6 to 0.8 kcal/mol, whereas machine-learning methods display much lower sensitivity (~ 0.1 kcal/mol). We also observe that the precision correlates with the accuracy for mutation-type-balanced data sets but not generally reported accuracy of the methods, indicating the importance of mutation-type balance in both contexts. Conclusions The structural sensitivity of stability prediction methods varies greatly and is caused mainly by the models and less by the actual protein structural differences. As a new recommended standard, we therefore suggest that ΔΔG values are evaluated on three protein structures when available and the associated standard deviation reported, to emphasize not just the accuracy but also the precision of the method in a specific study. Our observation that machine-learning methods deemphasize structure may indicate that folded wild-type structures alone, without the folded mutant and unfolded structures, only add modest value for assessing protein stability effects, and that side-chain-sensitive methods overstate the significance of the folded wild-type structure.


F1000Research ◽  
2018 ◽  
Vol 6 ◽  
pp. 2012 ◽  
Author(s):  
Hashem Koohy

In the era of explosion in biological data, machine learning techniques are becoming more popular in life sciences, including biology and medicine. This research note examines the rise and fall of the most commonly used machine learning techniques in life sciences over the past three decades.


Author(s):  
M.V. Buinevich ◽  
K.E. Izrailov

Over the past years, the use of unsafe software, the search for vulnerabilities in which relies on static and dynamic analysis, continues to be the main threat to the infosphere. The manual form of conducting static analysis is extremely time-consuming and requires the involvement of highly qualified, and therefore deficient specialists. An alternative is the automation of the process based on artificial intelligence. This work is aimed at finding solutions for the use of machine learning methods at all stages of the static analysis of program code, for which the formal needs of the stages and the possibilities of the methods are studied and correlated. The main result of the study is a generalized domain model, and private — 14 solutions to the “key” problems of static analysis of program code using machine learning methods.


2020 ◽  
Vol 21 (10) ◽  
pp. 804-809
Author(s):  
Pengmian Feng ◽  
Lijing Feng

Antioxidants are molecules that can prevent damages to cells caused by free radicals. Recent studies also demonstrated that antioxidants play roles in preventing diseases. However, the number of known molecules with antioxidant activity is very small. Therefore, it is necessary to identify antioxidants from various resources. In the past several years, a series of computational methods have been proposed to identify antioxidants. In this review, we briefly summarized recent advances in computationally identifying antioxidants. The challenges and future perspectives for identifying antioxidants were also discussed. We hope this review will provide insights into researches on antioxidant identification.


Author(s):  
Andrei Dmitri Gavrilov ◽  
Alex Jordache ◽  
Maya Vasdani ◽  
Jack Deng

The current discourse in the machine learning domain converges to the agreement that machine learning methods emerged as some of the most prominent learning and classification approaches over the past decade. The CNN became one of most actively researched and broadly-applied deep machine learning methods. However, the training set has a large influence on the accuracy of a network and it is paramount to create an architecture that supports its maximum training and recognition performance. The problem considered in this article is how to prevent overfitting and underfitting. The deficiencies are addressed by comparing the statistics of CNN image recognition algorithms to the Ising model. Using a two-dimensional square-lattice array, the impact that the learning rate and regularization rate parameters have on the adaptability of CNNs for image classification are evaluated. The obtained results contribute to a better theoretical understanding of a CNN and provide concrete guidance on preventing model overfitting and underfitting when a CNN is applied for image recognition tasks.


2020 ◽  
Vol 29 (10) ◽  
pp. 108704
Author(s):  
Bin Huang ◽  
Yuanyang Du ◽  
Shuai Zhang ◽  
Wenfei Li ◽  
Jun Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document