scholarly journals Predicting Protein Thermostability Upon Mutation Using Molecular Dynamics Timeseries Data

2016 ◽  
Author(s):  
Noah Fleming ◽  
Benjamin Kinsella ◽  
Christopher Ing

AbstractA large number of human diseases result from disruptions to protein structure and function caused by missense mutations. Computational methods are frequently employed to assist in the prediction of protein stability upon mutation. These methods utilize a combination of protein sequence data, protein structure data, empirical energy functions, and physicochemical properties of amino acids. In this work, we present the first use of dynamic protein structural features in order to improve stability predictions upon mutation. This is achieved through the use of a set of timeseries extracted from microsecond timescale atomistic molecular dynamics simulations of proteins. Standard machine learning algorithms using mean, variance, and histograms of these timeseries were found to be 60-70% accurate in stability classification based on experimental ΔΔGor protein-chaperone interaction measurements. A recurrent neural network with full treatment of timeseries data was found to be 80% accurate according the F1 score. The performance of our models was found to be equal or better than two recently developed machine learning methods for binary classification as well as two industry-standard stability prediction algorithms. In addition to classification, understanding the molecular basis of protein stability disruption due to disease-causing mutations is a significant challenge that impedes the development of drugs and therapies that may be used treat genetic diseases. The use of dynamic structural features allows for novel insight into the molecular basis of protein disruption by mutation in a diverse set of soluble proteins. To assist in the interpretation of machine learning results, we present a technique for determining the importance of features to a recurrent neural network using Garson’s method. We propose a novel extension of neural interpretation diagrams by implementing Garson’s method to scale each node in the neural interpretation diagram according to its relative importance to the network.

Author(s):  
E. Yu. Shchetinin

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1576 ◽  
Author(s):  
Li Zhu ◽  
Lianghao Huang ◽  
Linyu Fan ◽  
Jinsong Huang ◽  
Faming Huang ◽  
...  

Landslide susceptibility prediction (LSP) modeling is an important and challenging problem. Landslide features are generally uncorrelated or nonlinearly correlated, resulting in limited LSP performance when leveraging conventional machine learning models. In this study, a deep-learning-based model using the long short-term memory (LSTM) recurrent neural network and conditional random field (CRF) in cascade-parallel form was proposed for making LSPs based on remote sensing (RS) images and a geographic information system (GIS). The RS images are the main data sources of landslide-related environmental factors, and a GIS is used to analyze, store, and display spatial big data. The cascade-parallel LSTM-CRF consists of frequency ratio values of environmental factors in the input layers, cascade-parallel LSTM for feature extraction in the hidden layers, and cascade-parallel full connection for classification and CRF for landslide/non-landslide state modeling in the output layers. The cascade-parallel form of LSTM can extract features from different layers and merge them into concrete features. The CRF is used to calculate the energy relationship between two grid points, and the extracted features are further smoothed and optimized. As a case study, the cascade-parallel LSTM-CRF was applied to Shicheng County of Jiangxi Province in China. A total of 2709 landslide grid cells were recorded and 2709 non-landslide grid cells were randomly selected from the study area. The results show that, compared with existing main traditional machine learning algorithms, such as multilayer perception, logistic regression, and decision tree, the proposed cascade-parallel LSTM-CRF had a higher landslide prediction rate (positive predictive rate: 72.44%, negative predictive rate: 80%, total predictive rate: 75.67%). In conclusion, the proposed cascade-parallel LSTM-CRF is a novel data-driven deep learning model that overcomes the limitations of traditional machine learning algorithms and achieves promising results for making LSPs.


Author(s):  
Kun Xie ◽  
Chong Qiao ◽  
Hong Shen ◽  
Riyi Yang ◽  
Ming Xu ◽  
...  

Abstract Zr-Rh metallic glass has enabled its many applications in vehicle parts, sports equipment and so on due to its outstanding performance in mechanical property, but the knowledge of the microstructure determining the superb mechanical property remains yet insufficient. Here, we develop a deep neural network potential of Zr-Rh system by using machine learning, which breaks the dilemma between the accuracy and efficiency in molecular dynamics simulations, and greatly improves the simulation scale in both space and time. The results show that the structural features obtained from the neural network method are in good agreement with the cases in ab initio molecular dynamics simulations. Furthermore, we build a large model of 5400 atoms to explore the influences of simulated size and cooling rate on the melt-quenching process of Zr77Rh23. Our study lays a foundation for exploring the complex structures in amorphous Zr77Rh23, which is of great significance for the design and practical application.


Large data clustering and classification is a very challenging task in data mining. Various machine learning and deep learning systems have been proposed by many researchers on a different dataset. Data volume, data size and structure of data may affect the time complexity of the system. This paper described a new document object classification approach using deep learning (DL) and proposed a recurrent neural network (RNN) for classification with a micro-clustering approach.TF-IDF and a density-based approach are used to store the best features. The plane work used supervised learning method and it extracts features set called as BK of the desired classes. once the training part completed then proceeds to figure out the particular test instances with the help of the planned classification algorithm. Recurrent Neural Network categorized the particular test object according to their weights. The system can able to work on heterogeneous data set and generate the micro-clusters according to classified results. The system also carried out experimental analysis with classical machine learning algorithms. The proposed algorithm shows higher accuracy than the existing density-based approach on different data sets.


The classification technique is most important for supervised and semi supervised base machine learning task. Many classification algorithms has introduced already for existing systems. Class-label classification is an important machine learning task wherein one assigns a subset of candidate without label to an object. Classification of various document models based on short text, metadata, heading levels these are the existing techniques which are introduced in literature survey. Sometime whole data reading and processing might be take a much time for classification, so it increase the time complexity for entire system. We proposed a new document classification method based on deep learning using NLP and machine learning approach. In this work system has several attractive properties: it captures some metadata from entire abstract section and built the training set first. Once complete all document process, it deals with optimization algorithm. Recurrent Neural Network has used to categories the individual object according to their weights. And it provides final class label for entire test dataset. Based on the various experimental analysis system provides data classification accuracy as well as minimum time complexity than classical machine learning algorithms.


Author(s):  
Peipei Jiang ◽  
Liailun Chen ◽  
Min-Feng Wang

Each language is a system of understanding and skills that allows language users to interact, express thoughts, hypotheses, feelings, wishes, and all that needs to be expressed. Linguistics is the research of these structures in all respects: the composition, usage, and sociology of language, in particular, are the core of linguistics. Machine Learning is the research area that allows machines to learn without being specifically scheduled. In linguistics, the design of writing is understood to be a foundation for many distinct company apps and probably the most useful if incorporated with machine learning methods. Research shows that besides text tagging and algorithm training, there are major problems in the field of Big Data. This article provides a collaborative effort (transfer learning integrated into Recurrent Neural Network) to analyze the distinct kinds of writing between the language's linear and non-computational sides, and to enhance granularity. The outcome demonstrates stronger incorporation of granularity into the language from both sides. Comparative results of machine learning algorithms are used to determine the best way to analyze and interpret the structure of the language.


In this era of globalization, it is quite likely to come across people or community who do not share the same language for communication as us. To acknowledge the problems caused by this, we have machine translation systems being developed. Developers of several reputed organizations like Google LLC, have been working to bring algorithms to support machine translations using machine learning algorithms like Artificial Neural Network (ANN) in order to facilitate machine translation. Several Neural Machine Translations have been developed in this regard, but Recurrent Neural Network (RNN), on the other hand, has not grown much in this field. In our work, we have tried to bring RNN in the field of machine translations, in order to acknowledge the benefits of RNN over ANN. The results show how RNN is able to perform machine translations with proper accuracy.


Abstract. Predictive models are important to help manage high-value assets and to ensure optimal and safe operations. Recently, advanced machine learning algorithms have been applied to solve practical and complex problems, and are of significant interest due to their ability to adaptively ‘learn’ in response to changing environments. This paper reports on the data preparation strategies and the development and predictive capability of a Long Short-Term Memory recurrent neural network model for anaerobic reactors employed at Melbourne Water’s Western Treatment Plant for sewage treatment that includes biogas harvesting. The results show rapid training and higher accuracy in predicting biogas production when historical data, which include significant outliers, are preprocessed with z-score standardisation in comparison to those with max-min normalisation. Furthermore, a trained model with a reduced number of input variables via the feature selection technique based on Pearson’s correlation coefficient is found to yield good performance given sufficient dataset training. It is shown that the overall best performance model comprises the reduced input variables and data processed with z-score standardisation. This initial study provides a useful guide for the implementation of machine learning techniques to develop smarter structures and management towards Industry 4.0 concepts.


2020 ◽  
pp. 1-12
Author(s):  
Cao Yanli

The research on the risk pricing of Internet finance online loans not only enriches the theory and methods of online loan pricing, but also helps to improve the level of online loan risk pricing. In order to improve the efficiency of Internet financial supervision, this article builds an Internet financial supervision system based on machine learning algorithms and improved neural network algorithms. Moreover, on the basis of factor analysis and discretization of loan data, this paper selects the relatively mature Logistic regression model to evaluate the credit risk of the borrower and considers the comprehensive management of credit risk and the matching with income. In addition, according to the relevant provisions of the New Basel Agreement on expected losses and economic capital, starting from the relevant factors, this article combines the credit risk assessment results to obtain relevant factors through regional research and conduct empirical analysis. The research results show that the model constructed in this paper has certain reliability.


Sign in / Sign up

Export Citation Format

Share Document