scholarly journals Prediction of MoRFs Based on n-gram Convolutional Neural Network

10.29007/5k4z ◽  
2019 ◽  
Author(s):  
Fang Chun ◽  
Yoshitaka Moriwaki ◽  
Caihong Li ◽  
Kentaro Shimizu

MoRFs usually play as "hub" site in interaction networks of intrinsically disordered proteins. With more and more serious diseases being found to be associated with disordered proteins, identifying MoRFs has become increasingly important. In this study, we introduce a multichannel convolutional neural network (CNN) model for MoRFs prediction. This model is generated by expanding the standard one-dimensional CNN model using multiple parallel CNNs that read the sequence with different n-gram sizes (groups of residues). In addition, we add an averaging step to refine the output result of machine learning model. When compared with other methods on the same dataset, our approach achieved a balanced accuracy of 0.682 and an AUC of 0.723, which is the best performance among the single model-based approaches.

2019 ◽  
Vol 17 (06) ◽  
pp. 1940015
Author(s):  
Chun Fang ◽  
Yoshitaka Moriwaki ◽  
Caihong Li ◽  
Kentaro Shimizu

Molecular recognition features (MoRFs) usually act as “hub” sites in the interaction networks of intrinsically disordered proteins (IDPs). Because an increasing number of serious diseases have been found to be associated with disordered proteins, identifying MoRFs has become increasingly important. In this study, we propose an ensemble learning strategy, named MoRFPred_en, to predict MoRFs from protein sequences. This approach combines four submodels that utilize different sequence-derived features for the prediction, including a multichannel one-dimensional convolutional neural network (CNN_1D multichannel) based model, two deep two-dimensional convolutional neural network (DCNN_2D) based models, and a support vector machine (SVM) based model. When compared with other methods on the same datasets, the MoRFPred_en approach produced better results than existing state-of-the-art MoRF prediction methods, achieving an AUC of 0.762 on the VALIDATION419 dataset, 0.795 on the TEST45 dataset, and 0.776 on the TEST49 dataset. Availability: http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/MoRFPred_en.php .


2019 ◽  
Vol 17 (01) ◽  
pp. 1950004 ◽  
Author(s):  
Chun Fang ◽  
Yoshitaka Moriwaki ◽  
Aikui Tian ◽  
Caihong Li ◽  
Kentaro Shimizu

Molecular recognition features (MoRFs) are key functional regions of intrinsically disordered proteins (IDPs), which play important roles in the molecular interaction network of cells and are implicated in many serious human diseases. Identifying MoRFs is essential for both functional studies of IDPs and drug design. This study adopts the cutting-edge machine learning method of artificial intelligence to develop a powerful model for improving MoRFs prediction. We proposed a method, named as en_DCNNMoRF (ensemble deep convolutional neural network-based MoRF predictor). It combines the outcomes of two independent deep convolutional neural network (DCNN) classifiers that take advantage of different features. The first, DCNNMoRF1, employs position-specific scoring matrix (PSSM) and 22 types of amino acid-related factors to describe protein sequences. The second, DCNNMoRF2, employs PSSM and 13 types of amino acid indexes to describe protein sequences. For both single classifiers, DCNN with a novel two-dimensional attention mechanism was adopted, and an average strategy was added to further process the output probabilities of each DCNN model. Finally, en_DCNNMoRF combined the two models by averaging their final scores. When compared with other well-known tools applied to the same datasets, the accuracy of the novel proposed method was comparable with that of state-of-the-art methods. The related web server can be accessed freely via http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/en_MoRFs.php .


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Mahdi Hashemi

AbstractThe input to a machine learning model is a one-dimensional feature vector. However, in recent learning models, such as convolutional and recurrent neural networks, two- and three-dimensional feature tensors can also be inputted to the model. During training, the machine adjusts its internal parameters to project each feature tensor close to its target. After training, the machine can be used to predict the target for previously unseen feature tensors. What this study focuses on is the requirement that feature tensors must be of the same size. In other words, the same number of features must be present for each sample. This creates a barrier in processing images and texts, as they usually have different sizes, and thus different numbers of features. In classifying an image using a convolutional neural network (CNN), the input is a three-dimensional tensor, where the value of each pixel in each channel is one feature. The three-dimensional feature tensor must be the same size for all images. However, images are not usually of the same size and so are not their corresponding feature tensors. Resizing images to the same size without deforming patterns contained therein is a major challenge. This study proposes zero-padding for resizing images to the same size and compares it with the conventional approach of scaling images up (zooming in) using interpolation. Our study showed that zero-padding had no effect on the classification accuracy but considerably reduced the training time. The reason is that neighboring zero input units (pixels) will not activate their corresponding convolutional unit in the next layer. Therefore, the synaptic weights on outgoing links from input units do not need to be updated if they contain a zero value. Theoretical justification along with experimental endorsements are provided in this paper.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Da-Wei Li ◽  
Alexandar L. Hansen ◽  
Chunhua Yuan ◽  
Lei Bruschweiler-Li ◽  
Rafael Brüschweiler

AbstractThe analysis of nuclear magnetic resonance (NMR) spectra for the comprehensive and unambiguous identification and characterization of peaks is a difficult, but critically important step in all NMR analyses of complex biological molecular systems. Here, we introduce DEEP Picker, a deep neural network (DNN)-based approach for peak picking and spectral deconvolution which semi-automates the analysis of two-dimensional NMR spectra. DEEP Picker includes 8 hidden convolutional layers and was trained on a large number of synthetic spectra of known composition with variable degrees of crowdedness. We show that our method is able to correctly identify overlapping peaks, including ones that are challenging for expert spectroscopists and existing computational methods alike. We demonstrate the utility of DEEP Picker on NMR spectra of folded and intrinsically disordered proteins as well as a complex metabolomics mixture, and show how it provides access to valuable NMR information. DEEP Picker should facilitate the semi-automation and standardization of protocols for better consistency and sharing of results within the scientific community.


2013 ◽  
Vol 42 (D1) ◽  
pp. D320-D325 ◽  
Author(s):  
Satoshi Fukuchi ◽  
Takayuki Amemiya ◽  
Shigetaka Sakamoto ◽  
Yukiko Nobe ◽  
Kazuo Hosoda ◽  
...  

2019 ◽  
Author(s):  
Ruchi Lohia ◽  
Reza Salari ◽  
Grace Brannigan

<div>The role of electrostatic interactions and mutations that change charge states in intrinsically disordered proteins (IDPs) is well-established, but many disease-associated mutations in IDPs are charge-neutral. The Val66Met single nucleotide polymorphism (SNP) encodes a hydrophobic-to-hydrophobic mutation at the midpoint of the prodomain of precursor brain-derived neurotrophic factor (BDNF), one of the earliest SNPs to be associated with neuropsychiatric disorders, for which the underlying molecular mechanism is unknown. Here we report on over 250 μs of fully-atomistic, explicit solvent, temperature replica exchange molecular dynamics simulations of the 91 residue BDNF prodomain, for both the V66 and M66 sequence.</div><div>The simulations were able to correctly reproduce the location of both local and non-local secondary changes due to the Val66Met mutation when compared with NMR spectroscopy. We find that the local structure change is mediated via entropic and sequence specific effects. We show that the highly disordered prodomain can be meaningfully divided into domains based on sequence alone. Monte Carlo simulations of a self-excluding heterogeneous polymer, with monomers representing each domain, suggest the sequence would be effectively segmented by the long, highly disordered polyampholyte near the sequence midpoint. This is qualitatively consistent with observed interdomain contacts within the BDNF prodomain, although contacts between the two segments are enriched relative to the self-excluding polymer. The Val66Met mutation increases interactions across the boundary between the two segments, due in part to a specific Met-Met interaction with a Methionine in the other segment. This effect propagates to cause the non-local change in secondary structure around the second methionine, previously observed in NMR. The effect is not mediated simply via changes in inter-domain contacts but is also dependent on secondary structure formation around residue 66, indicating a mechanism for secondary structure coupling in disordered proteins. </div>


Sign in / Sign up

Export Citation Format

Share Document