scholarly journals SSMFN: a fused spatial and sequential deep learning model for methylation site prediction

2021 ◽  
Vol 7 ◽  
pp. e683
Author(s):  
Favorisen Rosyking Lumbanraja ◽  
Bharuno Mahesworo ◽  
Tjeng Wawan Cenggoro ◽  
Digdo Sudigyo ◽  
Bens Pardamean

Background Conventional in vivo methods for post-translational modification site prediction such as spectrophotometry, Western blotting, and chromatin immune precipitation can be very expensive and time-consuming. Neural networks (NN) are one of the computational approaches that can predict effectively the post-translational modification site. We developed a neural network model, namely the Sequential and Spatial Methylation Fusion Network (SSMFN), to predict possible methylation sites on protein sequences. Method We designed our model to be able to extract spatial and sequential information from amino acid sequences. Convolutional neural networks (CNN) is applied to harness spatial information, while long short-term memory (LSTM) is applied for sequential data. The latent representation of the CNN and LSTM branch are then fused. Afterwards, we compared the performance of our proposed model to the state-of-the-art methylation site prediction models on the balanced and imbalanced dataset. Results Our model appeared to be better in almost all measurement when trained on the balanced training dataset. On the imbalanced training dataset, all of the models gave better performance since they are trained on more data. In several metrics, our model also surpasses the PRMePred model, which requires a laborious effort for feature extraction and selection. Conclusion Our models achieved the best performance across different environments in almost all measurements. Also, our result suggests that the NN model trained on a balanced training dataset and tested on an imbalanced dataset will offer high specificity and low sensitivity. Thus, the NN model for methylation site prediction should be trained on an imbalanced dataset. Since in the actual application, there are far more negative samples than positive samples.

2021 ◽  
Vol 15 ◽  
Author(s):  
Alhassan Alkuhlani ◽  
Walaa Gad ◽  
Mohamed Roushdy ◽  
Abdel-Badeeh M. Salem

Background: Glycosylation is one of the most common post-translation modifications (PTMs) in organism cells. It plays important roles in several biological processes including cell-cell interaction, protein folding, antigen’s recognition, and immune response. In addition, glycosylation is associated with many human diseases such as cancer, diabetes and coronaviruses. The experimental techniques for identifying glycosylation sites are time-consuming, extensive laboratory work, and expensive. Therefore, computational intelligence techniques are becoming very important for glycosylation site prediction. Objective: This paper is a theoretical discussion of the technical aspects of the biotechnological (e.g., using artificial intelligence and machine learning) to digital bioinformatics research and intelligent biocomputing. The computational intelligent techniques have shown efficient results for predicting N-linked, O-linked and C-linked glycosylation sites. In the last two decades, many studies have been conducted for glycosylation site prediction using these techniques. In this paper, we analyze and compare a wide range of intelligent techniques of these studies from multiple aspects. The current challenges and difficulties facing the software developers and knowledge engineers for predicting glycosylation sites are also included. Method: The comparison between these different studies is introduced including many criteria such as databases, feature extraction and selection, machine learning classification methods, evaluation measures and the performance results. Results and conclusions: Many challenges and problems are presented. Consequently, more efforts are needed to get more accurate prediction models for the three basic types of glycosylation sites.


2021 ◽  
Vol 83 (6) ◽  
pp. 19-33
Author(s):  
Mohd Rizman Sultan Mohd ◽  
Fazlina Ahmat Ruslan ◽  
Juliana Johari

Solar radiation mapping has used geographical and meteorological data. To obtain geographical and meteorological data, a Geographic Information System (GIS) is required. GIS is defined as an integrated geographic resource that presents data in terms of spatial information. This data is important for Neural Networks as it will be used as input parameters for the development of solar radiation prediction models. Solar radiation prediction is one way to map the sun's rays in certain places where there are insufficient resources or space to build a complete solar radiation measurement station. Since predictions about solar radiation require meteorological and geographical data, this paper will give an overview of GIS-assisted geographical and meteorological data to be used as input parameters for solar radiation mapping which will eventually be used as input for prediction models developed for the whole country of Malaysia using Neural Networks. Based on the results, the prediction model developed managed to obtain a coefficient of determination, R2 value of 0.9329.


Author(s):  
László Róbert Kolozsvári ◽  
Tamás Bérczes ◽  
András Hajdu ◽  
Rudolf Gesztelyi ◽  
Attila Tiba ◽  
...  

AbstractObjectivesThe current form of severe acute respiratory syndrome called coronavirus disease 2019 (COVID-19) caused by a coronavirus (SARS-CoV-2) is a major global health problem. The aim of our study was to use the official epidemiological data and predict the possible outcomes of the COVID-19 pandemic using artificial intelligence (AI)-based RNNs (Recurrent Neural Networks), then compare and validate the predicted and observed data.Materials and MethodsWe used the publicly available datasets of World Health Organization and Johns Hopkins University to create the training dataset, then have used recurrent neural networks (RNNs) with gated recurring units (Long Short-Term Memory – LSTM units) to create 2 Prediction Models. Information collected in the first t time-steps were aggregated with a fully connected (dense) neural network layer and a consequent regression output layer to determine the next predicted value. We used root mean squared logarithmic errors (RMSLE) to compare the predicted and observed data, then recalculated the predictions again.ResultsThe result of our study underscores that the COVID-19 pandemic is probably a propagated source epidemic, therefore repeated peaks on the epidemic curve (rise of the daily number of the newly diagnosed infections) are to be anticipated. The errors between the predicted and validated data and trends seems to be low.ConclusionsThe influence of this pandemic is great worldwide, impact our everyday lifes. Especially decision makers must be aware, that even if strict public health measures are executed and sustained, future peaks of infections are possible. The AI-based predictions might be useful tools for predictions and the models can be recalculated according to the new observed data, to get more precise forecast of the pandemic.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lin Zhang ◽  
Gangshen Li ◽  
Xiuyu Li ◽  
Honglei Wang ◽  
Shutao Chen ◽  
...  

Abstract Background As a common and abundant RNA methylation modification, N6-methyladenosine (m6A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m6A methylation sites has become a hot topic. Most biological methods rely on high-throughput sequencing technology, which places great demands on the sequencing library preparation and data analysis. Thus, various machine learning methods have been proposed to extract various types of features based on sequences, then occupied conventional classifiers, such as SVM, RF, etc., for m6A methylation site identification. However, the identification performance relies heavily on the extracted features, which still need to be improved. Results This paper mainly studies feature extraction and classification of m6A methylation sites in a natural language processing way, which manages to organically integrate the feature extraction and classification simultaneously, with consideration of upstream and downstream information of m6A sites. One-hot, RNA word embedding, and Word2vec are adopted to depict sites from the perspectives of the base as well as its upstream and downstream sequence. The BiLSTM model, a well-known sequence model, was then constructed to discriminate the sequences with potential m6A sites. Since the above-mentioned three feature extraction methods focus on different perspectives of m6A sites, an ensemble deep learning predictor (EDLm6APred) was finally constructed for m6A site prediction. Experimental results on human and mouse data sets show that EDLm6APred outperforms the other single ones, indicating that base, upstream, and downstream information are all essential for m6A site detection. Compared with the existing m6A methylation site prediction models without genomic features, EDLm6APred obtains 86.6% of the area under receiver operating curve on the human data sets, indicating the effectiveness of sequential modeling on RNA. To maximize user convenience, a webserver was developed as an implementation of EDLm6APred and made publicly available at www.xjtlu.edu.cn/biologicalsciences/EDLm6APred. Conclusions Our proposed EDLm6APred method is a reliable predictor for m6A methylation sites.


2021 ◽  
Author(s):  
Hüseyin Cüce ◽  
Ozge Cagcag Yolcu ◽  
Fulya AYDIN TEMEL

Abstract In this study, it is aimed to evaluate COD removal performance of Classical-Fenton and Photo-Fenton Processes from cosmetic wastewater by different prediction models. Besides Response Surface Methodology (RSM), three neural networks were used to more reliably and effectively predict the behavior of dependent variable at different values of relevant parameters. These neural networks; multi-layer perceptron trained by Levenberg-Marquardt (MLP-LM); multi-layer perceptron and single multiplicative neuron model trained by particle swarm optimization algorithm (MLP-PSO; SMN-PSO). H2O2 doses, Fe(II) doses, and H2O2/Fe(II) rates were independent variables of prediction models to optimize both processes in batch reactors. The generated predictions for whole data set were compared with each other. The prediction performances of models were evaluated by RMSE and MAPE error criteria. Regression analysis was also applied to determine the performance of the best model. The results obtained from all prediction tools showed that the model produces the best predictive results in almost all cases is SMN-PSO model in terms of both criteria. In addition, the genetic algorithm was utilized for SMN-PSO model results to find the optimum values of the study. Thus, without the need to perform many different experiments, the optimum parameter values can be determined to get maximum removal ratios.


2012 ◽  
Vol 3 (2) ◽  
pp. 48-50
Author(s):  
Ana Isabel Velasco Fernández ◽  
◽  
Ricardo José Rejas Muslera ◽  
Juan Padilla Fernández-Vega ◽  
María Isabel Cepeda González

2021 ◽  
Vol 2 (3) ◽  
Author(s):  
Gustaf Halvardsson ◽  
Johanna Peterson ◽  
César Soto-Valero ◽  
Benoit Baudry

AbstractThe automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning to make computers able to interpret signs of the Swedish Sign Language (SSL) hand alphabet. Our model consists of the implementation of a pre-trained InceptionV3 network, and the usage of the mini-batch gradient descent optimization algorithm. We rely on transfer learning during the pre-training of the model and its data. The final accuracy of the model, based on 8 study subjects and 9400 images, is 85%. Our results indicate that the usage of CNNs is a promising approach to interpret sign languages, and transfer learning can be used to achieve high testing accuracy despite using a small training dataset. Furthermore, we describe the implementation details of our model to interpret signs as a user-friendly web application.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 443
Author(s):  
Chyan-long Jan

Because of the financial information asymmetry, the stakeholders usually do not know a company’s real financial condition until financial distress occurs. Financial distress not only influences a company’s operational sustainability and damages the rights and interests of its stakeholders, it may also harm the national economy and society; hence, it is very important to build high-accuracy financial distress prediction models. The purpose of this study is to build high-accuracy and effective financial distress prediction models by two representative deep learning algorithms: Deep neural networks (DNN) and convolutional neural networks (CNN). In addition, important variables are selected by the chi-squared automatic interaction detector (CHAID). In this study, the data of Taiwan’s listed and OTC sample companies are taken from the Taiwan Economic Journal (TEJ) database during the period from 2000 to 2019, including 86 companies in financial distress and 258 not in financial distress, for a total of 344 companies. According to the empirical results, with the important variables selected by CHAID and modeling by CNN, the CHAID-CNN model has the highest financial distress prediction accuracy rate of 94.23%, and the lowest type I error rate and type II error rate, which are 0.96% and 4.81%, respectively.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Niraj Thapa ◽  
Meenal Chaudhari ◽  
Anthony A. Iannetta ◽  
Clarence White ◽  
Kaushik Roy ◽  
...  

AbstractProtein phosphorylation, which is one of the most important post-translational modifications (PTMs), is involved in regulating myriad cellular processes. Herein, we present a novel deep learning based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. An ensemble model combining convolutional neural networks and long short-term memory (LSTM) achieves the best performance in predicting phosphorylation sites in C. reinhardtii. Deemed Chlamy-EnPhosSite, the measured best AUC and MCC are 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing higher than those measures for other predictors. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC–MS/MS) in a blinded study and approximately 89.69% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 76.83% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dipendra Jha ◽  
Vishu Gupta ◽  
Logan Ward ◽  
Zijiang Yang ◽  
Christopher Wolverton ◽  
...  

AbstractThe application of machine learning (ML) techniques in materials science has attracted significant attention in recent years, due to their impressive ability to efficiently extract data-driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector-based materials representation as input to build accurate property prediction models. We find that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.


Sign in / Sign up

Export Citation Format

Share Document