scholarly journals 4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network

Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 296
Author(s):  
Zeeshan Abbas ◽  
Hilal Tayara ◽  
Kil To Chong

Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme—one-hot encoding—we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics.

Author(s):  
Dan Zhang ◽  
Zhao-Chun Xu ◽  
Wei Su ◽  
Yu-He Yang ◽  
Hao Lv ◽  
...  

Abstract Motivation Protein carbonylation is one of the most important oxidative stress-induced post-translational modifications, which is generally characterized as stability, irreversibility and relative early formation. It plays a significant role in orchestrating various biological processes and has been already demonstrated to be related to many diseases. However, the experimental technologies for carbonylation sites identification are not only costly and time consuming, but also unable of processing a large number of proteins at a time. Thus, rapidly and effectively identifying carbonylation sites by computational methods will provide key clues for the analysis of occurrence and development of diseases. Results In this study, we developed a predictor called iCarPS to identify carbonylation sites based on sequence information. A novel feature encoding scheme called residues conical coordinates combined with their physicochemical properties was proposed to formulate carbonylated protein and non-carbonylated protein samples. To remove potential redundant features and improve the prediction performance, a feature selection technique was used. The accuracy and robustness of iCarPS were proved by experiments on training and independent datasets. Comparison with other published methods demonstrated that the proposed method is powerful and could provide powerful performance for carbonylation sites identification. Availability and implementation Based on the proposed model, a user-friendly webserver and a software package were constructed, which can be freely accessed at http://lin-group.cn/server/iCarPS. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 27 ◽  
Author(s):  
Zaheer Ullah Khan ◽  
Dechang Pi

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.


Animals ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 104
Author(s):  
Shulin Liang ◽  
Chaoqun Wu ◽  
Wenchao Peng ◽  
Jian-Xin Liu ◽  
Hui-Zeng Sun

The objective of this study was to evaluate the feasibility of using the dry matter intake of first 2 h after feeding (DMI-2h), body weight (BW), and milk yield to estimate daily DMI in mid and late lactating dairy cows with fed ration three times per day. Our dataset included 2840 individual observations from 76 cows enrolled in two studies, of which 2259 observations served as development dataset (DDS) from 54 cows and 581 observations acted as the validation dataset (VDS) from 22 cows. The descriptive statistics of these variables were 26.0 ± 2.77 kg/day (mean ± standard deviation) of DMI, 14.9 ± 3.68 kg/day of DMI-2h, 35.0 ± 5.48 kg/day of milk yield, and 636 ± 82.6 kg/day of BW in DDS and 23.2 ± 4.72 kg/day of DMI, 12.6 ± 4.08 kg/day of DMI-2h, 30.4 ± 5.85 kg/day of milk yield, and 597 ± 63.7 kg/day of BW in VDS, respectively. A multiple regression analysis was conducted using the REG procedure of SAS to develop the forecasting models for DMI. The proposed prediction equation was: DMI (kg/day) = 8.499 + 0.2725 × DMI-2h (kg/day) + 0.2132 × Milk yield (kg/day) + 0.0095 × BW (kg/day) (R2 = 0.46, mean bias = 0 kg/day, RMSPE = 1.26 kg/day). Moreover, when compared with the prediction equation for DMI in Nutrient Requirements of Dairy Cattle (2001) using the independent dataset (VDS), our proposed model shows higher R2 (0.22 vs. 0.07) and smaller mean bias (−0.10 vs. 1.52 kg/day) and RMSPE (1.77 vs. 2.34 kg/day). Overall, we constructed a feasible forecasting model with better precision and accuracy in predicting daily DMI of dairy cows in mid and late lactation when fed ration three times per day.


Author(s):  
Huimin Lu ◽  
Rui Yang ◽  
Zhenrong Deng ◽  
Yonglin Zhang ◽  
Guangwei Gao ◽  
...  

Chinese image description generation tasks usually have some challenges, such as single-feature extraction, lack of global information, and lack of detailed description of the image content. To address these limitations, we propose a fuzzy attention-based DenseNet-BiLSTM Chinese image captioning method in this article. In the proposed method, we first improve the densely connected network to extract features of the image at different scales and to enhance the model’s ability to capture the weak features. At the same time, a bidirectional LSTM is used as the decoder to enhance the use of context information. The introduction of an improved fuzzy attention mechanism effectively improves the problem of correspondence between image features and contextual information. We conduct experiments on the AI Challenger dataset to evaluate the performance of the model. The results show that compared with other models, our proposed model achieves higher scores in objective quantitative evaluation indicators, including BLEU , BLEU , METEOR, ROUGEl, and CIDEr. The generated description sentence can accurately express the image content.


Author(s):  
Shiqian He ◽  
Liang Kong ◽  
Jing Chen

Accurate detection of N6-methyladenine (6mA) sites by biochemical experiments will help to reveal their biological functions, still, these wet experiments are laborious and expensive. Therefore, it is necessary to introduce a powerful computational model to identify the 6mA sites on a genomic scale, especially for plant genomes. In view of this, we proposed a model called iDNA6mA-Rice-DL for the effective identification of 6mA sites in rice genome, which is an intelligent computing model based on deep learning method. Traditional machine learning methods assume the preparation of the features for analysis. However, our proposed model automatically encodes and extracts key DNA features through an embedded layer and several groups of dense layers. We use an independent dataset to evaluate the generalization ability of our model. An area under the receiver operating characteristic curve (auROC) of 0.98 with an accuracy of 95.96% was obtained. The experiment results demonstrate that our model had good performance in predicting 6mA sites in the rice genome. A user-friendly local web server has been established. The Docker image of the local web server can be freely downloaded at https://hub.docker.com/r/his1server/idna6ma-rice-dl .


2020 ◽  
Vol 15 (5) ◽  
pp. 396-407 ◽  
Author(s):  
Saba Amanat ◽  
Adeel Ashraf ◽  
Waqar Hussain ◽  
Nouman Rasool ◽  
Yaser D. Khan

Background: Carboxylation is one of the most biologically important post-translational modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent and biologically important type of carboxylation. For studying such biological functions, it is essential to correctly determine the lysine sites sensitive to carboxylation. Objective: Herein, we present a computational model for the prediction of the carboxylysine site which is based on machine learning. Methods: Various position and composition relative features have been incorporated into the Pse- AAC for construction of feature vectors and a neural network is employed as a classifier. The model is validated by jackknife, cross-validation, self-consistency, and independent testing. Results: The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp, 99.76% Sp, and 0.99 MCC..Using the jackknife method, prediction model validation gave 97.07% Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc. Conclusion: The results of independent dataset testing were 94.3% which illustrated that the proposed model has better performance as compared to the existing model PreLysCar; however, the accuracy can be improved further, in the future, due to the increasing number of carboxylysine sites in proteins.


Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 312 ◽  
Author(s):  
Asma Baccouche ◽  
Sadaf Ahmed ◽  
Daniel Sierra-Sosa ◽  
Adel Elmaghraby

Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset.


PLoS ONE ◽  
2012 ◽  
Vol 7 (6) ◽  
pp. e38772 ◽  
Author(s):  
Shao-Ping Shi ◽  
Jian-Ding Qiu ◽  
Xing-Yu Sun ◽  
Sheng-Bao Suo ◽  
Shu-Yun Huang ◽  
...  

2021 ◽  
Vol 11 (20) ◽  
pp. 9578
Author(s):  
Andrew Parker ◽  
Steven Fenton

Objective measurement of perceptually motivated music attributes has application in both target-driven mixing and mastering methodologies and music information retrieval. This work proposes a perceptual model of mix clarity which decomposes a mixed input signal into transient, steady-state, and residual components. Masking thresholds are calculated for each component and their relative relationship is used to determine an overall masking score as the model’s output. Three variants of the model were tested against subjective mix clarity scores gathered from a controlled listening test. The best performing variant achieved a Spearman’s rank correlation of rho = 0.8382 (p < 0.01). Furthermore, the model output was analysed using an independent dataset generated by progressively applying degradation effects to the test stimuli. Analysis of the model suggested a close relationship between the proposed model and the subjective mix clarity scores particularly when masking was measured using linearly spaced analysis bands. Moreover, the presence of noise-like residual signals was shown to have a negative effect on the perceived mix clarity.


Sign in / Sign up

Export Citation Format

Share Document