DEEPSMP: A deep learning model for predicting the ectodomain shedding events of membrane proteins

Membrane proteins play essential roles in modern medicine. In recent studies, some membrane proteins involved in ectodomain shedding events have been reported as the potential drug targets and biomarkers of some serious diseases. However, there are few effective tools for identifying the shedding event of membrane proteins. So, it is necessary to design an effective tool for predicting shedding event of membrane proteins. In this study, we design an end-to-end prediction model using deep neural networks with long short-term memory (LSTM) units and attention mechanism, to predict the ectodomain shedding events of membrane proteins only by sequence information. Firstly, the evolutional profiles are encoded from original sequences of these proteins by Position-Specific Iterated BLAST (PSI-BLAST) on Uniref50 database. Then, the LSTM units which contain memory cells are used to hold information from past inputs to the network and the attention mechanism is applied to detect sorting signals in proteins regardless of their position in the sequence. Finally, a fully connected dense layer and a softmax layer are used to obtain the final prediction results. Additionally, we also try to reduce overfitting of the model by using dropout, L2 regularization, and bagging ensemble learning in the model training process. In order to ensure the fairness of performance comparison, firstly we use cross validation process on training dataset obtained from an existing paper. The average accuracy and area under a receiver operating characteristic curve (AUC) of five-fold cross-validation are 81.19% and 0.835 using our proposed model, compared to 75% and 0.78 by a previously published tool, respectively. To better validate the performance of the proposed model, we also evaluate the performance of the proposed model on independent test dataset. The accuracy, sensitivity, and specificity are 83.14%, 84.08%, and 81.63% using our proposed model, compared to 70.20%, 71.97%, and 67.35% by the existing model. The experimental results validate that the proposed model can be regarded as a general tool for predicting ectodomain shedding events of membrane proteins. The pipeline of the model and prediction results can be accessed at the following URL: http://www.csbg-jlu.info/DeepSMP/ .

Download Full-text

Multi-Layer Attention Approach for Aspect based Sentiment Analysis

10.5121/csit.2020.101410 ◽

2020 ◽

Author(s):

Xinzhi Ai ◽

Xiaoge Li ◽

Feixiong Hu ◽

Shuting Zhi ◽

Likun Hu

Keyword(s):

Sentiment Analysis ◽

Semantic Information ◽

Short Term Memory ◽

Attention Mechanism ◽

Training Dataset ◽

Emotion Classification ◽

Data Set ◽

New Model ◽

Fine Grained ◽

Proposed Model

Based on the aspect-level sentiment analysis is typical of fine-grained emotional classification that assigns sentiment polarity for each of the aspects in a review. For better handle the emotion classification task, this paper put forward a new model which apply Long Short-Term Memory network combine multiple attention with aspect context. Where multiple attention mechanism (i.e., location attention, content attention and class attention) refers to takes the factors of context location, content semantics and class balancing into consideration. Therefore, the proposed model can adaptively integrate location and semantic information between the aspect targets and their contexts into sentimental features, and overcome the model data variance introduced by the imbalanced training dataset. In addition, the aspect context is encoded on both sides of the aspect target, so as to enhance the ability of the model to capture semantic information. The Multi-Attention mechanism (MATT) and Aspect Context (AC) allow our model to perform better when facing reviews with more complicated structures. The result of this experiment indicate that the accuracy of the new model is up to 80.6% and 75.1% for two datasets in SemEval-2014 Task 4 respectively, While the accuracy of the data set on twitter 71.1%, and 81.6% for the Chinese automotive-domain dataset. Compared with some previous models for sentiment analysis, our model shows a higher accuracy.

Download Full-text

Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features

Computational and Mathematical Methods in Medicine ◽

10.1155/2022/7493834 ◽

2022 ◽

Vol 2022 ◽

pp. 1-7

Author(s):

Mujiexin Liu ◽

Hui Chen ◽

Dong Gao ◽

Cai-Yi Ma ◽

Zhao-Yue Zhang

Keyword(s):

Helicobacter Pylori ◽

Membrane Proteins ◽

Cross Validation ◽

Cost Effective ◽

Vital Role ◽

Support Vector ◽

Sequence Information ◽

Common Risk Factor ◽

Proposed Model ◽

H Pylori

Helicobacter pylori (H. pylori) is the most common risk factor for gastric cancer worldwide. The membrane proteins of the H. pylori are involved in bacterial adherence and play a vital role in the field of drug discovery. Thus, an accurate and cost-effective computational model is needed to predict the uncharacterized membrane proteins of H. pylori. In this study, a reliable benchmark dataset consisted of 114 membrane and 219 nonmembrane proteins was constructed based on UniProt. A support vector machine- (SVM-) based model was developed for discriminating H. pylori membrane proteins from nonmembrane proteins by using sequence information. Cross-validation showed that our method achieved good performance with an accuracy of 91.29%. It is anticipated that the proposed model will be useful for the annotation of H. pylori membrane proteins and the development of new anti-H. pylori agents.

Download Full-text

iRNAD: a computational tool for identifying D modification sites in RNA sequence

Bioinformatics ◽

10.1093/bioinformatics/btz358 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4922-4929 ◽

Cited By ~ 31

Author(s):

Zhao-Chun Xu ◽

Peng-Mian Feng ◽

Hui Yang ◽

Wang-Ren Qiu ◽

Wei Chen ◽

...

Keyword(s):

Computational Models ◽

Operating Characteristic ◽

Cross Validation ◽

Characteristic Curve ◽

Support Vector ◽

Final Model ◽

Rna Sequence ◽

Functional Roles ◽

Proposed Model ◽

User Friendly

Abstract Motivation Dihydrouridine (D) is a common RNA post-transcriptional modification found in eukaryotes, bacteria and a few archaea. The modification can promote the conformational flexibility of individual nucleotide bases. And its levels are increased in cancerous tissues. Therefore, it is necessary to detect D in RNA for further understanding its functional roles. Since wet-experimental techniques for the aim are time-consuming and laborious, it is urgent to develop computational models to identify D modification sites in RNA. Results We constructed a predictor, called iRNAD, for identifying D modification sites in RNA sequence. In this predictor, the RNA samples derived from five species were encoded by nucleotide chemical property and nucleotide density. Support vector machine was utilized to perform the classification. The final model could produce the overall accuracy of 96.18% with the area under the receiver operating characteristic curve of 0.9839 in jackknife cross-validation test. Furthermore, we performed a series of validations from several aspects and demonstrated the robustness and reliability of the proposed model. Availability and implementation A user-friendly web-server called iRNAD can be freely accessible at http://lin-group.cn/server/iRNAD, which will provide convenience and guide to users for further studying D modification.

Download Full-text

An LSTM-Based Method with Attention Mechanism for Travel Time Prediction

Sensors ◽

10.3390/s19040861 ◽

2019 ◽

Vol 19 (4) ◽

pp. 861 ◽

Cited By ~ 21

Author(s):

Xiangdong Ran ◽

Zhiguang Shan ◽

Yufei Fang ◽

Chuang Lin

Keyword(s):

Short Term Memory ◽

Attention Mechanism ◽

Traffic Prediction ◽

Travel Time Prediction ◽

Short Term ◽

Term Memory ◽

Proposed Model ◽

Departure Time ◽

Long Short Term Memory

Traffic prediction is based on modeling the complex non-linear spatiotemporal traffic dynamics in road network. In recent years, Long Short-Term Memory has been applied to traffic prediction, achieving better performance. The existing Long Short-Term Memory methods for traffic prediction have two drawbacks: they do not use the departure time through the links for traffic prediction, and the way of modeling long-term dependence in time series is not direct in terms of traffic prediction. Attention mechanism is implemented by constructing a neural network according to its task and has recently demonstrated success in a wide range of tasks. In this paper, we propose an Long Short-Term Memory-based method with attention mechanism for travel time prediction. We present the proposed model in a tree structure. The proposed model substitutes a tree structure with attention mechanism for the unfold way of standard Long Short-Term Memory to construct the depth of Long Short-Term Memory and modeling long-term dependence. The attention mechanism is over the output layer of each Long Short-Term Memory unit. The departure time is used as the aspect of the attention mechanism and the attention mechanism integrates departure time into the proposed model. We use AdaGrad method for training the proposed model. Based on the datasets provided by Highways England, the experimental results show that the proposed model can achieve better accuracy than the Long Short-Term Memory and other baseline methods. The case study suggests that the departure time is effectively employed by using attention mechanism.

Download Full-text

Radiomics Analysis Based on Automatic Image Segmentation of DCE-MRI for Predicting Triple-Negative and Nontriple-Negative Breast Cancer

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/2140465 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Mingming Ma ◽

Liangyu Gan ◽

Yuan Jiang ◽

Naishan Qin ◽

Changxin Li ◽

...

Keyword(s):

Breast Cancer ◽

Image Segmentation ◽

Cross Validation ◽

Triple Negative ◽

Characteristic Curve ◽

Three Dimensional ◽

Training Dataset ◽

Breast Cancer Patients ◽

Dce Mri ◽

Model Based

Purpose. To investigate whether quantitative radiomics features extracted from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) could be used to differentiate triple-negative breast cancer (TNBC) and nontriple-negative breast cancer (non-TNBC). Materials and Methods. This retrospective study included DCE-MRI images of 81 breast cancer patients (44 TNBC and 37 non-TNBC) from August 2018 to October 2019. The MR scans were achieved at a 1.5 T MR scanner. For each patient, the largest tumor mass was selected to analyze. Three-dimensional (3D) images of the regions of interest (ROIs) were automatically segmented on the third DCE phase by a deep learning segmentation model; then, the ROIs were checked and revised by 2 radiologists. DCE-MRI radiomics features were extracted from the 3D tumor volume. The patients were randomly divided into training ( N = 57 ) and test ( N = 24 ) cohorts. The machine learning classifier was built in the training dataset, and 5-fold cross-validation was performed on the training cohort to train and validate. The data of the test cohort were used to investigate the predictive power of the radiomics model in predicting TNBC and non-TNBC. The performance of the model was evaluated by the area under receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity. Results. The radiomics model based on 15 features got the best performance. The AUC achieved 0.741 for the cross-validation, and 0.867 for the independent testing cohort. Conclusion. The radiomics model based on automatic image segmentation of DCE-MRI can be used to distinguish TNBC and non-TNBC.

Download Full-text

Source Code Assessment and Classification Based on Estimated Error Probability Using Attentive LSTM Language Model and Its Application in Programming Education

Applied Sciences ◽

10.3390/app10082973 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2973 ◽

Cited By ~ 2

Author(s):

Md. Mostafizer Rahman ◽

Yutaka Watanobe ◽

Keita Nakamura

Keyword(s):

Error Detection ◽

Error Probability ◽

Short Term Memory ◽

Language Model ◽

Source Code ◽

Attention Mechanism ◽

Error Assessment ◽

Programming Education ◽

Proposed Model ◽

Estimated Error

The rate of software development has increased dramatically. Conventional compilers cannot assess and detect all source code errors. Software may thus contain errors, negatively affecting end-users. It is also difficult to assess and detect source code logic errors using traditional compilers, resulting in software that contains errors. A method that utilizes artificial intelligence for assessing and detecting errors and classifying source code as correct (error-free) or incorrect is thus required. Here, we propose a sequential language model that uses an attention-mechanism-based long short-term memory (LSTM) neural network to assess and classify source code based on the estimated error probability. The attentive mechanism enhances the accuracy of the proposed language model for error assessment and classification. We trained the proposed model using correct source code and then evaluated its performance. The experimental results show that the proposed model has logic and syntax error detection accuracies of 92.2% and 94.8%, respectively, outperforming state-of-the-art models. We also applied the proposed model to the classification of source code with logic and syntax errors. The average precision, recall, and F-measure values for such classification are much better than those of benchmark models. To strengthen the proposed model, we combined the attention mechanism with LSTM to enhance the results of error assessment and detection as well as source code classification. Finally, our proposed model can be effective in programming education and software engineering by improving code writing, debugging, error-correction, and reasoning.

Download Full-text

Automatic Word Spacing of Korean Using Syllable and Morpheme

Applied Sciences ◽

10.3390/app11020626 ◽

2021 ◽

Vol 11 (2) ◽

pp. 626

Author(s):

Jeong-Myeong Choi ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Sequence Information ◽

Morphological Pattern ◽

Word Level ◽

Proposed Model ◽

Correction Problem ◽

Long Short Term Memory ◽

N Gram ◽

Pattern Information

In Korean, spacing is very important to understand the readability and context of sentences. In addition, in the case of natural language processing for Korean, if a sentence with an incorrect spacing is used, the structure of the sentence is changed, which affects performance. In the previous study, spacing errors were corrected using n-gram based statistical methods and morphological analyzers, and recently many studies using deep learning have been conducted. In this study, we try to solve the spacing error correction problem using both the syllable-level and morpheme-level. The proposed model uses a structure that combines the convolutional neural network layer that can learn syllable and morphological pattern information in sentences and the bidirectional long short-term memory layer that can learn forward and backward sequence information. When evaluating the performance of the proposed model, the accuracy was evaluated at the syllable-level, and also precision, recall, and f1 score were evaluated at the word-level. As a result of the experiment, it was confirmed that performance was improved from the previous study.

Download Full-text

Automatic Sleep Staging Algorithm Based on Time Attention Mechanism

Frontiers in Human Neuroscience ◽

10.3389/fnhum.2021.692054 ◽

2021 ◽

Vol 15 ◽

Author(s):

Li-Xiao Feng ◽

Xin Li ◽

Hong-Yu Wang ◽

Wen-Yin Zheng ◽

Yong-Qing Zhang ◽

...

Keyword(s):

Cross Validation ◽

Conditional Random Field ◽

Recognition Rate ◽

Performance Comparison ◽

Attention Mechanism ◽

Sleep Stages ◽

Sleep Staging ◽

Time Frequency ◽

Proposed Model ◽

Fold Cross Validation

The most important part of sleep quality assessment is the automatic classification of sleep stages. Sleep staging is helpful in the diagnosis of sleep-related diseases. This study proposes an automatic sleep staging algorithm based on the time attention mechanism. Time-frequency and non-linear features are extracted from the physiological signals of six channels and then normalized. The time attention mechanism combined with the two-way bi-directional gated recurrent unit (GRU) was used to reduce computing resources and time costs, and the conditional random field (CRF) was used to obtain information between tags. After five-fold cross-validation on the Sleep-EDF dataset, the values of accuracy, WF1, and Kappa were 0.9218, 0.9177, and 0.8751, respectively. After five-fold cross-validation on the our own dataset, the values of accuracy, WF1, and Kappa were 0.9006, 0.8991, and 0.8664, respectively, which is better than the result of the latest algorithm. In the study of sleep staging, the recognition rate of the N1 stage was low, and the imbalance has always been a problem. Therefore, this study introduces a type of balancing strategy. By adopting the proposed strategy, SEN-N1 and ACC of 0.7 and 0.86, respectively, can be achieved. The experimental results show that compared to the latest method, the proposed model can achieve significantly better performance and significantly improve the recognition rate of the N1 period. The performance comparison of different channels shows that even when the EEG channel was not used, considerable accuracy can be obtained.

Download Full-text

End-to-End Chinese Dialect Discrimination with Self-Attention

10.5121/csit.2021.111425 ◽

2021 ◽

Author(s):

Yangjie Dan ◽

Fan Xu ◽

Mingwen Wang

Keyword(s):

Large Scale ◽

Short Term Memory ◽

Attention Mechanism ◽

Practical Significance ◽

Sequence Information ◽

Discrimination Model ◽

Chinese Dialect ◽

End To End ◽

Low Performance ◽

Phonetic Features

Dialect discrimination has an important practical significance for protecting inheritance of dialects. The traditional dialect discrimination methods pay much attention to the underlying acoustic features, and ignore the meaning of the pronunciation itself, resulting in low performance. This paper systematically explores the validity of the pronunciation features of dialect speech composed of phoneme sequence information for dialect discrimination, and designs an end-to-end dialect discrimination model based on the multi-head self-attention mechanism. Specifically, we first adopt the residual convolution neural network and the multihead self-attention mechanism to effectively extract the phoneme sequence features unique to different dialects to compose the novel phonetic features. Then, we perform dialect discrimination based on the extracted phonetic features using the self-attention mechanism and bi-directional long short-term memory networks. The experimental results on the large-scale benchmark 10-way Chinese dialect corpus released by IFLYTEK 1 show that our model outperforms the state-of-the-art alternatives by large margin.

Download Full-text

A New Pair of Watchful Eyes for Students in Online Courses

Frontiers in Applied Mathematics and Statistics ◽

10.3389/fams.2021.620080 ◽

2021 ◽

Vol 7 ◽

Author(s):

Salman Hussain Raza ◽

Bibhya Nand Sharma ◽

Kaylash Chaudhary

Keyword(s):

Time Series ◽

Predictive Model ◽

Student Performance ◽

Network Architecture ◽

Performance Monitoring ◽

Short Term Memory ◽

Special Kind ◽

Training Dataset ◽

Academic Learning ◽

Proposed Model

While the recent technological advancements have enabled instructors to deliver mathematical concepts and theories beyond the physical boundaries innovatively and interactively, poor performance and low success rate in mathematic courses have always been a major concern of educators. More specifically, in an online learning environment, where students are not physically present in the classroom and access course materials over the network, it is toilsome for course coordinators to track and monitor every student’s academic learning and experiences. Thus, automated student performance monitoring is indispensable since it is easy for online students, especially those underperforming, to be “out of sight,” hence getting derailed and off-track. Since student learning and performance are evolving over time, it is reasonable to consider student performance monitoring as a time-series problem and implement a time-series predictive model to forecast students’ educational progress and achievement. This research paper presents a case study from a higher education institute where interaction data and course achievement of a previously offered online course are used to develop a time-series predictive model using a Long Short-Term Memory network, a special kind of Recurrent Neural Network architecture. The proposed model makes predictions of student status at any given time of the semester by examining the trend or pattern learned in the previous events. The model reported an average classification accuracy of 86 and 84% with the training dataset and testing dataset, respectively. The proposed model is trialed on selected online math courses with exciting yet dissimilar trends recorded.

Download Full-text