scholarly journals SecProCT: In Silico Prediction of Human Secretory Proteins Based on Capsule Network and Transformer

2021 ◽  
Vol 22 (16) ◽  
pp. 9054
Author(s):  
Wei Du ◽  
Xuan Zhao ◽  
Yu Sun ◽  
Lei Zheng ◽  
Ying Li ◽  
...  

Identifying secretory proteins from blood, saliva or other body fluids has become an effective method of diagnosing diseases. Existing secretory protein prediction methods are mainly based on conventional machine learning algorithms and are highly dependent on the feature set from the protein. In this article, we propose a deep learning model based on the capsule network and transformer architecture, SecProCT, to predict secretory proteins using only amino acid sequences. The proposed model was validated using cross-validation and achieved 0.921 and 0.892 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively. Meanwhile, the proposed model was validated on an independent test set and achieved 0.917 and 0.905 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively, which are better than conventional machine learning methods and other deep learning methods for biological sequence analysis. The main contributions of this article are as follows: (1) a deep learning model based on a capsule network and transformer architecture is proposed for predicting secretory proteins. The results of this model are better than the those of existing conventional machine learning methods and deep learning methods for biological sequence analysis; (2) only amino acid sequences are used in the proposed model, which overcomes the high dependence of existing methods on the annotated protein features; (3) the proposed model can accurately predict most experimentally verified secretory proteins and cancer protein biomarkers in blood and saliva.

2020 ◽  
Vol 12 (12) ◽  
pp. 5074
Author(s):  
Jiyoung Woo ◽  
Jaeseok Yun

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively.


Electronics ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 39
Author(s):  
Zhiyuan Xie ◽  
Shichang Du ◽  
Jun Lv ◽  
Yafei Deng ◽  
Shiyao Jia

Remaining Useful Life (RUL) prediction is significant in indicating the health status of the sophisticated equipment, and it requires historical data because of its complexity. The number and complexity of such environmental parameters as vibration and temperature can cause non-linear states of data, making prediction tremendously difficult. Conventional machine learning models such as support vector machine (SVM), random forest, and back propagation neural network (BPNN), however, have limited capacity to predict accurately. In this paper, a two-phase deep-learning-model attention-convolutional forget-gate recurrent network (AM-ConvFGRNET) for RUL prediction is proposed. The first phase, forget-gate convolutional recurrent network (ConvFGRNET) is proposed based on a one-dimensional analog long short-term memory (LSTM), which removes all the gates except the forget gate and uses chrono-initialized biases. The second phase is the attention mechanism, which ensures the model to extract more specific features for generating an output, compensating the drawbacks of the FGRNET that it is a black box model and improving the interpretability. The performance and effectiveness of AM-ConvFGRNET for RUL prediction is validated by comparing it with other machine learning methods and deep learning methods on the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset and a dataset of ball screw experiment.


Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7078
Author(s):  
Yueting Wang ◽  
Minzan Li ◽  
Ronghua Ji ◽  
Minjuan Wang ◽  
Lihua Zheng

Visible-near-infrared spectrum (Vis-NIR) spectroscopy technology is one of the most important methods for non-destructive and rapid detection of soil total nitrogen (STN) content. In order to find a practical way to build STN content prediction model, three conventional machine learning methods and one deep learning approach are investigated and their predictive performances are compared and analyzed by using a public dataset called LUCAS Soil (19,019 samples). The three conventional machine learning methods include ordinary least square estimation (OLSE), random forest (RF), and extreme learning machine (ELM), while for the deep learning method, three different structures of convolutional neural network (CNN) incorporated Inception module are constructed and investigated. In order to clarify effectiveness of different pre-treatments on predicting STN content, the three conventional machine learning methods are combined with four pre-processing approaches (including baseline correction, smoothing, dimensional reduction, and feature selection) are investigated, compared, and analyzed. The results indicate that the baseline-corrected and smoothed ELM model reaches practical precision (coefficient of determination (R2) = 0.89, root mean square error of prediction (RMSEP) = 1.60 g/kg, and residual prediction deviation (RPD) = 2.34). While among three different structured CNN models, the one with more 1 × 1 convolutions preforms better (R2 = 0.93; RMSEP = 0.95 g/kg; and RPD = 3.85 in optimal case). In addition, in order to evaluate the influence of data set characteristics on the model, the LUCAS data set was divided into different data subsets according to dataset size, organic carbon (OC) content and countries, and the results show that the deep learning method is more effective and practical than conventional machine learning methods and, on the premise of enough data samples, it can be used to build a robust STN content prediction model with high accuracy for the same type of soil with similar agricultural treatment.


Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3085 ◽  
Author(s):  
Raluca Brehar ◽  
Delia-Alexandrina Mitrea ◽  
Flaviu Vancea ◽  
Tiberiu Marita ◽  
Sergiu Nedevschi ◽  
...  

The emergence of deep-learning methods in different computer vision tasks has proved to offer increased detection, recognition or segmentation accuracy when large annotated image datasets are available. In the case of medical image processing and computer-aided diagnosis within ultrasound images, where the amount of available annotated data is smaller, a natural question arises: are deep-learning methods better than conventional machine-learning methods? How do the conventional machine-learning methods behave in comparison with deep-learning methods on the same dataset? Based on the study of various deep-learning architectures, a lightweight multi-resolution Convolutional Neural Network (CNN) architecture is proposed. It is suitable for differentiating, within ultrasound images, between the Hepatocellular Carcinoma (HCC), respectively the cirrhotic parenchyma (PAR) on which HCC had evolved. The proposed deep-learning model is compared with other CNN architectures that have been adapted by transfer learning for the ultrasound binary classification task, but also with conventional machine-learning (ML) solutions trained on textural features. The achieved results show that the deep-learning approach overcomes classical machine-learning solutions, by providing a higher classification performance.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mohanad Mohammed ◽  
Henry Mwambi ◽  
Innocent B. Mboya ◽  
Murtada K. Elbashir ◽  
Bernard Omolo

AbstractCancer tumor classification based on morphological characteristics alone has been shown to have serious limitations. Breast, lung, colorectal, thyroid, and ovarian are the most commonly diagnosed cancers among women. Precise classification of cancers into their types is considered a vital problem for cancer diagnosis and therapy. In this paper, we proposed a stacking ensemble deep learning model based on one-dimensional convolutional neural network (1D-CNN) to perform a multi-class classification on the five common cancers among women based on RNASeq data. The RNASeq gene expression data was downloaded from Pan-Cancer Atlas using GDCquery function of the TCGAbiolinks package in the R software. We used least absolute shrinkage and selection operator (LASSO) as feature selection method. We compared the results of the new proposed model with and without LASSO with the results of the single 1D-CNN and machine learning methods which include support vector machines with radial basis function, linear, and polynomial kernels; artificial neural networks; k-nearest neighbors; bagging trees. The results show that the proposed model with and without LASSO has a better performance compared to other classifiers. Also, the results show that the machine learning methods (SVM-R, SVM-L, SVM-P, ANN, KNN, and bagging trees) with under-sampling have better performance than with over-sampling techniques. This is supported by the statistical significance test of accuracy where the p-values for differences between the SVM-R and SVM-P, SVM-R and ANN, SVM-R and KNN are found to be p = 0.003, p =  < 0.001, and p =  < 0.001, respectively. Also, SVM-L had a significant difference compared to ANN p = 0.009. Moreover, SVM-P and ANN, SVM-P and KNN are found to be significantly different with p-values p =  < 0.001 and p =  < 0.001, respectively. In addition, ANN and bagging trees, ANN and KNN were found to be significantly different with p-values p =  < 0.001 and p = 0.004, respectively. Thus, the proposed model can help in the early detection and diagnosis of cancer in women, and hence aid in designing early treatment strategies to improve survival.


2021 ◽  
Vol 11 (17) ◽  
pp. 7940
Author(s):  
Mohammed Al-Sarem ◽  
Abdullah Alsaeedi ◽  
Faisal Saeed ◽  
Wadii Boulila ◽  
Omair AmeerBakhsh

Spreading rumors in social media is considered under cybercrimes that affect people, societies, and governments. For instance, some criminals create rumors and send them on the internet, then other people help them to spread it. Spreading rumors can be an example of cyber abuse, where rumors or lies about the victim are posted on the internet to send threatening messages or to share the victim’s personal information. During pandemics, a large amount of rumors spreads on social media very fast, which have dramatic effects on people’s health. Detecting these rumors manually by the authorities is very difficult in these open platforms. Therefore, several researchers conducted studies on utilizing intelligent methods for detecting such rumors. The detection methods can be classified mainly into machine learning-based and deep learning-based methods. The deep learning methods have comparative advantages against machine learning ones as they do not require preprocessing and feature engineering processes and their performance showed superior enhancements in many fields. Therefore, this paper aims to propose a Novel Hybrid Deep Learning Model for Detecting COVID-19-related Rumors on Social Media (LSTM–PCNN). The proposed model is based on a Long Short-Term Memory (LSTM) and Concatenated Parallel Convolutional Neural Networks (PCNN). The experiments were conducted on an ArCOV-19 dataset that included 3157 tweets; 1480 of them were rumors (46.87%) and 1677 tweets were non-rumors (53.12%). The findings of the proposed model showed a superior performance compared to other methods in terms of accuracy, recall, precision, and F-score.


Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2012
Author(s):  
Jiameng Gao ◽  
Chengzhong Liu ◽  
Junying Han ◽  
Qinglin Lu ◽  
Hengxing Wang ◽  
...  

Wheat is a very important food crop for mankind. Many new varieties are bred every year. The accurate judgment of wheat varieties can promote the development of the wheat industry and the protection of breeding property rights. Although gene analysis technology can be used to accurately determine wheat varieties, it is costly, time-consuming, and inconvenient. Traditional machine learning methods can significantly reduce the cost and time of wheat cultivars identification, but the accuracy is not high. In recent years, the relatively popular deep learning methods have further improved the accuracy on the basis of traditional machine learning, whereas it is quite difficult to continue to improve the identification accuracy after the convergence of the deep learning model. Based on the ResNet and SENet models, this paper draws on the idea of the bagging-based ensemble estimator algorithm, and proposes a deep learning model for wheat classification, CMPNet, which is coupled with the tillering period, flowering period, and seed image. This convolutional neural network (CNN) model has a symmetrical structure along the direction of the tensor flow. The model uses collected images of different types of wheat in multiple growth periods. First, it uses the transfer learning method of the ResNet-50, SE-ResNet, and SE-ResNeXt models, and then trains the collected images of 30 kinds of wheat in different growth periods. It then uses the concat layer to connect the output layers of the three models, and finally obtains the wheat classification results through the softmax function. The accuracy of wheat variety identification increased from 92.07% at the seed stage, 95.16% at the tillering stage, and 97.38% at the flowering stage to 99.51%. The model’s single inference time was only 0.0212 s. The model not only significantly improves the classification accuracy of wheat varieties, but also achieves low cost and high efficiency, which makes it a novel and important technology reference for wheat producers, managers, and law enforcement supervisors in the practice of wheat production.


2020 ◽  
Vol 31 (10) ◽  
pp. 1222-1235
Author(s):  
Abhishek Sheetal ◽  
Zhiyu Feng ◽  
Krishna Savani

How can we nudge people to not engage in unethical behaviors, such as hoarding and violating social-distancing guidelines, during the COVID-19 pandemic? Because past research on antecedents of unethical behavior has not provided a clear answer, we turned to machine learning to generate novel hypotheses. We trained a deep-learning model to predict whether or not World Values Survey respondents perceived unethical behaviors as justifiable, on the basis of their responses to 708 other items. The model identified optimism about the future of humanity as one of the top predictors of unethicality. A preregistered correlational study ( N = 218 U.S. residents) conceptually replicated this finding. A preregistered experiment ( N = 294 U.S. residents) provided causal support: Participants who read a scenario conveying optimism about the COVID-19 pandemic were less willing to justify hoarding and violating social-distancing guidelines than participants who read a scenario conveying pessimism. The findings suggest that optimism can help reduce unethicality, and they document the utility of machine-learning methods for generating novel hypotheses.


2021 ◽  
Vol 13 (7) ◽  
pp. 1360
Author(s):  
A-Xing Zhu ◽  
Fang-He Zhao ◽  
Hao-Bo Pan ◽  
Jun-Zhi Liu

Two main approaches are used in mapping rice paddy distribution from remote sensing images: phenological methods or machine learning methods. The phenological methods can map rice paddy distribution in a simple way but with limited accuracy. Machine learning, particularly deep learning, methods that learn the spectral signatures can achieve higher accuracy yet require a large number of field samples. This paper proposed a pheno-deep method to couple the simplicity of the phenological methods and the learning ability of the deep learning methods for mapping rice paddy at high accuracy without the need of field samples. The phenological method was first used to initially delineate the rice paddy for the purpose of creating training samples. These samples were then used to train the deep learning model. The trained deep learning model was applied to map the spatial distribution of rice paddy. The effectiveness of the pheno-deep method was evaluated in Jin’an District, Lu’an City, Anhui Province, China. Results show that the pheno-deep method achieved a high performance with the overall accuracy, the precision, the recall, and AUC (area under curve) being 88.8%, 87.2%, 91.1%, and 94.4%, respectively. The pheno-deep method achieved a much better performance than the phenological alone method and can overcome the noises in the training samples from the phenological method. The overall accuracy of the pheno-deep method is only 2.4% lower than that of the deep learning alone method trained with field samples and this difference is not statistically significant. In addition, the pheno-deep method requires no field sampling, which would be a noteworthy advantage for situations when large training samples are difficult to obtain. This study shows that by combining knowledge-based methods with data-driven methods, it is possible to achieve high mapping accuracy of geographic variables using remote sensing even with little field sampling efforts.


Sign in / Sign up

Export Citation Format

Share Document