DeepEnzyPred: A Bi-Layered Deep Learning Framework for prediction of Bacteriophage Enzymes and their Sub-Hydrolases Enzymes via Novel Multi Level- Multi Thresholds Feature Selection technique

2020 ◽  
Author(s):  
Yu Wang ◽  
ZAHEER ULLAH KHAN ◽  
Shaukat Ali ◽  
Maqsood Hayat

Abstract BackgroundBacteriophage or phage is a type of virus that replicates itself inside bacteria. It consist of genetic material surrounded by a protein structure. Bacteriophage plays a vital role in the domain of phage therapy and genetic engineering. Phage and hydrolases enzyme proteins have a significant impact on the cure of pathogenic bacterial infections and disease treatment. Accurate identification of bacteriophage proteins is important in the host subcellular localization for further understanding of the interaction between phage, hydrolases, and in designing antibacterial drugs. Looking at the significance of Bacteriophage proteins, besides wet laboratory-based methods several computational models have been developed so far. However, the performance was not considerable due to inefficient feature schemes, redundancy, noise, and lack of an intelligent learning engine. Therefore we have developed an anovative bi-layered model name DeepEnzyPred. A Hybrid feature vector was obtained via a novel Multi-Level Multi-Threshold subset feature selection (MLMT-SFS) algorithm. A two-dimensional convolutional neural network was adopted as a baseline classifier.ResultsA conductive hybrid feature was obtained via a serial combination of CTD and KSAACGP features. The optimum feature was selected via a Novel Multi-Level Multi-Threshold Subset Feature selection algorithm. Over 5-fold jackknife cross-validation, an accuracy of 91.6 %, Sensitivity of 63.39%, Specificity 95.72%, MCC of 0.6049, and ROC value of 0.8772 over Layer-1 were recorded respectively. Similarly, the underline model obtained an Accuracy of 96.05%, Sensitivity of 96.22%, Specificity of 95.91%, MCC of 0.9219, and ROC value of 0.9899 over layer-2 respectivily.ConclusionThis paper presents a robust and effective classification model was developed for bacteriophage and their types. Primitive features were extracted via CTD and KSAACGP. A novel method (MLMT-SFS ) was devised for yielding optimum hybrid feature space out of primitive features. The result drew over hybrid feature space and 2D-CNN shown an excellent classification. Based on the recorded results, we believe that the developed predictor will be a valuable resource for large scale discrimination of unknown Phage and hydrolase enzymes in particular and new antibacterial drug design in pharmaceutical companies in general.

2021 ◽  
Author(s):  
A B Pawar ◽  
M A Jawale ◽  
Ravi Kumar Tirandasu ◽  
Saiprasad Potharaju

High dimensionality is the serious issue in the preprocessing of data mining. Having large number of features in the dataset leads to several complications for classifying an unknown instance. In a initial dataspace there may be redundant and irrelevant features present, which leads to high memory consumption, and confuse the learning model created with those properties of features. Always it is advisable to select the best features and generate the classification model for better accuracy. In this research, we proposed a novel feature selection approach and Symmetrical uncertainty and Correlation Coefficient (SU-CCE) for reducing the high dimensional feature space and increasing the classification accuracy. The experiment is performed on colon cancer microarray dataset which has 2000 features. The proposed method derived 38 best features from it. To measure the strength of proposed method, top 38 features extracted by 4 traditional filter-based methods are compared with various classifiers. After careful investigation of result, the proposed approach is competing with most of the traditional methods.


2014 ◽  
Author(s):  
Seyed-Mahdi Khaligh-Razavi ◽  
Linda Henriksson ◽  
Kendrick Kay ◽  
Nikolaus Kriegeskorte

AbstractStudies of the primate visual system have begun to test a wide range of complex computational object-vision models. Realistic models have many parameters, which in practice cannot be fitted using the limited amounts of brain-activity data typically available. Task performance optimization (e.g. using backpropagation to train neural networks) provides major constraints for fitting parameters and discovering nonlinear representational features appropriate for the task (e.g. object classification). Model representations can be compared to brain representations in terms of the representational dissimilarities they predict for an image set. This method, called representational similarity analysis (RSA), enables us to test the representational feature space as is (fixed RSA) or to fit a linear transformation that mixes the nonlinear model features so as to best explain a cortical area’s representational space (mixed RSA). Like voxel/population-receptive-field modelling, mixed RSA uses a training set (different stimuli) to fit one weight per model feature and response channel (voxels here), so as to best predict the response profile across images for each response channel. We analysed response patterns elicited by natural images, which were measured with functional magnetic resonance imaging (fMRI). We found that early visual areas were best accounted for by shallow models, such as a Gabor wavelet pyramid (GWP). The GWP model performed similarly with and without mixing, suggesting that the original features already approximated the representational space, obviating the need for mixing. However, a higher ventral-stream visual representation (lateral occipital region) was best explained by the higher layers of a deep convolutional network, and mixing of its feature set was essential for this model to explain the representation. We suspect that mixing was essential because the convolutional network had been trained to discriminate a set of 1000 categories, whose frequencies in the training set did not match their frequencies in natural experience or their behavioural importance. The latter factors might determine the representational prominence of semantic dimensions in higher-level ventral-stream areas. Our results demonstrate the benefits of testing both the specific representational hypothesis expressed by a model’s original feature space and the hypothesis space generated by linear transformations of that feature space.HighlightsWe tested computational models of representations in ventral-stream visual areas.We compared representational dissimilarities with/without linear remixing of model features.Early visual areas were best explained by shallow – and higher by deep – models.Unsupervised shallow models performed better without linear remixing of their features.A supervised deep convolutional net performed best with linear feature remixing.


2020 ◽  
Vol 27 ◽  
Author(s):  
Zaheer Ullah Khan ◽  
Dechang Pi

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.


Agriculture ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 371
Author(s):  
Yu Jin ◽  
Jiawei Guo ◽  
Huichun Ye ◽  
Jinling Zhao ◽  
Wenjiang Huang ◽  
...  

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.


Healthcare ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 884
Author(s):  
Antonio García-Domínguez ◽  
Carlos E. Galván-Tejada ◽  
Ramón F. Brena ◽  
Antonio A. Aguileta ◽  
Jorge I. Galván-Tejada ◽  
...  

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.


Computers ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 82
Author(s):  
Ahmad O. Aseeri

Deep Learning-based methods have emerged to be one of the most effective and practical solutions in a wide range of medical problems, including the diagnosis of cardiac arrhythmias. A critical step to a precocious diagnosis in many heart dysfunctions diseases starts with the accurate detection and classification of cardiac arrhythmias, which can be achieved via electrocardiograms (ECGs). Motivated by the desire to enhance conventional clinical methods in diagnosing cardiac arrhythmias, we introduce an uncertainty-aware deep learning-based predictive model design for accurate large-scale classification of cardiac arrhythmias successfully trained and evaluated using three benchmark medical datasets. In addition, considering that the quantification of uncertainty estimates is vital for clinical decision-making, our method incorporates a probabilistic approach to capture the model’s uncertainty using a Bayesian-based approximation method without introducing additional parameters or significant changes to the network’s architecture. Although many arrhythmias classification solutions with various ECG feature engineering techniques have been reported in the literature, the introduced AI-based probabilistic-enabled method in this paper outperforms the results of existing methods in outstanding multiclass classification results that manifest F1 scores of 98.62% and 96.73% with (MIT-BIH) dataset of 20 annotations, and 99.23% and 96.94% with (INCART) dataset of eight annotations, and 97.25% and 96.73% with (BIDMC) dataset of six annotations, for the deep ensemble and probabilistic mode, respectively. We demonstrate our method’s high-performing and statistical reliability results in numerical experiments on the language modeling using the gating mechanism of Recurrent Neural Networks.


Cancers ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 2111
Author(s):  
Bo-Wei Zhao ◽  
Zhu-Hong You ◽  
Lun Hu ◽  
Zhen-Hao Guo ◽  
Lei Wang ◽  
...  

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.


2021 ◽  
Vol 288 ◽  
pp. 125519
Author(s):  
Carole Brunet ◽  
Oumarou Savadogo ◽  
Pierre Baptiste ◽  
Michel A. Bouchard ◽  
Céline Cholez ◽  
...  

Author(s):  
Hillary E. Swann-Thomsen ◽  
Jared Vineyard ◽  
John Hanks ◽  
Rylon Hofacer ◽  
Claire Sitts ◽  
...  

PURPOSE: The goal of this study was to evaluate the performance of a pediatric stratification tool that incorporates health and non-medical determinants to identify children and youth with special health care needs (CYSHCN) patients according to increasing levels of complexity and compare this method to existing tools for pediatric populations. METHODS: This retrospective cohort study examined pediatric patients aged 0 to 21 years who received care at our institution between 2012 and 2015. We used the St. Luke’s Children’s Acuity Tool (SLCAT) to evaluate mean differences in dollars billed, number of encounters, and number of problems on the problem list and compared the SLCAT to the Pediatric Chronic Conditions Classification System version2 (CCCv2). RESULTS: Results indicate that the SLCAT assigned pediatric patients into levels reflective of resource utilization and found that children with highly complex chronic conditions had significantly higher utilization than those with mild and/or moderate complex conditions. The SLCAT found 515 patients not identified by the CCCv2. Nearly half of those patients had a mental/behavioral health diagnosis. CONCLUSIONS: The findings of this study provide evidence that a tiered classification model that incorporates all aspects of a child’s care may result in more accurate identification of CYSHCN. This would allow for primary care provider and care coordination teams to match patients and families with the appropriate amount and type of care coordination services.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 556
Author(s):  
Thaer Thaher ◽  
Mahmoud Saheb ◽  
Hamza Turabieh ◽  
Hamouda Chantar

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.


Sign in / Sign up

Export Citation Format

Share Document