tenfold cross validation
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 26)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Jianmin Xu ◽  
Binghua Xu ◽  
Yipeng Li ◽  
Zhijian Su ◽  
Yueping Yao

Aims: This study presents a survival stratification model based on multi-omics integration using bidirectional deep neural networks (BiDNNs) in gastric cancer. Methods: Based on the survival-related representation features yielded by BiDNNs through integrating transcriptomics and epigenomics data, K-means clustering analysis was performed to cluster tumor samples into different survival subgroups. The BiDNNs-based model was validated using tenfold cross-validation and in two independent confirmation cohorts. Results: Using the BiDNNs-based survival stratification model, patients were grouped into two survival subgroups with log-rank p-value = 9.05E-05. The subgroups classification was robustly validated in tenfold cross-validation (C-index = 0.65 ± 0.02) and in two confirmation cohorts (E-GEOD-26253, C-index = 0.609; E-GEOD-62254, C-index = 0.706). Conclusion: We propose and validate a robust and stable BiDNN-based survival stratification model in gastric cancer.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Abdulaziz Albahr ◽  
Marwan Albahar ◽  
Mohammed Thanoon ◽  
Muhammad Binsawad

Heart diseases are characterized as heterogeneous diseases comprising multiple subtypes. Early diagnosis and prognosis of heart disease are essential to facilitate the clinical management of patients. In this research, a new computational model for predicting early heart disease is proposed. The predictive model is embedded in a new regularization based on decaying the weights according to the weight matrices’ standard deviation and comparing the results against its parents (RSD-ANN). The performance of RSD-ANN is far better than that of the existing methods. Based on our experiments, the average validation accuracy computed was 96.30% using either the tenfold cross-validation or holdout method.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Xiaoting Liu ◽  
Chenhao Fang ◽  
Chao Wu ◽  
Jianxing Yu ◽  
Qi Zhao

Abstract Background Diagnosis-related groups (DRGs) are a payment system that could effectively solve the problem of excessive increases in healthcare costs which are applied as a principal measure in the healthcare reform in China. However, expert-oriented DRG grouping is a black box with the drawbacks of upcoding and high cost. Methods This study proposes a method of data-based grouping, designed and updated by machine learning algorithms, which could be trained by real cases, or even simulated cases. It inherits the decision-making rules from the expert-oriented grouping and improves performance by incorporating continuous updates at low cost. Five typical classification algorithms were assessed and some suggestions were made for algorithm choice. The kappa coefficients were reported to evaluate the performance of grouping. Results Based on tenfold cross-validation, experiments showed that data-based grouping had a similar classification performance to the expert-oriented grouping when choosing suitable algorithms. The groupings trained by simulated cases had less accuracy when they were tested by the real cases rather than simulated cases, but the kappa coefficients of the best model were still higher than 0.6. When the grouping was tested in a new DRGs system, the average kappa coefficients were significantly improved from 0.1534 to 0.6435 by the update; and with enough computation resources, the update process could be completed in a very short time. Conclusions As a new potential option, the data-based grouping meets the requirements of the DRGs system and has the advantages of high transparency and low cost in the design and update process.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fei He ◽  
Jingyi Li ◽  
Rui Wang ◽  
Xiaowei Zhao ◽  
Ye Han

Abstract Background Several computational tools for predicting protein Ubiquitylation and SUMOylation sites have been proposed to study their regulatory roles in gene location, gene expression, and genome replication. However, existing methods generally rely on feature engineering, and ignore the natural similarity between the two types of protein translational modification. This study is the first all-in-one deep network to predict protein Ubiquitylation and SUMOylation sites from protein sequences as well as their crosstalk sites simultaneously. Our deep learning architecture integrates several meta classifiers that apply deep neural networks to protein sequence information and physico-chemical properties, which were trained on multi-label classification mode for simultaneously identifying protein Ubiquitylation and SUMOylation as well as their crosstalk sites. Results The promising AUCs of our method on Ubiquitylation, SUMOylation and crosstalk sites achieved 0.838, 0.888, and 0.862 respectively on tenfold cross-validation. The corresponding APs reached 0.683, 0.804 and 0.552, which also validated our effectiveness. Conclusions The proposed architecture managed to classify ubiquitylated and SUMOylated lysine residues along with their crosstalk sites, and outperformed other well-known Ubiquitylation and SUMOylation site prediction tools.


Author(s):  
Mohammed Alghobiri ◽  
Hikmat Ullah Khan ◽  
Ahsan Mahmood

The human liver is one of the major organs in the body and liver disease can cause many problems in human live. Due to the increase in liver disease, various data mining techniques are proposed by the researchers to predict the liver disease. These techniques are improving day by day in order to predict and diagnose the liver disease in human. In this paper, real-world liver disease dataset is incorporated for diagnosing liver disease in human body. For this purpose, feature selection models are used to select a number of features that best are the most important feature to diagnose the liver disease. After selecting features and splitting data for training and testing, different classification algorithms in terms of naïve Bayes, supervised vector machine, decision tree, k near neighbor and logistic regression models to diagnose the liver disease in human body. The results are cross-validated by tenfold cross validation methods and achieve an accuracy as good as 93%.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Seref Gul ◽  
Fatih Rahim ◽  
Safak Isin ◽  
Fatma Yilmaz ◽  
Nuri Ozturk ◽  
...  

AbstractCircadian rhythm is an important mechanism that controls behavior and biochemical events based on 24 h rhythmicity. Ample evidence indicates disturbance of this mechanism is associated with different diseases such as cancer, mood disorders, and familial delayed phase sleep disorder. Therefore, drug discovery studies have been initiated using high throughput screening. Recently the crystal structures of core clock proteins (CLOCK/BMAL1, Cryptochromes (CRY), Periods), responsible for generating circadian rhythm, have been solved. Availability of structures makes amenable core clock proteins to design molecules regulating their activity by using in silico approaches. In addition to that, the implementation of classification features of molecules based on their toxicity and activity will improve the accuracy of the drug discovery process. Here, we identified 171 molecules that target functional domains of a core clock protein, CRY1, using structure-based drug design methods. We experimentally determined that 115 molecules were nontoxic, and 21 molecules significantly lengthened the period of circadian rhythm in U2OS cells. We then performed a machine learning study to classify these molecules for identifying features that make them toxic and lengthen the circadian period. Decision tree classifiers (DTC) identified 13 molecular descriptors, which predict the toxicity of molecules with a mean accuracy of 79.53% using tenfold cross-validation. Gradient boosting classifiers (XGBC) identified 10 molecular descriptors that predict and increase in the circadian period length with a mean accuracy of 86.56% with tenfold cross-validation. Our results suggested that these features can be used in QSAR studies to design novel nontoxic molecules that exhibit period lengthening activity.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Geng Hong ◽  
Xiaoyan Chen ◽  
Jianyong Chen ◽  
Miao Zhang ◽  
Yumeng Ren ◽  
...  

AbstractCoronavirus 2019 (COVID-19) is a new acute respiratory disease that has spread rapidly throughout the world. In this paper, a lightweight convolutional neural network (CNN) model named multi-scale gated multi-head attention depthwise separable CNN (MGMADS-CNN) is proposed, which is based on attention mechanism and depthwise separable convolution. A multi-scale gated multi-head attention mechanism is designed to extract effective feature information from the COVID-19 X-ray and CT images for classification. Moreover, the depthwise separable convolution layers are adopted as MGMADS-CNN’s backbone to reduce the model size and parameters. The LeNet-5, AlexNet, GoogLeNet, ResNet, VGGNet-16, and three MGMADS-CNN models are trained, validated and tested with tenfold cross-validation on X-ray and CT images. The results show that MGMADS-CNN with three attention layers (MGMADS-3) has achieved accuracy of 96.75% on X-ray images and 98.25% on CT images. The specificity and sensitivity are 98.06% and 96.6% on X-ray images, and 98.17% and 98.05% on CT images. The size of MGMADS-3 model is only 43.6 M bytes. In addition, the detection speed of MGMADS-3 on X-ray images and CT images are 6.09 ms and 4.23 ms for per image, respectively. It is proved that the MGMADS-3 can detect and classify COVID-19 faster with higher accuracy and efficiency.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1740
Author(s):  
Hui Wen Loh ◽  
Chui Ping Ooi ◽  
Elizabeth Palmer ◽  
Prabal Datta Barua ◽  
Sengul Dogan ◽  
...  

Parkinson’s disease (PD) is globally the most common neurodegenerative movement disorder. It is characterized by a loss of dopaminergic neurons in the substantia nigra of the brain. However, current methods to diagnose PD on the basis of clinical features of Parkinsonism may lead to misdiagnoses. Hence, noninvasive methods such as electroencephalographic (EEG) recordings of PD patients can be an alternative biomarker. In this study, a deep-learning model is proposed for automated PD diagnosis. EEG recordings of 16 healthy controls and 15 PD patients were used for analysis. Using Gabor transform, EEG recordings were converted into spectrograms, which were used to train the proposed two-dimensional convolutional neural network (2D-CNN) model. As a result, the proposed model achieved high classification accuracy of 99.46% (±0.73) for 3-class classification (healthy controls, and PD patients with and without medication) using tenfold cross-validation. This indicates the potential of proposed model to simultaneously automatically detect PD patients and their medication status. The proposed model is ready to be validated with a larger database before implementation as a computer-aided diagnostic (CAD) tool for clinical-decision support.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alvaro Ras-Carmona ◽  
Marta Gomez-Perosanz ◽  
Pedro A. Reche

Abstract Motivation In eukaryotes, proteins targeted for secretion contain a signal peptide, which allows them to proceed through the conventional ER/Golgi-dependent pathway. However, an important number of proteins lacking a signal peptide can be secreted through unconventional routes, including that mediated by exosomes. Currently, no method is available to predict protein secretion via exosomes. Results Here, we first assembled a dataset including the sequences of 2992 proteins secreted by exosomes and 2961 proteins that are not secreted by exosomes. Subsequently, we trained different random forests models on feature vectors derived from the sequences in this dataset. In tenfold cross-validation, the best model was trained on dipeptide composition, reaching an accuracy of 69.88% ± 2.08 and an area under the curve (AUC) of 0.76 ± 0.03. In an independent dataset, this model reached an accuracy of 75.73% and an AUC of 0.840. After these results, we developed ExoPred, a web-based tool that uses random forests to predict protein secretion by exosomes. Conclusion ExoPred is available for free public use at http://imath.med.ucm.es/exopred/. Datasets are available at http://imath.med.ucm.es/exopred/datasets/.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Farshid Shirafkan ◽  
Sajjad Gharaghani ◽  
Karim Rahimian ◽  
Reza Hasan Sajedi ◽  
Javad Zahiri

Abstract Background Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. Results In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. Conclusions MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting.


Sign in / Sign up

Export Citation Format

Share Document