scholarly journals IILLS: predicting virus-receptor interactions based on similarity and semi-supervised learning

2019 ◽  
Vol 20 (S23) ◽  
Author(s):  
Cheng Yan ◽  
Guihua Duan ◽  
Fang-Xiang Wu ◽  
Jianxin Wang

Abstract Background Viral infectious diseases are the serious threat for human health. The receptor-binding is the first step for the viral infection of hosts. To more effectively treat human viral infectious diseases, the hidden virus-receptor interactions must be discovered. However, current computational methods for predicting virus-receptor interactions are limited. Result In this study, we propose a new computational method (IILLS) to predict virus-receptor interactions based on Initial Interaction scores method via the neighbors and the Laplacian regularized Least Square algorithm. IILLS integrates the known virus-receptor interactions and amino acid sequences of receptors. The similarity of viruses is calculated by the Gaussian Interaction Profile (GIP) kernel. On the other hand, we also compute the receptor GIP similarity and the receptor sequence similarity. Then the sequence similarity is used as the final similarity of receptors according to the prediction results. The 10-fold cross validation (10CV) and leave one out cross validation (LOOCV) are used to assess the prediction performance of our method. We also compare our method with other three competing methods (BRWH, LapRLS, CMF). Conlusion The experiment results show that IILLS achieves the AUC values of 0.8675 and 0.9061 with the 10-fold cross validation and leave-one-out cross validation (LOOCV), respectively, which illustrates that IILLS is superior to the competing methods. In addition, the case studies also further indicate that the IILLS method is effective for the virus-receptor interaction prediction.

2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Hong-Jhang Chen ◽  
Yii-Jeng Lin ◽  
Pei-Chen Wu ◽  
Wei-Hsiang Hsu ◽  
Wan-Chung Hu ◽  
...  

Traditional Chinese medicine (TCM) formulates treatment according to body constitution (BC) differentiation. Different constitutions have specific metabolic characteristics and different susceptibility to certain diseases. This study aimed to assess theYang-Xuconstitution using a body constitution questionnaire (BCQ) and clinical blood variables. A BCQ was employed to assess the clinical manifestation ofYang-Xu. The logistic regression model was conducted to explore the relationship between BC scores and biomarkers. Leave-one-out cross-validation (LOOCV) and K-fold cross-validation were performed to evaluate the accuracy of a predictive model in practice. Decision trees (DTs) were conducted to determine the possible relationships between blood biomarkers and BC scores. According to the BCQ analysis, 49% participants without any BC were classified as healthy subjects. Among them, 130 samples were selected for further analysis and divided into two groups. One group comprised healthy subjects without any BC (68%), while subjects of the other group, named as the sub-healthy group, had three BCs (32%). Six biomarkers, CRE, TSH, HB, MONO, RBC, and LH, were found to have the greatest impact on BCQ outcomes inYang-Xusubjects. This study indicated significant biochemical differences inYang-Xusubjects, which may provide a connection between blood variables and theYang-XuBC.


Author(s):  
WASIF AFZAL ◽  
RICHARD TORKAR ◽  
ROBERT FELDT

In the presence of a number of algorithms for classification and prediction in software engineering, there is a need to have a systematic way of assessing their performances. The performance assessment is typically done by some form of partitioning or resampling of the original data to alleviate biased estimation. For predictive and classification studies in software engineering, there is a lack of a definitive advice on the most appropriate resampling method to use. This is seen as one of the contributing factors for not being able to draw general conclusions on what modeling technique or set of predictor variables are the most appropriate. Furthermore, the use of a variety of resampling methods make it impossible to perform any formal meta-analysis of the primary study results. Therefore, it is desirable to examine the influence of various resampling methods and to quantify possible differences. Objective and method: This study empirically compares five common resampling methods (hold-out validation, repeated random sub-sampling, 10-fold cross-validation, leave-one-out cross-validation and non-parametric bootstrapping) using 8 publicly available data sets with genetic programming (GP) and multiple linear regression (MLR) as software quality classification approaches. Location of (PF, PD) pairs in the ROC (receiver operating characteristics) space and area under an ROC curve (AUC) are used as accuracy indicators. Results: The results show that in terms of the location of (PF, PD) pairs in the ROC space, bootstrapping results are in the preferred region for 3 of the 8 data sets for GP and for 4 of the 8 data sets for MLR. Based on the AUC measure, there are no significant differences between the different resampling methods using GP and MLR. Conclusion: There can be certain data set properties responsible for insignificant differences between the resampling methods based on AUC. These include imbalanced data sets, insignificant predictor variables and high-dimensional data sets. With the current selection of data sets and classification techniques, bootstrapping is a preferred method based on the location of (PF, PD) pair data in the ROC space. Hold-out validation is not a good choice for comparatively smaller data sets, where leave-one-out cross-validation (LOOCV) performs better. For comparatively larger data sets, 10-fold cross-validation performs better than LOOCV.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Feng Zhou ◽  
Meng-Meng Yin ◽  
Cui-Na Jiao ◽  
Zhen Cui ◽  
Jing-Xiu Zhao ◽  
...  

Abstract Background With the rapid development of various advanced biotechnologies, researchers in related fields have realized that microRNAs (miRNAs) play critical roles in many serious human diseases. However, experimental identification of new miRNA–disease associations (MDAs) is expensive and time-consuming. Practitioners have shown growing interest in methods for predicting potential MDAs. In recent years, an increasing number of computational methods for predicting novel MDAs have been developed, making a huge contribution to the research of human diseases and saving considerable time. In this paper, we proposed an efficient computational method, named bipartite graph-based collaborative matrix factorization (BGCMF), which is highly advantageous for predicting novel MDAs. Results By combining two improved recommendation methods, a new model for predicting MDAs is generated. Based on the idea that some new miRNAs and diseases do not have any associations, we adopt the bipartite graph based on the collaborative matrix factorization method to complete the prediction. The BGCMF achieves a desirable result, with AUC of up to 0.9514 ± (0.0007) in the five-fold cross-validation experiments. Conclusions Five-fold cross-validation is used to evaluate the capabilities of our method. Simulation experiments are implemented to predict new MDAs. More importantly, the AUC value of our method is higher than those of some state-of-the-art methods. Finally, many associations between new miRNAs and new diseases are successfully predicted by performing simulation experiments, indicating that BGCMF is a useful method to predict more potential miRNAs with roles in various diseases.


2020 ◽  
Author(s):  
Zekuan Yu ◽  
Xiaohu Li ◽  
Haitao Sun ◽  
Jian Wang ◽  
Tongtong Zhao ◽  
...  

Abstract Background: To implement the real-time diagnosis of the severity of patients infected with novel coronavirus 2019 (COVID-19) and guide the follow-up therapeutic treatment, We collected chest CT scans of 202 patients diagnosed with the COVID-19 from three hospitals in Anhui Province, China.Methods: A total of 729 2D axial plan slices with 246 severe cases and 483 non-severe cases were employed in this study. Four pre-trained deep models (Inception-V3, ResNet-50, ResNet-101, DenseNet-201) with multiple classifiers (linear discriminant, linear SVM, cubic SVM, KNN and Adaboost decision tree) were applied to identify the severe and non-severe COVID-19 cases. Three validation strategies (holdout validation, 10-fold cross-validation and leave-one-out) are employed to validate the feasibility of proposed pipelines. Results and conclusion: The experimental results demonstrate that classification of the features from pre-trained deep models show the promising application in COVID-19 screening whereas the DenseNet-201 with cubic SVM model achieved the best performance. Specifically, it achieved the highest severity classification accuracy of 95.20% and 95.34% for 10-fold cross-validation and leave-one-out, respectively. The established pipeline was able to achieve a rapid and accurate identification of the severity of COVID-19. This may assist the physicians to make more efficient and reliable decisions.


Author(s):  
Xing Chen ◽  
Tian-Hao Li ◽  
Yan Zhao ◽  
Chun-Chun Wang ◽  
Chi-Chi Zhu

Abstract MicroRNA (miRNA) plays an important role in the occurrence, development, diagnosis and treatment of diseases. More and more researchers begin to pay attention to the relationship between miRNA and disease. Compared with traditional biological experiments, computational method of integrating heterogeneous biological data to predict potential associations can effectively save time and cost. Considering the limitations of the previous computational models, we developed the model of deep-belief network for miRNA-disease association prediction (DBNMDA). We constructed feature vectors to pre-train restricted Boltzmann machines for all miRNA-disease pairs and applied positive samples and the same number of selected negative samples to fine-tune DBN to obtain the final predicted scores. Compared with the previous supervised models that only use pairs with known label for training, DBNMDA innovatively utilizes the information of all miRNA-disease pairs during the pre-training process. This step could reduce the impact of too few known associations on prediction accuracy to some extent. DBNMDA achieves the AUC of 0.9104 based on global leave-one-out cross validation (LOOCV), the AUC of 0.8232 based on local LOOCV and the average AUC of 0.9048 ± 0.0026 based on 5-fold cross validation. These AUCs are better than other previous models. In addition, three different types of case studies for three diseases were implemented to demonstrate the accuracy of DBNMDA. As a result, 84% (breast neoplasms), 100% (lung neoplasms) and 88% (esophageal neoplasms) of the top 50 predicted miRNAs were verified by recent literature. Therefore, we could conclude that DBNMDA is an effective method to predict potential miRNA-disease associations.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Da Xu ◽  
Hanxiao Xu ◽  
Yusen Zhang ◽  
Mingyi Wang ◽  
Wei Chen ◽  
...  

Abstract Background Microbes are closely related to human health and diseases. Identification of disease-related microbes is of great significance for revealing the pathological mechanism of human diseases and understanding the interaction mechanisms between microbes and humans, which is also useful for the prevention, diagnosis and treatment of human diseases. Considering the known disease-related microbes are still insufficient, it is necessary to develop effective computational methods and reduce the time and cost of biological experiments. Methods In this work, we developed a novel computational method called MDAKRLS to discover potential microbe-disease associations (MDAs) based on the Kronecker regularized least squares. Specifically, we introduced the Hamming interaction profile similarity to measure the similarities of microbes and diseases besides Gaussian interaction profile kernel similarity. In addition, we introduced the Kronecker product to construct two kinds of Kronecker similarities between microbe-disease pairs. Then, we designed the Kronecker regularized least squares with different Kronecker similarities to obtain prediction scores, respectively, and calculated the final prediction scores by integrating the contributions of different similarities. Results The AUCs value of global leave-one-out cross-validation and 5-fold cross-validation achieved by MDAKRLS were 0.9327 and 0.9023 ± 0.0015, which were significantly higher than five state-of-the-art methods used for comparison. Comparison results demonstrate that MDAKRLS has faster computing speed under two kinds of frameworks. In addition, case studies of inflammatory bowel disease (IBD) and asthma further showed 19 (IBD), 19 (asthma) of the top 20 prediction disease-related microbes could be verified by previously published biological or medical literature. Conclusions All the evaluation results adequately demonstrated that MDAKRLS has an effective and reliable prediction performance. It may be a useful tool to seek disease-related new microbes and help biomedical researchers to carry out follow-up studies.


2015 ◽  
Vol 80 (4) ◽  
pp. 499-508 ◽  
Author(s):  
Long Jiao ◽  
Xiaofei Wang ◽  
Shan Bing ◽  
Zhiwei Xue ◽  
Hua Li

The quantitative structure property relationship (QSPR) for supercooled liquid vapour pressures (PL) of PBDEs was investigated. Molecular distance-edge vector (MDEV) index was used as the structural descriptor. The quantitative relationship between the MDEV index and lgPL was modeled by using multivariate linear regression (MLR) and artificial neural network (ANN) respectively. Leave-one-out cross validation and k-fold cross validation were carried out to assess the prediction ability of the developed models. For the MLR method, the prediction root mean square relative error (RMSRE) of leave-one-out cross validation and k-fold cross validation is 9.95 and 9.05 respectively. For the ANN method, the prediction RMSRE of leave-one-out cross validation and k-fold cross validation is 8.75 and 8.31 respectively. It is demonstrated the established models are practicable for predicting the lgPL of PBDEs. The MDEV index is quantitatively related to the lgPL of PBDEs. MLR and L-ANN are practicable for modeling this relationship. Compared with MLR, ANN shows slightly higher prediction accuracy. Subsequently, an MLR model, which regression equation is lgPL = 0.2868 M11 - 0.8449 M12 - 0.0605, and an ANN model, which is a two inputs linear network, were developed. The two models can be used to predict the lgPL of each PBDE.


GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Shufang Wu ◽  
Zhencheng Fang ◽  
Jie Tan ◽  
Mo Li ◽  
Chunhui Wang ◽  
...  

Abstract Background Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage–derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage–derived fragment. Findings DeePhage uses a “one-hot” encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease. Conclusions DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.


2020 ◽  
Vol 17 (4) ◽  
pp. 287-301
Author(s):  
Chang Xu ◽  
Yijie Ding ◽  
Limin Jiang ◽  
Cong Shen ◽  
Gaoyan Zhang ◽  
...  

Background: The ligand-receptor interaction plays an important role in signal transduction required for cellular differentiation, proliferation, and immune response process. The analysis of ligand-receptor interactions is helpful to provide a deeper understanding of cellular proliferation/ differentiation and other cell processes. Methods: The computational technique would be used to promote ligand-receptor interactions research in future proteomics research. In this paper, we propose a novel computational method to predict ligand-receptor interactions from amino acid sequences by a machine learning approach. We extract features from ligand and receptor sequences by Histogram of Oriented Gradient (HOG) and Discrete Cosine Transform (DCT). Then, these features are fed into the Fuzzy C-Means (FCM) clustering algorithm for clustering, and also we get multiple training subsets to generate the same number of sub-classifiers. We choose an optimal sub-classifier for predicting ligand-receptor interactions according to the similarity from one sample to training subsets. Observations: In order to verify the performance, we perform five-fold cross-validation experiments on a ligand-receptor interactions dataset and achieve 80.08% accuracy, 82.98% sensitivity and 80.02% specificity. Then, we test our extracted feature method on two Protein-Protein Interactions (PPIs) datasets, and achieve accuracies of 93.79% and 87.46%, respectively. Conclusion: Our proposed method can be a useful tool for identifying of ligand-receptor interactions. Related data sets and source code are available at https://github.com/guofei-tju/ligand-receptorinteractions. git.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xin Liu ◽  
Liang Wang ◽  
Jian Li ◽  
Junfeng Hu ◽  
Xiao Zhang

Abstract Background Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs. Results In this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively. Conclusion Mal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec, together with the data sets used in this study.


Sign in / Sign up

Export Citation Format

Share Document