scholarly journals Machine Learning Techniques for Sequence-based Prediction of Viral-Host Interactions between SARS-CoV-2 and Human Proteins

2020 ◽  
Author(s):  
Lopamudra Dey ◽  
Sanjay Chakraborty ◽  
Anirban Mukhopadhyay

COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, has been declared as a pandemic by the World Health Organization on March 11, 2020. Over 4.3 million people from more than 200 countries have already been affected throughout the world by this deadly virus, resulting in almost 0.3 millions deaths. Protein-protein interactions (PPIs) play a key role in the cellular process of SARS-CoV-2 virus infection in the human body. Recently a study has reported some SARS-CoV-2 proteins that interact with a number of human proteins while many potential interactions still remain to be identified. However, human cells are composed of a large number of proteins. Therefore, it is not possible to experimentally check all possible combinations of interactions. This leads to development of various computational methods to predict the PPIs between the virus and human proteins and further validation of them using biological experiments. This paper presents a prediction model by combining the different sequence-based features of human proteins like the amino acid composition, pseudo amino acid composition, and the conjoint triad. We have built an ensemble voting classifier using $SVM^{Radial}$, $SVM^{Polynomial}$, and Random Forest technique which gives greater accuracy, precision, specificity, recall, and F1 score over all other models used in the work. We have predicted 1326 potential human target proteins using this weighted ensemble classifier. Furthermore, the Gene Ontology (GO) and KEGG pathway enrichments of these predicted human proteins are investigated. This study may encourage the identification of potential targets for more effective anti-COVID drug discovery.

Plants ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 721
Author(s):  
Nozipho P. Sibiya ◽  
Eugenie Kayitesi ◽  
Annah N. Moteetee

A literature survey revealed that several wild indigenous Southern African fruits had previously not been evaluated for their proximate and amino acid composition, as well as the total energy value (caloric value). Fourteen species including Carissa macrocarpa, Carpobrotus edulis, Dovyalis caffra, Halleria lucida, Manilkara mochisia, Pappea capensis, Phoenix reclinata, and Syzygium guineense were analyzed in this study. The nutritional values for several species such as C. edulis, H. lucida, P. reclinata, and M. mochisia are being reported here for the first time. The following fruits had the highest proximate values: C. macrocarpa (ash at 20.42 mg/100 g), S. guineense (fat at 7.75 mg/100 g), P. reclinata (fiber at 29.89 mg/100 g), and H. lucida (protein at 6.98 mg/100 g and carbohydrates at 36.98 mg/100 g). Essential amino acids such as histidine, isoleucine, lysine, methionine, phenylalanine, tryptophan, and valine were reported in all studied indigenous fruits. The high protein content in H. lucida was exhibited by the highest amino acid quantities for histidine. However, the fruits are a poor source of proteins since the content is lower than the recommended daily intake. The jacket-plum (Pappea capensis), on the other hand, meets and exceeds the required daily intake of lysine (0.0003 g/100 g or 13 mg/kg) recommended by the World Health Organization.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Lifu Zhang ◽  
Benzhi Dong ◽  
Zhixia Teng ◽  
Ying Zhang ◽  
Liran Juan

Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.


2018 ◽  
Author(s):  
Sandip S Panesar ◽  
Rhett N D’Souza ◽  
Fang-Cheng Yeh ◽  
Juan C Fernandez-Miranda

AbstractBackgroundMachine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization or prediction. ML techniques have been traditionally applied to large, highly-dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathological features. Recently the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly-dimensional database of glioma patients.MethodsWe applied three machine learning techniques: artificial neural networks (ANN), decision trees (DT), support vector machine (SVM), and classical logistic regression (LR) to a dataset consisting of 76 glioma patients of all grades. We compared the effect of applying the algorithms to the raw database, versus a database where only statistically significant features were included into the algorithmic inputs (feature selection).ResultsRaw input consisted of 21 variables, and achieved performance of (accuracy/AUC): 70.7%/0.70 for ANN, 68%/0.72 for SVM, 66.7%/0.64 for LR and 65%/0.70 for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 for ANN, 73.3%/0.74 for SVM, 69.3%/0.73 for LR and 65.2%/0.63 for DT.ConclusionsWe demonstrate that these techniques can also be applied to small, yet highly-dimensional datasets. Our ML techniques achieved reasonable performance compared to similar studies in the literature. Though local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis, though traditional statistical methods are of similar benefit.


2020 ◽  
Vol 10 (2) ◽  
pp. 551 ◽  
Author(s):  
Fayez AlFayez ◽  
Mohamed W. Abo El-Soud ◽  
Tarek Gaber

Breast cancer is considered one of the major threats for women’s health all over the world. The World Health Organization (WHO) has reported that 1 in every 12 women could be subject to a breast abnormality during her lifetime. To increase survival rates, it is found that it is very effective to early detect breast cancer. Mammography-based breast cancer screening is the leading technology to achieve this aim. However, it still can not deal with patients with dense breast nor with tumor size less than 2 mm. Thermography-based breast cancer approach can address these problems. In this paper, a thermogram-based breast cancer detection approach is proposed. This approach consists of four phases: (1) Image Pre-processing using homomorphic filtering, top-hat transform and adaptive histogram equalization, (2) ROI Segmentation using binary masking and K-mean clustering, (3) feature extraction using signature boundary, and (4) classification in which two classifiers, Extreme Learning Machine (ELM) and Multilayer Perceptron (MLP), were used and compared. The proposed approach is evaluated using the public dataset, DMR-IR. Various experiment scenarios (e.g., integration between geometrical feature extraction, and textural features extraction) were designed and evaluated using different measurements (i.e., accuracy, sensitivity, and specificity). The results showed that ELM-based results were better than MLP-based ones with more than 19%.


Author(s):  
Priyadarshini Soni ◽  
Lubhan Singh ◽  
Prabhat Singh ◽  
Sokindra Kumar

Today most common psychiatric problem across the world is depression and stress is main source of ailment. According to World health organization, it will be the main cause of morbidity by 2020 in the world. Depression can critically affects the quality of life  as it is characterized by many symptoms like unhappy feeling, lack of interest and pleasure, down energy, inadequacy, regret feeling, slow-down of thoughts or reduction in physical movement, speech can affects, altered appetite or sleep, sad,  and increase the risk of suicide. Human body is inadequate to produce tryptophan which is a crucial amino acid; therefore it must be required from diet. After absorption, L-tryptophan crosses the BBB (Blood brain barrier) by non-specific L-type amino acid transporter and act as precursor to various metabolic pathways in central nervous system (CNS). Kynurenine is an important pathway that is associated with tryptophan (TRP) metabolism, where it develops a lot of metabolites such as 3-hydroxykynurenine (3HK), anthranilic acid (AA), kynurenic acid (KYNA), 3-hydroxyanthranilic acid (3HAA) and quinolinic acid (QUIN) known as kynurenines. It is already reported previously that disturbance in neuroprotective and neurotoxic metabolites leads to many psychiatric disorders. This review summarizes the role of kynurenine pathway metabolites in depression.   


Amino Acids ◽  
2011 ◽  
Vol 42 (4) ◽  
pp. 1443-1454 ◽  
Author(s):  
Tariq Habib Afridi ◽  
Asifullah Khan ◽  
Yeon Soo Lee

Sign in / Sign up

Export Citation Format

Share Document