AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes

Author(s):  
Neelam Sharma ◽  
Sumeet Patiyal ◽  
Anjali Dhall ◽  
Akshara Pande ◽  
Chakit Arora ◽  
...  

Abstract AlgPred 2.0 is a web server developed for predicting allergenic proteins and allergenic regions in a protein. It is an updated version of AlgPred developed in 2006. The dataset used for training, testing and validation consists of 10 075 allergens and 10 075 non-allergens. In addition, 10 451 experimentally validated immunoglobulin E (IgE) epitopes were used to identify antigenic regions in a protein. All models were trained on 80% of data called training dataset, and the performance of models was evaluated using 5-fold cross-validation technique. The performance of the final model trained on the training dataset was evaluated on 20% of data called validation dataset; no two proteins in any two sets have more than 40% similarity. First, a Basic Local Alignment Search Tool (BLAST) search has been performed against the dataset, and allergens were predicted based on the level of similarity with known allergens. Second, IgE epitopes obtained from the IEDB database were searched in the dataset to predict allergens based on their presence in a protein. Third, motif-based approaches like multiple EM for motif elicitation/motif alignment and search tool have been used to predict allergens. Fourth, allergen prediction models have been developed using a wide range of machine learning techniques. Finally, the ensemble approach has been used for predicting allergenic protein by combining prediction scores of different approaches. Our best model achieved maximum performance in terms of area under receiver operating characteristic curve 0.98 with Matthew’s correlation coefficient 0.85 on the validation dataset. A web server AlgPred 2.0 has been developed that allows the prediction of allergens, mapping of IgE epitope, motif search and BLAST search (https://webs.iiitd.edu.in/raghava/algpred2/).

2021 ◽  
Author(s):  
Lubna Maryam ◽  
Anjali Dhall ◽  
Sumeet Patiyal ◽  
Salman Sadullah Usmani ◽  
Neelam Sharma ◽  
...  

Number of beta-lactamase variants have ability to deactivate ceftazidime antibiotic, which is the most commonly used antibiotic for treating infection by Gram-negative bacteria. In this study an attempt has been made to develop a method that can predict ceftazidime resistant strains of bacteria from amino acid sequence of beta-lactamases. We obtained beta-lactamases proteins from the β-lactamase database, corresponding to 87 ceftazidime-sensitive and 112 ceftazidime-resistant bacterial strains. All models developed in this study were trained, tested, and evaluated on a dataset of 199 beta-lactamases proteins. We generate 9149 features for beta-lactamases using Pfeature and select relevant features using different algorithms in scikit-learn package. A wide range of machine learning techniques (like KNN, DT, RF, GNB, LR, SVC, XGB) has been used to develop prediction models. Our random forest-based model achieved maximum performance with AUROC of 0.80 on training dataset and 0.79 on the validation dataset. The study also revealed that ceftazidime-resistant beta-lactamases have amino acids with non-polar side chains in abundance. In contrast, ceftazidime-sensitive beta-lactamases have amino acids with polar side chains and charged entities in abundance. Finally, we developed a webserver- ABCRpred, for the scientific community working in the era of antibiotic resistance to predict the antibiotic resistance/susceptibility of beta-lactamase protein sequences. The server is freely available at (http://webs.iiitd.edu.in/raghava/abcrpred/ ).


2021 ◽  
Author(s):  
Anjali Dhall ◽  
Sumeet Patiyal ◽  
Neelam Sharma ◽  
Naorem Leimarembi Devi ◽  
Gajendra P. S. Raghava

Abstract It has been shown in the past that levels of cytokines, including interleukin 6 (IL6), is highly correlated with the disease severity of COVID-19 patients. IL6 mediated activation of STAT3 is responsible to proliferate proinflammatory responses that leads to promotion of cytokine storm. Thus, STAT3 inhibitors may play a crucial role in managing pathogenesis of COVID-19. This paper describes a method developed for predicting inhibitors against the IL6-mediated STAT3 signaling pathway. The dataset used for training, testing, and evaluation of models contains small-molecule based 1564 STAT3 inhibitors and 1671 non-inhibitors. Analysis of data indicates that rings and aromatic groups are significantly abundant in STAT3 inhibitors compared to non-inhibitors. In order to build models, we generate a wide range of descriptors for each chemical compound. Firstly, we developed models using 2-D and 3-D descriptors and achieved maximum AUC 0.84 and 0.73, respectively. Secondly, fingerprints (FP) are used to build prediction models and achieved 0.86 AUC and accuracy of 78.70% on validation dataset. Finally, models were developed using hybrid features or descriptors, achieve a maximum of 0.87 AUC on the validation dataset. We used our best model to identify STAT3 inhibitors in FDA-approved drugs and found few drugs (e.g., Tamoxifen, and Perindopril) that can be used to manage COVID-19 associated cytokine storm. A webserver “STAT3In” (https://webs.iiitd.edu.in/raghava/stat3in/ ) has been developed to predict and design STAT3 inhibitors.


Author(s):  
Anjali Dhall ◽  
Sumeet Patiyal ◽  
Neelam Sharma ◽  
Salman Sadullah Usmani ◽  
Gajendra P S Raghava

Abstract Interleukin 6 (IL-6) is a pro-inflammatory cytokine that stimulates acute phase responses, hematopoiesis and specific immune reactions. Recently, it was found that the IL-6 plays a vital role in the progression of COVID-19, which is responsible for the high mortality rate. In order to facilitate the scientific community to fight against COVID-19, we have developed a method for predicting IL-6 inducing peptides/epitopes. The models were trained and tested on experimentally validated 365 IL-6 inducing and 2991 non-inducing peptides extracted from the immune epitope database. Initially, 9149 features of each peptide were computed using Pfeature, which were reduced to 186 features using the SVC-L1 technique. These features were ranked based on their classification ability, and the top 10 features were used for developing prediction models. A wide range of machine learning techniques has been deployed to develop models. Random Forest-based model achieves a maximum AUROC of 0.84 and 0.83 on training and independent validation dataset, respectively. We have also identified IL-6 inducing peptides in different proteins of SARS-CoV-2, using our best models to design vaccine against COVID-19. A web server named as IL-6Pred and a standalone package has been developed for predicting, designing and screening of IL-6 inducing peptides (https://webs.iiitd.edu.in/raghava/il6pred/).


2013 ◽  
Vol 110 (12) ◽  
pp. 2260-2270 ◽  
Author(s):  
Simiao Tian ◽  
Laurence Mioche ◽  
Jean-Baptiste Denis ◽  
Béatrice Morio

The aims of the present study were to propose a multivariate model for predicting simultaneously body, trunk and appendicular fat and lean masses from easily measured variables and to compare its predictive capacity with that of the available univariate models that predict body fat percentage (BF%). The dual-energy X-ray absorptiometry (DXA) dataset (52 % men and 48 % women) with White, Black and Hispanic ethnicities (1999–2004, National Health and Nutrition Examination Survey) was randomly divided into three sub-datasets: a training dataset (TRD), a test dataset (TED); a validation dataset (VAD), comprising 3835, 1917 and 1917 subjects. For each sex, several multivariate prediction models were fitted from the TRD using age, weight, height and possibly waist circumference. The most accurate model was selected from the TED and then applied to the VAD and a French DXA dataset (French DB) (526 men and 529 women) to assess the prediction accuracy in comparison with that of five published univariate models, for which adjusted formulas were re-estimated using the TRD. Waist circumference was found to improve the prediction accuracy, especially in men. For BF%, the standard error of prediction (SEP) values were 3·26 (3·75) % for men and 3·47 (3·95) % for women in the VAD (French DB), as good as those of the adjusted univariate models. Moreover, the SEP values for the prediction of body and appendicular lean masses ranged from 1·39 to 2·75 kg for both the sexes. The prediction accuracy was best for age < 65 years, BMI < 30 kg/m2and the Hispanic ethnicity. The application of our multivariate model to large populations could be useful to address various public health issues.


2018 ◽  
Vol 2018 ◽  
pp. 1-15 ◽  
Author(s):  
Tanuj Chopra ◽  
Manoranjan Parida ◽  
Naveen Kwatra ◽  
Palika Chopra

The objective of the present study is to develop models to predict the deterioration of pavement distress of the urban road network. Genetic programming (GP) has been used to develop five models for the prediction of pavement distress: Model 1 for the cracking progression, Model 2 for the ravelling progression, Model 3 for the pothole progression, Model 4 for the rutting progression, and Model 5 for the roughness progression. The data have been collected from the roads of Patiala City, Punjab, India; during the years 2012–2015, the network of 16 roads have been selected for the data collection purposes. The data have been divided into two sets, that is, training dataset (data collected during the years 2012 and 2013) and validation dataset (data collected during the years 2014 and 2015). The two fitness functions have been used for the evaluation of the models, that is, coefficient of determination (R2) and root mean square error (RMSE), and it is inferred that GP models predict with high accuracy for pavement distress and help the decision makers for adequate and timely fund allocations for preservation of the urban road network.


2013 ◽  
Vol 20 (2) ◽  
pp. 233-238 ◽  
Author(s):  
M. Ajmal Ali ◽  
Fahad M. Al-Hemaid ◽  
Ritesh K. Choudhary ◽  
Joongku Lee ◽  
Soo-Yong Kim ◽  
...  

The present study focuses on the status of Reseda pentagyna Abdallah & A.G. Miller (Resedaceae). The internal transcribed spacer (ITS) region of nuclear ribosomal DNA and chloroplast trnL-F gene of the questioned species were sequenced. The Basic Local Alignment Search Tool (BLAST) search showed maximum identity with R. stenostachya. The parsimony analysis of ITS, trnL-F and combined sequences data analyses revealed grouping of Reseda species consistent with established taxonomic sections of the genus, R. pentagyna showed proximity with R. stenostachya (100% bootstrap support), nested within the clade of section Reseda.DOI: http://dx.doi.org/10.3329/bjpt.v20i2.17397Bangladesh J. Plant Taxon. 20(2): 233-238, 2013


2021 ◽  
Author(s):  
Neelam Sharma ◽  
Sumeet Patiyal ◽  
Anjali Dhall ◽  
Leimarembi Devi Naorem ◽  
Gajendra P.S. Raghava

Allergy is the abrupt reaction of the immune system that may occur after the exposure with allergens like protein/peptide or chemical allergens. In past number of methods of have been developed for classifying the protein/peptide based allergen. To the best of our knowledge, there is no method to classify the allergenicity of chemical compound. Here, we have proposed a method named ChAlPred, which can be used to fill the gap for predicting the chemical compound that might cause allergy. In this study, we have obtained the dataset of 403 allergen and 1074 non-allergen chemical compounds and used 2D, 3D and FP descriptors to train, test and validate our prediction models. The fingerprint analysis of the dataset indicates that PubChemFP129 and GraphFP1014 are more frequent in the allergenic chemical compounds, whereas KRFP890 is highly present in non-allergenic chemical compounds. Our XGB based model achieved the AUC of 0.89 on validation dataset using 2D descriptors. RF based model has outperformed other classifiers using 3D descriptors (AUC = 0.85), FP descriptors (AUC = 0.92), combined descriptors (AUC = 0.93), and hybrid model (AUC = 0.92) on validation dataset. In addition, we have also reported some FDA-approved drugs like Cefuroxime, Spironolactone, and Tioconazole which can cause the allergic symptoms. A user user-friendly web server named ChAlPred has been developed to predict the chemical allergens. It can be easily accessed at https://webs.iiitd.edu.in/raghava/chalpred/.


2020 ◽  
Vol 11 ◽  
pp. 374
Author(s):  
Masahito Katsuki ◽  
Yukinari Kakizawa ◽  
Akihiro Nishikawa ◽  
Yasunaga Yamamoto ◽  
Toshiya Uchiyama

Background: Reliable prediction models of subarachnoid hemorrhage (SAH) outcomes are needed for decision-making of the treatment. SAFIRE score using only four variables is a good prediction scoring system. However, making such prediction models needs a large number of samples and time-consuming statistical analysis. Deep learning (DL), one of the artificial intelligence, is attractive, but there were no reports on prediction models for SAH outcomes using DL. We herein made a prediction model using DL software, Prediction One (Sony Network Communications Inc., Tokyo, Japan) and compared it to SAFIRE score. Methods: We used 153 consecutive aneurysmal SAH patients data in our hospital between 2012 and 2019. Modified Rankin Scale (mRS) 0–3 at 6 months was defined as a favorable outcome. We randomly divided them into 102 patients training dataset and 51 patients external validation dataset. Prediction one made the prediction model using the training dataset with internal cross-validation. We used both the created model and SAFIRE score to predict the outcomes using the external validation set. The areas under the curve (AUCs) were compared. Results: The model made by Prediction One using 28 variables had AUC of 0.848, and its AUC for the validation dataset was 0.953 (95%CI 0.900–1.000). AUCs calculated using SAFIRE score were 0.875 for the training dataset and 0.960 for the validation dataset, respectively. Conclusion: We easily and quickly made prediction models using Prediction One, even with a small single-center dataset. The accuracy of the model was not so inferior to those of previous statistically calculated prediction models.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
S. J. White ◽  
M. Moore-Colyer ◽  
E. Marti ◽  
D. Hannant ◽  
V. Gerber ◽  
...  

Abstract Severe equine asthma (sEA), which closely resembles human asthma, is a debilitating and performance-limiting allergic respiratory disorder which affects 14% of horses in the Northern Hemisphere and is associated with increased allergen-specific immunoglobulin E (IgE) against a range of environmental proteins. A comprehensive microarray platform was developed to enable the simultaneous detection of allergen-specific equine IgE in serum against a wide range of putative allergenic proteins. The microarray revealed a plethora of novel pollen, bacteria, mould and arthropod proteins significant in the aetiology of sEA. Moreover, the analyses revealed an association between sEA-affected horses and IgE antibodies specific for proteins derived from latex, which has traditionally been ubiquitous to the horse’s environment in the form of riding surfaces and race tracks. Further work is required to establish the involvement of latex proteins in sEA as a potential risk factor. This work demonstrates a novel and rapid approach to sEA diagnosis, providing a platform for tailored management and the development of allergen-specific immunotherapy.


Author(s):  
O. I. Eyong ◽  
A. T. Owolabi ◽  
A. A. J. Mofunanya ◽  
E. E. Ekpiken

Cucumber mosaic virus (CMV) is one of the most important viral pathogens infecting a wide range of plant species in Nigeria. Mosaic and mottle symptoms were observed on Lagenaria siceraria L. in Adim Southern-Nigeria in 2018 and the infected leaves collected for investigation. This research was aimed at characterising the virus responsible for this infection with a view to identifying it.  Antigen coated plate (ACP) enzyme linked-immunosorbent assay (ELISA) and gene sequencing were employed methods in the characterisation process. The Amplified Complementary Deoxyribonucleic Acid (cDNA) was cloned and the nucleotide sequence was determined. Result of serology revealed that the virus belonged to the genus Cucumovirus while the gene sequence obtained when compared to known virus sequences present in the GenBank using the Basic Local Alignment Search Tool (BLAST) program available at National Centre for Biotechnology Information (NCBI) revealed 97% sequence homologue with Cucumber mosaic virus confirming it as Cucumber mosaic virus. This is the first report of CMV infecting L. siceraria in Nigeria. I recommend further studies on insect and host range test of this virus be carried on.


Sign in / Sign up

Export Citation Format

Share Document