scholarly journals AllerStat: Finding Statistically Significant Allergen-Specific Patterns in Protein Sequences by Machine Learning

2021 ◽  
Author(s):  
Kento Goto ◽  
Norimasa Tamehiro ◽  
Takumi Yoshida ◽  
Hiroyuki Hanada ◽  
Takuto Sakuma ◽  
...  

Cutting-edge technologies such as genome editing and synthetic biology allow us to produce novel foods and functional proteins. However, their toxicity and allergenicity must be accurately evaluated. Allergic reactions are caused by specific amino-acid sequences in proteins (Allergen Specific Patterns, ASPs), of which, many remain undiscovered. In this study, we introduce a data-driven approach and a machine-learning (ML) method to find undiscovered ASPs. The proposed method enables an exhaustive search for amino-acid subsequences whose frequencies are statistically significantly higher in allergenic proteins. As a proof-of-concept (PoC), we created a database containing 21,154 proteins of which the presence or absence of allergic reactions are already known, and the proposed method was applied to the database. The detected ASPs in the PoC study were consistent with known biological findings, and the allergenicity prediction accuracy using the detected ASPs was higher than extant approaches.

2011 ◽  
Vol 81 (23) ◽  
pp. 173-180 ◽  
Author(s):  
Barbara K. Ballmer-Weber

Four to eight percent of the population are estimated to be food-allergic. Most food allergies in adolescents and adults are acquired on the basis of cross-reaction to pollen allergens. Theses allergens are ubiquitous in the plant kingdom. Therefore pollen-allergic patients might acquire a multitude of different plant food allergies, and even react to novel foods to which they have never previously been exposed. A curative therapy for food allergy does not yet exist. Food-allergic patients have to rely on strict avoidance diets, The widespread use of industrially processed foods poses a general problem for food-allergic patients. Although the most frequent allergens must be declared openly in the list of ingredients, involuntary contamination with allergy-provoking compounds can occur. The precautionary labelling “may contain” is sometimes applied even if the chance of contamination is very low; on the other hand, foods not declared to contain possible traces of allergenic components may actually contain relevant amounts of allergenic proteins. Switzerland is the only country in Europe with legal regulations on contamination by allergenic food; however, the allowance of 1 g/kg is too high to protect a relevant proportion of food-allergic individuals.


2020 ◽  
Author(s):  
Keiichi Inoue ◽  
Masayuki Karasuyama ◽  
Ryoko Nakamura ◽  
Masae Konno ◽  
Daichi Yamada ◽  
...  

AbstractMicrobial rhodopsins are photoreceptive membrane proteins utilized as molecular tools in optogenetics. In this paper, a machine learning (ML)-based model was constructed to approximate the relationship between amino acid sequences and absorption wavelengths using ~800 rhodopsins with known absorption wavelengths. This ML-based model was specifically designed for screening rhodopsins that are red-shifted from representative rhodopsins in the same subfamily. Among 5,558 candidate rhodopsins suggested by a protein BLAST search of several protein databases, 40 were selected by the ML-based model. The wavelengths of these 40 selected candidates were experimentally investigated, and 32 (80%) showed red-shift gains. In addition, four showed red-shift gains > 20 nm, and two were found to have desirable ion-transporting properties, indicating that they were potentially useful in optogenetics. These findings suggest that an ML-based model can reduce the cost for exploring new functional proteins.


Life ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 866
Author(s):  
Sony Hartono Wijaya ◽  
Farit Mochamad Afendi ◽  
Irmanida Batubara ◽  
Ming Huang ◽  
Naoaki Ono ◽  
...  

Background: We performed in silico prediction of the interactions between compounds of Jamu herbs and human proteins by utilizing data-intensive science and machine learning methods. Verifying the proteins that are targeted by compounds of natural herbs will be helpful to select natural herb-based drug candidates. Methods: Initially, data related to compounds, target proteins, and interactions between them were collected from open access databases. Compounds are represented by molecular fingerprints, whereas amino acid sequences are represented by numerical protein descriptors. Then, prediction models that predict the interactions between compounds and target proteins were constructed using support vector machine and random forest. Results: A random forest model constructed based on MACCS fingerprint and amino acid composition obtained the highest accuracy. We used the best model to predict target proteins for 94 important Jamu compounds and assessed the results by supporting evidence from published literature and other sources. There are 27 compounds that can be validated by professional doctors, and those compounds belong to seven efficacy groups. Conclusion: By comparing the efficacy of predicted compounds and the relations of the targeted proteins with diseases, we found that some compounds might be considered as drug candidates.


Author(s):  
Felix Teufel ◽  
José Juan Almagro Armenteros ◽  
Alexander Rosenberg Johansen ◽  
Magnús Halldór Gíslason ◽  
Silas Irby Pihl ◽  
...  

AbstractSignal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.


Molecules ◽  
2018 ◽  
Vol 23 (11) ◽  
pp. 2751 ◽  
Author(s):  
Olga Tarasova ◽  
Nadezhda Biziukova ◽  
Dmitry Filimonov ◽  
Vladimir Poroikov

The high variability of the human immunodeficiency virus (HIV) is an important cause of HIV resistance to reverse transcriptase and protease inhibitors. There are many variants of HIV type 1 (HIV-1) that can be used to model sequence-resistance relationships. Machine learning methods are widely and successfully used in new drug discovery. An emerging body of data regarding the interactions of small drug-like molecules with their protein targets provides the possibility of building models on “structure-property” relationships and analyzing the performance of various machine-learning techniques. In our research, we analyze several different types of descriptors in order to predict the resistance of HIV reverse transcriptase and protease to the marketed antiretroviral drugs using the Random Forest approach. First, we represented amino acid sequences as a set of short peptide fragments, which included several amino acid residues. Second, we represented nucleotide sequences as a set of fragments, which included several nucleotides. We compared these two approaches using open data from the Stanford HIV Drug Resistance Database. We have determined the factors that modulate the performance of prediction: in particular, we observed that the prediction performance was more sensitive to certain drugs than a type of the descriptor used.


Sign in / Sign up

Export Citation Format

Share Document