scholarly journals Humanization of antibodies using a machine learning approach on large-scale repertoire data

2021 ◽  
Author(s):  
Mark Chin ◽  
Claire Marks ◽  
Charlotte M Deane

Monoclonal antibody therapeutics are often produced from non-human sources (typically murine), and can therefore generate immunogenic responses in humans. Humanization procedures aim to produce antibody therapeutics that do not elicit an immune response and are safe for human use, without impacting efficacy. Humanization is normally carried out in a largely trial-and-error experimental process. We have built machine learning classifiers that can discriminate between human and non-human antibody variable domain sequences using the large amount of repertoire data now available. Our classifiers consistently outperform existing best-in-class models, and our output scores exhibit a negative relationship with the experimental immunogenicity of existing antibody therapeutics. We used our classifiers to develop a novel, computational humanization tool, Hu-mAb, that suggests mutations to an input sequence to reduce its immunogenicity. For a set of existing therapeutics with known precursor sequences, the mutations suggested by Hu-mAb show significant overlap with those deduced experimentally. Hu-mAb is therefore an effective replacement for trial-and-error humanization experiments, producing similar results in a fraction of the time. Hu-mAb is freely available to use at opig.stats.ox.ac.uk/webapps/humab.

2019 ◽  
Author(s):  
Anton Levitan ◽  
Andrew N. Gale ◽  
Emma K. Dallon ◽  
Darby W. Kozan ◽  
Kyle W. Cunningham ◽  
...  

ABSTRACTIn vivo transposon mutagenesis, coupled with deep sequencing, enables large-scale genome-wide mutant screens for genes essential in different growth conditions. We analyzed six large-scale studies performed on haploid strains of three yeast species (Saccharomyces cerevisiae, Schizosaccaromyces pombe, and Candida albicans), each mutagenized with two of three different heterologous transposons (AcDs, Hermes, and PiggyBac). Using a machine-learning approach, we evaluated the ability of the data to predict gene essentiality. Important data features included sufficient numbers and distribution of independent insertion events. All transposons showed some bias in insertion site preference because of jackpot events, and preferences for specific insertion sequences and short-distance vs long-distance insertions. For PiggyBac, a stringent target sequence limited the ability to predict essentiality in genes with few or no target sequences. The machine learning approach also robustly predicted gene function in less well-studied species by leveraging cross-species orthologs. Finally, comparisons of isogenic diploid versus haploid S. cerevisiae isolates identified several genes that are haplo-insufficient, while most essential genes, as expected, were recessive. We provide recommendations for the choice of transposons and the inference of gene essentiality in genome-wide studies of eukaryotic haploid microbes such as yeasts, including species that have been less amenable to classical genetic studies.


Molecules ◽  
2019 ◽  
Vol 24 (11) ◽  
pp. 2097 ◽  
Author(s):  
Ambrose Plante ◽  
Derek M. Shore ◽  
Giulia Morra ◽  
George Khelashvili ◽  
Harel Weinstein

G protein-coupled receptors (GPCRs) play a key role in many cellular signaling mechanisms, and must select among multiple coupling possibilities in a ligand-specific manner in order to carry out a myriad of functions in diverse cellular contexts. Much has been learned about the molecular mechanisms of ligand-GPCR complexes from Molecular Dynamics (MD) simulations. However, to explore ligand-specific differences in the response of a GPCR to diverse ligands, as is required to understand ligand bias and functional selectivity, necessitates creating very large amounts of data from the needed large-scale simulations. This becomes a Big Data problem for the high dimensionality analysis of the accumulated trajectories. Here we describe a new machine learning (ML) approach to the problem that is based on transforming the analysis of GPCR function-related, ligand-specific differences encoded in the MD simulation trajectories into a representation recognizable by state-of-the-art deep learning object recognition technology. We illustrate this method by applying it to recognize the pharmacological classification of ligands bound to the 5-HT2A and D2 subtypes of class-A GPCRs from the serotonin and dopamine families. The ML-based approach is shown to perform the classification task with high accuracy, and we identify the molecular determinants of the classifications in the context of GPCR structure and function. This study builds a framework for the efficient computational analysis of MD Big Data collected for the purpose of understanding ligand-specific GPCR activity.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241239
Author(s):  
Kai On Wong ◽  
Osmar R. Zaïane ◽  
Faith G. Davis ◽  
Yutaka Yasui

Background Canada is an ethnically-diverse country, yet its lack of ethnicity information in many large databases impedes effective population research and interventions. Automated ethnicity classification using machine learning has shown potential to address this data gap but its performance in Canada is largely unknown. This study conducted a large-scale machine learning framework to predict ethnicity using a novel set of name and census location features. Methods Using census 1901, the multiclass and binary class classification machine learning pipelines were developed. The 13 ethnic categories examined were Aboriginal (First Nations, Métis, Inuit, and all-combined)), Chinese, English, French, Irish, Italian, Japanese, Russian, Scottish, and others. Machine learning algorithms included regularized logistic regression, C-support vector, and naïve Bayes classifiers. Name features consisted of the entire name string, substrings, double-metaphones, and various name-entity patterns, while location features consisted of the entire location string and substrings of province, district, and subdistrict. Predictive performance metrics included sensitivity, specificity, positive predictive value, negative predictive value, F1, Area Under the Curve for Receiver Operating Characteristic curve, and accuracy. Results The census had 4,812,958 unique individuals. For multiclass classification, the highest performance achieved was 76% F1 and 91% accuracy. For binary classifications for Chinese, French, Italian, Japanese, Russian, and others, the F1 ranged 68–95% (median 87%). The lower performance for English, Irish, and Scottish (F1 ranged 63–67%) was likely due to their shared cultural and linguistic heritage. Adding census location features to the name-based models strongly improved the prediction in Aboriginal classification (F1 increased from 50% to 84%). Conclusions The automated machine learning approach using only name and census location features can predict the ethnicity of Canadians with varying performance by specific ethnic categories.


2021 ◽  
pp. 1-8
Author(s):  
Irzam Hardiansyah ◽  
Linnea Hamrefors ◽  
Monica Siqueiros ◽  
Terje Falck-Ytter ◽  
Kristiina Tammimies

Abstract Accurate zygosity determination is a fundamental step in twin research. Although DNA-based testing is the gold standard for determining zygosity, collecting biological samples is not feasible in all research settings or all families. Previous work has demonstrated the feasibility of zygosity estimation based on questionnaire (physical similarity) data in older twins, but the extent to which this is also a reliable approach in infancy is less well established. Here, we report the accuracy of different questionnaire-based zygosity determination approaches (traditional and machine learning) in 5.5 month-old twins. The participant cohort comprised 284 infant twin pairs (128 dizygotic and 156 monozygotic) who participated in the Babytwins Study Sweden (BATSS). Manual scoring based on an established technique validated in older twins accurately predicted 90.49% of the zygosities with a sensitivity of 91.65% and specificity of 89.06%. The machine learning approach improved the prediction accuracy to 93.10%, with a sensitivity of 91.30% and specificity of 94.29%. Additionally, we quantified the systematic impact of zygosity misclassification on estimates of genetic and environmental influences using simulation-based sensitivity analysis on a separate data set to show the implication of our machine learning accuracy gain. In conclusion, our study demonstrates the feasibility of determining zygosity in very young infant twins using a questionnaire with four items and builds a scalable machine learning model with better metrics, thus a viable alternative to DNA tests in large-scale infant twin studies.


Catalysts ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 1001
Author(s):  
Heesoo Park ◽  
El Tayeb Bentria ◽  
Sami Rtimi ◽  
Abdelilah Arredouani ◽  
Halima Bensmail ◽  
...  

Nowadays, most experiments to synthesize and test photocatalytic antimicrobial materials are based on trial and error. More often than not, the mechanism of action of the antimicrobial activity is unknown for a large spectrum of microorganisms. Here, we propose a scheme to speed up the design and optimization of photocatalytic antimicrobial surfaces tailored to give a balanced production of reactive oxygen species (ROS) upon illumination. Using an experiment-to-machine-learning scheme applied to a limited experimental dataset, we built a model that can predict the photocatalytic activity of materials for antimicrobial applications over a wide range of material compositions. This machine-learning-assisted strategy offers the opportunity to reduce the cost, labor, time, and precursors consumed during experiments that are based on trial and error. Our strategy may significantly accelerate the large-scale deployment of photocatalysts as a promising route to mitigate fomite transmission of pathogens (bacteria, viruses, fungi) in hospital settings and public places.


2019 ◽  
Vol 33 (5) ◽  
pp. 825-833 ◽  
Author(s):  
J. M. Bokhorst ◽  
A. Blank ◽  
A. Lugli ◽  
I. Zlobec ◽  
H. Dawson ◽  
...  

AbstractTumor budding is a promising and cost-effective biomarker with strong prognostic value in colorectal cancer. However, challenges related to interobserver variability persist. Such variability may be reduced by immunohistochemistry and computer-aided tumor bud selection. Development of computer algorithms for this purpose requires unequivocal examples of individual tumor buds. As such, we undertook a large-scale, international, and digital observer study on individual tumor bud assessment. From a pool of 46 colorectal cancer cases with tumor budding, 3000 tumor bud candidates were selected, largely based on digital image analysis algorithms. For each candidate bud, an image patch (size 256 × 256 µm) was extracted from a pan cytokeratin-stained whole-slide image. Members of an International Tumor Budding Consortium (n = 7) were asked to categorize each candidate as either (1) tumor bud, (2) poorly differentiated cluster, or (3) neither, based on current definitions. Agreement was assessed with Cohen’s and Fleiss Kappa statistics. Fleiss Kappa showed moderate overall agreement between observers (0.42 and 0.51), while Cohen’s Kappas ranged from 0.25 to 0.63. Complete agreement by all seven observers was present for only 34% of the 3000 tumor bud candidates, while 59% of the candidates were agreed on by at least five of the seven observers. Despite reports of moderate-to-substantial agreement with respect to tumor budding grade, agreement with respect to individual pan cytokeratin-stained tumor buds is moderate at most. A machine learning approach may prove especially useful for a more robust assessment of individual tumor buds.


Nutrients ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 3195
Author(s):  
Tazman Davies ◽  
Jimmy Chun Yu Louie ◽  
Tailane Scapin ◽  
Simone Pettigrew ◽  
Jason HY Wu ◽  
...  

Underconsumption of dietary fiber is prevalent worldwide and is associated with multiple adverse health conditions. Despite the importance of fiber, the labeling of fiber content on packaged foods and beverages is voluntary in most countries, making it challenging for consumers and policy makers to monitor fiber consumption. Here, we developed a machine learning approach for automated and systematic prediction of fiber content using nutrient information commonly available on packaged products. An Australian packaged food dataset with known fiber content information was divided into training (n = 8986) and test datasets (n = 2455). Utilization of a k-nearest neighbors machine learning algorithm explained a greater proportion of variance in fiber content than an existing manual fiber prediction approach (R2 = 0.84 vs. R2 = 0.68). Our findings highlight the opportunity to use machine learning to efficiently predict the fiber content of packaged products on a large scale.


Sign in / Sign up

Export Citation Format

Share Document