scholarly journals Generative Language Modeling for Antibody Design

2021 ◽  
Author(s):  
Richard W. Shuai ◽  
Jeffrey A. Ruffolo ◽  
Jeffrey J. Gray

Successful development of monoclonal antibodies (mAbs) for therapeutic applications is hindered by developability issues such as low solubility, low thermal stability, high aggregation, and high immunogenicity. The discovery of more developable mAb candidates relies on high-quality antibody libraries for isolating candidates with desirable properties. We present Immunoglobulin Language Model (IgLM), a deep generative language model for generating synthetic libraries by re-designing variable-length spans of antibody sequences. IgLM formulates antibody design as an autoregressive sequence generation task based on text-infilling in natural language. We trained IgLM on approximately 558M antibody heavy- and light-chain variable sequences, conditioning on each sequence's chain type and species-of-origin. We demonstrate that IgLM can be applied to generate synthetic libraries that may accelerate the discovery of therapeutic antibody candidates

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Koichiro Saka ◽  
Taro Kakuzaki ◽  
Shoichi Metsugi ◽  
Daiki Kashiwagi ◽  
Kenji Yoshida ◽  
...  

AbstractMolecular evolution is an important step in the development of therapeutic antibodies. However, the current method of affinity maturation is overly costly and labor-intensive because of the repetitive mutation experiments needed to adequately explore sequence space. Here, we employed a long short term memory network (LSTM)—a widely used deep generative model—based sequence generation and prioritization procedure to efficiently discover antibody sequences with higher affinity. We applied our method to the affinity maturation of antibodies against kynurenine, which is a metabolite related to the niacin synthesis pathway. Kynurenine binding sequences were enriched through phage display panning using a kynurenine-binding oriented human synthetic Fab library. We defined binding antibodies using a sequence repertoire from the NGS data to train the LSTM model. We confirmed that likelihood of generated sequences from a trained LSTM correlated well with binding affinity. The affinity of generated sequences are over 1800-fold higher than that of the parental clone. Moreover, compared to frequency based screening using the same dataset, our machine learning approach generated sequences with greater affinity.


Author(s):  
Kelvin Guu ◽  
Tatsunori B. Hashimoto ◽  
Yonatan Oren ◽  
Percy Liang

We propose a new generative language model for sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional language models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.


2020 ◽  
Vol 34 (10) ◽  
pp. 13859-13860
Author(s):  
Yiyuan Li ◽  
Antonios Anastasopoulos ◽  
Alan W. Black

Current grammatical error correction (GEC) models typically consider the task as sequence generation, which requires large amounts of annotated data and limit the applications in data-limited settings. We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios. Results show strong potential of Bidirectional Encoder Representations from Transformers (BERT) in grammatical error correction task.


2020 ◽  
Vol 389 ◽  
pp. 93-107
Author(s):  
Jinmeng Wu ◽  
Tingting Mu ◽  
Jeyarajan Thiyagalingam ◽  
John Y. Goulermas

2020 ◽  
Vol 6 (26) ◽  
pp. eaaz1002 ◽  
Author(s):  
Stephen Ferrigno ◽  
Samuel J. Cheyette ◽  
Steven T. Piantadosi ◽  
Jessica F. Cantlon

The question of what computational capacities, if any, differ between humans and nonhuman animals has been at the core of foundational debates in cognitive psychology, anthropology, linguistics, and animal behavior. The capacity to form nested hierarchical representations is hypothesized to be essential to uniquely human thought, but its origins in evolution, development, and culture are controversial. We used a nonlinguistic sequence generation task to test whether subjects generalize sequential groupings of items to a center-embedded, recursive structure. Children (3 to 5 years old), U.S. adults, and adults from a Bolivian indigenous group spontaneously induced recursive structures from ambiguous training data. In contrast, monkeys did so only with additional exposure. We quantify these patterns using a Bayesian mixture model over logically possible strategies. Our results show that recursive hierarchical strategies are robust in human thought, both early in development and across cultures, but the capacity itself is not unique to humans.


2021 ◽  
pp. 1-17
Author(s):  
Luping Liu ◽  
Meiling Wang ◽  
Xiaohai He ◽  
Linbo Qing ◽  
Jin Zhang

Joint extraction of entities and relations from unstructured text is an essential step in constructing a knowledge base. However, relational facts in these texts are often complicated, where most of them contain overlapping triplets, making the joint extraction task still challenging. This paper proposes a novel Sequence-to-Sequence (Seq2Seq) framework to handle the overlapping issue, which models the triplet extraction as a sequence generation task. Specifically, a unique cascade structure is proposed to connect transformer and pointer network to extract entities and relations jointly. By this means, sequences can be generated in triplet-level and it speeds up the decoding process. Besides, a syntax-guided encoder is applied to integrate the sentence’s syntax structure into the transformer encoder explicitly, which helps the encoder pay more accurate attention to the syntax-related words. Extensive experiments were conducted on three public datasets, named NYT24, NYT29, and WebNLG, and the results show the validity of this model by comparing with various baselines. In addition, a pre-trained BERT model is also employed as the encoder. Then it comes up to excellent performance that the F1 scores on the three datasets surpass the strongest baseline by 5.7%, 5.6%, and 4.4%.


2021 ◽  
pp. 1-14
Author(s):  
Ethan Porter ◽  
Yamil R. Velez

Abstract Although placebo conditions are ubiquitous in survey experiments, little evidence guides common practices for their use and selection. How should scholars choose and construct placebos? First, we review the role of placebos in published survey experiments, finding that placebos are used inconsistently. Then, drawing on the medical literature, we clarify the role that placebos play in accounting for nonspecific effects (NSEs), or the effects of ancillary features of experiments. We argue that, in the absence of precise knowledge of NSEs that placebos are adjusting for, researchers should average over a corpus of many placebos. We demonstrate this agnostic approach to placebo construction through the use of GPT-2, a generative language model trained on a database of over 1 million internet news pages. Using GPT-2, we devise 5,000 distinct placebos and administer two experiments (N = 2,975). Our results illustrate how researchers can minimize their role in placebo selection through automated processes. We conclude by offering tools for incorporating computer-generated placebo text vignettes into survey experiments and developing recommendations for best practice.


2021 ◽  
Author(s):  
Douglas Summers-Stay ◽  
Claire Bonial ◽  
Clare Voss

Sign in / Sign up

Export Citation Format

Share Document