Generative Language Modeling for Antibody Design

Successful development of monoclonal antibodies (mAbs) for therapeutic applications is hindered by developability issues such as low solubility, low thermal stability, high aggregation, and high immunogenicity. The discovery of more developable mAb candidates relies on high-quality antibody libraries for isolating candidates with desirable properties. We present Immunoglobulin Language Model (IgLM), a deep generative language model for generating synthetic libraries by re-designing variable-length spans of antibody sequences. IgLM formulates antibody design as an autoregressive sequence generation task based on text-infilling in natural language. We trained IgLM on approximately 558M antibody heavy- and light-chain variable sequences, conditioning on each sequence's chain type and species-of-origin. We demonstrate that IgLM can be applied to generate synthetic libraries that may accelerate the discovery of therapeutic antibody candidates

Download Full-text

Antibody design using LSTM based deep generative model from phage display library for affinity maturation

Scientific Reports ◽

10.1038/s41598-021-85274-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Koichiro Saka ◽

Taro Kakuzaki ◽

Shoichi Metsugi ◽

Daiki Kashiwagi ◽

Kenji Yoshida ◽

...

Keyword(s):

Phage Display ◽

Short Term Memory ◽

Current Method ◽

Phage Display Library ◽

Affinity Maturation ◽

Generative Model ◽

Sequence Generation ◽

Machine Learning Approach ◽

Antibody Design ◽

Ngs Data

AbstractMolecular evolution is an important step in the development of therapeutic antibodies. However, the current method of affinity maturation is overly costly and labor-intensive because of the repetitive mutation experiments needed to adequately explore sequence space. Here, we employed a long short term memory network (LSTM)—a widely used deep generative model—based sequence generation and prioritization procedure to efficiently discover antibody sequences with higher affinity. We applied our method to the affinity maturation of antibodies against kynurenine, which is a metabolite related to the niacin synthesis pathway. Kynurenine binding sequences were enriched through phage display panning using a kynurenine-binding oriented human synthetic Fab library. We defined binding antibodies using a sequence repertoire from the NGS data to train the LSTM model. We confirmed that likelihood of generated sequences from a trained LSTM correlated well with binding affinity. The affinity of generated sequences are over 1800-fold higher than that of the parental clone. Moreover, compared to frequency based screening using the same dataset, our machine learning approach generated sequences with greater affinity.

Download Full-text

Generating Sentences by Editing Prototypes

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00030 ◽

2018 ◽

Vol 6 ◽

pp. 437-450 ◽

Cited By ~ 10

Author(s):

Kelvin Guu ◽

Tatsunori B. Hashimoto ◽

Yonatan Oren ◽

Percy Liang

Keyword(s):

Language Model ◽

Language Modeling ◽

Language Models ◽

Training Corpus ◽

Human Evaluation ◽

Sentence Level ◽

Sentence Similarity ◽

Traditional Language ◽

Generative Language

We propose a new generative language model for sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional language models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.

Download Full-text

Towards Minimal Supervision BERT-Based Grammar Error Correction (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7202 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13859-13860

Author(s):

Yiyuan Li ◽

Antonios Anastasopoulos ◽

Alan W. Black

Keyword(s):

Error Correction ◽

Contextual Information ◽

Language Model ◽

Sequence Generation ◽

Strong Potential ◽

Grammatical Error

Current grammatical error correction (GEC) models typically consider the task as sequence generation, which requires large amounts of annotated data and limit the applications in data-limited settings. We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios. Results show strong potential of Bidirectional Encoder Representations from Transformers (BERT) in grammatical error correction task.

Download Full-text

Building interactive sentence-aware representation based on generative language model for community question answering

Neurocomputing ◽

10.1016/j.neucom.2019.12.107 ◽

2020 ◽

Vol 389 ◽

pp. 93-107

Author(s):

Jinmeng Wu ◽

Tingting Mu ◽

Jeyarajan Thiyagalingam ◽

John Y. Goulermas

Keyword(s):

Question Answering ◽

Language Model ◽

Community Question Answering ◽

Generative Language

Download Full-text

Recursive sequence generation in monkeys, children, U.S. adults, and native Amazonians

Science Advances ◽

10.1126/sciadv.aaz1002 ◽

2020 ◽

Vol 6 (26) ◽

pp. eaaz1002 ◽

Cited By ~ 3

Author(s):

Stephen Ferrigno ◽

Samuel J. Cheyette ◽

Steven T. Piantadosi ◽

Jessica F. Cantlon

Keyword(s):

Training Data ◽

Recursive Structure ◽

Generation Task ◽

Sequence Generation ◽

Human Thought ◽

Bayesian Mixture Model ◽

Indigenous Group ◽

Recursive Sequence ◽

Bayesian Mixture ◽

Recursive Structures

The question of what computational capacities, if any, differ between humans and nonhuman animals has been at the core of foundational debates in cognitive psychology, anthropology, linguistics, and animal behavior. The capacity to form nested hierarchical representations is hypothesized to be essential to uniquely human thought, but its origins in evolution, development, and culture are controversial. We used a nonlinguistic sequence generation task to test whether subjects generalize sequential groupings of items to a center-embedded, recursive structure. Children (3 to 5 years old), U.S. adults, and adults from a Bolivian indigenous group spontaneously induced recursive structures from ambiguous training data. In contrast, monkeys did so only with additional exposure. We quantify these patterns using a Bayesian mixture model over logically possible strategies. Our results show that recursive hierarchical strategies are robust in human thought, both early in development and across cultures, but the capacity itself is not unique to humans.

Download Full-text

Effect of Cognitive Load on a Random Sequence Generation Task

2019 IEEE Region 10 Symposium (TENSYMP) ◽

10.1109/tensymp46218.2019.8971081 ◽

2019 ◽

Author(s):

Rahul Gavas ◽

Debatri Chatterjee ◽

Aniruddha Sinha ◽

Sanjoy Kumar Saha

Keyword(s):

Cognitive Load ◽

Random Sequence ◽

Generation Task ◽

Sequence Generation ◽

Random Sequence Generation

Download Full-text

Extracting relational facts based on hybrid Syntax-Guided transformer and pointer network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210281 ◽

2021 ◽

pp. 1-17

Author(s):

Luping Liu ◽

Meiling Wang ◽

Xiaohai He ◽

Linbo Qing ◽

Jin Zhang

Keyword(s):

Knowledge Base ◽

Triplet Level ◽

Excellent Performance ◽

Generation Task ◽

Sequence Generation ◽

Essential Step ◽

Unstructured Text ◽

Cascade Structure ◽

Public Datasets

Joint extraction of entities and relations from unstructured text is an essential step in constructing a knowledge base. However, relational facts in these texts are often complicated, where most of them contain overlapping triplets, making the joint extraction task still challenging. This paper proposes a novel Sequence-to-Sequence (Seq2Seq) framework to handle the overlapping issue, which models the triplet extraction as a sequence generation task. Specifically, a unique cascade structure is proposed to connect transformer and pointer network to extract entities and relations jointly. By this means, sequences can be generated in triplet-level and it speeds up the decoding process. Besides, a syntax-guided encoder is applied to integrate the sentence’s syntax structure into the transformer encoder explicitly, which helps the encoder pay more accurate attention to the syntax-related words. Extensive experiments were conducted on three public datasets, named NYT24, NYT29, and WebNLG, and the results show the validity of this model by comparing with various baselines. In addition, a pre-trained BERT model is also employed as the encoder. Then it comes up to excellent performance that the F1 scores on the three datasets surpass the strongest baseline by 5.7%, 5.6%, and 4.4%.

Download Full-text

Placebo Selection in Survey Experiments: An Agnostic Approach

Political Analysis ◽

10.1017/pan.2021.16 ◽

2021 ◽

pp. 1-14

Author(s):

Ethan Porter ◽

Yamil R. Velez

Keyword(s):

Best Practice ◽

Medical Literature ◽

Language Model ◽

Internet News ◽

Precise Knowledge ◽

Nonspecific Effects ◽

Survey Experiments ◽

Generative Language ◽

Automated Processes

Abstract Although placebo conditions are ubiquitous in survey experiments, little evidence guides common practices for their use and selection. How should scholars choose and construct placebos? First, we review the role of placebos in published survey experiments, finding that placebos are used inconsistently. Then, drawing on the medical literature, we clarify the role that placebos play in accounting for nonspecific effects (NSEs), or the effects of ancillary features of experiments. We argue that, in the absence of precise knowledge of NSEs that placebos are adjusting for, researchers should average over a corpus of many placebos. We demonstrate this agnostic approach to placebo construction through the use of GPT-2, a generative language model trained on a database of over 1 million internet news pages. Using GPT-2, we devise 5,000 distinct placebos and administer two experiments (N = 2,975). Our results illustrate how researchers can minimize their role in placebo selection through automated processes. We conclude by offering tools for incorporating computer-generated placebo text vignettes into survey experiments and developing recommendations for best practice.

Download Full-text