scholarly journals Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels

Author(s):  
Harris Chan ◽  
Jamie Kiros ◽  
William Chan
Author(s):  
Gretel Liz De la Peña Sarracén ◽  
Paolo Rosso

AbstractThe proliferation of harmful content on social media affects a large part of the user community. Therefore, several approaches have emerged to control this phenomenon automatically. However, this is still a quite challenging task. In this paper, we explore the offensive language as a particular case of harmful content and focus our study in the analysis of keywords in available datasets composed of offensive tweets. Thus, we aim to identify relevant words in those datasets and analyze how they can affect model learning. For keyword extraction, we propose an unsupervised hybrid approach which combines the multi-head self-attention of BERT and a reasoning on a word graph. The attention mechanism allows to capture relationships among words in a context, while a language model is learned. Then, the relationships are used to generate a graph from what we identify the most relevant words by using the eigenvector centrality. Experiments were performed by means of two mechanisms. On the one hand, we used an information retrieval system to evaluate the impact of the keywords in recovering offensive tweets from a dataset. On the other hand, we evaluated a keyword-based model for offensive language detection. Results highlight some points to consider when training models with available datasets.


Author(s):  
Kelvin Guu ◽  
Tatsunori B. Hashimoto ◽  
Yonatan Oren ◽  
Percy Liang

We propose a new generative language model for sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional language models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.


2020 ◽  
Vol 389 ◽  
pp. 93-107
Author(s):  
Jinmeng Wu ◽  
Tingting Mu ◽  
Jeyarajan Thiyagalingam ◽  
John Y. Goulermas

2021 ◽  
pp. 1-14
Author(s):  
Ethan Porter ◽  
Yamil R. Velez

Abstract Although placebo conditions are ubiquitous in survey experiments, little evidence guides common practices for their use and selection. How should scholars choose and construct placebos? First, we review the role of placebos in published survey experiments, finding that placebos are used inconsistently. Then, drawing on the medical literature, we clarify the role that placebos play in accounting for nonspecific effects (NSEs), or the effects of ancillary features of experiments. We argue that, in the absence of precise knowledge of NSEs that placebos are adjusting for, researchers should average over a corpus of many placebos. We demonstrate this agnostic approach to placebo construction through the use of GPT-2, a generative language model trained on a database of over 1 million internet news pages. Using GPT-2, we devise 5,000 distinct placebos and administer two experiments (N = 2,975). Our results illustrate how researchers can minimize their role in placebo selection through automated processes. We conclude by offering tools for incorporating computer-generated placebo text vignettes into survey experiments and developing recommendations for best practice.


2021 ◽  
Author(s):  
Douglas Summers-Stay ◽  
Claire Bonial ◽  
Clare Voss

2021 ◽  
Author(s):  
Richard W. Shuai ◽  
Jeffrey A. Ruffolo ◽  
Jeffrey J. Gray

Successful development of monoclonal antibodies (mAbs) for therapeutic applications is hindered by developability issues such as low solubility, low thermal stability, high aggregation, and high immunogenicity. The discovery of more developable mAb candidates relies on high-quality antibody libraries for isolating candidates with desirable properties. We present Immunoglobulin Language Model (IgLM), a deep generative language model for generating synthetic libraries by re-designing variable-length spans of antibody sequences. IgLM formulates antibody design as an autoregressive sequence generation task based on text-infilling in natural language. We trained IgLM on approximately 558M antibody heavy- and light-chain variable sequences, conditioning on each sequence's chain type and species-of-origin. We demonstrate that IgLM can be applied to generate synthetic libraries that may accelerate the discovery of therapeutic antibody candidates


2008 ◽  
Vol 19 (9) ◽  
pp. 2449-2460 ◽  
Author(s):  
Mei WANG ◽  
Xiang-Dong ZHOU ◽  
Jun-Qi ZHANG ◽  
Hong-Tao XU ◽  
Bai-Le SHI

2021 ◽  
Author(s):  
Su-Jeong Park ◽  
Soon-Seo Park ◽  
Han-Lim Choi ◽  
Kyeong-Soo An ◽  
Young-Gon Kim

Sign in / Sign up

Export Citation Format

Share Document