scholarly journals General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

Author(s):  
Jingfei Du ◽  
Myle Ott ◽  
Haoran Li ◽  
Xing Zhou ◽  
Veselin Stoyanov
2019 ◽  
Vol 1 (3) ◽  
Author(s):  
A. Aziz Altowayan ◽  
Lixin Tao

We consider the following problem: given neural language models (embeddings) each of which is trained on an unknown data set, how can we determine which model would provide a better result when used for feature representation in a downstream task such as text classification or entity recognition? In this paper, we assess the word similarity measure through analyzing its impact on word embeddings learned from various datasets and how they perform in a simple classification task. Word representations were learned and assessed under the same conditions. For training word vectors, we used the implementation of Continuous Bag of Words described in [1]. To assess the quality of the vectors, we applied the analogy questions test for word similarity described in the same paper. Further, to measure the retrieval rate of an embedding model, we introduced a new metric (Average Retrieval Error) which measures the percentage of missing words in the model. We observe that scoring a high accuracy of syntactic and semantic similarities between word pairs is not an indicator of better classification results. This observation can be justified by the fact that a domain-specific corpus contributes to the performance better than a general-purpose corpus. For reproducibility, we release our experiments scripts and results.


2014 ◽  
Vol 40 (3) ◽  
pp. 687-723 ◽  
Author(s):  
Cyril Allauzen ◽  
Bill Byrne ◽  
Adrià de Gispert ◽  
Gonzalo Iglesias ◽  
Michael Riley

This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT.


2010 ◽  
Vol 17 (4) ◽  
pp. 455-483 ◽  
Author(s):  
YUQING GUO ◽  
HAIFENG WANG ◽  
JOSEF VAN GENABITH

AbstractThis paper presents a general-purpose, wide-coverage, probabilistic sentence generator based on dependency n-gram models. This is particularly interesting as many semantic or abstract syntactic input specifications for sentence realisation can be represented as labelled bi-lexical dependencies or typed predicate-argument structures. Our generation method captures the mapping between semantic representations and surface forms by linearising a set of dependencies directly, rather than via the application of grammar rules as in more traditional chart-style or unification-based generators. In contrast to conventional n-gram language models over surface word forms, we exploit structural information and various linguistic features inherent in the dependency representations to constrain the generation space and improve the generation quality. A series of experiments shows that dependency-based n-gram models generalise well to different languages (English and Chinese) and representations (LFG and CoNLL). Compared with state-of-the-art generation systems, our general-purpose sentence realiser is highly competitive with the added advantages of being simple, fast, robust and accurate.


2021 ◽  
Vol 9 ◽  
pp. 226-242
Author(s):  
Zhaofeng Wu ◽  
Hao Peng ◽  
Noah A. Smith

Abstract For natural language processing systems, two kinds of evidence support the use of text representations from neural language models “pretrained” on large unannotated corpora: performance on application-inspired benchmarks (Peters et al., 2018, inter alia), and the emergence of syntactic abstractions in those representations (Tenney et al., 2019, inter alia). On the other hand, the lack of grounded supervision calls into question how well these representations can ever capture meaning (Bender and Koller, 2020). We apply novel probes to recent language models— specifically focusing on predicate-argument structure as operationalized by semantic dependencies (Ivanova et al., 2012)—and find that, unlike syntax, semantics is not brought to the surface by today’s pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning, yielding benefits to natural language understanding (NLU) tasks in the GLUE benchmark. This approach demonstrates the potential for general-purpose (rather than task-specific) linguistic supervision, above and beyond conventional pretraining and finetuning. Several diagnostics help to localize the benefits of our approach.1


2021 ◽  
Vol 11 (17) ◽  
pp. 7814
Author(s):  
Buddhika Kasthuriarachchy ◽  
Madhu Chetty ◽  
Adrian Shatte ◽  
Darren Walls

Obtaining meaning-rich representations of social media inputs, such as Tweets (unstructured and noisy text), from general-purpose pre-trained language models has become challenging, as these inputs typically deviate from mainstream English usage. The proposed research establishes effective methods for improving the comprehension of noisy texts. For this, we propose a new generic methodology to derive a diverse set of sentence vectors combining and extracting various linguistic characteristics from latent representations of multi-layer, pre-trained language models. Further, we clearly establish how BERT, a state-of-the-art pre-trained language model, comprehends the linguistic attributes of Tweets to identify appropriate sentence representations. Five new probing tasks are developed for Tweets, which can serve as benchmark probing tasks to study noisy text comprehension. Experiments are carried out for classification accuracy by deriving the sentence vectors from GloVe-based pre-trained models and Sentence-BERT, and by using different hidden layers from the BERT model. We show that the initial and middle layers of BERT have better capability for capturing the key linguistic characteristics of noisy texts than its latter layers. With complex predictive models, we further show that the sentence vector length has lesser importance to capture linguistic information, and the proposed sentence vectors for noisy texts perform better than the existing state-of-the-art sentence vectors.


Author(s):  
Andri Setyorini ◽  
Niken Setyaningrum

Background: Elderly is the final stage of the human life cycle, that is part of the inevitable life process and will be experienced by every individual. At this stage the individual undergoes many changes both physically and mentally, especially setbacks in various functions and abilities he once had. Preliminary study in Social House Tresna Wreda Yogyakarta Budhi Luhur Units there are 16 elderly who experience physical immobilization. In the social house has done various activities for the elderly are still active, but the elderly who experienced muscle weakness is not able to follow the exercise, so it needs to do ROM (Range Of Motion) exercise.   Objective: The general purpose of this research is to know the effect of Range Of Motion (ROM) Active Assitif training to increase the range of motion of joints in elderly who experience physical immobility at Social House of Tresna Werdha Yogyakarta unit Budhi Luhur.   Methode: This study was included in the type of pre-experiment, using the One Group Pretest Posttest design in which the range of motion of the joints before (pretest) and posttest (ROM) was performed  ROM. Subjects in this study were all elderly with impaired physical mobility in Social House Tresna Wreda Yogyakarta Unit Budhi Luhur a number of 14 elderly people. Data analysis in this research use paired sample t-test statistic  Result: The result of this research shows that there is influence of ROM (Range of Motion) Active training to increase of range of motion of joints in elderly who experience physical immobility at Social House Tresna Wredha Yogyakarta Unit Budhi Luhur.  Conclusion: There is influence of ROM (Range of Motion) Active training to increase of range of motion of joints in elderly who experience physical immobility at Social House Tresna Wredha Yogyakarta Unit Budhi Luhur.


Sign in / Sign up

Export Citation Format

Share Document