scholarly journals RetroGAN: A Cyclic Post-Specialization System for Improving Out-of-Knowledge and Rare Word Representations

Author(s):  
Pedro Colon-Hernandez ◽  
Yida Xin ◽  
Henry Lieberman ◽  
Catherine Havasi ◽  
Cynthia Breazeal ◽  
...  
Keyword(s):  
Informatics ◽  
2021 ◽  
Vol 8 (1) ◽  
pp. 20
Author(s):  
Giovanni Bonetta ◽  
Marco Roberti ◽  
Rossella Cancelliere ◽  
Patrick Gallinari

In this paper, we analyze the problem of generating fluent English utterances from tabular data, focusing on the development of a sequence-to-sequence neural model which shows two major features: the ability to read and generate character-wise, and the ability to switch between generating and copying characters from the input: an essential feature when inputs contain rare words like proper names, telephone numbers, or foreign words. Working with characters instead of words is a challenge that can bring problems such as increasing the difficulty of the training phase and a bigger error probability during inference. Nevertheless, our work shows that these issues can be solved and efforts are repaid by the creation of a fully end-to-end system, whose inputs and outputs are not constrained to be part of a predefined vocabulary, like in word-based models. Furthermore, our copying technique is integrated with an innovative shift mechanism, which enhances the ability to produce outputs directly from inputs. We assess performance on the E2E dataset, the benchmark used for the E2E NLG challenge, and on a modified version of it, created to highlight the rare word copying capabilities of our model. The results demonstrate clear improvements over the baseline and promising performance compared to recent techniques in the literature.


1983 ◽  
Vol 15 (3) ◽  
pp. 19-39 ◽  
Author(s):  
Peter Freebody ◽  
Richard C. Anderson

Two experiments assessed the effect of vocabulary difficulty on three measures of text comprehension—free recall, summary recall, and sentence recognition. In Experiment 1, the effects of differing proportions of rare-word substitutions were examined. It was found that a high rate of difficult vocabulary (one substance word in three) was required before reliable effects on comprehension were evident. In Experiment 2, difficult vocabulary was placed in important text elements in one form of the passages, and in unimportant elements in another. These forms were contrasted with easy vocabulary forms in their effects on the three comprehension measures. Only on the summary measure was there an overall effect of difficult vocabulary in important elements. The results are discussed in terms of the salience of the signaling value of unfamiliar words.


Author(s):  
Rashmini Naranpanawa ◽  
Ravinga Perera ◽  
Thilakshi Fonseka ◽  
Uthayasanker Thayasivam

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.


1975 ◽  
Vol 25 (1) ◽  
pp. 26-40
Author(s):  
M. H. B. Marshall

In the opening sentence of 2.6, which is of key significance, the meanings or references of are disputed, rendering the whole passage difficult. In the most widely established version, while —‘Evidence for the statement that Athens grew morethan other places because of migration is provided by the following, viz. that...’ This is consistent with taking either Athens or the other places as the subject of the infinitive, ‘Athens grew more’ or ‘the other places grew less’. If recapitulates the end of 2.2, appears to refer to those migrations; but since it is a rare word, abnormal in that sense, and is obscure, Ullrich (Beitr. z. Erkl. des Th., 169 ff.) proposed , which makes a straight reference to migration, disposes of the obscurity, and provides an explicit subject for the infinitive. Gomme accepts this. I disagree, but yet believe that the established version, though capable of much improvement, is the best to date.


2018 ◽  
Vol 50 (1) ◽  
pp. 23-60
Author(s):  
Mohd Hilmi Hamzah ◽  
John Hajek ◽  
Janet Fletcher

This study reports on non-durational acoustic correlates of typologically rare word-initial consonant gemination in Kelantan Malay (KM) by focusing on two acoustic parameters – amplitude and f0. Given the unusual characteristics of the word-initial consonant contrast and its potential maintenance in domain-initial environments, this study sets to examine the extent to which amplitude and f0 can potentially characterise such a contrast in KM in addition to the cross-linguistically established acoustic correlate of closure duration. The production data involved elicited materials from sixteen KM native speakers. RMS and f0 values were measured at the start of the vowel following stops and sonorants produced in isolation (i.e. utterance-initial position) and in a carrier sentence (i.e. utterance-medial position). Results indicate that the consonant contrast is reflected in systematic differences in (i) vowel onset amplitude and f0 following the target consonant and (ii) the ratios of amplitude and f0 across two syllables of disyllabic words. There are also effects of utterance position, manner of articulation and voicing type on the magnitude of contrast between singletons and geminates with utterance-initial voiceless stops generally showing the greatest magnitude difference. The conclusion is drawn that the KM word-initial singleton/geminate consonant contrast can be associated with a set of acoustic parameters alongside closure duration.


2007 ◽  
Vol 39 (01) ◽  
pp. 128-140 ◽  
Author(s):  
Etienne Roquain ◽  
Sophie Schbath

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.


1996 ◽  
Vol 24 (1) ◽  
pp. 60-69 ◽  
Author(s):  
Terrence M. Barnhardt ◽  
Elizabeth L. Glisky ◽  
Michael R. Polster ◽  
Laurie Elam
Keyword(s):  

2007 ◽  
Vol 28 (3) ◽  
pp. 221-240 ◽  
Author(s):  
Debra Mcginnis ◽  
Nikola N. Saunders ◽  
Ryan J. Burns
Keyword(s):  

2021 ◽  
Vol 8 (1) ◽  
pp. 133-143
Author(s):  
Irina B. Diaghileva ◽  

The article deals with the newspaper “Babochka” as an important source for philological research, which objectively reflects the linguistic processes of its time. Translated articles selected from leading periodicals in Europe and America, creatively revised by the authors, included the Russian reader in the world media space. A differential approach is used in the article that focuses primarily on the dynamic elements of the lexical and semantic system. The newspaper presents the innovations of the early 19th century, including borrowings, foreign language inclusions, complex adjectives formed in Russian, and dialect words. As a result of the analysis of the source, the emergence of new meanings for words already in use was noted, the dating of a number of new lexemes was clarified, and contexts for their semantization were identified. The work concludes that the rare words and rare word usage recorded in the texts of the newspaper “Babochka” can be considered as valuable materials for historical lexicology.


Author(s):  
Masako Fujimoto ◽  
Shigeko Shinohara ◽  
Daichi Mochihashi

The Ikema dialect of Miyako Island in Okinawa, Japan, has typologically rare word-initial and voiced geminate obstruents (e.g. /vva/ ‘you’, /ffa/ ‘child’, /tta/ ‘tongue’, /badda/ ‘side’). These sounds are marked in two ways: Voicing through geminate obstruents is hard to produce and initial voiceless plosives seem to be difficult to perceive. This study investigated real-time magnetic resonance imaging (rt-MRI) to examine the articulatory settings underlying contrasts between singleton and geminate obstruents. Our analyses of two male speakers’ utterances showed the following five characteristics: (i) geminate obstruents in Ikema have longer duration of articulatory constrictions regardless of position and consonant types; (ii) the voiced alveolar plosive geminate /dd/ is articulated with a larger linguopalatal contact than its singleton counterpart but such difference depends on the speaker for the voiceless plosive pair /tt/–/t/ and the fricative pairs /ss/–/s/ and /zz/–/z/; (iii) alveolar voiceless plosives /t/ and /tt/ have a greater degree of linguopalatal contact than their voiced counterparts /d/ and /dd/, respectively, but fricatives show inter-speaker variation; (iv) fricatives do not show any systematic difference in degree of (midsagittal) linguopalatal contact between geminates and singletons, or between voiceless and voiced consonants; and (v) voiced geminate obstruents are accompanied by pharyngeal expansion for both speakers and by lowering the larynx for one speaker, and never by lowering of the velum. We also observed that voiced fricatives tend to realize as affricates, which we interpret as part of the articulatory adjustments for (full) voicing of phonologically voiced geminate fricatives.


Sign in / Sign up

Export Citation Format

Share Document