scholarly journals Emerging trends: Subwords, seriously?

2020 ◽  
Vol 26 (3) ◽  
pp. 375-382
Author(s):  
Kenneth Ward Church

AbstractSubwords have become very popular, but the BERTa and ERNIEb tokenizers often produce surprising results. Byte pair encoding (BPE) trains a dictionary with a simple information theoretic criterion that sidesteps the need for special treatment of unknown words. BPE is more about training (populating a dictionary of word pieces) than inference (parsing an unknown word into word pieces). The parse at inference time can be ambiguous. Which parse should we use? For example, “electroneutral” can be parsed as electron-eu-tral or electro-neutral, and “bidirectional” can be parsed as bid-ire-ction-al and bi-directional. BERT and ERNIE tend to favor the parse with more word pieces. We propose minimizing the number of word pieces. To justify our proposal, a number of criteria will be considered: sound, meaning, etc. The prefix, bi-, has the desired vowel (unlike bid) and the desired meaning (bi is Latin for two, unlike bid, which is Germanic for offer).

1990 ◽  
Vol 37 ◽  
pp. 51-58
Author(s):  
Carolien Schouten-van Parreren

Within the larger framework of a project on Mixed Ability Teaching, a qualitative experiment was carried out with respect to the individual differences between pupils of very different ability ranges, when learning French. This experiment was meant to gain insight into the nature of the differences concerning vocabulary learning and reading strategies. 69 pupils (12-15 year) pupils of very different ability ranges (but being educated together) were presented with a variety of vocabulary learning and reading tasks. They worked individually or in pairs and were requested to think aloud. The following tasks were used: 1) while reading a story, guessing the meaning of unknown words from the context, 2) after having read a story, memorizing the meaning of unknown words by means of vocabulary cards, 3) intensive reading of a relatively difficult illustrated story, 4) recalling the meaning of new words incidentally acquired (or not), while reading a story, 5) doing an exercise, involving different reading strategies. The analysis of the protocol records focused on the causes of the differences between weak and strong pupils. The differences which were found could be related to two relevant general strategies: guessing the meaning of an unknown word from the context and analyzing the word form of an unknown word. The main results were the following: 1) the attention of weak pupils tends to be exclusively drawn by one source of information; weak pupils are not able to integrate information from different sources (advance knowledge, text, word forms, context, illustrations, cues), 2) weak pupils take no account whatsoever of the sentence structure, 3) weak pupils have difficulties in generalizing from a new word to an already known word (in the target language or in the mother tongue). The article concludes with some implications for foreign language teaching.


Author(s):  
Fatima Zahrae El Malaki

Do Moroccan EFL learners depend on the context to infer the meaning of unknown words occurring in sentences? This study investigates the way intermediate and advanced learners infer the meaning of fake words. To this end, the subjects took a test consisting of 60 items with three multiple choices. Subjects were asked to provide appropriate, inappropriate meanings of the unknown word or none of the choices without using dictionaries. The Chi-2 tests were adopted to determine whether there is a) a statistically significant difference between the three categories and b) a statistically significant difference between intermediate and advanced learners’ inferencing results. The findings demonstrate that the context along with the lexical knowledge of the L2 learners play the most important role in understanding vocabulary.


1989 ◽  
Vol 34 ◽  
pp. 13-25
Author(s):  
Jan H. Hulstijn

This research focused on the incidental learning of the meaning of new word forms occurring in a reading passage. In five experiments, a comparison was made of the retention effects of several ways to orient readers to the meaning of twelve new word forms ("targets"), occurring in a reading passage, containing otherwise simple vocabulary. In all experiments the same four-page Dutch reading passage was used (on the role of advertisement agencies). In experiments I, III, and V, the targets were 12 Dutch low frequency verbs. Subjects in these experiments were adult intermediate learners of Dutch as a second language (65, 45, and 35 Ss respectively). In experiments II and IV, these Dutch verbs were replaced by twelve pseudo-verbs. Subjects in these two experiments were adult Dutch native speakers (98 and 52 Ss respectively). In the margin of the text various sorts of cues were given, orienting the readers in various ways to the meaning of the targets. The following orienting cues (experimental conditions) were compared: (1) Translation: Translation of the target into LI (Exp. I), (2) Synonym: Dutch synonym of the target (Exp. II-V), (3) Context: a sample sentence providing a concise and highly specific context for the target's meaning (Exp. I and II), (4) Multiple Choice: four (Exp. I-III) or two (Exp. IV-V) verbs to choose from, one verb being a correct synonym, the other verbs giving wrong meanings (distractors), and (5) Control: absence of cue (Exp. I-II). In all five experiments Ss read the text and answered six multiple-choice comprehension questions, each question pertaining to the meaning of one or two paragraphs. This reading-for-comprehension task was unexpectedly followed by some posttests, eliciting knowledge of the twelve targets (incidental learning). In experiments IV and V half of the Ss were informed that retention tests were to follow the reading task (intentional learning). The results of these five experiments and the conclusions drawn from them can be summarized as follows: 1. The retention of word meanings in a truly incidental task is very poor indeed. The chance that readers will remember the meaning of an unknown word, occurring once in the text, is minimal. 2. The presence of an orienting cue enhances word meaning retention, as compared to the absence of an orienting cue. In the latter case, readers often spontaneously infer a wrong (although possible) meaning. 3. From 2 it follows that in language pedagogy one should try to assess the differential effect of various orienting cues, rather than compare giving the meaning to the reader/learner (cue presence) with having the reader/learner infer the meaning without any help (cue absence). 4. A comparison between the Multiple Choice and the Synonym conditions showed in three out of four experiments that the former had a higher retention effect than the latter in an incidental (as opposed to intentional) learning setting. With the multiple-choice procedure, however, there is a chance that the readerAearner infers a wrong meaning (distractor). This procedure should therefore only be used in the classroom, with immediate feedback from the teacher. For unguided reading/learning at home, the synonym (or translation) procedure seems to be more appropriate. 5. The results of these experiments provide modest evidence for a mental effort hypothesis. The net retention effect (i.e. in an incidental learning task) of conditions in which the meaning of unknown words must be inferred by the reader/earner is higher than of conditions in which the meaning is given. However, as said under 4, it is assumed that language teachers will generally opt for the safer procedure of giving the meaning of an unknown word, rather than for the (somewhat) more effective procedure of having the reader/learner infer the meaning.


2014 ◽  
Vol 2014 ◽  
pp. 1-7
Author(s):  
Qiuping Huang ◽  
Liangye He ◽  
Derek F. Wong ◽  
Lidia S. Chao

This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce the POS tag ambiguities of unknown words to improve the parsing performance. In addition, a novel method, using graph-based semisupervised learning (SSL), is proposed to improve the syntax parsing of unknown words. Its goal is to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The method is mainly to propagate lexical emission probabilities to unknown words by building the similarity graphs over the words of labeled and unlabeled data. The derived distributions are incorporated into the parsing process. The proposed methods are effective in dealing with the unknown words to improve the parsing. Empirical results for Penn Chinese Treebank and TCT Treebank revealed its effectiveness.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Yen-Hui Wang

This paper developed an adaptive Business English self-learning system for EFL vocabulary learning. The components of word reoccurrence and learner engagement have been built into the system where the amount of unknown word reexposure in various customized texts increases and vocabulary enhancement tasks are added to promote learner engagement with wanted words. To evaluate the system effectiveness on EFL vocabulary learning, the experimental group read system-screened texts with immediate and repeated contacts with individuals’ unknown words and performed vocabulary tasks specific to those unknown words, while the control group read online texts without unknown word reoccurrence and vocabulary practice. After one semester, these two groups were measured by one online vocabulary test, and an online user satisfaction investigation was also administered to the experimental group. The study found that the experimental group reading customized texts to reexpose to previously encountered unknown words in different texts along with doing individualized vocabulary exercises performed significantly better in EFL vocabulary learning than the other group. It was also found that the system was appealing for the learners to show positive attitudes toward the use of the system. The study demonstrated that the constructed adaptive Business English self-learning system could effectively promote vocabulary growth.


Author(s):  
Myunggwon Hwang ◽  
Pankoo Kim

This paper deals with research that automatically constructs a lexical dictionary of unknown words as an automatic lexical dictionary expansion. The lexical dictionary has been usefully applied to various fields for semantic information processing. It has limitations in which it only processes terms defined in the dictionary. Under this circumstance, the concept of “Unknown Word (UW)” is defined. UW is considered a word, not defined in WordNet, that is an existing representative lexical dictionary. Here is where a new method to construct UW lexical dictionary through inputting various document collections that are scattered on the WebWeb is proposed. The authors grasp related terms of UW and measure semantic relatedness (similarity) between an UW and a related term(s). The relatedness is obtained by calculating both probabilistic relationship and semantic relationship. This research can extend UW lexical dictionary with an abundant number of UW. It is also possible to prepare a foundation for semantic retrieval by simultaneously using the UW lexical dictionary and WordNet.


2020 ◽  
Author(s):  
Mohammed Abdulmalik Ali

This study attempted to answer the following research questions related to the various vocabulary discovery strategies which are used by Saudi undergraduate learners to find unknown word meanings, the most and the least vocabulary discovery strategies the learners used to discover unknown word meanings, the relationship between the type of Vocabulary Learning Strategies used and the scores the learners accomplished on the vocabulary test, and effectiveness of the learner control and the teacher control strategy in enhancing learners’ ability to discover meanings of unknown words. Answering these questions of the study are expected to help language instructors determine the most feasible vocabulary learning strategies to help their students improve their vocabulary and so their language competences. Through purposive sampling, a group of 50 male students participated in this descriptive and analytic type of study. A questionnaire and a vocabulary test were used to collect data. The findings of the study revealed that in understanding a reading text, EFL Saudi students tend to figure out the meanings of unknown words, mainly by guessing word-meanings through different sub-strategies. The least used was the social interaction strategy. It was also found that students’ scores on the vocabulary test significantly correlated (positively and negatively) with the type of strategy they used. This study concluded that it is vital for teachers and textbook writers to design more activities to train students on the use of effective vocabulary learning strategies, mainly guessing through socially linked contextual clues.


2008 ◽  
Vol 13 (1) ◽  
pp. 99-128 ◽  
Author(s):  
Xiaofei Lu

This paper addresses the problem of classifying Chinese unknown words into fine-grained semantic categories defined in a Chinese thesaurus, Cilin (Mei et al. 1984). We present three novel knowledge-based models that capture the relationship between the semantic categories of an unknown word and those of its component characters in three different ways, and combine two of them with a corpus-based model that uses contextual information to classify unknown words. Experiments show that the combined knowledge-based model outperforms previous methods on the same task, but the use of contextual information does not further improve performance.


Author(s):  
Minu Mathew ◽  
Chandra Sekhar Rout

This review details the fundamentals, working principles and recent developments of Schottky junctions based on 2D materials to emphasize their improved gas sensing properties including low working temperature, high sensitivity, and selectivity.


2011 ◽  
Vol 12 (1) ◽  
pp. 3-11
Author(s):  
Janet Deppe ◽  
Marie Ireland

This paper will provide the school-based speech-language pathologist (SLP) with an overview of the federal requirements for Medicaid, including provider qualifications, “under the direction of” rule, medical necessity, and covered services. Billing, documentation, and reimbursement issues at the state level will be examined. A summary of the findings of the Office of Inspector General audits of state Medicaid plans is included as well as what SLPs need to do in order to ensure that services are delivered appropriately. Emerging trends and advocacy tools will complete the primer on Medicaid services in school settings.


Sign in / Sign up

Export Citation Format

Share Document