Constructing Distributed Semantic Lexical Representations using a Machine Readable Dictionary

Author(s):  
Richard F. E. Sutcliffe
Author(s):  
Yuji Matsumoto

This article deals with the acquisition of lexical knowledge, instrumental in complementing the ambiguous process of NLP (natural language processing). Imprecise in nature, lexical representations are mostly simple and superficial. The thesaurus would be an apt example. Two primary tools for acquiring lexical knowledge are ‘corpora’ and ‘machine-readable dictionary’ (MRD). The former are mostly domain specific, monolingual, while the definitions in MRD are generally described by a ‘genus term’ followed by a set of differentiae. Auxiliary technical nuances of the acquisition process, find mention as well, such as ‘lexical collocation’ and ‘association’, referring to the deliberate co-occurrence of words that form a new meaning altogether and loses it whenever a synonym replaces either of the words. The first seminal work on collocation extraction from large text corpora, was compiled around the early 1990s, using inter-word mutual information to locate collocation. Abundant corpus data would be obtainable from the Linguistic Data Consortium (LDC).


1986 ◽  
Vol 65 (1) ◽  
pp. 9
Author(s):  
C.W. Painter
Keyword(s):  

1969 ◽  
Vol 08 (01) ◽  
pp. 07-11 ◽  
Author(s):  
H. B. Newcombe

Methods are described for deriving personal and family histories of birth, marriage, procreation, ill health and death, for large populations, from existing civil registrations of vital events and the routine records of ill health. Computers have been used to group together and »link« the separately derived records pertaining to successive events in the lives of the same individuals and families, rapidly and on a large scale. Most of the records employed are already available as machine readable punchcards and magnetic tapes, for statistical and administrative purposes, and only minor modifications have been made to the manner in which these are produced.As applied to the population of the Canadian province of British Columbia (currently about 2 million people) these methods have already yielded substantial information on the risks of disease: a) in the population, b) in relation to various parental characteristics, and c) as correlated with previous occurrences in the family histories.


1997 ◽  
Vol 9 (1-3) ◽  
pp. 58-77
Author(s):  
Vitaly Kliatskine ◽  
Eugene Shchepin ◽  
Gunnar Thorvaldsen ◽  
Konstantin Zingerman ◽  
Valery Lazarev

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.


2020 ◽  
Author(s):  
Pauline Palma ◽  
Marie-France Marin ◽  
k onishi ◽  
Debra Titone

Although several studies have focused on novel word learning and consolidation in native (presumably monolingual) speakers, less is know about how bilinguals add novel words to their mental lexicon. Here, we trained 33 English-French bilinguals on novel word-forms that were neighbors to “hermit” English words (i.e., words with no existing neighbors). Importantly, these English words varied in terms of orthographic overlap with their French translation equivalent (i.e., cognates vs. noncognates). We measured explicit recognition of the novel neighbors and the interaction between novel neighbors and English words through a lexical decision task, both before and after a sleep interval. In the lexical decision task, we found evidence of immediate facilitation for English words with novel neighbors, and evidence of competition after a sleep interval for cognate words only. These results suggest that higher quality of existing lexical representations predicts an earlier onset for novel word lexicalization.


Author(s):  
O. Y. Balalaieva ◽  

The purpose of the article is to study the dynamics of electronic dictionaries development abroad and in Ukraine using methods of analysis of scientific sources, comparison, generalization and systematization. Electronic dictionaries have been found to be a relatively new phenomenon in the lexicographic market, evolving from machine-readable dictionaries, exact copies of paper editions to complex digital lexicographic systems with a powerful arsenal of functions over the decades. The stages of development of autonomous and online dictionaries are described. Electronic dictionaries due to the advanced search capabilities, speed, simplicity, ease of use, accessibility and compactness have gained popularity among a wide range of users. Today they are used in many spheres of human activity – scientific, educational, professional, everyday communication. However, the analysis of the current level of development of Ukrainian electronic resources indicates a shortage of electronic dictionaries both common and terminological vocabulary. The lack of electronic dictionaries is due to a number of objective problems, both practical and theoretical, that is why research in the field of domestic computer lexicography is a promising area of further research.


Sign in / Sign up

Export Citation Format

Share Document