Automatic Comparable Web Corpora Collection and Bilingual Terminology Extraction for Specialized Dictionary Making

Author(s):  
Antton Gurrutxaga ◽  
Igor Leturia ◽  
Xabier Saralegi ◽  
Iñaki San Vicente
2014 ◽  
Vol 35 ◽  
pp. 879-885
Author(s):  
Kyoko Yanagihori ◽  
Koji Tanaka ◽  
Kazuhiko Tsuda

2010 ◽  
Vol 16 (4) ◽  
pp. 469-491 ◽  
Author(s):  
YVES PEIRSMAN ◽  
DIRK GEERAERTS ◽  
DIRK SPEELMAN

AbstractLanguages are not uniform. Speakers of different language varieties use certain words differently – more or less frequently, or with different meanings. We argue that distributional semantics is the ideal framework for the investigation of such lexical variation. We address two research questions and present our analysis of the lexical variation between Belgian Dutch and Netherlandic Dutch. The first question involves a classic application of distributional models: the automatic retrieval of synonyms. We use corpora of two different language varieties to identify the Netherlandic Dutch synonyms for a set of typically Belgian words. Second, we address the problem of automatically identifying words that are typical of a given lect, either because of their high frequency or because of their divergent meaning. Overall, we show that distributional models are able to identify more lectal markers than traditional keyword methods. Distributional models also have a bias towards a different type of variation. In summary, our results demonstrate how distributional semantics can help research in variational linguistics, with possible future applications in lexicography or terminology extraction.


2020 ◽  
Author(s):  
Anna Hätty ◽  
Dominik Schlechtweg ◽  
Michael Dorna ◽  
Sabine Schulte im Walde

Terminology ◽  
1997 ◽  
Vol 4 (2) ◽  
pp. 225-244 ◽  
Author(s):  
David A. Hull

Translation is a labor intensive process. We propose a general methodology for automatic terminology extraction and alignment which could substantially reduce the translator's workload. The goal is to take advantage of existing technology in terminology extraction and statistical word alignment to automatically construct a bilingual terminology lexicon by exploiting bilingual parallel aligned corpora. This paper introduces the technology in each area and discusses some simple heuristic methods for using the output from each component to build a bilingual terminology lexicon. The process is illustrated by an in-depth analysis of a single sentence pair.


2019 ◽  
Vol 20 (2) ◽  
pp. 197-211
Author(s):  
Veronique Hoste ◽  
Klaar Vanopstal ◽  
Ayla Rigouts Terryn ◽  
Els Lefever

2020 ◽  
Vol 50 (6) ◽  
pp. 1813-1831 ◽  
Author(s):  
Fethi Fkih ◽  
Mohamed Nazih Omri

Sign in / Sign up

Export Citation Format

Share Document