Design and development of Iberia: a corpus of scientific Spanish
Iberia is a synchronic corpus of scientific Spanish designed mainly for terminological studies. In this paper, we describe its design and the infrastructure for its acquisition, processing and exploitation, including mark-up, linguistic annotation, indexing and the user interface. Two pre-processing tasks affecting a large number of words are described in detail: de-hyphenation and identification of text fragments in other languages. We also show how some of the reported statistics, namely, dispersion and association, are used for research on lexis.
Keyword(s):
1996 ◽
Vol 40
(5)
◽
pp. 318-322
◽
Keyword(s):
2014 ◽
Vol 03
(15)
◽
pp. 176-180
◽
2009 ◽
Vol 9
(4)
◽
pp. 72-80