Marianne Hund, Nadja Nesselhauf and Carolin Biewer: Corpus linguistics and the web

2010 ◽  
Vol 44 (3) ◽  
pp. 291-293
Author(s):  
Mirko Tavosanis
Keyword(s):  
Tradterm ◽  
2021 ◽  
Vol 37 (2) ◽  
pp. 460-487
Author(s):  
Adauri Brezolin

Although it might appear contradictory to investigate noncanonical phraseological combinations in corpora, corpus linguistics research has revealed that they exceed canonical forms in number (Philip 2008). This paper intends to discuss the idea of fixedness by analyzing variant forms of idioms, and if they qualify as wordplay. The Web, our data source, is employed for collecting such noncanonical occurrences in both English and Portuguese using keywords on the Google Search Engine. Our discussion mainly draws on studies relating to fixed phrases (Kjellmer 1991; Granger & Paquot 2008; Tagnin 2013); phraseological skeletons (Renouf & Sinclair 1991; Philip 2008), and idiom transformations (Veisbergs 1997; Barta 2005). Due attention is also given to search queries of nonstandard forms of fixed expressions in corpora (Philip 2008), and the translation of idiom-based wordplay (Veisbergs 1997; Brezolin 2020)


2018 ◽  
Vol 5 (2) ◽  
pp. 208-229
Author(s):  
Anne Condamines

Abstract Using the example of the alternation [to fish (det) river(s)]/[to fish prep (det) river(s)/], this paper adopts a corpus linguistics approach in order to show how it can contribute to studies in cognitive semantics, combining statistics with a more qualitative analysis. The main aim is to investigate whether these two constructions (with or without a preposition) correspond to a single meaning with alternations or to two distinct meanings. Two studies, both using the Web as corpus, were carried out to elucidate this issue. The first study compared occurrences of the two constructions on French and English websites and showed that, statistically speaking, the construction without a preposition occurs mainly in angling websites that have an emotional dimension, such as blogs. The second study, focusing solely on English websites, examined the lexical environment of the two constructions and identified certain distinct semantic classes for each construction, defining two semantic scenarios. These two semantic scenarios were found to correlate closely with the nature of the website. In light of the corpus evidence, the paper concludes in favor of two meanings, each concerned by one or the other construction (with or without a preposition). The role of the emotional dimension in the relationship between the angler and the river is crucial in determining the presence or absence of a preposition before river. Such a conclusion positions this study firmly in the perspective of cognitive sociolinguistics.


Author(s):  
Anna Matamala

Abstract Following an overview of corpus linguistics in audiovisual translation, and more specifically in audio description, this article presents the VIW (Visuals Into Words) project and its resulting corpus. It describes the compilation and annotation processes, highlighting the main challenges found. The article also presents the web application that has been developed, explaining in detail various data visualisation and search possibilities.


Author(s):  
Marianne Hundt ◽  
Nadja Nesselhauf ◽  
Carolin Biewer
Keyword(s):  

2018 ◽  
Vol 1 ◽  
pp. 1-4
Author(s):  
Luz Angela Rocha S. ◽  
Johnatan Bonilla ◽  
Julio Bernal ◽  
Catherine Duarte ◽  
Alejandro Rodriguez

The Atlas Lingüístico y Etnográfico de Colombia (Linguistic and Ethnographic Atlas of Colombia), known by “ALEC” is a compilation of popular speaking Spanish of the populations of Colombia; such research was carried out for more than fifty years. The result of this work is a collection of thematic maps organized in six volumes and its supplements in analog format. In that sense was created the project entitles “Interactive ALEC” which main objective is to develop a digital and interactive web version of the ethnographic and Linguistic Atlas of Colombia (1983) and its supplements. In this way the Corpus linguistics research group belonging to the Institute Caro y Cuervo and the research group NIDE of the Universidad Distrital “Francisco José de Caldas” have been working together in the design and development of the Atlas Web, that allows the visualization and consulting of the spatial information contained in the volume III of the analog ALEC Atlas, applying concepts of Geographical Information Systems and web cartography. Therefore, the objective of this paper is to show the process of design and development of the web prototype of the ALEC as a collection of static and dynamic maps, which show spatial information, combined with multimedia content, taking into account that in addition to all maps, the total compendium includes images, illustrations, photographs, audio and text comments. Likewise, the interactive ALEC is a good example of how to use geo-technology tools nowadays, because they are essential for the dissemination of geo linguistic information through internet, achieving more access and distribution of the Atlas web.


2006 ◽  
Vol 32 (3) ◽  
pp. 295-340 ◽  
Author(s):  
Christoph Ringlstetter ◽  
Klaus U. Schulz ◽  
Stoyan Mihov

Since the Web by far represents the largest public repository of natural language texts, recent experiments, methods, and tools in the area of corpus linguistics often use the Web as a corpus. For applications where high accuracy is crucial, the problem has to be faced that a non-negligible number of orthographic and grammatical errors occur in Web documents. In this article we investigate the distribution of orthographic errors of various types in Web pages. As a by-product, methods are developed for efficiently detecting erroneous pages and for marking orthographic errors in acceptable Web documents, reducing thus the number of errors in corpora and linguistic knowledge bases automatically retrieved from the Web.


Sign in / Sign up

Export Citation Format

Share Document