Analyzing the structure of code-switched written texts

2018 ◽  
Vol 18 (1) ◽  
pp. 120-143
Author(s):  
Bruno Estigarribia ◽  
Zachary Wilkins

Abstract As more written language data become available, the interest in written language mixing / codeswitching (LM/CS) is increasing (Sebba, Mahootian & Jonsson 2012; Sebba 2013). LM/CS in non-naturalistic (e.g., literary) texts raises issues related to gauging (1) the authenticity and representativity of a textual corpus, and deciding (2) whether categories/mechanisms of spoken LM/CS apply to written LM/CS.1 We focus on Guarani-Spanish LM/CS (Jopara) as represented in the Paraguayan novel Ramona Quebranto (RQ). We apply the framework of Muysken (1997; 2000; 2013), developed as a taxonomy of spoken LM/CS. Our contribution extends its applicability to written LM/CS. We show that Jopara has a mix of insertional and backflagging strategies, with infrequent alternations.

Author(s):  
Deo Kawalya ◽  
Koen Bostoen ◽  
Gilles-Maurice de Schryver

Abstract This article employs a 4-million-word diachronic corpus to examine how the expression of possibility has evolved in Luganda since the 1890s to the present, by focusing on the language’s three main potential markers -yînz-, -sóból- and -andi-, and their historical interaction. It is shown that while the auxiliary -yînz- originally covered the whole modal subdomain of possibility, the auxiliary -sóból- has steadily taken over the more objective categories of dynamic possibility. Currently, -yînz- first and foremost conveys deontic and epistemic possibility. It still prevails in these more subjective modal categories even though the prefix -andi-, a conditional marker in origin, has started to express epistemic possibility since the 1940s, and -sóból- deontic possibility since the 1970s. More generally, this article demonstrates the potential of corpus linguistics for the study of diachronic semantics beyond language comparison. This is an important achievement in Bantu linguistics, where written language data tend to be young.


Virittäjä ◽  
2008 ◽  
Vol 112 (2) ◽  
pp. 162
Author(s):  
Auli Hakulinen ◽  
Lea Laitinen

Anaphoric zero: Grammar and affect [myös suomeksi] (englanti)2/2008 (112)Anaphoric zero: Grammar and affectThe article examines the syntactic and semantic properties of the anaphoric zero in spoken and written Finnish. Referentially, the zero is equivalent to the third person pronoun hn he/she or he they. However, the writers started out with the hypothesis that this does not necessarily hold for other possible kinds of meaning conveyed by the two different devices, the anaphoric zero and anaphoric pronouns. In standardised written language the conditions for use of the zero are fairly clear cut: within a sentence it is mainly used as an anaphoric device, but in a subordinate clause that precedes the main clause it is also used as a forward-looking, anticipatory anaphor. In spoken language as well as in literary prose the syntactic conditions are more flexible. During the course of the research, it was the literary texts that proved especially fruitful for understanding the implications involved in the use of the anaphoric zero.In earlier work (e.g. Kalliokoski 1990; Heinonen 1995), it has been pointed out that the anaphoric zero typically ties two successive clauses together more tightly than a pronoun would. The writers are able to show that it does something else as well. In talk-in-interaction, it conveys the speakers commitment to and often affiliation with the previous speakers perspective and stance. In reported speech - both in spoken language and in literary dialogue - the zero can convey the speakers attitude concerning the thoughts of the person being referred to, for example irony and empathy.The writers argue that when the zero represents one alternative in a paradigm it is empty only in (morpho)syntactical terms, not in terms of meaning. Whether the speaker chooses a pronoun (hn or he) or a zero, he/she makes a rhetorical choice. The zero alternative creates implications, expressing the speakers affective stance and attitude in relation to the characters in the story, or his/her interpretation of the speech, thought or behaviour of the co-participant or the story character that he/she is quoting.It is striking that in more than 90 per cent of the 150 examples used, the verb is at the beginning of the utterance or turn. In the rest of the cases, the verb is often preceded by an epistemic adverb (varmaan definitely, tuskin hardly), or the utterance is formed as a fixed construction. The writers hypothesise that the grammar of the anaphoric zero should include verb initial position as one of its constitutive factors. This factor is typical both for co-ordinated and subordinated sentences of the standard written language that are governed by syntactic rules, and for the turn-initial expressions that arise from the speakers or narrators affective stance towards the matter at hand.Auli Hakulinen Lea Laitinen- - - - - - - - - - - -Anaforinen nolla: Kielioppia ja affektejaArtikkeli käsittelee anaforisen nollan syntaktisia ja semanttisia ominaisuuksia puhutussa ja kirjoitetussa suomessa. Referentiaalisesti nolla vastaa kolmannen persoonan pronomineja hän, he. Lähdimme kuitenkin siitä oletuksesta, että vastaavuus ei välttämättä koske niiden muita funktioita. Normitetussa kirjakielessä nollan käytön ehdot ovat jokseenkin selvät: virkkeen rajoissa se on anaforinen mutta päälausetta edeltävässä sivulauseessa myös eteenpäin katsova, ennakoiva anafora. Puhutussa kielessä samoin kuin kaunokirjallisessa proosassa anaforisen nollan syntaktiset ehdot ovat joustavammat. Varsinkin kaunokirjalliset tekstit osoittautuivat hedelmällisiksi yrittäessmme tutkimuksen kuluessa ymmrätää nollan käyttöön liittyviä implikaatioita. Aikaisemmassa tutkimuksessa (Kalliokoski 1990, Heinonen 1995) on todettu, että anaforinen nolla sitoo kaksi perättäistä lausetta tiukemmin yhteen kuin pronomini. Omassa tutkimuksessamme voimme osoittaa sen tekevän muutakin. Keskustelupuheessa se välittää puhujan sitoutumista ja usein asettumista (affiliaatiota) edellisen puhujan perspektiiviin ja asennoitumiseen. Referoinnissa - niin vapaassa puheessa kuin kaunokirjallisessa dialogissakin - nolla voi tuoda esiin puhujan asennoitumisen puheenalaisen henkilön ajatuksiin, esimerkiksi ironisia tai empaattisia affekteja.Väitämme siis, että kun nolla on yksi paradigman vaihtoehdoista, se on tyhjä vain (morfo)syntaktisesti, ei merkitykseltään. Käyttää puhuja sitten pronominia hän, he tai nollaa, hän tekee retorisen valinnan. Nollavaihtoehto luo implikaatioita, ilmaisee puhujan affektia ja suhtautumista kertomuksen henkilöön tai tulkintaa referoimansa puhekumppanin tai kertomuksen henkilön puheesta, ajattelusta tai käyttäytymisestä.Huomiota herttää, että yli 90 %:ssa 150 esimerkistämme verbi on lausuman- tai vuoronalkuinen. Lopuissa tapauksista verbi edeltää usein episteeminen adverbi (varmaan, tuskin) tai lausumana on kiteytynyt konstruktio. Hypoteesimme on, että verbialkuisuus on anaforisen nollan kieliopin tärkeä piirre. Se on tyypillinen kirjoitetussa kielessä sekä rinnasteisille ja alisteisille virkkeille, joita säätelevät kirjakielen normit, että vuoronalkuisille ilmauksille, jotka ilmentävät puhujan tai kertojan affektista suhtautumista käsillä olevaan. Auli Hakulinen Lea Laitinen


Author(s):  
Nicolas Zhou ◽  
Erin M. Corsini ◽  
Shida Jin ◽  
Gregory R. Barbosa ◽  
Trey Kell ◽  
...  

In the first part of this series, we introduced the tools of Big Data, including Not Only Standard Query Language data warehouse, natural language processing (NLP), optical character recognition (OCR), and Internet of Things (IoT). There are nuances to the utilization of these analytics tools, which must be well understood by clinicians seeking to take advantage of these innovative research strategies. One must recognize technical challenges to NLP, such as unintended search outcomes and variability in the expression of human written texts. Other caveats include dealing written texts in image formats, which may ultimately be handled with transformation to text format by OCR, though this technology is still under development. IoT is beginning to be used in cardiac monitoring, medication adherence alerts, lifestyle monitoring, and saving traditional labs from equipment failure catastrophes. These technologies will become more prevalent in the future research landscape, and cardiothoracic surgeons should understand the advantages of these technologies to propel our research to the next level. Experience and understanding of technology are needed in building a robust NLP search result, and effective communication with the data management team is a crucial step in successful utilization of these technologies. In this second installment of the series, we provide examples of published investigations utilizing the advanced analytic tools introduced in Part I. We will explain our processes in developing the research question, barriers to achieving the research goals using traditional research methods, tools used to overcome the barriers, and the research findings.


1991 ◽  
Vol 6 (1) ◽  
pp. 73-87 ◽  
Author(s):  
Margaret M. Marshall

Louisiana French Creole (LFC) has clearly been undergoing decreoli-zation in the twentieth century; its exact nature is difficult to determine, since the only evidence from the previous century available up to now has come from literary texts of that time. Language data was elicited from elderly informants whose parents were the last monolingual creole speakers living in the vicinity of Mobile, Alabama. Since communication between the speakers of New Orleans Creole and Mobile Creole was quite commonplace, Mon Louis Island Creole (MLIC) represents new evidence relating to nineteenth century LFC. This study presents an analysis of the MLIC and LFC noun phrase and verb phrase. Mon Louis Island (MLI) speakers use two-stem verbs which are not attested in nineteenth century LFC texts. On the other hand, there are developments in LFC, such as preposed definite articles, that were not documented in MLIC. Thus, the MLIC data might help distinguish the features already present in the nineteenth century from those which represent more recent changes in LFC.


1990 ◽  
Vol 11 ◽  
pp. 181-195 ◽  
Author(s):  
Mick Short

The termsdiscourse analysisandstylistic analysismean different thing to different people. Most narrowly defined, discourse analysis has only to do with the structure of spoken discourse. Such a definition separates discourse analysis from literany stylistics and pragmatics—the study of how people understand language in context. At the other end of the spectrum, discourse analysis can be carried out on spoken and written texts, and can include matters like textual coherence and cohesion, and the inferencing of meaning by readers or listeners. In this case, it includes pragmatics and much of stylistics within its bounds. Similarly, stylistics can apply just to literary texts or not, and be restricted to the study of style or, on the other hand, include the study of meaning. For the purposes of this review, relatively wide definitions of both areas have been assumed in order to make what follows reasonably comprehensive. The main restriction assumed is that the works discussed will be relevant to the examination of literature in some way. The section on literature instruction will include matters relevant to both native and non-native learners of English, and will also make reference to the integration of literary and language study.


لارك ◽  
2021 ◽  
Vol 2 (41) ◽  
pp. 1258-1241
Author(s):  
Asst. Prof Mayada R.Eesa

During the past years, a large number of analyses has been done on what is called discourse markers , which are considered a class of linguistic expressions. Notably, various approaches have been taken, and unsurprisingly various results have been produced as to the theoretical status of discourse markers such as Potts, 2005 or even  Blakemore, 2002.     In spite of the fact that discourse markers are typically considered as one of the basic characteristics of oral discourse, nowadays it has been also found in written texts. Therefore, the current study introduces a kind of investigation to discourse markers in written language of Iraqi participants in English Language Proficiency Test , henceforth referred to as ELPT . Throughout this study , we ll see how discourse markers have the ability in improving the quality of writing in addition to increasing the conception of text. In current research , there is an attempt to measure the participants' knowledge about Discourse Markers. The aim of this study is to find out whether Iraqi ELPT participants use discourse markers in their writing and how they use them. To justify this aim, an analysis of essays written by ELPT participants was done .


MANUSYA ◽  
2007 ◽  
Vol 10 (3) ◽  
pp. 4-17 ◽  
Author(s):  
Wirote Aroonmanakun

This paper reports on the progress of Thai National Corpus development. The TNC is designed as a general corpus of standard Thai. Only written texts are collected in the first phase. It aims to include at least eighty million words. Various text types produced by various authors are included in the TNC so that it would closely represent written language in general. Texts are word segmented and tagged following the Text Encoding Initiative (TEl) guidelines on text encoding. The TNC was designed as a resource for general applications, such as lexicography, language teaching, and linguistic research. In addition, the TNC is designed to be comparable to the British National Corpus so that a comparative study between the two languages is also possible.


2021 ◽  
Vol 1 (2) ◽  
pp. 63-72
Author(s):  
P. Pahri

Students in Islamic boarding schools have difficulty expressing their thoughts verbally using Arabic. Meanwhile, in terms of understanding written texts and reading written language, they have pretty good skills. This study aimed to describe the implementation of the TPR method to improve the speaking skills. This study used a qualitative method with a descriptive approach. The data were obtained through observation, interviews, and documentation studies. The subjects of this study were Arabic teachers and selected students using the snowball technique. Based on the results, the TPR method can break the ice of communicating in Arabic among students, their difficulties in expressing their thoughts can be minimized by getting used to responding to those around them.


2009 ◽  
Vol 30 (3) ◽  
pp. 463-484 ◽  
Author(s):  
SARAH ROBINS ◽  
REBECCA TREIMAN

ABSTRACTIn six analyses using the Child Language Data Exchange System known as CHILDES, we explored whether and how parents and their 1.5- to 5-year-old children talk about writing. Parent speech might include information about the similarity between print and speech and about the difference between writing and drawing. Parents could convey similarity between print and speech by using the wordssay,name, andwordto refer to both spoken and written language. Parents could differentiate writing and drawing by making syntactic and semantic distinctions in their discussion of the two symbol systems. Our results indicate that parent speech includes these types of information. However, young children themselves sometimes confuse writing and drawing in their speech.


Sign in / Sign up

Export Citation Format

Share Document