Applied Natural Language Processing
Latest Publications


TOTAL DOCUMENTS

32
(FIVE YEARS 0)

H-INDEX

6
(FIVE YEARS 0)

Published By IGI Global

9781609607418, 9781609607425

Author(s):  
Courtney M. Bell ◽  
Philip M. McCarthy ◽  
Danielle S. McNamara

We use computational linguistic tools to investigate gender differences in language use within the context of marital conflict. Using the Language Inquiry and Word Count tool (LIWC), differences between genders were significant for the use of self references, but not for the use of social words and positive and negative emotion words. Using Coh-Metrix, differences were significant for the use of syntactic complexity, global argument overlap, and density of logical connectors but not for the use of word frequency, frequency of causal verbs and particles, global Latent Semantic Analysis (LSA), local argument overlap, and local LSA. These results confirmed some expectations but failed to confirm the majority of the expectations based on the biological theory of gender, which defines gender in terms of biological sex resulting in polarized and static language differences based on the speaker’s gender.


Author(s):  
Khaled Shaalan ◽  
Marwa Magdy ◽  
Aly Fahmy

Arabic is a language of rich and complex morphology. The nature and peculiarity of Arabic make its morphological and phonological rules confusing for second language learners (SLLs). The conjugation of Arabic verbs is central to the formulation of an Arabic sentence because of its richness of form and meaning. In this research, we address issues related to the morphological analysis of ill-formed Arabic verbs in order to identify the source of errors and provide an informative feedback to SLLs of Arabic. The edit distance and constraint relaxation techniques are used to demonstrate the capability of the proposed system in generating all possible analyses of erroneous Arabic verbs written by SLLs. Filtering mechanisms are applied to exclude the irrelevant constructions and determine the target stem which is used as the base for constructing the feedback to the learner. The proposed system has been developed and effectively evaluated using real test data. It achieved satisfactory results in terms of the recall rate.


Author(s):  
Cyrus Shaoul ◽  
Chris Westbury

HAL (Hyperspace Analog to Language) is a high-dimensional model of semantic space that uses the global co-occurrence frequency of words in a large corpus of text as the basis for a representation of semantic memory. In the original HAL model, many parameters were set without any a priori rationale. In this chapter we describe a new computer application called the High Dimensional Explorer (HiDEx) that makes it possible to systematically alter the values of the model’s parameters and thereby to examine their effect on the co-occurrence matrix that instantiates the model. New parameter sets give us measures of semantic density that improve the model’s ability to predict behavioral measures. Implications for such models are discussed.


Author(s):  
Cindy K. Chung ◽  
James W. Pennebaker

Linguistic Inquiry and Word Count (LIWC; Pennebaker, Booth, & Francis, 2007) is a word counting software program that references a dictionary of grammatical, psychological, and content word categories. LIWC has been used to efficiently classify texts along psychological dimensions and to predict behavioral outcomes, making it a text analysis tool widely used in the social sciences. LIWC can be considered to be a tool for applied natural language processing since, beyond classification, the relative uses of various LIWC categories can reflect the underlying psychology of demographic characteristics, honesty, health, status, relationship quality, group dynamics, or social context. By using a comparison group or longitudinal information, or validation with other psychological measures, LIWC analyses can be informative of a variety of psychological states and behaviors. Combining LIWC categories using new algorithms or using the processor to assess new categories and languages further extend the potential applications of LIWC.


Author(s):  
Arthur C. Graesser ◽  
Vasile Rus ◽  
Zhiqiang Cai ◽  
Xiangen Hu

Automated Question Answering and Asking are two active areas of Natural Language Processing with the former dominating the past decade and the latter most likely to dominate the next one. Due to the vast amounts of information available electronically in the Internet era, automated Question Answering is needed to fulfill information needs in an efficient and effective manner. Automated Question Answering is the task of providing answers automatically to questions asked in natural language. Typically, the answers are retrieved from large collections of documents. While answering any question is difficult, successful automated solutions to answer some type of questions, so-called factoid questions, have been developed recently, culminating with the just announced Watson Question Answering system developed by I.B.M. to compete in Jeopardy-like games. The flip process, automated Question Asking or Generation, is about generating questions from some form of input such as a text, meaning representation, or database. Question Asking/Generation is an important component in the full gamut of learning technologies, from conventional computer-based training to tutoring systems. Advances in Question Asking/Generation are projected to revolutionize learning and dialogue systems. This chapter presents an overview of recent developments in Question Answering and Generation starting with the landscape of questions that people ask.


Author(s):  
René Venegas

In this chapter I approach three automatic methods for the evaluation of summaries from narrative and expository texts in Spanish. The task consisted of correlating the evaluation made by three raters for 373 summaries with results provided by latent semantic analysis. Scores assigned by latent semantic analysis were obtained by means of the following three methods: 1) Comparison of summaries with the source text, 2) Comparison of summaries with a summary approved by consensus, and 3) Comparison of summaries with three summaries constructed by three language teachers. The most relevant results are a) a high positive correlation between the evaluation made by the raters (r= 0.642); b) a high positive correlation between the computer methods (r= 0.810); and c) a moderate-high positive correlation between the evaluations of raters and the second and third LSA methods (r= 0.585 and 0,604), in summaries from narrative texts. Both methods did not differ significantly in statistical terms from the correlation among raters when the texts evaluated were predominantly narrative. These results allow us to assert that at least two holistic LSA-based methods are useful for assessing reading comprehension of narrative texts written in Spanish.


Author(s):  
Aqil Azmi ◽  
Nawaf Al Badia

Hadiths are narrations originating from the words and deeds of Prophet Muhammad. Each hadith starts with a list of narrators involved in transmitting it. A hadith scholar judges a hadith based on the narration chain along with the individual narrators in the chain. In this chapter, we report on a method that automatically extracts the transmission chains from the hadith text and graphically displays it. Computationally, this is a challenging problem. Foremost each hadith has its own peculiar way of listing narrators; and the text of hadith is in Arabic, a language rich in morphology. Our proposed solution involves parsing and annotating the hadith text and recognizing the narrators’ names. We use shallow parsing along with a domain specific grammar to parse the hadith content. Experiments on sample hadiths show our approach to have a very good success rate.


Author(s):  
Philip M. McCarthy ◽  
David Dufty ◽  
Christian F. Hempelmann ◽  
Zhiqiang Cai ◽  
Danielle S. McNamara ◽  
...  

The identification of new versus given information within a text has been frequently investigated by researchers of language and discourse. Despite theoretical advances, an accurate computational method for assessing the degree to which a text contains new versus given information has not previously been implemented. This study discusses a variety of computational new/given systems and analyzes four typical expository and narrative texts against a widely accepted theory of new/given proposed by Prince (1981). Our findings suggest that a latent semantic analysis (LSA) based measure called span outperforms standard LSA in detecting both new and given information in text. Further, span outperforms standard LSA for distinguishing low versus high cohesion versions of text. Our results suggest that span may be a useful variable in a wide array of discourse analyses.


Author(s):  
Jennifer L. Weston ◽  
Scott A. Crossley ◽  
Danielle S. McNamara

This study examines the relationship between the linguistic features of freewrites and human assessments of freewrite quality. Freewriting is a prewriting strategy that has received little experimental attention, particularly in terms of linguistic differences between high and low quality freewrites. This study builds upon the authors’ previous study, in which linguistic features of freewrites written by 9th and 11th grade students were included in a model of the freewrites’ quality (Weston, Crossley, & McNamara; 2010). The current study reexamines this model using a larger data set of freewrites. The results show that similar linguistic features reported in the Weston et al. model positively correlate with expert ratings in the new data set. Significant predictors in the current model of freewrite quality were total number of words and stem overlap. In addition, analyses suggest that 11th graders, as compared to 9th graders, wrote higher quality and longer freewrites. Overall, the results of this study support the conclusion that better freewrites are longer and more cohesive than poor freewrites.


Author(s):  
Philip M. McCarthy ◽  
Shinobu Watanabe ◽  
Travis A. Lamkin

Natural language processing tools, such as Coh-Metrix (see Chapter 11, this volume) and LIWC (see Chapter 12, this volume), have been tremendously successful in offering insight into quantifiable differences between text types. Such quantitative assessments have certainly been highly informative in terms of evaluating theoretical linguistic and psychological categories that distinguish text types (e.g., referential overlap, lexical diversity, positive emotion words, and so forth). Although these identifications are extremely important in revealing ability deficiencies, knowledge gaps, comprehension failures, and underlying psychological phenomena, such assessments can be difficult to interpret because they do not explicitly inform readers and researchers as to which specific linguistic features are driving the text type identification (i.e., the words and word clusters of the text). For example, a tool such as Coh-Metrix informs us that expository texts are more cohesive than narrative texts in terms of sentential referential overlap (McNamara, Louwerse, & Graesser, in press; McCarthy, 2010), but it does not tell us which words (or word clusters) are driving that cohesion. That is, we do not learn which actual words tend to be indicative of the text type differences. These actual words may tend to cluster around certain psychological, cultural, or generic differences, and, as a result, researchers and materials designers who might wish to create or modify text, so as to better meet the needs of readers, are left somewhat in the dark as to which specific language to use. What is needed is a textual analysis tool that offers qualitative output (in addition to quantitative output) that researchers and materials designers might use as a guide to the lexical characteristics of the texts under analysis. The Gramulator is such a tool.


Sign in / Sign up

Export Citation Format

Share Document