Applied Natural Language Processing

Using LIWC and Coh-Metrix to Investigate Gender Differences in Linguistic Styles

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch032 ◽

2012 ◽

pp. 545-556 ◽

Cited By ~ 4

Author(s):

Courtney M. Bell ◽

Philip M. McCarthy ◽

Danielle S. McNamara

Keyword(s):

Gender Differences ◽

Computational Linguistic ◽

Marital Conflict ◽

Latent Semantic Analysis ◽

Biological Theory ◽

Semantic Analysis ◽

Negative Emotion ◽

Language Differences ◽

Biological Sex ◽

Logical Connectors

We use computational linguistic tools to investigate gender differences in language use within the context of marital conflict. Using the Language Inquiry and Word Count tool (LIWC), differences between genders were significant for the use of self references, but not for the use of social words and positive and negative emotion words. Using Coh-Metrix, differences were significant for the use of syntactic complexity, global argument overlap, and density of logical connectors but not for the use of word frequency, frequency of causal verbs and particles, global Latent Semantic Analysis (LSA), local argument overlap, and local LSA. These results confirmed some expectations but failed to confirm the majority of the expectations based on the biological theory of gender, which defines gender in terms of biological sex resulting in polarized and static language differences based on the speaker’s gender.

Download Full-text

Morphological Analysis of Ill-Formed Arabic Verbs for Second Language Learners

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch022 ◽

2012 ◽

pp. 383-397 ◽

Cited By ~ 1

Author(s):

Khaled Shaalan ◽

Marwa Magdy ◽

Aly Fahmy

Keyword(s):

Second Language ◽

Language Learners ◽

Test Data ◽

Morphological Analysis ◽

Second Language Learners ◽

Recall Rate ◽

Relaxation Techniques ◽

Informative Feedback ◽

Constraint Relaxation ◽

Real Test

Arabic is a language of rich and complex morphology. The nature and peculiarity of Arabic make its morphological and phonological rules confusing for second language learners (SLLs). The conjugation of Arabic verbs is central to the formulation of an Arabic sentence because of its richness of form and meaning. In this research, we address issues related to the morphological analysis of ill-formed Arabic verbs in order to identify the source of errors and provide an informative feedback to SLLs of Arabic. The edit distance and constraint relaxation techniques are used to demonstrate the capability of the proposed system in generating all possible analyses of erroneous Arabic verbs written by SLLs. Filtering mechanisms are applied to exclude the irrelevant constructions and determine the target stem which is used as the base for constructing the feedback to the learner. The proposed system has been developed and effectively evaluated using real test data. It achieved satisfactory results in terms of the recall rate.

Download Full-text

HiDEx

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch013 ◽

2012 ◽

pp. 230-246 ◽

Cited By ~ 3

Author(s):

Cyrus Shaoul ◽

Chris Westbury

Keyword(s):

Semantic Memory ◽

A Priori ◽

Semantic Space ◽

Occurrence Frequency ◽

Computer Application ◽

High Dimensional ◽

Behavioral Measures ◽

Hyperspace Analog To Language ◽

Occurrence Matrix ◽

Large Corpus

HAL (Hyperspace Analog to Language) is a high-dimensional model of semantic space that uses the global co-occurrence frequency of words in a large corpus of text as the basis for a representation of semantic memory. In the original HAL model, many parameters were set without any a priori rationale. In this chapter we describe a new computer application called the High Dimensional Explorer (HiDEx) that makes it possible to systematically alter the values of the model’s parameters and thereby to examine their effect on the co-occurrence matrix that instantiates the model. New parameter sets give us measures of semantic density that improve the model’s ability to predict behavioral measures. Implications for such models are discussed.

Download Full-text

Linguistic Inquiry and Word Count (LIWC)

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch012 ◽

2012 ◽

pp. 206-229 ◽

Cited By ~ 24

Author(s):

Cindy K. Chung ◽

James W. Pennebaker

Keyword(s):

Group Dynamics ◽

Language Processing ◽

Behavioral Outcomes ◽

Content Word ◽

Analysis Tool ◽

Word Count ◽

Psychological States ◽

Psychological Measures ◽

Potential Applications ◽

Linguistic Inquiry

Linguistic Inquiry and Word Count (LIWC; Pennebaker, Booth, & Francis, 2007) is a word counting software program that references a dictionary of grammatical, psychological, and content word categories. LIWC has been used to efficiently classify texts along psychological dimensions and to predict behavioral outcomes, making it a text analysis tool widely used in the social sciences. LIWC can be considered to be a tool for applied natural language processing since, beyond classification, the relative uses of various LIWC categories can reflect the underlying psychology of demographic characteristics, honesty, health, status, relationship quality, group dynamics, or social context. By using a comparison group or longitudinal information, or validation with other psychological measures, LIWC analyses can be informative of a variety of psychological states and behaviors. Combining LIWC categories using new algorithms or using the processor to assess new categories and languages further extend the potential applications of LIWC.

Download Full-text

Question Answering and Generation

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch001 ◽

2012 ◽

pp. 1-16 ◽

Cited By ~ 3

Author(s):

Arthur C. Graesser ◽

Vasile Rus ◽

Zhiqiang Cai ◽

Xiangen Hu

Keyword(s):

Natural Language ◽

Language Processing ◽

Information Needs ◽

Question Answering ◽

Learning Technologies ◽

Question Asking ◽

Tutoring Systems ◽

Computer Based Training ◽

Recent Developments ◽

Text Meaning Representation

Automated Question Answering and Asking are two active areas of Natural Language Processing with the former dominating the past decade and the latter most likely to dominate the next one. Due to the vast amounts of information available electronically in the Internet era, automated Question Answering is needed to fulfill information needs in an efficient and effective manner. Automated Question Answering is the task of providing answers automatically to questions asked in natural language. Typically, the answers are retrieved from large collections of documents. While answering any question is difficult, successful automated solutions to answer some type of questions, so-called factoid questions, have been developed recently, culminating with the just announced Watson Question Answering system developed by I.B.M. to compete in Jeopardy-like games. The flip process, automated Question Asking or Generation, is about generating questions from some form of input such as a text, meaning representation, or database. Question Asking/Generation is an important component in the full gamut of learning technologies, from conventional computer-based training to tutoring systems. Advances in Question Asking/Generation are projected to revolutionize learning and dialogue systems. This chapter presents an overview of recent developments in Question Answering and Generation starting with the landscape of questions that people ask.

Download Full-text

Evaluation of Narrative and Expository Text Summaries Using Latent Semantic Analysis

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch031 ◽

2012 ◽

pp. 531-544

Author(s):

René Venegas

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Expository Text ◽

Language Teachers ◽

Expository Texts ◽

Narrative Texts ◽

High Positive Correlation ◽

Positive Correlation ◽

Computer Methods ◽

Automatic Methods

In this chapter I approach three automatic methods for the evaluation of summaries from narrative and expository texts in Spanish. The task consisted of correlating the evaluation made by three raters for 373 summaries with results provided by latent semantic analysis. Scores assigned by latent semantic analysis were obtained by means of the following three methods: 1) Comparison of summaries with the source text, 2) Comparison of summaries with a summary approved by consensus, and 3) Comparison of summaries with three summaries constructed by three language teachers. The most relevant results are a) a high positive correlation between the evaluation made by the raters (r= 0.642); b) a high positive correlation between the computer methods (r= 0.810); and c) a moderate-high positive correlation between the evaluations of raters and the second and third LSA methods (r= 0.585 and 0,604), in summaries from narrative texts. Both methods did not differ significantly in statistical terms from the correlation among raters when the texts evaluated were predominantly narrative. These results allow us to assert that at least two holistic LSA-based methods are useful for assessing reading comprehension of narrative texts written in Spanish.

Download Full-text

Mining and Visualizing the Narration Tree of Hadiths (Prophetic Traditions)

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch029 ◽

2012 ◽

pp. 495-510 ◽

Cited By ~ 1

Author(s):

Aqil Azmi ◽

Nawaf Al Badia

Keyword(s):

Success Rate ◽

Challenging Problem ◽

Domain Specific ◽

Shallow Parsing ◽

The Individual ◽

Good Success Rate

Hadiths are narrations originating from the words and deeds of Prophet Muhammad. Each hadith starts with a list of narrators involved in transmitting it. A hadith scholar judges a hadith based on the narration chain along with the individual narrators in the chain. In this chapter, we report on a method that automatically extracts the transmission chains from the hadith text and graphically displays it. Computationally, this is a challenging problem. Foremost each hadith has its own peculiar way of listing narrators; and the text of hadith is in Arabic, a language rich in morphology. Our proposed solution involves parsing and annotating the hadith text and recognizing the narrators’ names. We use shallow parsing along with a domain specific grammar to parse the hadith content. Experiments on sample hadiths show our approach to have a very good success rate.

Download Full-text

Newness and Givenness of Information

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch027 ◽

2012 ◽

pp. 457-478 ◽

Cited By ~ 4

Author(s):

Philip M. McCarthy ◽

David Dufty ◽

Christian F. Hempelmann ◽

Zhiqiang Cai ◽

Danielle S. McNamara ◽

...

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Computational Method ◽

Narrative Texts

The identification of new versus given information within a text has been frequently investigated by researchers of language and discourse. Despite theoretical advances, an accurate computational method for assessing the degree to which a text contains new versus given information has not previously been implemented. This study discusses a variety of computational new/given systems and analyzes four typical expository and narrative texts against a widely accepted theory of new/given proposed by Prince (1981). Our findings suggest that a latent semantic analysis (LSA) based measure called span outperforms standard LSA in detecting both new and given information in text. Further, span outperforms standard LSA for distinguishing low versus high cohesion versions of text. Our results suggest that span may be a useful variable in a wide array of discourse analyses.

Download Full-text

Computationally Assessing Expert Judgments of Freewriting Quality

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch021 ◽

2012 ◽

pp. 365-382

Author(s):

Jennifer L. Weston ◽

Scott A. Crossley ◽

Danielle S. McNamara

Keyword(s):

Current Model ◽

Linguistic Features ◽

Data Set ◽

Linguistic Differences ◽

Study Support ◽

Expert Ratings ◽

Prewriting Strategy ◽

The Relationship ◽

11Th Grade

This study examines the relationship between the linguistic features of freewrites and human assessments of freewrite quality. Freewriting is a prewriting strategy that has received little experimental attention, particularly in terms of linguistic differences between high and low quality freewrites. This study builds upon the authors’ previous study, in which linguistic features of freewrites written by 9th and 11th grade students were included in a model of the freewrites’ quality (Weston, Crossley, & McNamara; 2010). The current study reexamines this model using a larger data set of freewrites. The results show that similar linguistic features reported in the Weston et al. model positively correlate with expert ratings in the new data set. Significant predictors in the current model of freewrite quality were total number of words and stem overlap. In addition, analyses suggest that 11th graders, as compared to 9th graders, wrote higher quality and longer freewrites. Overall, the results of this study support the conclusion that better freewrites are longer and more cohesive than poor freewrites.

Download Full-text

The Gramulator

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch018 ◽

2012 ◽

pp. 312-333 ◽

Cited By ~ 10

Author(s):

Philip M. McCarthy ◽

Shinobu Watanabe ◽

Travis A. Lamkin

Keyword(s):

Language Processing ◽

Expository Texts ◽

Analysis Tool ◽

Linguistic Features ◽

Narrative Texts ◽

Lexical Diversity ◽

Text Type ◽

Psychological Phenomena ◽

Text Types

Natural language processing tools, such as Coh-Metrix (see Chapter 11, this volume) and LIWC (see Chapter 12, this volume), have been tremendously successful in offering insight into quantifiable differences between text types. Such quantitative assessments have certainly been highly informative in terms of evaluating theoretical linguistic and psychological categories that distinguish text types (e.g., referential overlap, lexical diversity, positive emotion words, and so forth). Although these identifications are extremely important in revealing ability deficiencies, knowledge gaps, comprehension failures, and underlying psychological phenomena, such assessments can be difficult to interpret because they do not explicitly inform readers and researchers as to which specific linguistic features are driving the text type identification (i.e., the words and word clusters of the text). For example, a tool such as Coh-Metrix informs us that expository texts are more cohesive than narrative texts in terms of sentential referential overlap (McNamara, Louwerse, & Graesser, in press; McCarthy, 2010), but it does not tell us which words (or word clusters) are driving that cohesion. That is, we do not learn which actual words tend to be indicative of the text type differences. These actual words may tend to cluster around certain psychological, cultural, or generic differences, and, as a result, researchers and materials designers who might wish to create or modify text, so as to better meet the needs of readers, are left somewhat in the dark as to which specific language to use. What is needed is a textual analysis tool that offers qualitative output (in addition to quantitative output) that researchers and materials designers might use as a guide to the lexical characteristics of the texts under analysis. The Gramulator is such a tool.

Download Full-text

Applied Natural Language Processing
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Using LIWC and Coh-Metrix to Investigate Gender Differences in Linguistic Styles

Morphological Analysis of Ill-Formed Arabic Verbs for Second Language Learners

HiDEx

Linguistic Inquiry and Word Count (LIWC)

Question Answering and Generation

Evaluation of Narrative and Expository Text Summaries Using Latent Semantic Analysis

Mining and Visualizing the Narration Tree of Hadiths (Prophetic Traditions)

Newness and Givenness of Information

Computationally Assessing Expert Judgments of Freewriting Quality

The Gramulator

Export Citation Format

Applied Natural Language ProcessingLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Using LIWC and Coh-Metrix to Investigate Gender Differences in Linguistic Styles

Morphological Analysis of Ill-Formed Arabic Verbs for Second Language Learners

HiDEx

Linguistic Inquiry and Word Count (LIWC)

Question Answering and Generation

Evaluation of Narrative and Expository Text Summaries Using Latent Semantic Analysis

Mining and Visualizing the Narration Tree of Hadiths (Prophetic Traditions)

Newness and Givenness of Information

Computationally Assessing Expert Judgments of Freewriting Quality

The Gramulator

Applied Natural Language Processing
Latest Publications