Class-based approach to disambiguating Levin verbs

2010 ◽  
Vol 16 (4) ◽  
pp. 391-415 ◽  
Author(s):  
JIANGUO LI ◽  
CHRIS BREW

AbstractLapata and Brew (Computational Linguistics, vol. 30, 2004, pp. 295–313) (hereafter LB04) obtain from untagged texts a statistical prior model that is able to generate class preferences for ambiguous Lewin (English Verb Classes and Alternations: A Preliminary Investigation, 1993, University of Chicago Press) verbs (hereafter Levin). They also show that their informative priors, incorporated into a Naive Bayes classifier deduced from hand-tagged data (HTD), can aid in verb class disambiguation. We re-analyse LB04's prior model and show that a single factor (the joint probability of class and frame) determines the predominant class for a particular verb in a particular frame. This means that the prior model cannot be sensitive to fine-grained lexical distinctions between different individual verbs falling in the same class.We replicate LB04's supervised disambiguation experiments on large-scale data, using deep parsers rather than the shallow parser of LB04. In addition, we introduce a method for training our classifier without using HTD. This relies on knowledge of Levin class memberships to move information from unambiguous to ambiguous instances of each class. We regard this system as unsupervised because it does not rely on human annotation of individual verb instances. Although our unsupervised verb class disambiguator does not match the performance of the ones that make use of HTD, it consistently outperforms the random baseline model. Our experiments also demonstrate that the informative priors derived from untagged texts help improve the performance of the classifier trained on untagged data.

Author(s):  
Yifan Gao ◽  
Yang Zhong ◽  
Daniel Preoţiuc-Pietro ◽  
Junyi Jessy Li

In computational linguistics, specificity quantifies how much detail is engaged in text. It is an important characteristic of speaker intention and language style, and is useful in NLP applications such as summarization and argumentation mining. Yet to date, expert-annotated data for sentence-level specificity are scarce and confined to the news genre. In addition, systems that predict sentence specificity are classifiers trained to produce binary labels (general or specific).We collect a dataset of over 7,000 tweets annotated with specificity on a fine-grained scale. Using this dataset, we train a supervised regression model that accurately estimates specificity in social media posts, reaching a mean absolute error of 0.3578 (for ratings on a scale of 1-5) and 0.73 Pearson correlation, significantly improving over baselines and previous sentence specificity prediction systems. We also present the first large-scale study revealing the social, temporal and mental health factors underlying language specificity on social media.


2016 ◽  
Vol 42 (3) ◽  
pp. 537-593 ◽  
Author(s):  
Dong Nguyen ◽  
A. Seza Doğruöz ◽  
Carolyn P. Rosé ◽  
Franciska de Jong

Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of “computational sociolinguistics” that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction, and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions used in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.


Author(s):  
Baohua Qiang ◽  
Ruidong Chen ◽  
Yuan Xie ◽  
Mingliang Zhou ◽  
Riwei Pan ◽  
...  

In this paper, we propose the hybrid deep neural network-based cross-modal image and text retrieval method to explore complex cross-modal correlation by considering multi-layer learning. First, we propose intra-modal and inter-modal representations to achieve a complementary single-modal representation that preserves the correlation between the modalities. Second, we build an association between different modalities through hierarchical learning to further mine the fine-grained latent semantic association among multimodal data. The experimental results show that our algorithm substantially enhances retrieval performance and consistently outperforms four comparison methods.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

2016 ◽  
Author(s):  
John W. Williams ◽  
◽  
Simon Goring ◽  
Eric Grimm ◽  
Jason McLachlan

Sign in / Sign up

Export Citation Format

Share Document