scholarly journals Extracting semantic relations using syntax

2021 ◽  
Vol 3 (2) ◽  
pp. 1-16
Author(s):  
Kasper Welbers ◽  
Wouter van Atteveldt ◽  
Jan Kleinnijenhuis

Abstract Most common methods for automatic text analysis in communication science ignore syntactic information, focusing on the occurrence and co-occurrence of individual words, and sometimes n-grams. This is remarkably effective for some purposes, but poses a limitation for fine-grained analyses into semantic relations such as who does what to whom and according to what source. One tested, effective method for moving beyond this bag-of-words assumption is to use a rule-based approach for labeling and extracting syntactic patterns in dependency trees. Although this method can be used for a variety of purposes, its application is hindered by the lack of dedicated and accessible tools. In this paper we introduce the rsyntax R package, which is designed to make working with dependency trees easier and more intuitive for R users, and provides a framework for combining multiple rules for reliably extracting useful semantic relations.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Lixue Zou ◽  
Xiwen Liu ◽  
Wray Buntine ◽  
Yanli Liu

PurposeFull text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.Design/methodology/approachThe authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.FindingsThe results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.Originality/valueThe automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.


2012 ◽  
Vol 56 (1) ◽  
pp. 19-25 ◽  
Author(s):  
Yair Neuman ◽  
Yohai Cohen ◽  
Dan Assaf ◽  
Gabbi Kedma

Author(s):  
Wouter van Atteveldt ◽  
Kasper Welbers ◽  
Mariken van der Velden

Analyzing political text can answer many pressing questions in political science, from understanding political ideology to mapping the effects of censorship in authoritarian states. This makes the study of political text and speech an important part of the political science methodological toolbox. The confluence of increasing availability of large digital text collections, plentiful computational power, and methodological innovations has led to many researchers adopting techniques of automatic text analysis for coding and analyzing textual data. In what is sometimes termed the “text as data” approach, texts are converted to a numerical representation, and various techniques such as dictionary analysis, automatic scaling, topic modeling, and machine learning are used to find patterns in and test hypotheses on these data. These methods all make certain assumptions and need to be validated to assess their fitness for any particular task and domain.


Science ◽  
1970 ◽  
Vol 168 (3929) ◽  
pp. 335-343 ◽  
Author(s):  
G. Salton

Sign in / Sign up

Export Citation Format

Share Document