2018 ◽  
Vol 46 (1) ◽  

Damian Trilling & Jelle Boumans Automated analysis of Dutch language-based texts. An overview and research agenda While automated methods of content analysis are increasingly popular in today’s communication research, these methods have hardly been adopted by communication scholars studying texts in Dutch. This essay offers an overview of the possibilities and current limitations of automated text analysis approaches in the context of the Dutch language. Particularly in dictionary-based approaches, research is far less prolific as research on the English language. We divide the most common types of content-analytical research questions into three categories: 1) research problems for which automated methods ought to be used, 2) research problems for which automated methods could be used, and 3) research problems for which automated methods (currently) cannot be used. Finally, we give suggestions for the advancement of automated text analysis approaches for Dutch texts. Keywords: automated content analysis, Dutch, dictionaries, supervised machine learning, unsupervised machine learning


2021 ◽  
pp. 1-12
Author(s):  
Melesio Crespo-Sanchez ◽  
Ivan Lopez-Arevalo ◽  
Edwin Aldana-Bobadilla ◽  
Alejandro Molina-Villegas

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.


Author(s):  
C. Thie ◽  
Z. Lock ◽  
D. Smith ◽  
E. Cribb ◽  
A. Ford ◽  
...  

2020 ◽  
Vol 117 (17) ◽  
pp. 9284-9291 ◽  
Author(s):  
Bas Hofstra ◽  
Vivek V. Kulkarni ◽  
Sebastian Munoz-Najar Galvez ◽  
Bryan He ◽  
Dan Jurafsky ◽  
...  

Prior work finds a diversity paradox: Diversity breeds innovation, yet underrepresented groups that diversify organizations have less successful careers within them. Does the diversity paradox hold for scientists as well? We study this by utilizing a near-complete population of ∼1.2 million US doctoral recipients from 1977 to 2015 and following their careers into publishing and faculty positions. We use text analysis and machine learning to answer a series of questions: How do we detect scientific innovations? Are underrepresented groups more likely to generate scientific innovations? And are the innovations of underrepresented groups adopted and rewarded? Our analyses show that underrepresented groups produce higher rates of scientific novelty. However, their novel contributions are devalued and discounted: For example, novel contributions by gender and racial minorities are taken up by other scholars at lower rates than novel contributions by gender and racial majorities, and equally impactful contributions of gender and racial minorities are less likely to result in successful scientific careers than for majority groups. These results suggest there may be unwarranted reproduction of stratification in academic careers that discounts diversity’s role in innovation and partly explains the underrepresentation of some groups in academia.


Author(s):  
A. A. Goncharov ◽  
◽  
O. Yu. Inkova ◽  
◽  

One of the main characteristics of logical-semantic relations (LSRs) between two fragments of a text is that these relations can be either explicit (expressed by some marker, e.g. a connective) or implicit (derived from the interrelation of these fragments’ semantics). Since implicit LSRs do not have any marker, it is difficult to find them in a text (whether automatically or not). In this paper, approaches to analysing implicit LSRs are compared, an original definition for them is offered and differences between implicit LSRs and LSRs expressed by non-prototypical means are described. A method is proposed to identify implicit LSRs using a parallel corpus and a supracorpora database of connectives. Based on the well-known statement that LSRs can be explicitated by adding connectives in the translation, it is argued here that through selecting pairs in which fragments where a connective is used to express an LSR in the translation correspond to those containing any of the translation stimuli standard for this connective in the source language, it is possible to get an array of contexts in which this LSR is implicit in the source text (or expressed by means other than connectives). This method is then applied to study the French causal connectives car, parce que and puisque using a Russian-French parallel corpus. The corpus data are analysed to obtain information about LSRs particularly about cases where the causal LSR in Russian is implicit, as well as about the use of causal connectives in French. These results are used to show that the method proposed allows to quickly create a representative array of contexts with implicit LSRs, which can be useful in both text analysis and in machine learning.


2019 ◽  
Vol 8 (2) ◽  
pp. 2038-2040

In our digital India, the use of social media like twitter, blogs and various forums is growing with the rapid rate. Thus the size of the data is becoming big day by day and in the span of this type of high varied and volume data, manual analysis would be a clumsy job. So, there is an alarming rate to analyze that large amount of data to make it suitable for analysis purpose. As a most elaborate open source platform, R has immeasurable user communities that thrives and perpetuate a huge amount of text analysis packages. So, in this paper we are analyzing movie related tweets using machine learning in R.


Sign in / Sign up

Export Citation Format

Share Document