Summaries

Damian Trilling & Jelle Boumans Automated analysis of Dutch language-based texts. An overview and research agenda While automated methods of content analysis are increasingly popular in today’s communication research, these methods have hardly been adopted by communication scholars studying texts in Dutch. This essay offers an overview of the possibilities and current limitations of automated text analysis approaches in the context of the Dutch language. Particularly in dictionary-based approaches, research is far less prolific as research on the English language. We divide the most common types of content-analytical research questions into three categories: 1) research problems for which automated methods ought to be used, 2) research problems for which automated methods could be used, and 3) research problems for which automated methods (currently) cannot be used. Finally, we give suggestions for the advancement of automated text analysis approaches for Dutch texts. Keywords: automated content analysis, Dutch, dictionaries, supervised machine learning, unsupervised machine learning

Download Full-text

Using Supervised Machine Learning in Automated Content Analysis: An Example Using Relational Uncertainty

Communication Methods and Measures ◽

10.1080/19312458.2019.1650166 ◽

2019 ◽

Vol 13 (4) ◽

pp. 287-304 ◽

Cited By ~ 2

Author(s):

Andrew Pilny ◽

Kelly McAninch ◽

Amanda Slone ◽

Kelsey Moore

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Supervised Machine Learning ◽

Relational Uncertainty ◽

Automated Content Analysis

Download Full-text

CytoCensus, mapping cell identity and division in tissues and organs using machine learning

eLife ◽

10.7554/elife.51085 ◽

2020 ◽

Vol 9 ◽

Author(s):

Martin Hailstone ◽

Dominic Waithe ◽

Tamsin J Samuels ◽

Lu Yang ◽

Ita Costello ◽

...

Keyword(s):

Machine Learning ◽

Automated Analysis ◽

Time Lapse ◽

Supervised Machine Learning ◽

Cell Detection ◽

Automated Identification ◽

User Training ◽

General Utility ◽

Multiple Cell ◽

Mutant Phenotypes

A major challenge in cell and developmental biology is the automated identification and quantitation of cells in complex multilayered tissues. We developed CytoCensus: an easily deployed implementation of supervised machine learning that extends convenient 2D ‘point-and-click’ user training to 3D detection of cells in challenging datasets with ill-defined cell boundaries. In tests on such datasets, CytoCensus outperforms other freely available image analysis software in accuracy and speed of cell detection. We used CytoCensus to count stem cells and their progeny, and to quantify individual cell divisions from time-lapse movies of explanted Drosophila larval brains, comparing wild-type and mutant phenotypes. We further illustrate the general utility and future potential of CytoCensus by analysing the 3D organisation of multiple cell classes in Zebrafish retinal organoids and cell distributions in mouse embryos. CytoCensus opens the possibility of straightforward and robust automated analysis of developmental phenotypes in complex tissues.

Download Full-text

Authorship Identification Using Supervised Learning and n-Grams for Hindi Language

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9058 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4258-4261

Author(s):

Jagadish S. Kallimani ◽

C. P. Chandrika ◽

Aniket Singh ◽

Zaifa Khan

Keyword(s):

Machine Learning ◽

English Language ◽

Supervised Machine Learning ◽

Bag Of Words ◽

Learning Models ◽

The Rich ◽

Authorship Identification ◽

Regional Languages ◽

Authorship Analysis ◽

Machine Learning Models

Authorship Identification pertains to establishing the author of a particular document, currently unknown, based on the documents previously available. The field of authorship identification has been explored so far primarily in the English language, using several supervised and unsupervised machine learning models along with usage of NLP techniques, but work on regional languages is highly limited. This may be due to the lack of collection of proper datasets and preprocessing techniques attributed to the rich morphological and stylistic features in these languages. In this paper we apply some supervised machine learning models, namely SVM and Naïve Bayes to Hindi literature to perform authorship analysis by picking four Hindi authors. We compare and analyze the accuracy which is so obtained using different models and bag of words approach.

Download Full-text

Thematic content analysis using supervised machine learning: An empirical evaluation using German online news

Quality & Quantity ◽

10.1007/s11135-011-9545-7 ◽

2011 ◽

Vol 47 (2) ◽

pp. 761-773 ◽

Cited By ~ 49

Author(s):

Michael Scharkow

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Empirical Evaluation ◽

Online News ◽

Supervised Machine Learning ◽

Thematic Content Analysis ◽

Thematic Content

Download Full-text

Lifespan development in the academy of American poets

Scientific Study of Literature ◽

10.1075/ssol.5.1.04tho ◽

2015 ◽

Vol 5 (1) ◽

pp. 83-98 ◽

Cited By ~ 2

Author(s):

David E. Thomson

Keyword(s):

Multilevel Analysis ◽

Expressive Writing ◽

Text Analysis ◽

English Language ◽

Literary Analysis ◽

Lifespan Development ◽

Past Tense ◽

Significant Relationships ◽

Automated Text Analysis ◽

American Poets

The present study investigated lifespan writing tendencies among members of the Academy of American Poets (N = 411). All original English language poems (N = 2,558) available on the Academy website during 2013 were included provided that each poet was represented by at least two poems. Correlations of the age in which each poet published each poem with established indicators of lifespan development were small to moderate (r’s from -.11 to .16). Contrary to lifespan development for expository and emotionally expressive writing, poets tended to employ past tense and use less emotionally valenced language as they aged. Multilevel analysis revealed no significant relationships between publishing age and maturation outcomes, although that process did indicate various curvilinear relations. I conclude by discussing the implications of automated text analysis on literary analysis of career development.

Download Full-text

Nationality swapping in the Olympic Games 1978–2017: A supervised machine learning approach to analysing discourses of citizenship and nationhood

International Review for the Sociology of Sport ◽

10.1177/1012690218773969 ◽

2018 ◽

Vol 54 (8) ◽

pp. 971-988

Author(s):

Joost Jansen

Keyword(s):

Machine Learning ◽

English Language ◽

State Of The Art ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Olympic Athletes ◽

The Past ◽

Learning Techniques ◽

Machine Learning Approach ◽

Media Reports

While the practice of nationality swapping in sports traces back as far as the Ancient Olympics, it seems to have increased over the past decades. Cases of Olympic athletes who switched their national allegiances are often surrounded with controversy. Two strands of thought could help explain this controversy. First, these cases are believed to be indicative of the marketisation of citizenship. Second, these cases challenge established discourses of national identity as the question ‘who may represent the nation?’ becomes contested. Using state-of-the-art machine learning techniques, I analysed 1534 English language newspaper articles about Olympic athletes who changed their nationalities (1978–2017). The results indicate: (i) that switching national allegiance has not necessarily become more controversial; (ii) that most media reports do not frame nationality switching in economic terms; and (iii) that nationality swapping often occurs fairly unnoticed. I therefore conclude that a marketisation of citizenship is less apparent in nationality switching than some claim. Moreover, nationality switches are often mentioned rather casually, indicating the generally banal character of nationalism. Only under certain conditions does ‘hot’ nationalism spark the issue of nationhood.

Download Full-text

Translation strategies for tourist advertising sites

LAPLAGE EM REVISTA ◽

10.24115/s2446-6220202173d1751p.609-622 ◽

2021 ◽

Vol 7 (3D) ◽

pp. 609-622

Author(s):

Assel S. Tukhtabayeva ◽

Bagila A. Akhatova ◽

Raigul S. Malikova ◽

Emma M. Howes

Keyword(s):

Content Analysis ◽

Text Analysis ◽

English Language ◽

Translation Strategies ◽

Continuous Sampling ◽

Research Material ◽

Translation Methods

This article is devoted to the study of modern strategies for translating tourist-oriented Internet texts. Misuse or omission of expressive means in translated texts may lead to distortion of information and loss of the reader’s interest. Therefore, the aim of the research is to study translation methods and strategies making quality translation of tourist texts possible. The research material comprised tourist texts presented on English-language tourist sites. The following methods were used to reach the objectives: content analysis; the method of continuous sampling, in which the material necessary for the study was collected; method of linguistic and stylistic text analysis. In the course of the analysis, the authors identified the most frequent methods of broadcasting realia: descriptive translation; tracking; transcription; syntactic assimilation.

Download Full-text

Supervised Machine Learning for Text Analysis in R

10.1201/9781003093459 ◽

2021 ◽

Author(s):

Emil Hvitfeldt ◽

Julia Silge

Keyword(s):

Machine Learning ◽

Text Analysis ◽

Supervised Machine Learning

Download Full-text

Uncivil and personal? Comparing patterns of incivility in comments on the Facebook pages of news outlets

New Media & Society ◽

10.1177/1461444818757205 ◽

2018 ◽

Vol 20 (10) ◽

pp. 3678-3699 ◽

Cited By ~ 26

Author(s):

Leona Yi-Fan Su ◽

Michael A Xenos ◽

Kathleen M Rose ◽

Christopher Wirz ◽

Dietram A Scheufele ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Content Analysis ◽

Empirical Evidence ◽

News Media ◽

General Type ◽

Supervised Machine Learning ◽

The Social ◽

Facebook Pages ◽

National Local

Social media and its embedded user commentary are playing increasingly influential roles in the news process. However, researchers’ understanding of the social media commenting environment remains limited, despite rising concerns over uncivil comments. Accordingly, this study used a supervised machine learning–based method of content analysis to examine the extent and patterns of incivility in the comment sections of 42 US news outlets’ Facebook pages over an 18-month period in 2015–2016. These outlets were selected as being broadly representative of national, local, conservative, and liberal-news media. The findings provide the first empirical evidence that both the level and the targets of incivility in the comments posted on news outlets’ Facebook pages vary greatly according to such entities’ general type and ideological stance.

Download Full-text

Sentiment/tone (Automated Content Analysis)

DOCA - Database of Variables for Content Analysis ◽

10.34778/1d ◽

2021 ◽

Author(s):

Valerie Hase

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Sentiment Analysis ◽

Supervised Machine Learning ◽

Data Sets ◽

Financial News ◽

State Of The Union ◽

Automated Content Analysis ◽

Course Material ◽

Newspaper Articles

Sentiment/tone describes the way issues or specific actors are described in coverage. Many analyses differentiate between negative, neutral/balanced or positive sentiment/tone as broader categories, but analyses might also measure expressions of incivility, fear, or happiness, for example, as more granular types of sentiment/tone. Analyses can detect sentiment/tone in full texts (e.g., general sentiment in financial news) or concerning specific issues (e.g., specific sentiment towards the stock market in financial news or a specific actor). The datasets referred to in the table are described in the following paragraph: Puschmann (2019) uses four data sets to demonstrate how sentiment/tone may be analyzed by the computer. Using Sherlock Holmes stories (18th century, N = 12), tweets (2016, N = 18,826), Swiss newspaper articles (2007-2012, N = 21,280), and debate transcripts (2013-2017, N = 205,584), he illustrates how dictionaries may be applied for such a task. Rauh (2019) uses three data sets to validate his organic German language dictionary for sentiment/tone. His data consists of sentences from German parliament speeches (1991-2013, N = 1,500), German-language quasi-sentences from German, Austrian and Swiss party manifestos (1998-2013, N = 14,008) and newspaper, journal and news wire articles (2011-2012, N = 4,038). Silge and Robinson (2020) use six Jane Austen novels to demonstrate how dictionaries may be used for sentiment analysis. Van Atteveldt and Welbers (2020) use state of the Union speeches (1789-2017, N = 58) for the same purpose. The same authors (van Atteveldt & Welbers, 2019) show based on a dataset of N = 2,000 movie reviews how supervised machine learning might also do the trick. In their Quanteda tutorials, Watanabe and Müller (2019) demonstrate the use of dictionaries and supervised machine learning for sentiment analysis on UK newspaper articles (2012-2016, N = 6,000) as well as the same set of movie reviews (n = 2,000). Lastly, Wiedemann and Niekler (2017) use state of the Union speeches (1790-2017, N = 233) to demonstrate how sentiment/tone can be coded automatically via a dictionary approach. Field of application/theoretical foundation: Related to theories of “Framing” and “Bias” in coverage, many analyses are concerned with the way the news evaluates and interprets specific issues and actors. References/combination with other methods of data collection: Manual coding is needed for many automated analyses, including the ones concerned with sentiment. Studies for example use manual content analysis to develop dictionaries, to create training sets on which algorithms used for automated classification are trained, or to validate the results of automated analyses (Song et al., 2020). Table 1. Measurement of “Sentiment/Tone” using automated content analysis. Author(s) Sample Procedure Formal validity check with manual coding as benchmark* Code Puschmann (2019) (a) Sherlock Holmes stories (b) Tweets (c) Swiss newspaper articles (d) German Parliament transcripts Dictionary approach Not reported http://inhaltsanalyse-mit-r.de/sentiment.html Rauh (2018) (a) Bundestag speeches (b) Quasi-sentences from German, Austrian and Swiss party manifestos (c) Newspapers, journals, agency reports Dictionary approach Reported https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BKBXWD Silge & Robinson (2020) Books by Jane Austen Dictionary approach Not reported https://www.tidytextmining.com/sentiment.html van Atteveldt & Welbers (2020) State of the Union speeches Dictionary approach Reported https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/sentiment_analysis.md van Atteveldt & Welbers (2019) Movie reviews Supervised Machine Learning Approach Reported https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_ml.md Watanabe & Müller (2019) Newspaper articles Dictionary approach Not reported https://tutorials.quanteda.io/advanced-operations/targeted-dictionary-analysis/ Watanabe & Müller (2019) Movie reviews Supervised Machine Learning Approach Reported https://tutorials.quanteda.io/machine-learning/nb/ Wiedemann & Niekler (2017) State of the Union speeches Dictionary approach Not reported https://tm4ss.github.io/docs/Tutorial_3_Frequency.html *Please note that many of the sources listed here are tutorials on how to conducted automated analyses – and therefore not focused on the validation of results. Readers should simply read this column as an indication in terms of which sources they can refer to if they are interested in the validation of results. References Puschmann, C. (2019). Automatisierte Inhaltsanalyse mit R. Retrieved from http://inhaltsanalyse-mit-r.de/index.html Rauh, C. (2018). Validating a sentiment dictionary for German political language—A workbench note. Journal of Information Technology & Politics, 15(4), 319–343. doi:10.1080/19331681.2018.1485608 Silge, J., & Robinson, D. (2020). Text mining with R. A tidy approach. Retrieved from https://www.tidytextmining.com/ Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H.G. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication, 37(4), 550-572. van Atteveldt, W., & Welbers, K. (2019). Supervised Text Classification. Retrieved from https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_ml.md van Atteveldt, W., & Welbers, K. (2020). Supervised Sentiment Analysis in R. Retrieved from https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/sentiment_analysis.md Watanabe, K., & Müller, S. (2019). Quanteda tutorials. Retrieved from https://tutorials.quanteda.io/ Wiedemann, G., Niekler, A. (2017). Hands-on: a five day text mining course for humanists and social scientists in R. Proceedings of the 1st Workshop Teaching NLP for Digital Humanities (Teach4DH@GSCL 2017), Berlin. Retrieved from https://tm4ss.github.io/docs/index.html

Download Full-text