corpus selection Latest Research Papers

Corpus selection and investigation

Dance Lexicon in Shakespeare and His Contemporaries ◽

10.4324/9781003087687-8-11 ◽

2021 ◽

pp. 64-72

Author(s):

Fabio Ciambella

Keyword(s):

Corpus Selection

Digital Methods for Hashtag Engagement Research

Social Media + Society ◽

10.1177/2056305120940697 ◽

2020 ◽

Vol 6 (3) ◽

pp. 205630512094069

Author(s):

Janna Joceli Omena ◽

Elaine Teixeira Rabello ◽

André Goes Mintz

Keyword(s):

Social Media ◽

Online Activity ◽

Online Engagement ◽

Digital Methods ◽

Media Research ◽

Social Media Research ◽

Corpus Selection ◽

The Relationship ◽

Digital Research

This article seeks to contribute to the field of digital research by critically accounting for the relationship between hashtags and their forms of grammatization—the platform techno-materialization process of online activity. We approach hashtags as sociotechnical formations that serve social media research not only as criteria in corpus selection but also displaying the complexity of the online engagement and its entanglement with the technicity of web platforms. Therefore, the study of hashtag engagement requires a grasping of the functioning of the platform itself (technicity) along with the platform grammatization. In this respect, we propose the three-layered (3L) perspective for addressing hashtag engagement. The first contemplates potential differences between high-visibility and ordinary hashtag usage culture, its related actors, and content. The second focuses on hashtagging activity and the repurposing of how hashtags can be differently embedded into social media databases. The last layer looks particularly into the images and texts to which hashtags are brought to relation. To operationalize the 3L framework, we draw on the case of the “impeachment-cum-coup” of Brazilian president Dilma Rousseff. When cross-read, the three layers add value to one another, providing also difference visions of the high-visibility and ordinary groups.

Automated Text Classification of News Articles: A Practical Guide

Political Analysis ◽

10.1017/pan.2020.8 ◽

2020 ◽

Vol 29 (1) ◽

pp. 19-42 ◽

Cited By ~ 1

Author(s):

Pablo Barberá ◽

Amber E. Boydstun ◽

Suzanna Linn ◽

Ryan McMahon ◽

Jonathan Nagler

Keyword(s):

New York ◽

New York Times ◽

Fixed Number ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Methodological Choices ◽

Units Of Analysis ◽

Human Validation ◽

Corpus Selection

Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.

Representations of the New Woman in "The Irish Times" and "The Weekly Irish Times". A Preliminary Approach

Oceánide ◽

10.37668/oceanide.v13i.41 ◽

2020 ◽

Vol 13 ◽

pp. 61-68

Author(s):

María Jesús Lorenzo Modia ◽

María Begoña Lasa Álvarez

Keyword(s):

Nineteenth Century ◽

Twentieth Century ◽

New Woman ◽

Turn Of The Century ◽

New Journalism ◽

Letters To The Editor ◽

Political Situation ◽

The New Woman ◽

Corpus Selection

This article presents a preliminary approach to the study of the images of the New Woman in the publications "The Irish Times" and "The Weekly Irish Times" at the turn of the twentieth century. From the theoretical framework of women’s studies the concept of New Woman is analysed in relation to that of New Journalism, which arose at the same time. Additionally, the aetiology and features of the two publications, plus the criteria for corpus selection, are described, and the corpus texts are compared to similar English publications of the period. The complex political situation in Ireland at the turn of the century is also considered. The role of women and the various perceptions of them are analysed, both in the sections of letters to the Editor and in essays. The roles of women in "The Irish Times" and "The Weekly Irish Times" are also compared to those depicted in journals and newspapers addressed to a female readership. The study concludes with excerpts of the two publications in question and the analysis of the contradictory opinions on the lives and roles of women in the nineteenth-century fin de siècle.

Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection

Communications in Computer and Information Science - Computational Linguistics ◽

10.1007/978-981-15-6168-9_18 ◽

2020 ◽

pp. 206-217

Author(s):

Fei Wang ◽

Robert J. Ross ◽

John D. Kelleher

Keyword(s):

Corpus Selection ◽

Update Frequency ◽

Story Detection

A robust authorship attribution on big period

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp3167-3174 ◽

2019 ◽

Vol 9 (4) ◽

pp. 3167 ◽

Cited By ~ 1

Author(s):

Mubin Shoukat Tamboli ◽

Rajesh Prasad

Keyword(s):

Identification Problem ◽

Authorship Attribution ◽

Support Vector ◽

Writing Style ◽

Author Identification ◽

Time Period ◽

N Gram ◽

Corpus Selection ◽

Writing Sample ◽

Small Period

Authorship attribution is a task to identify the writer of unknown text and categorize it to known writer. Writing style of each author is distinct and can be used for the discrimination. There are different parameters responsible for rectifying such changes. When the writing samples collected for an author when it belongs to small period, it can participate efficiently for identification of unknown sample. In this paper author identification problem considered where writing sample is not available on the same time period. Such evidences collected over long period of time. And character n-gram, word n-gram and pos n-gram features used to build the model. As they are contributing towards style of writer in terms of content as well as statistic characteristic of writing style. We applied support vector machine algorithm for classification. Effective results and outcome came out from the experiments. While discriminating among multiple authors, corpus selection and construction were the most tedious task which was implemented effectively. It is observed that accuracy varied on feature type. Word and character n-gram have shown good accuracy than PoS n-gram.

Citation Mining of Humanities Journals: The Progress to Date and the Challenges Ahead

Journal of European Periodical Studies ◽

10.21825/jeps.v4i1.10120 ◽

2019 ◽

Vol 4 (1) ◽

pp. 36-53 ◽

Cited By ~ 1

Author(s):

Giovanni Colavizza ◽

Matteo Romanello

Keyword(s):

Large Scale ◽

Web Of Science ◽

Legal Framework ◽

Time Span ◽

Primary Sources ◽

Journal Articles ◽

Current State ◽

Citation Indexes ◽

State Of Research ◽

Corpus Selection

Even large citation indexes such as the Web of Science, Scopus or Google Scholar cover only a small fraction of the literature in the humanities. This coverage sensibly decreases going backwards in time. Citation mining of humanities publications — defined as an instance of bibliometric data mining and as a means to the end of building comprehensive citation indexes — remains an open problem. In this contribution we discuss the results of two recent projects in this area: Cited Loci and Linked Books. The former focused on the domain of classics, using journal articles in JSTOR as a corpus; the latter considered the historiography on Venice and a novel corpus of journals and monographs. Both projects attempted to mine citations of all kinds — abbreviated and not, to all types of sources, including primary sources — and considered a wide time span (19th to 21st century). We first discuss the current state of research in citation mining of humanities publications. We then present the various steps involved into this process, from corpus selection to data publication, discussing the peculiarities of the humanities. The approaches taken by the two projects are compared, allowing us to highlight disciplinary differences and commonalities, as well as shared challenges between historiography and classics on this respect. The resulting picture portrays humanities citation mining as a field with a great, yet mostly untapped potential, and a few still open challenges. The potential lies in using citations as a means to interconnect digitized collections at a large scale, by making explicit the linking function of bibliographic citations. As for the open challenges, a key issue is the existing need for an integrated metadata infrastructure and an appropriate legal framework to facilitate citation mining in the humanities.

Some Problems in the Corpus Selection of Chinese Grammar in Spring and Autumn Period

Proceedings of the 2nd International Conference on Culture, Education and Economic Development of Modern Society (ICCESE 2018) ◽

10.2991/iccese-18.2018.104 ◽

2018 ◽

Author(s):

Weiming Peng ◽

Jianya Zhang

Keyword(s):

Autumn Period ◽

Spring And Autumn Period ◽

Corpus Selection ◽

Selection Of

Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies

10.18653/v1/k17-3021 ◽

2017 ◽

Author(s):

Ryan Hornby ◽

Clark Taylor ◽

Jungyeul Park

Keyword(s):

Corpus Selection

Chapter 10: Corpus selection and design

Making and Using Word Lists for Language Learning and Testing ◽

10.1075/z.208.10ch10 ◽

2016 ◽

pp. 95-105 ◽

Cited By ~ 1

Author(s):

I.S.P. Nation ◽

Sorell

Keyword(s):

Corpus Selection

corpus selection
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Corpus selection and investigation

Digital Methods for Hashtag Engagement Research

Automated Text Classification of News Articles: A Practical Guide

Representations of the New Woman in "The Irish Times" and "The Weekly Irish Times". A Preliminary Approach

Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection

A robust authorship attribution on big period

Citation Mining of Humanities Journals: The Progress to Date and the Challenges Ahead

Some Problems in the Corpus Selection of Chinese Grammar in Spring and Autumn Period

Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies

Chapter 10: Corpus selection and design

Export Citation Format

corpus selectionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Corpus selection and investigation

Digital Methods for Hashtag Engagement Research

Automated Text Classification of News Articles: A Practical Guide

Representations of the New Woman in "The Irish Times" and "The Weekly Irish Times". A Preliminary Approach

Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection

A robust authorship attribution on big period

Citation Mining of Humanities Journals: The Progress to Date and the Challenges Ahead

Some Problems in the Corpus Selection of Chinese Grammar in Spring and Autumn Period

Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies

Chapter 10: Corpus selection and design

corpus selection
Recently Published Documents