corpus selection
Recently Published Documents


TOTAL DOCUMENTS

23
(FIVE YEARS 7)

H-INDEX

2
(FIVE YEARS 1)

2020 ◽  
Vol 6 (3) ◽  
pp. 205630512094069
Author(s):  
Janna Joceli Omena ◽  
Elaine Teixeira Rabello ◽  
André Goes Mintz

This article seeks to contribute to the field of digital research by critically accounting for the relationship between hashtags and their forms of grammatization—the platform techno-materialization process of online activity. We approach hashtags as sociotechnical formations that serve social media research not only as criteria in corpus selection but also displaying the complexity of the online engagement and its entanglement with the technicity of web platforms. Therefore, the study of hashtag engagement requires a grasping of the functioning of the platform itself (technicity) along with the platform grammatization. In this respect, we propose the three-layered (3L) perspective for addressing hashtag engagement. The first contemplates potential differences between high-visibility and ordinary hashtag usage culture, its related actors, and content. The second focuses on hashtagging activity and the repurposing of how hashtags can be differently embedded into social media databases. The last layer looks particularly into the images and texts to which hashtags are brought to relation. To operationalize the 3L framework, we draw on the case of the “impeachment-cum-coup” of Brazilian president Dilma Rousseff. When cross-read, the three layers add value to one another, providing also difference visions of the high-visibility and ordinary groups.


2020 ◽  
Vol 29 (1) ◽  
pp. 19-42 ◽  
Author(s):  
Pablo Barberá ◽  
Amber E. Boydstun ◽  
Suzanna Linn ◽  
Ryan McMahon ◽  
Jonathan Nagler

Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.


Oceánide ◽  
2020 ◽  
Vol 13 ◽  
pp. 61-68
Author(s):  
María Jesús Lorenzo Modia ◽  
María Begoña Lasa Álvarez

This article presents a preliminary approach to the study of the images of the New Woman in the publications "The Irish Times" and "The Weekly Irish Times" at the turn of the twentieth century. From the theoretical framework of women’s studies the concept of New Woman is analysed in relation to that of New Journalism, which arose at the same time. Additionally, the aetiology and features of the two publications, plus the criteria for corpus selection, are described, and the corpus texts are compared to similar English publications of the period. The complex political situation in Ireland at the turn of the century is also considered. The role of women and the various perceptions of them are analysed, both in the sections of letters to the Editor and in essays. The roles of women in "The Irish Times" and "The Weekly Irish Times" are also compared to those depicted in journals and newspapers addressed to a female readership. The study concludes with excerpts of the two publications in question and the analysis of the contradictory opinions on the lives and roles of women in the nineteenth-century fin de siècle.


Author(s):  
Mubin Shoukat Tamboli ◽  
Rajesh Prasad

Authorship attribution is a task to identify the writer of unknown text and categorize it to known writer. Writing style of each author is distinct and can be used for the discrimination. There are different parameters responsible for rectifying such changes. When the writing samples collected for an author when it belongs to small period, it can participate efficiently for identification of unknown sample. In this paper author identification problem considered where writing sample is not available on the same time period. Such evidences collected over long period of time. And character n-gram, word n-gram and pos n-gram features used to build the model. As they are contributing towards style of writer in terms of content as well as statistic characteristic of writing style. We applied support vector machine algorithm for classification. Effective results and outcome came out from the experiments. While discriminating among multiple authors, corpus selection and construction were the most tedious task which was implemented effectively. It is observed that accuracy varied on feature type. Word and character n-gram have shown good accuracy than PoS n-gram.


2019 ◽  
Vol 4 (1) ◽  
pp. 36-53 ◽  
Author(s):  
Giovanni Colavizza ◽  
Matteo Romanello

Even large citation indexes such as the Web of Science, Scopus or Google Scholar cover only a small fraction of the literature in the humanities. This coverage sensibly decreases going backwards in time. Citation mining of humanities publications — defined as an instance of bibliometric data mining and as a means to the end of building comprehensive citation indexes — remains an open problem. In this contribution we discuss the results of two recent projects in this area: Cited Loci and Linked Books. The former focused on the domain of classics, using journal articles in JSTOR as a corpus; the latter considered the historiography on Venice and a novel corpus of journals and monographs. Both projects attempted to mine citations of all kinds — abbreviated and not, to all types of sources, including primary sources — and considered a wide time span (19th to 21st century). We first discuss the current state of research in citation mining of humanities publications. We then present the various steps involved into this process, from corpus selection to data publication, discussing the peculiarities of the humanities. The approaches taken by the two projects are compared, allowing us to highlight disciplinary differences and commonalities, as well as shared challenges between historiography and classics on this respect. The resulting picture portrays humanities citation mining as a field with a great, yet mostly untapped potential, and a few still open challenges. The potential lies in using citations as a means to interconnect digitized collections at a large scale, by making explicit the linking function of bibliographic citations. As for the open challenges, a key issue is the existing need for an integrated metadata infrastructure and an appropriate legal framework to facilitate citation mining in the humanities.


Sign in / Sign up

Export Citation Format

Share Document