scholarly journals The (real) need for a human touch: testing a human–machine hybrid topic classification workflow on a New York Times corpus

Author(s):  
Miklos Sebők ◽  
Zoltán Kacsuk ◽  
Ákos Máté

AbstractThe classification of the items of ever-increasing textual databases has become an important goal for a number of research groups active in the field of computational social science. Due to the increased amount of text data there is a growing number of use-cases where the initial effort of human classifiers was successfully augmented using supervised machine learning (SML). In this paper, we investigate such a hybrid workflow solution classifying the lead paragraphs of New York Times front-page articles from 1996 to 2006 according to policy topic categories (such as education or defense) of the Comparative Agendas Project (CAP). The SML classification is conducted in multiple rounds and, within each round, we run the SML algorithm on n samples and n times if the given algorithm is non-deterministic (e.g., SVM). If all the SML predictions point towards a single label for a document, then it is classified as such (this approach is also called a “voting ensemble"). In the second step, we explore several scenarios, ranging from using the SML ensemble without human validation to incorporating active learning. Using these scenarios, we can quantify the gains from the various workflow versions. We find that using human coding and validation combined with an ensemble SML hybrid approach can reduce the need for human coding while maintaining very high precision rates and offering a modest to a good level of recall. The modularity of this hybrid workflow allows for various setups to address the idiosyncratic resource bottlenecks that a large-scale text classification project might face.

2020 ◽  
Vol 29 (1) ◽  
pp. 19-42 ◽  
Author(s):  
Pablo Barberá ◽  
Amber E. Boydstun ◽  
Suzanna Linn ◽  
Ryan McMahon ◽  
Jonathan Nagler

Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Lawrence J. Trautman

In November 2018, The New York Times ran a front-page story describing how Facebook concealed knowledge and disclosure of Russian-linked activity and exploitation resulting in Kremlin led disruption of the 2016 and 2018 U.S. elections, through the use of global hate campaigns and propaganda warfare. By mid-December 2018, it became clear that the Russian efforts leading up to the 2016 U.S. elections were much more extensive than previously thought. Two studies conducted for the United States Senate Select Committee on Intelligence (SSCI), by: (1) Oxford University’s Computational Propaganda Project and Graphika; and (2) New Knowledge, provide considerable new information and analysis about the Russian Internet Research Agency (IRA) influence operations targeting American citizens.By early 2019 it became apparent that a number of influential and successful high growth social media platforms had been used by nation states for propaganda purposes. Over two years earlier, Russia was called out by the U.S. intelligence community for their meddling with the 2016 American presidential elections. The extent to which prominent social media platforms have been used, either willingly or without their knowledge, by foreign powers continues to be investigated as this Article goes to press. Reporting by The New York Times suggests that it wasn’t until the Facebook board meeting held September 6, 2017 that board audit committee chairman, Erskin Bowles, became aware of Facebook’s internal awareness of the extent to which Russian operatives had utilized the Facebook and Instagram platforms for influence campaigns in the United States. As this Article goes to press, the degree to which the allure of advertising revenues blinded Facebook to their complicit role in offering the highest bidder access to Facebook users is not yet fully known. This Article can not be a complete chapter in the corporate governance challenge of managing, monitoring, and oversight of individual privacy issues and content integrity on prominent social media platforms. The full extent of Facebook’s experience is just now becoming known, with new revelations yet to come. All interested parties: Facebook users; shareholders; the board of directors at Facebook; government regulatory agencies such as the Federal Trade Commission (FTC) and Securities and Exchange Commission (SEC); and Congress must now figure out what has transpired and what to do about it. These and other revelations have resulted in a crisis for Facebook. American democracy has been and continues to be under attack. This article contributes to the literature by providing background and an account of what is known to date and posits recommendations for corrective action.


Author(s):  
Vivian Johnson

In a letter to the editor of the New York Times, Mark Peck (May 6, 2007), a 10th grade student, notes “it’s too bad that students have to take the rap for old-style teachers who are still not comfortable with the computer as an educational tool” (p. A22). Mark’s comment was in response to a front-page article that highlighted how little substantive change had occurred in the learning environments of schools that instituted laptop programs. In succinct terms, Mark identifies a major barrier to meaningful adoption of new technologies by stating that “computer-based learning initiatives are not going to take off until teachers are just as excited about them as their students” (p. A22). Mark’s experience as a learner is echoed in a recent report (Education Week, 2007).


2020 ◽  
Author(s):  
Sinan Aral ◽  
Paramveer S. Dhillon

Most online content publishers have moved to subscription-based business models regulated by digital paywalls. But the managerial implications of such freemium content offerings are not well understood. We, therefore, utilized microlevel user activity data from the New York Times to conduct a large-scale study of the implications of digital paywall design for publishers. Specifically, we use a quasi-experiment that varied the (1) quantity (the number of free articles) and (2) exclusivity (the number of available sections) of free content available through the paywall to investigate the effects of paywall design on content demand, subscriptions, and total revenue. The paywall policy changes we studied suppressed total content demand by about 9.9%, reducing total advertising revenue. However, this decrease was more than offset by increased subscription revenue as the policy change led to a 31% increase in total subscriptions during our seven-month study, yielding net positive revenues of over $230,000. The results confirm an economically significant impact of the newspaper’s paywall design on content demand, subscriptions, and net revenue. Our findings can help structure the scientific discussion about digital paywall design and help managers optimize digital paywalls to maximize readership, revenue, and profit. This paper was accepted by Chris Forman, information systems.


2017 ◽  
Vol 51 (1) ◽  
pp. 22-26 ◽  
Author(s):  
James L. Gelvin

I want to kick off this discussion with three quotes and a statistic. The first quote is as follows: “The chief purpose [of historical education] is not to fill [someone's] head with a mass of material which he may perhaps put forward again when a college examiner demands its production.” The second—a line from a front page story in The New York Times—reads, “College freshmen throughout the nation reveal a striking ignorance of even the most elementary aspects of United States history.” And the third: We have descended into what some consider the dark age of declining enrollments, professional unemployment, and a growing rejection of history by many students who seem to agree with Henry Ford that history is “bunk.” If we are going to have any real impact on individuals or society, we must do something besides just cover the material. Finally, the statistic: in eight years alone, the number of students majoring in history dropped 40 percent.


1988 ◽  
Vol 9 (3) ◽  
pp. 1-9 ◽  
Author(s):  
Paul Martin Lester

Mug shots from five U.S. newspapers: USA Today, Chicago Tribune, New Orleans Times-Picayune, New York Times and the Los Angeles Times, were analyzed for the same five-day work week of each month for 1986. The 300 front pages yielded 520 head shots of 1,148 photographs. USA Today and newspapers with its similar graphic style use more mug shots without an accompanying article on the front page than more traditionally designed newspapers.


1995 ◽  
Vol 72 (4) ◽  
pp. 841-850 ◽  
Author(s):  
William J. Hughes

Scholars and political actors generally believe that presidents enjoy a period of sanguine rapport with the press gallery during a honeymoon of about two months at the beginning of each new administration. The honeymoon is characterized by a minimum of hostile questions by reporters and relatively gentle media treatment of the new president. However, this content analysis of front-page headlines in the New York Times during the first 100 days of the Eisenhower, Kennedy, Nixon, Carter, Reagan, and Clinton administrations suggests that all honeymoons are not equal.


2018 ◽  
Vol 7 (3) ◽  
pp. 337-360 ◽  
Author(s):  
Bryan E. Denham

When it publishes a major investigative report or exposé, a prominent news organization can transfer the salience of both an issue and its attributes to other news outlets. Major investigations can also affect how reporters in the same outlet think about an issue in the news. The present study examines intramedia and intermedia agenda-setting effects in the context of sport, drawing on allegations of a state-sponsored doping program in Russian athletics. In May 2016, Dr. Grigory Rodchenkov, a former doping official in Russia, described the program to reporters at the New York Times, and the ensuing front-page story impacted coverage both internally and externally. The current study considers the implications of these effects for sports journalism and individual athletes.


Sign in / Sign up

Export Citation Format

Share Document