Constructing Narrative Event Evolutionary Graph for Script Event Prediction

Script event prediction requires a model to predict the subsequent event given an existing event context. Previous models based on event pairs or event chains cannot make full use of dense event connections, which may limit their capability of event prediction. To remedy this, we propose constructing an event graph to better utilize the event network information for script event prediction. In particular, we first extract narrative event chains from large quantities of news corpus, and then construct a narrative event evolutionary graph (NEEG) based on the extracted chains. NEEG can be seen as a knowledge base that describes event evolutionary principles and patterns. To solve the inference problem on NEEG, we present a scaled graph neural network (SGNN) to model event interactions and learn better event representations. Instead of computing the representations on the whole graph, SGNN processes only the concerned nodes each time, which makes our model feasible to large-scale graphs. By comparing the similarity between input context event representations and candidate event representations, we can choose the most reasonable subsequent event. Experimental results on widely used New York Times corpus demonstrate that our model significantly outperforms state-of-the-art baseline methods, by using standard multiple choice narrative cloze evaluation.

Download Full-text

SAM-Net: Integrating Event-Level and Chain-Level Attentions to Predict What Happens Next

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016802 ◽

2019 ◽

Vol 33 ◽

pp. 6802-6809 ◽

Cited By ~ 1

Author(s):

Shangwen Lv ◽

Wanhui Qian ◽

Longtao Huang ◽

Jizhong Han ◽

Songlin Hu

Keyword(s):

New York ◽

State Of The Art ◽

New York Times ◽

Event Sequences ◽

Text Understanding ◽

Event Prediction ◽

Subsequent Event ◽

Chain Sequence ◽

The Individual ◽

Chain Level

Scripts represent knowledge of event sequences that can help text understanding. Script event prediction requires to measure the relation between an existing chain and the subsequent event. The dominant approaches either focus on the effects of individual events, or the influence of the chain sequence. However, only considering individual events will lose much semantic relations within the event chain, and only considering the sequence of the chain will introduce much noise. With our observations, both the individual events and the event segments within the chain can facilitate the prediction of the subsequent event. This paper develops self attention mechanism to focus on diverse event segments within the chain and the event chain is represented as a set of event segments. We utilize the event-level attention to model the relations between subsequent events and individual events. Then, we propose the chain-level attention to model the relations between subsequent events and event segments within the chain. Finally, we integrate event-level and chain-level attentions to interact with the chain to predict what happens next. Comprehensive experiment results on the widely used New York Times corpus demonstrate that our model achieves better results than other state-of-the-art baselines by adopting the evaluation of Multi-Choice Narrative Cloze task.

Download Full-text

Digital Paywall Design: Implications for Content Demand and Subscriptions

Management Science ◽

10.1287/mnsc.2020.3650 ◽

2020 ◽

Author(s):

Sinan Aral ◽

Paramveer S. Dhillon

Keyword(s):

New York ◽

Large Scale ◽

Business Models ◽

New York Times ◽

Total Content ◽

Policy Changes ◽

Activity Data ◽

Scientific Discussion ◽

Managerial Implications ◽

Net Revenue

Most online content publishers have moved to subscription-based business models regulated by digital paywalls. But the managerial implications of such freemium content offerings are not well understood. We, therefore, utilized microlevel user activity data from the New York Times to conduct a large-scale study of the implications of digital paywall design for publishers. Specifically, we use a quasi-experiment that varied the (1) quantity (the number of free articles) and (2) exclusivity (the number of available sections) of free content available through the paywall to investigate the effects of paywall design on content demand, subscriptions, and total revenue. The paywall policy changes we studied suppressed total content demand by about 9.9%, reducing total advertising revenue. However, this decrease was more than offset by increased subscription revenue as the policy change led to a 31% increase in total subscriptions during our seven-month study, yielding net positive revenues of over $230,000. The results confirm an economically significant impact of the newspaper’s paywall design on content demand, subscriptions, and net revenue. Our findings can help structure the scientific discussion about digital paywall design and help managers optimize digital paywalls to maximize readership, revenue, and profit. This paper was accepted by Chris Forman, information systems.

Download Full-text

Large scale analysis of violent death count in daily newspapers to quantify bias and censorship

10.21203/rs.3.rs-17213/v1 ◽

2020 ◽

Author(s):

marco casolino

Keyword(s):

New York ◽

Power Law ◽

Large Scale ◽

New York Times ◽

Geographical Distance ◽

Power Law Distribution ◽

Work Related ◽

Social Traits ◽

Editorial Bias ◽

Fascist Regime

Abstract Editorial bias and censorship can be quantified studying how the occurrence of the word ‘killed’ (‘morti’ in Italian) changes over time and reported location. To this purpose, we have analyzed the complete online archives of the major US newspaper (The New York Times - NYT) and the three major Italian ones (Il Corriere della Sera - CDS, La Repubblica - REP, La Stampa - STA). After 1960 we find a common trend of decreasing coverage given to violent events in all three Italian newspapers (NYT is more stable), opposite to the growing perceived threat of violence in Italy. In all Italian newspapers we also find that the female/male ratio is about 30% and roughly constant over the years, with only La Repubblica showing an increase of reporting of female deaths of about 3% /year . Even accounting for the lower female casualty rate, especially in work-related accidents, this hints to the presence of some gender bias in the reporting of violent deaths. Historically, we show evidence of censorship in Italian newspapers during WW1 and Italian Fascist regime and estimate that in the period 1923-1943 ’ 57,000 articles (75%) featuring domestic deaths were censored in Italy. We also find that the number of casualties is often (up to 26%) artificially increased to the next multiple of 5 or 10 to emphasize the importance of the article. The only exception to this editorial practice is found in domestic articles by Italian newspapers during the Fascist regime, another effect of censorship trying to downplay domestic casualties. Furthermore, we find that in all newspapers, the distribution Nk of the number of articles involving k persons killed is described by a power law Nk =A*k^(-γ) for 2≤k≤1E6. The value of γ decreases in wartime and increases in peacetime and reflecting how the state of belligerence of a country is being reported. In foreign events, editorial bias results in a break of the power law for 2≤k≤10 resulting in up to 100% articles missing in comparison to what would be expected by a pure power law distribution, which describes the distribution of all domestic articles. The suppression of low casualties articles grows with geographical distance from the publishing nation with a rate higher by a factor 5 in the Italian newspapers than in NYT and by a factor 2 - 4 when considering only countries in Europe (for the Italian newspapers) or America (for NYT), sign that the geographical distance plays a strong role when reporting among countries that share common social traits. These techniques can be be applied in a wider context, e.g. toward specific ethnic groups and contribute to quantitatively assess the freedom of press in a given country.

Download Full-text

Large scale analysis of violent death count in daily newspapers to quantify bias and censorship

10.21203/rs.3.rs-17213/v2 ◽

2020 ◽

Author(s):

marco casolino

Keyword(s):

New York ◽

Large Scale ◽

Historical Analysis ◽

New York Times ◽

Daily Newspapers ◽

Large Scale Analysis ◽

Foreign Countries ◽

The Press ◽

Show Evidence ◽

Changes Over Time

Abstract In this work we develop a series of techniques and tools to determine and quantify the presence of bias and censorship in newspapers. These algorithms are tested analyzing the occurrence of keywords ‘killed’ and ‘suicide’ ( ‘morti’, ‘suicidio’ in Italian) and their changes over time, gender and reported location on the complete online archives (42 million records) of the major US newspaper ( The New York Times ) and the three major Italian ones ( Il Corriere della Sera, La Repubblica, La Stampa ). Using these tools, since the Italian language distinguishes between the female and male cases, we find the presence of gender bias in all Italian newspapers, with reported single female deaths to be about one-third of those involving single men. Analyzing the historical trends, we show evidence of censorship in Italian newspapers both during World War 1 and during the Italian Fascist regime. Censorship in all countries during World Wars and in Italy during the Fascist period is a historically ascertained fact, but so far there was no estimate on the amount on censorship in newspaper reporting: in this work we estimate that about 75% of domestic deaths and suicides were not reported. This is also confirmed by statistical analysis of the distribution of the least significant digit of the number of reported deaths. We also find that the distribution function of the number of articles vs. the number of deaths reported in articles follows a power law, which is broken (with fewer articles being written) when reporting on few deaths occurring in foreign countries. The lack of articles is found to grow with geographical distance from the nation where the newspaper is being printed. Whereas the assessment of the truth of a single article or the debunking of what are now called ‘fake news’ requires specific fact-checking and becomes more difficult as time goes by, these methods can be be used in historical analysis and to evaluate quantitatively the amount of bias and censorship present in other printed or online publication and can thus contribute to quantitatively assess the freedom of the press in a given country. Furthermore, they can be applied in wider contexts such as the evaluation of bias toward specific ethnic groups or specific accidents.

Download Full-text

The (real) need for a human touch: testing a human–machine hybrid topic classification workflow on a New York Times corpus

Quality & Quantity ◽

10.1007/s11135-021-01287-4 ◽

2021 ◽

Author(s):

Miklos Sebők ◽

Zoltán Kacsuk ◽

Ákos Máté

Keyword(s):

New York ◽

Large Scale ◽

Hybrid Approach ◽

New York Times ◽

Computational Social Science ◽

Supervised Machine Learning ◽

Front Page ◽

Text Data ◽

Human Validation ◽

Classification Project

AbstractThe classification of the items of ever-increasing textual databases has become an important goal for a number of research groups active in the field of computational social science. Due to the increased amount of text data there is a growing number of use-cases where the initial effort of human classifiers was successfully augmented using supervised machine learning (SML). In this paper, we investigate such a hybrid workflow solution classifying the lead paragraphs of New York Times front-page articles from 1996 to 2006 according to policy topic categories (such as education or defense) of the Comparative Agendas Project (CAP). The SML classification is conducted in multiple rounds and, within each round, we run the SML algorithm on n samples and n times if the given algorithm is non-deterministic (e.g., SVM). If all the SML predictions point towards a single label for a document, then it is classified as such (this approach is also called a “voting ensemble"). In the second step, we explore several scenarios, ranging from using the SML ensemble without human validation to incorporating active learning. Using these scenarios, we can quantify the gains from the various workflow versions. We find that using human coding and validation combined with an ensemble SML hybrid approach can reduce the need for human coding while maintaining very high precision rates and offering a modest to a good level of recall. The modularity of this hybrid workflow allows for various setups to address the idiosyncratic resource bottlenecks that a large-scale text classification project might face.

Download Full-text

Corpus linguistics, newspaper archives and historical research methods

Journal of Management History ◽

10.1108/jmh-01-2018-0009 ◽

2019 ◽

Vol 25 (4) ◽

pp. 533-549

Author(s):

Chinmay Tumbe

Keyword(s):

New York ◽

Corpus Linguistics ◽

Research Method ◽

Large Scale ◽

New York Times ◽

Historical Research ◽

Content Type ◽

Management History ◽

Management Concepts ◽

The Wall Street Journal

Purpose The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history. Design/methodology/approach The paper draws its inferences from Google NGram Viewer and five digitised historical newspaper databases – The Times of India, The Financial Times, The Economist, The New York Times and The Wall Street Journal – that contain prints from the nineteenth century. Findings The paper argues that corpus linguistics or the quantitative and qualitative analysis of large-scale real-world machine-readable text can be an important method of historical research in management studies, especially for discourse analysis. It shows how this method can be fruitfully used for research in management and organisational history, using term count and cluster analysis. In particular, historical databases of digitised newspapers serve as important corpora to understand the evolution of specific words and concepts. Corpus linguistics using newspaper archives can potentially serve as a method for periodisation and triangulation in corporate, analytically structured and serial histories and also foster cross-country comparisons in the evolution of management concepts. Research limitations/implications The paper also shows the limitation of the research method and potential robustness checks while using the method. Practical implications Findings of this paper can stimulate new ways of conducting research in management history. Originality/value The paper for the first time introduces corpus linguistics as a research method in management history.

Download Full-text

Networks, Big Data, and Intermedia Agenda Setting: An Analysis of Traditional, Partisan, and Emerging Online U.S. News

Journalism & Mass Communication Quarterly ◽

10.1177/1077699016679976 ◽

2016 ◽

Vol 94 (4) ◽

pp. 1031-1055 ◽

Cited By ~ 37

Author(s):

Chris J. Vargo ◽

Lei Guo

Keyword(s):

New York ◽

Agenda Setting ◽

Large Scale ◽

New York Times ◽

Washington Post ◽

Media Type ◽

Leading Role ◽

Intermedia Agenda Setting ◽

Partisan Media ◽

Elite Newspapers

This large-scale intermedia agenda–setting analysis examines U.S. online media sources for 2015. The network agenda–setting model showed that media agendas were highly homogeneous and reciprocal. Online partisan media played a leading role in the entire media agenda. Two elite newspapers— The New York Times and The Washington Post—were found to no longer be in control of the news agenda and were more likely to follow online partisan media. This article provides evidence for a nuanced view of the network agenda–setting model; intermedia agenda–setting effects varied by media type, issue type, and time periods.

Download Full-text

Large-scale quantitative evidence of media impact on public opinion toward China

Humanities and Social Sciences Communications ◽

10.1057/s41599-021-00846-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Junming Huang ◽

Gavin G. Cook ◽

Yu Xie

Keyword(s):

New York ◽

Public Opinion ◽

Mass Media ◽

Language Processing ◽

Large Scale ◽

New York Times ◽

Cross Sectional ◽

Data Set ◽

The New York Times ◽

Quantitative Evidence

AbstractDo mass media influence people’s opinions of other countries? Using BERT, a deep neural network-based natural language processing model, this study analyzes a large corpus of 267,907 China-related articles published by The New York Times since 1970. The output from The New York Times is then compared to a longitudinal data set constructed from 101 cross-sectional surveys of the American public’s views on China, revealing that the reporting of The New York Times on China in one year explains 54% of the variance in American public opinion on China in the next. This result confirms hypothesized links between media and public opinion and helps shed light on how mass media can influence the public opinion of foreign countries.

Download Full-text

Inhaltsanalyse elektronisch gespeicherter Massendaten der internationalen Presse

Zeitschrift für Medienpsychologie ◽

10.1026//1617-6383.15.3.98 ◽

2003 ◽

Vol 15 (3) ◽

pp. 98-105 ◽

Cited By ~ 1

Author(s):

Mark Galliker ◽

Jan Herman

Keyword(s):

New York ◽

New York Times

Zusammenfassung. Am Beispiel der Repräsentation von Mann und Frau in der Times und in der New York Times wird ein inhaltsanalytisches Verfahren vorgestellt, das sich besonders für die Untersuchung elektronisch gespeicherter Printmedien eignet. Unter Co-Occurrence-Analyse wird die systematische Untersuchung verbaler Kombinationen pro Zähleinheit verstanden. Diskutiert wird das Problem der Auswahl der bei der Auswertung und Darstellung der Ergebnisse berücksichtigten semantischen Einheiten.

Download Full-text

New York Times endorses plan for social reporting

PsycEXTRA Dataset ◽

10.1037/e467782008-005 ◽

1969 ◽

Cited By ~ 1

Keyword(s):

New York ◽

New York Times ◽

Social Reporting

Download Full-text