Learning to Identify Ambiguous and Misleading News Headlines

Accuracy is one of the basic principles of journalism. However, it is increasingly hard to manage due to the diversity of news media. Some editors of online news tend to use catchy headlines which trick readers into clicking. These headlines are either ambiguous or misleading, degrading the reading experience of the audience. Thus, identifying inaccurate news headlines is a task worth studying. Previous work names these headlines ``clickbaits'' and mainly focus on the features extracted from the headlines, which limits the performance since the consistency between headlines and news bodies is underappreciated. In this paper, we clearly redefine the problem and identify ambiguous and misleading headlines separately. We utilize class sequential rules to exploit structure information when detecting ambiguous headlines. For the identification of misleading headlines, we extract features based on the congruence between headlines and bodies. To make use of the large unlabeled data set, we apply a co-training method and gain an increase in performance. The experiment results show the effectiveness of our methods. Then we use our classifiers to detect inaccurate headlines crawled from different sources and conduct a data analysis.

Download Full-text

Algorithm for preprocessing and unification of time series based on machine learning for data structuring

Программные системы и вычислительные методы ◽

10.7256/2454-0714.2020.3.33958 ◽

2020 ◽

pp. 40-50

Author(s):

Andrey Sergeevich Kopyrin ◽

Irina Leonidovna Makarova

Keyword(s):

Time Series ◽

Domain Knowledge ◽

Fuzzy Time Series ◽

Data Set ◽

Structure Information ◽

Combined Use ◽

Primary Documents ◽

Preliminary Preparation ◽

Single Data ◽

Different Sources

The subject of the research is the process of collecting and preliminary preparation of data from heterogeneous sources. Economic information is heterogeneous and semi-structured or unstructured in nature. Due to the heterogeneity of the primary documents, as well as the human factor, the initial statistical data may contain a large amount of noise, as well as records, the automatic processing of which may be very difficult. This makes preprocessing dynamic input data an important precondition for discovering meaningful patterns and domain knowledge, and making the research topic relevant.Data preprocessing is a series of unique tasks that have led to the emergence of various algorithms and heuristic methods for solving preprocessing tasks such as merge and cleanup, identification of variablesIn this work, a preprocessing algorithm is formulated that allows you to bring together into a single database and structure information on time series from different sources. The key modification of the preprocessing method proposed by the authors is the technology of automated data integration.The technology proposed by the authors involves the combined use of methods for constructing a fuzzy time series and machine lexical comparison on the thesaurus network, as well as the use of a universal database built using the MIVAR concept.The preprocessing algorithm forms a single data model with the ability to transform the periodicity and semantics of the data set and integrate data that can come from various sources into a single information bank.

Download Full-text

What prompts users to click on news headlines? Evidence from unobtrusive data analysis

Aslib Journal of Information Management ◽

10.1108/ajim-04-2019-0097 ◽

2019 ◽

Vol 72 (1) ◽

pp. 49-66 ◽

Cited By ~ 2

Author(s):

Tingting Jiang ◽

Qian Guo ◽

Shunchang Chen ◽

Jiaqi Yang

Keyword(s):

Data Analysis ◽

Design Methodology ◽

Information Science ◽

Online News ◽

Clickstream Data ◽

Content Type ◽

Text Length ◽

News Selection ◽

Log File ◽

News Headlines

Purpose The headlines of online news are created carefully to influence audience news selection today. The purpose of this paper is to investigate the relationships between news headline presentation and users’ clicking behavior. Design/methodology/approach Two types of unobtrusive data were collected and analyzed jointly for this purpose. A two-month server log file containing 39,990,200 clickstream records was obtained from an institutional news site. A clickstream data analysis was conducted at the footprint and movement levels, which extracted 98,016 clicks received by 7,120 headlines ever displayed on the homepage. Meanwhile, the presentation of these headlines was characterized from seven dimensions, i.e. position, format, text length, use of numbers, use of punctuation marks, recency and popularity, based on the layout and content crawled from the homepage. Findings This study identified a series of presentation characteristics that prompted users to click on the headlines, including placing them in the central T-shaped zones, using images, increasing text length properly for greater clarity, using visually distinctive punctuation marks, and providing recency and popularity indicators. Originality/value The findings have valuable implications for news providers in attracting clicks to their headlines. Also, the successful application of nonreactive methods has significant implications for future user studies in both information science and journalism.

Download Full-text

Ghouta Timur Pasca Pembebasan Bashar Al-Assad (Kajian Fenomenologi Edmund Husserl)

JURNAL Al-AZHAR INDONESIA SERI HUMANIORA ◽

10.36722/sh.v6i2.559 ◽

2021 ◽

Vol 6 (2) ◽

pp. 91

Author(s):

Fauziyah Kurniawati

Keyword(s):

Data Analysis ◽

News Media ◽

Research Method ◽

Qualitative Method ◽

Edmund Husserl ◽

Descriptive Analysis ◽

Online News ◽

Research Article ◽

Analysis Technique ◽

The Arab Spring

This research article writing aims to describe East Ghouta post the deliverance of Bashar al-Assad based on the perspective of phenomenology study of Edmund Husserl. The issues to be studied are: (1) how did the East Ghouta conflict start, Syria?; and (2) how is Ghouta Timur after the release of Bashar al-Assad ?. The object under study is the national and international online news media. The research method used is qualitative method. Data collection is used with watch and note techniques. Data analysis technique used is descriptive analysis technique. To test the validity of data, the technique used is triangulation technique. The results of this study are: (1) East Ghouta conflicts, Syria started on March 15, 2011. In addition to the background of the Arab Spring events, it turns out the level of emotionality of the President, Bashar al-Assad is quite lit whenever there is something that is not in his heart, which eventually led to hundreds of thousands of civilian lives lost and millions more fled; and (2) after 6 years of slipping into a totally inhumane empire, Ghouta were finally freed from the shackles of their own warden by Bashar al-Assad.Keywords - East Ghouta, deliverance, Bashar al-Assad, phenomenology

Download Full-text

ANALISIS KELAYAKAN PENULISAN BERITA PADA PORTAL BERITA ISLAM ONLINE PANCARAN.NET

Mamba'ul 'Ulum ◽

10.54090/mu.11 ◽

2021 ◽

Vol 17 (1) ◽

pp. 37-48

Author(s):

Nurman Ando Setianas Nugroho

Keyword(s):

Data Analysis ◽

Code Of Ethics ◽

Descriptive Analysis ◽

Online News ◽

Online Media ◽

Descriptive Research ◽

Analysis Process ◽

News Value ◽

News Headlines

This research analyzed the news quality on an Islamic online news portal in Solo, thepancaran.net, and the concern about the quality of Islamic online media in Solo became the reason for this research. This is a descriptive research using qualitative approach., andthe research data analysis used descriptive analysis. The process was carried out since the data were collected;therefore, researchers had started the data analysis process on the field until the research was complete. The analysis usedparameters, whether the news hadfulfilled the elements of news,and thus the news could be said to be in good quality, less quality, or not worthy of publication due to the code of ethics violation. These elements were news value, 5W + 1H systematic, Inverted Pyramid Systematics, News Headlines, News Lead, News Content, News Quotations, and Journalistic Code of Ethics. In the analysis, there were 7 elements fulfilled in the news onpancaran.net, therefore if there was one element that had not been fulfilled, then the news on pancaran.net could be said to be in good quality, sinceit would have been good if these 7 elements had been fulfilled. However, there was one element that was not fulfilled, which was the element of the journalistic code of ethics. It was found on this research that the pancaran.netwebsite was not recommended for online news readers in Solo due to violations of the journalistic code of ethics found in the news.

Download Full-text

Shocking secret you won’t believe! Emotional arousal in clickbait headlines

Online Information Review ◽

10.1108/oir-05-2018-0172 ◽

2019 ◽

Vol 43 (7) ◽

pp. 1136-1150

Author(s):

Supavich (Fone) Pengnate

Keyword(s):

News Media ◽

Emotional Arousal ◽

Online News ◽

Control Group ◽

Pupillary Dilation ◽

Initial Attempt ◽

Content Type ◽

Unobtrusive Measure ◽

Tracking Device ◽

News Headlines

Purpose Clickbait has become a popular strategy for attracting online users by enticing them to follow the link to a particular website to read further. The purpose of this paper is to fill a gap in the literature by providing empirical evidence of how clickbait headlines affect online users’ emotional and behavioral responses, specifically emotional arousal and intention to read news. In addition, it is an early attempt to examine pupillary dilation response as an indicator of emotional arousal in the online news context. Design/methodology/approach An experiment was conducted primarily to examine the levels of emotional arousal evoked by two treatment groups of online news headlines, news and clickbait, compared to a neutral control group. Emotional arousal was assessed using two approaches – pupillary dilation response recorded by an eye-tracking device and the Self-Assessment Manikin (SAM) – and the results were compared. The influence of emotional arousal on intention to read news was hypothesized and tested. Findings The level of emotional arousal evoked by the headlines varies. In general, clickbait headlines generate a higher level of emotional arousal than do the neutral headlines but a lower level than the news headlines. The results also indicate that the level of emotional arousal measured by pupillary dilation response and by SAM are somewhat consistent. Emotional arousal appears to be a significant predictor of intention to read news. Originality/value This study is an initial attempt to investigate how clickbait headlines influence online users’ perceptions and responses, which will be of interest to researchers and news media publishers. The current study also provides evidence for adopting pupillary dilation response, an unobtrusive measure of emotional response, as an alternative methodology for future studies that investigate emotional arousal related to textual information in the online news context.

Download Full-text

Usage of the machine learning to organize time series and find anomalies

E3S Web of Conferences ◽

10.1051/e3sconf/202022401017 ◽

2020 ◽

Vol 224 ◽

pp. 01017

Author(s):

A.S. Kopyrin ◽

E.V. Vidishcheva ◽

Yu.I. Dreizis

Keyword(s):

Time Series ◽

Subject Area ◽

Fuzzy Time Series ◽

Data Set ◽

Structure Information ◽

The Subject ◽

Single Data ◽

Significant Patterns ◽

Automated Data Integration ◽

Different Sources

The subject of the study is the process of collecting, preparing, and searching for anomalies on data from heterogeneous sources. Economic information is naturally heterogeneous and semi-structured or unstructured. This makes pre-processing of input dynamic data an important prerequisite for the detection of significant patterns and knowledge in the subject area, so the topic of research is relevant. Pre-processing of data is several unique problems that have led to the emergence of various algorithms and heuristic methods for solving such pre-processing problems as merging and cleaning and identifying variables. In this work, an algorithm for preprocessing and searching for anomalies using LSTM is formulated, which allows you to consolidate into a single database and structure information by time series from different sources, as well as search for anomalies in an automated mode. A key modification of the preprocessing method proposed by the authors is the technology of automated data integration. The technology proposed by the authors involves the joint use of methods for building a fuzzy time series and machine lexical matching on a thesaurus network, as well as the use of a universal database built using the MIVAR concept. The preprocessing algorithm forms a single data model with the possibility of transforming the periodicity and semantics of the data set and integrating into a single information bank data that can come from various sources.

Download Full-text

Learning to Explain Ambiguous Headlines of Online News

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/588 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tianyu Liu ◽

Wei Wei ◽

Xiaojun Wan

Keyword(s):

State Of The Art ◽

Online News ◽

Svm Classifier ◽

Feature Engineering ◽

Matching Model ◽

Reading Experience ◽

Considerable Portion ◽

Information Gap ◽

Network Methods ◽

News Headlines

With the purpose of attracting clicks, online news publishers and editors use diverse strategies to make their headlines catchy, with a sacrifice of accuracy. Specifically, a considerable portion of news headlines is ambiguous. Such headlines are unclear relative to the content of the story, and largely degrade the reading experience of the audience. In this paper, we focus on dealing with the information gap caused by the ambiguous news headlines. We define a new task of explaining ambiguous headlines with short informative texts, and build a benchmark dataset for evaluation. We address the task by selecting a proper sentence from the news body to resolve the ambiguity in an ambiguous headline. Both feature engineering methods and neural network methods are explored. For feature engineering, we improve a standard SVM classifier with elaborately designed features. For neural networks, we propose an ambiguity-aware neural matching model based on a previous model. Utilizing automatic and manual evaluation metrics, we demonstrate the efficacy and the complementarity of the two methods, and the ambiguity-aware neural matching model achieves the state-of-the-art performance on this challenging task.

Download Full-text

GOVERNMENT COMMUNICATION AND INTERNET RESPONSES: PROFILE OF PRIME MINISTER KRIŠJĀNIS KARIŅŠ IN SELECTED DIGITAL MEDIA USERS’ COMMENTS DURING THE COVID-19 PANDEMIC

Environment Technology Resources Proceedings of the International Scientific and Practical Conference ◽

10.17770/etr2021vol2.6571 ◽

2021 ◽

Vol 2 ◽

pp. 78-83

Author(s):

Vineta Kleinberga

Keyword(s):

Digital Media ◽

News Media ◽

Prime Minister ◽

Qualitative Content Analysis ◽

Online News ◽

Analysis Tool ◽

Learning Program ◽

Digital Tool ◽

Data Set ◽

Government Communication

Perceptions play a pivotal role in assessment of efficiency of government communication. Informed by the strategic narrative conceptual framework this study looks at perception of government communication in Internet comments during three essential dates in conquering the COVID-19 pandemic in Latvia: introduction of emergency situations on March 12 and November 6, 2020, and introduction of a curfew on December 29, 2020. The study uncovers how often and how the main spokesperson in government communication – the Prime Minister of Latvia Krišjānis Kariņš – is framed in comments of three online news media in Latvia (Apollo, Delfi, Tvnet) in Latvian and Russian. Using a digital tool for online comments analysis - the Index of Internet Aggressiveness (IIA), a data set is created of 244 comments, containing a key word “Kariņš” in various cases in Latvian and Russian. Qualitative content analysis is applied to extract and to compare the frequency of appearance and the framing of Kariņš over the course of the pandemic in Latvia. The findings reveal that Kariņš appears in comments significantly more after news in Latvian than in Russian, and has been commented five times more in Delfi than in Tvnet and Apollo together. The comments in Latvian are more aggressive than in Russian, and their emotional tone increases towards the end of 2020. In majority of comments the framing is negative involving attributes of irresponsibility, superficiality, indecisiveness and danger; yet positively framed rigidity and decisiveness of Kariņš can be observed too.IIA is an online comment analysis tool, incorporating a machine learning program, which analyses users’ comments on news on online news sites according to pre-selected keywords to grasp the commenters’ verbal aggressiveness. In March 2021 the IIA data set consists of ~25.08 million comments; ~ 616.62 million word usage in written commenting and ~ 1357.40 thousand news.

Download Full-text

Critical thinking of young citizens towards news headlines in Chile

Comunicar ◽

10.3916/c54-2018-10 ◽

2018 ◽

Vol 26 (54) ◽

pp. 101-110 ◽

Cited By ~ 1

Author(s):

Matthieu Vernier ◽

Luis Cárcamo ◽

Eliana Scheihing

Keyword(s):

Critical Thinking ◽

News Media ◽

Data Science ◽

Online News ◽

Educational Process ◽

Social Mobilization ◽

Brand Name ◽

The Social ◽

The Face ◽

News Headlines

Strengthening critical thinking abilities of citizens in the face of news published on the web represents a key challenge for education. Young citizens appear to be vulnerable in the face of poor quality news or those containing nonexplicit ideologies. In the field of data science, computational and statistical techniques have been developed to automatically collect and characterize online news media in real time. Nevertheless, there is still not a lot of interdisciplinary research on how to design data exploration platforms supporting an educational process of critical citizenship. This article explores this opportunity through a case study analyzing critical thinking ability of students when facing news dealing with the social mobilization “No+APF”. From data collected through 4 online exercises conducted by 75 secondary school students, 55 university students and 25 communication specialists, we investigate to what extent young citizens are able to classify news headlines and ideological orientation of news media outlets. We also question the influence of the media’s brand name and the subjectivity of each participant in regards to the social mobilization “No+APF”. The results underline the importance of group work, the influence of the brand name and the correlation between criticalthinking abilities and having a defined opinion. Fortalecer el pensamiento crítico de ciudadanos frente a noticias de Internet representa un desafío educativo clave. Los jóvenes ciudadanos parecen vulnerables frente a noticias de mala calidad u orientaciones ideológicas poco explícitas. Desde la ciencia de datos se desarrollan técnicas informáticas y estadísticas para recopilar prensa digital en tiempo real y caracterizarla automáticamente. Sin embargo, existe poca investigación interdisciplinar para diseñar plataformas de exploración de datos al servicio de un proceso educativo de ciudadanía crítica. Este artículo investiga esa oportunidad, mediante un estudio de caso en Chile que analiza la capacidad crítica del alumnado frente a noticias de un hecho social relevante: la movilización social «No+AFP». A partir de cuatro tareas en línea ?realizadas por 75 estudiantes de secundaria, 55 estudiantes universitarios y 25 especialistas en comunicación? preguntamos en qué medida los jóvenes son capaces de calificar titulares de prensa y orientaciones ideológicas de medios de comunicación. Por otra parte, analizamos la influencia de la marca del medio y de la subjetividad que, frente al movimiento social «No+AFP», imprime cada participante al pensamiento crítico. Los resultados obtenidos destacan la relevancia del trabajo en grupo, la influencia de la marca del medio de prensa y la correlación entre tener una opinión definida y la capacidad crítica.

Download Full-text

Online News Feed Data Mining and Prediction

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1381.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 409-414

Keyword(s):

Data Mining ◽

Social Media ◽

Online News ◽

Data Set ◽

The Social ◽

News Agencies ◽

Social Media Platforms ◽

Prediction Systems ◽

Prediction Techniques ◽

Different Sources

Data mining and prediction systems have been the center of attraction since information retrieval came into existence. Most IT companies spend a lot of resources on such analysis and systems to improve their performance and generate more revenue depending on the nature of work that they do. Online News Feed Prediction System aims to provide an analysis and comparison of various prediction techniques by using different methods of implementation. UCI repository contains a collection of databases pertaining to different topics. News popularity in multiple social media is one such dataset containing information about news topics from different sources, sentiment analysis of title and headline, topic that they are related to, publishing date, popularity score in various social media platforms. Python, R and Weka have been used on this data set to implement data preprocessing, visualization and prediction techniques like Random Forest, Decision Tree and SVM. Moreover, there is dataset on the analysis of the score for every twenty minutes for the social media platforms chosen. Analysis on these platforms helps in developing a system to reach a wider audience. News agencies can use this system to increase their profit and visibility. This paper aims to realize the ways to obtain these results

Download Full-text