scholarly journals Learning to Identify Ambiguous and Misleading News Headlines

Author(s):  
Wei Wei ◽  
Xiaojun Wan

Accuracy is one of the basic principles of journalism. However, it is increasingly hard to manage due to the diversity of news media. Some editors of online news tend to use catchy headlines which trick readers into clicking. These headlines are either ambiguous or misleading, degrading the reading experience of the audience. Thus, identifying inaccurate news headlines is a task worth studying. Previous work names these headlines ``clickbaits'' and mainly focus on the features extracted from the headlines, which limits the performance since the consistency between headlines and news bodies is underappreciated. In this paper, we clearly redefine the problem and identify ambiguous and misleading headlines separately. We utilize class sequential rules to exploit structure information when detecting ambiguous headlines. For the identification of misleading headlines, we extract features based on the congruence between headlines and bodies. To make use of the large unlabeled data set, we apply a co-training method and gain an increase in performance. The experiment results show the effectiveness of our methods. Then we use our classifiers to detect inaccurate headlines crawled from different sources and conduct a data analysis.

Author(s):  
Andrey Sergeevich Kopyrin ◽  
Irina Leonidovna Makarova

The subject of the research is the process of collecting and preliminary preparation of data from heterogeneous sources. Economic information is heterogeneous and semi-structured or unstructured in nature. Due to the heterogeneity of the primary documents, as well as the human factor, the initial statistical data may contain a large amount of noise, as well as records, the automatic processing of which may be very difficult. This makes preprocessing dynamic input data an important precondition for discovering meaningful patterns and domain knowledge, and making the research topic relevant.Data preprocessing is a series of unique tasks that have led to the emergence of various algorithms and heuristic methods for solving preprocessing tasks such as merge and cleanup, identification of variablesIn this work, a preprocessing algorithm is formulated that allows you to bring together into a single database and structure information on time series from different sources. The key modification of the preprocessing method proposed by the authors is the technology of automated data integration.The technology proposed by the authors involves the combined use of methods for constructing a fuzzy time series and machine lexical comparison on the thesaurus network, as well as the use of a universal database built using the MIVAR concept.The preprocessing algorithm forms a single data model with the ability to transform the periodicity and semantics of the data set and integrate data that can come from various sources into a single information bank.


2019 ◽  
Vol 72 (1) ◽  
pp. 49-66 ◽  
Author(s):  
Tingting Jiang ◽  
Qian Guo ◽  
Shunchang Chen ◽  
Jiaqi Yang

Purpose The headlines of online news are created carefully to influence audience news selection today. The purpose of this paper is to investigate the relationships between news headline presentation and users’ clicking behavior. Design/methodology/approach Two types of unobtrusive data were collected and analyzed jointly for this purpose. A two-month server log file containing 39,990,200 clickstream records was obtained from an institutional news site. A clickstream data analysis was conducted at the footprint and movement levels, which extracted 98,016 clicks received by 7,120 headlines ever displayed on the homepage. Meanwhile, the presentation of these headlines was characterized from seven dimensions, i.e. position, format, text length, use of numbers, use of punctuation marks, recency and popularity, based on the layout and content crawled from the homepage. Findings This study identified a series of presentation characteristics that prompted users to click on the headlines, including placing them in the central T-shaped zones, using images, increasing text length properly for greater clarity, using visually distinctive punctuation marks, and providing recency and popularity indicators. Originality/value The findings have valuable implications for news providers in attracting clicks to their headlines. Also, the successful application of nonreactive methods has significant implications for future user studies in both information science and journalism.


2021 ◽  
Vol 6 (2) ◽  
pp. 91
Author(s):  
Fauziyah Kurniawati

<p><strong>This research article writing aims to describe East Ghouta post the deliverance of Bashar al-Assad based on the perspective of phenomenology study of Edmund Husserl. The issues to be studied are: (1) </strong><strong>h</strong><strong>ow did the East Ghouta conflict start, Syria?</strong><strong>; and</strong><strong> (2) </strong><strong>h</strong><strong>ow is Ghouta Timur after the release of Bashar al-Assad ?. The object under study is the national and international online news media. The research method used is qualitative method. Data collection is used with watch and note techniques. Data analysis technique used is descriptive analysis technique. To test the validity of data, the technique used is triangulation technique. The results of this study are: (1) East Ghouta </strong><strong>c</strong><strong>onflicts, Syria started on March 15, 2011. In addition to the background of the Arab Spring events, it turns out the level of emotionality of the President, Bashar al-Assad is quite lit whenever there is something that is not in his heart, which eventually led to hundreds of thousands of civilian lives lost and millions more fled</strong><strong>; and</strong><strong> (2) </strong><strong>a</strong><strong>fter 6 years of slipping into a totally inhumane empire, </strong><strong>Ghouta</strong><strong> were finally freed from the shackles of their own warden by Bashar al-Assad.</strong></p><p><strong><em>Keywords</em></strong> - <em>East Ghouta, deliverance, Bashar al-Assad, phenomenology</em></p>


2021 ◽  
Vol 17 (1) ◽  
pp. 37-48
Author(s):  
Nurman Ando Setianas Nugroho

This research analyzed the news quality on an Islamic online news portal in Solo, thepancaran.net, and the concern about the quality of Islamic online media in Solo became the reason for this research. This is a descriptive research using qualitative approach., andthe research data analysis used descriptive analysis. The process was carried out since the data were collected;therefore, researchers had started the data analysis process on the field until the research was complete. The analysis usedparameters, whether the news hadfulfilled the elements of news,and thus the news could be said to be in good quality, less quality, or not worthy of publication due to the code of ethics violation. These elements were news value, 5W + 1H systematic, Inverted Pyramid Systematics, News Headlines, News Lead, News Content, News Quotations, and Journalistic Code of Ethics. In the analysis, there were 7 elements fulfilled in the news onpancaran.net, therefore if there was one element that had not been fulfilled, then the news on pancaran.net could be said to be in good quality, sinceit would have been good if these 7 elements had been fulfilled. However, there was one element that was not fulfilled, which was the element of the journalistic code of ethics. It was found on this research that the pancaran.netwebsite was not recommended for online news readers in Solo due to violations of the journalistic code of ethics found in the news.


2019 ◽  
Vol 43 (7) ◽  
pp. 1136-1150
Author(s):  
Supavich (Fone) Pengnate

Purpose Clickbait has become a popular strategy for attracting online users by enticing them to follow the link to a particular website to read further. The purpose of this paper is to fill a gap in the literature by providing empirical evidence of how clickbait headlines affect online users’ emotional and behavioral responses, specifically emotional arousal and intention to read news. In addition, it is an early attempt to examine pupillary dilation response as an indicator of emotional arousal in the online news context. Design/methodology/approach An experiment was conducted primarily to examine the levels of emotional arousal evoked by two treatment groups of online news headlines, news and clickbait, compared to a neutral control group. Emotional arousal was assessed using two approaches – pupillary dilation response recorded by an eye-tracking device and the Self-Assessment Manikin (SAM) – and the results were compared. The influence of emotional arousal on intention to read news was hypothesized and tested. Findings The level of emotional arousal evoked by the headlines varies. In general, clickbait headlines generate a higher level of emotional arousal than do the neutral headlines but a lower level than the news headlines. The results also indicate that the level of emotional arousal measured by pupillary dilation response and by SAM are somewhat consistent. Emotional arousal appears to be a significant predictor of intention to read news. Originality/value This study is an initial attempt to investigate how clickbait headlines influence online users’ perceptions and responses, which will be of interest to researchers and news media publishers. The current study also provides evidence for adopting pupillary dilation response, an unobtrusive measure of emotional response, as an alternative methodology for future studies that investigate emotional arousal related to textual information in the online news context.


2020 ◽  
Vol 224 ◽  
pp. 01017
Author(s):  
A.S. Kopyrin ◽  
E.V. Vidishcheva ◽  
Yu.I. Dreizis

The subject of the study is the process of collecting, preparing, and searching for anomalies on data from heterogeneous sources. Economic information is naturally heterogeneous and semi-structured or unstructured. This makes pre-processing of input dynamic data an important prerequisite for the detection of significant patterns and knowledge in the subject area, so the topic of research is relevant. Pre-processing of data is several unique problems that have led to the emergence of various algorithms and heuristic methods for solving such pre-processing problems as merging and cleaning and identifying variables. In this work, an algorithm for preprocessing and searching for anomalies using LSTM is formulated, which allows you to consolidate into a single database and structure information by time series from different sources, as well as search for anomalies in an automated mode. A key modification of the preprocessing method proposed by the authors is the technology of automated data integration. The technology proposed by the authors involves the joint use of methods for building a fuzzy time series and machine lexical matching on a thesaurus network, as well as the use of a universal database built using the MIVAR concept. The preprocessing algorithm forms a single data model with the possibility of transforming the periodicity and semantics of the data set and integrating into a single information bank data that can come from various sources.


Author(s):  
Tianyu Liu ◽  
Wei Wei ◽  
Xiaojun Wan

With the purpose of attracting clicks, online news publishers and editors use diverse strategies to make their headlines catchy, with a sacrifice of accuracy. Specifically, a considerable portion of news headlines is ambiguous. Such headlines are unclear relative to the content of the story, and largely degrade the reading experience of the audience. In this paper, we focus on dealing with the information gap caused by the ambiguous news headlines. We define a new task of explaining ambiguous headlines with short informative texts, and build a benchmark dataset for evaluation. We address the task by selecting a proper sentence from the news body to resolve the ambiguity in an ambiguous headline. Both feature engineering methods and neural network methods are explored. For feature engineering, we improve a standard SVM classifier with elaborately designed features. For neural networks, we propose an ambiguity-aware neural matching model based on a previous model. Utilizing automatic and manual evaluation metrics, we demonstrate the efficacy and the complementarity of the two methods, and the ambiguity-aware neural matching model achieves the state-of-the-art performance on this challenging task.


Author(s):  
Vineta Kleinberga

Perceptions play a pivotal role in assessment of efficiency of government communication. Informed by the strategic narrative conceptual framework this study looks at perception of government communication in Internet comments during three essential dates in conquering the COVID-19 pandemic in Latvia: introduction of emergency situations on March 12 and November 6, 2020, and introduction of a curfew on December 29, 2020. The study uncovers how often and how the main spokesperson in government communication – the Prime Minister of Latvia Krišjānis Kariņš – is framed in comments of three online news media in Latvia (Apollo, Delfi, Tvnet) in Latvian and Russian. Using a digital tool for online comments analysis - the Index of Internet Aggressiveness (IIA), a data set is created of 244 comments, containing a key word “Kariņš” in various cases in Latvian and Russian. Qualitative content analysis is applied to extract and to compare the frequency of appearance and the framing of Kariņš over the course of the pandemic in Latvia. The findings reveal that Kariņš appears in comments significantly more after news in Latvian than in Russian, and has been commented five times more in Delfi than in Tvnet and Apollo together. The comments in Latvian are more aggressive than in Russian, and their emotional tone increases towards the end of 2020. In majority of comments the framing is negative involving attributes of irresponsibility, superficiality, indecisiveness and danger; yet positively framed rigidity and decisiveness of Kariņš can be observed too.IIA is an online comment analysis tool, incorporating a machine learning program, which analyses users’ comments on news on online news sites according to pre-selected keywords to grasp the commenters’ verbal aggressiveness. In March 2021 the IIA data set consists of ~25.08 million comments; ~ 616.62 million word usage in written commenting and ~ 1357.40 thousand news. 


Comunicar ◽  
2018 ◽  
Vol 26 (54) ◽  
pp. 101-110 ◽  
Author(s):  
Matthieu Vernier ◽  
Luis Cárcamo ◽  
Eliana Scheihing

Strengthening critical thinking abilities of citizens in the face of news published on the web represents a key challenge for education. Young citizens appear to be vulnerable in the face of poor quality news or those containing nonexplicit ideologies. In the field of data science, computational and statistical techniques have been developed to automatically collect and characterize online news media in real time. Nevertheless, there is still not a lot of interdisciplinary research on how to design data exploration platforms supporting an educational process of critical citizenship. This article explores this opportunity through a case study analyzing critical thinking ability of students when facing news dealing with the social mobilization “No+APF”. From data collected through 4 online exercises conducted by 75 secondary school students, 55 university students and 25 communication specialists, we investigate to what extent young citizens are able to classify news headlines and ideological orientation of news media outlets. We also question the influence of the media’s brand name and the subjectivity of each participant in regards to the social mobilization “No+APF”. The results underline the importance of group work, the influence of the brand name and the correlation between criticalthinking abilities and having a defined opinion. Fortalecer el pensamiento crítico de ciudadanos frente a noticias de Internet representa un desafío educativo clave. Los jóvenes ciudadanos parecen vulnerables frente a noticias de mala calidad u orientaciones ideológicas poco explícitas. Desde la ciencia de datos se desarrollan técnicas informáticas y estadísticas para recopilar prensa digital en tiempo real y caracterizarla automáticamente. Sin embargo, existe poca investigación interdisciplinar para diseñar plataformas de exploración de datos al servicio de un proceso educativo de ciudadanía crítica. Este artículo investiga esa oportunidad, mediante un estudio de caso en Chile que analiza la capacidad crítica del alumnado frente a noticias de un hecho social relevante: la movilización social «No+AFP». A partir de cuatro tareas en línea ?realizadas por 75 estudiantes de secundaria, 55 estudiantes universitarios y 25 especialistas en comunicación? preguntamos en qué medida los jóvenes son capaces de calificar titulares de prensa y orientaciones ideológicas de medios de comunicación. Por otra parte, analizamos la influencia de la marca del medio y de la subjetividad que, frente al movimiento social «No+AFP», imprime cada participante al pensamiento crítico. Los resultados obtenidos destacan la relevancia del trabajo en grupo, la influencia de la marca del medio de prensa y la correlación entre tener una opinión definida y la capacidad crítica.


Data mining and prediction systems have been the center of attraction since information retrieval came into existence. Most IT companies spend a lot of resources on such analysis and systems to improve their performance and generate more revenue depending on the nature of work that they do. Online News Feed Prediction System aims to provide an analysis and comparison of various prediction techniques by using different methods of implementation. UCI repository contains a collection of databases pertaining to different topics. News popularity in multiple social media is one such dataset containing information about news topics from different sources, sentiment analysis of title and headline, topic that they are related to, publishing date, popularity score in various social media platforms. Python, R and Weka have been used on this data set to implement data preprocessing, visualization and prediction techniques like Random Forest, Decision Tree and SVM. Moreover, there is dataset on the analysis of the score for every twenty minutes for the social media platforms chosen. Analysis on these platforms helps in developing a system to reach a wider audience. News agencies can use this system to increase their profit and visibility. This paper aims to realize the ways to obtain these results


Sign in / Sign up

Export Citation Format

Share Document