scholarly journals Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology Literature

2021 ◽  
Vol 4 (3) ◽  
pp. 1-56
Author(s):  
Agathe Balayn ◽  
Jie Yang ◽  
Zoltan Szlavik ◽  
Alessandro Bozzon

The automatic detection of conflictual languages (harmful, aggressive, abusive, and offensive languages) is essential to provide a healthy conversation environment on the Web. To design and develop detection systems that are capable of achieving satisfactory performance, a thorough understanding of the nature and properties of the targeted type of conflictual language is of great importance. The scientific communities investigating human psychology and social behavior have studied these languages in details, but their insights have only partially reached the computer science community. In this survey, we aim both at systematically characterizing the conceptual properties of online conflictual languages, and at investigating the extent to which they are reflected in state-of-the-art automatic detection systems. Through an analysis of psychology literature, we provide a reconciled taxonomy that denotes the ensemble of conflictual languages typically studied in computer science. We then characterize the conceptual mismatches that can be observed in the main semantic and contextual properties of these languages and their treatment in computer science works; and systematically uncover resulting technical biases in the design of machine learning classification models and the dataset created for their training. Finally, we discuss diverse research opportunities for the computer science community and reflect on broader technical and structural issues.

Author(s):  
Shintaro Yamamoto ◽  
Anne Lauscher ◽  
Simone Paolo Ponzetto ◽  
Goran Glavaš ◽  
Shigeo Morishima

The exponential growth of scientific literature yields the need to support users to both effectively and efficiently analyze and understand the some body of research work. This exploratory process can be facilitated by providing graphical abstracts–a visual summary of a scientific publication. Accordingly, previous work recently presented an initial study on automatic identification of a central figure in a scientific publication, to be used as the publication’s visual summary. This study, however, have been limited only to a single (biomedical) domain. This is primarily because the current state-of-the-art relies on supervised machine learning, typically relying on the existence of large amounts of labeled data: the only existing annotated data set until now covered only the biomedical publications. In this work, we build a novel benchmark data set for visual summary identification from scientific publications, which consists of papers presented at conferences from several areas of computer science. We couple this contribution with a new self-supervised learning approach to learn a heuristic matching of in-text references to figures with figure captions. Our self-supervised pre-training, executed on a large unlabeled collection of publications, attenuates the need for large annotated data sets for visual summary identification and facilitates domain transfer for this task. We evaluate our self-supervised pretraining for visual summary identification on both the existing biomedical and our newly presented computer science data set. The experimental results suggest that the proposed method is able to outperform the previous state-of-the-art without any task-specific annotations.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 325
Author(s):  
Zhihao Wu ◽  
Baopeng Zhang ◽  
Tianchen Zhou ◽  
Yan Li ◽  
Jianping Fan

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.


2021 ◽  
Vol 19 (6) ◽  
pp. 952-960
Author(s):  
Gisela De La Fuente Cortes ◽  
Jose Alejandro Diaz-Mendez ◽  
Guillermo Espinosa Flores-Verdad ◽  
Victor Rodolfo Gonzalez-Diaz

2021 ◽  
Vol 13 (2) ◽  
pp. 50
Author(s):  
Hamed Z. Jahromi ◽  
Declan Delaney ◽  
Andrew Hines

Content is a key influencing factor in Web Quality of Experience (QoE) estimation. A web user’s satisfaction can be influenced by how long it takes to render and visualize the visible parts of the web page in the browser. This is referred to as the Above-the-fold (ATF) time. SpeedIndex (SI) has been widely used to estimate perceived web page loading speed of ATF content and a proxy metric for Web QoE estimation. Web application developers have been actively introducing innovative interactive features, such as animated and multimedia content, aiming to capture the users’ attention and improve the functionality and utility of the web applications. However, the literature shows that, for the websites with animated content, the estimated ATF time using the state-of-the-art metrics may not accurately match completed ATF time as perceived by users. This study introduces a new metric, Plausibly Complete Time (PCT), that estimates ATF time for a user’s perception of websites with and without animations. PCT can be integrated with SI and web QoE models. The accuracy of the proposed metric is evaluated based on two publicly available datasets. The proposed metric holds a high positive Spearman’s correlation (rs=0.89) with the Perceived ATF reported by the users for websites with and without animated content. This study demonstrates that using PCT as a KPI in QoE estimation models can improve the robustness of QoE estimation in comparison to using the state-of-the-art ATF time metric. Furthermore, experimental result showed that the estimation of SI using PCT improves the robustness of SI for websites with animated content. The PCT estimation allows web application designers to identify where poor design has significantly increased ATF time and refactor their implementation before it impacts end-user experience.


Designs ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 42
Author(s):  
Eric Lazarski ◽  
Mahmood Al-Khassaweneh ◽  
Cynthia Howard

In recent years, disinformation and “fake news” have been spreading throughout the internet at rates never seen before. This has created the need for fact-checking organizations, groups that seek out claims and comment on their veracity, to spawn worldwide to stem the tide of misinformation. However, even with the many human-powered fact-checking organizations that are currently in operation, disinformation continues to run rampant throughout the Web, and the existing organizations are unable to keep up. This paper discusses in detail recent advances in computer science to use natural language processing to automate fact checking. It follows the entire process of automated fact checking using natural language processing, from detecting claims to fact checking to outputting results. In summary, automated fact checking works well in some cases, though generalized fact checking still needs improvement prior to widespread use.


2020 ◽  
pp. 1-21 ◽  
Author(s):  
Clément Dalloux ◽  
Vincent Claveau ◽  
Natalia Grabar ◽  
Lucas Emanuel Silva Oliveira ◽  
Claudia Maria Cabral Moro ◽  
...  

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
André D. Gomes ◽  
Jens Kobelke ◽  
Jörg Bierlich ◽  
Jan Dellith ◽  
Manfred Rothhardt ◽  
...  

Abstract The optical Vernier effect consists of overlapping responses of a sensing and a reference interferometer with slightly shifted interferometric frequencies. The beating modulation thus generated presents high magnified sensitivity and resolution compared to the sensing interferometer, if the two interferometers are slightly out of tune with each other. However, the outcome of such a condition is a large beating modulation, immeasurable by conventional detection systems due to practical limitations of the usable spectral range. We propose a method to surpass this limitation by using a few-mode sensing interferometer instead of a single-mode one. The overlap response of the different modes produces a measurable envelope, whilst preserving an extremely high magnification factor, an order of magnification higher than current state-of-the-art performances. Furthermore, we demonstrate the application of that method in the development of a giant sensitivity fibre refractometer with a sensitivity of around 500 µm/RIU (refractive index unit) and with a magnification factor over 850.


ReCALL ◽  
1999 ◽  
Vol 11 (S1) ◽  
pp. 31-39
Author(s):  
Pierre-Yves Foucou ◽  
Natalie Kübler

In this paper, we present the Web-based CALL environment (or WALL) which is currently being experimented with at the University of Paris 13 in the Computer Science Department of the Institut Universitaire de Technologie. Our environment is being developed to teach computer science (CS) English to CS French-speaking students, and will be extended to other languages for specific purposes such as, for example, English or French for banking, law, economics or medicine, where on-line resources are available.English, and more precisely CS English is, for our students, a necessary tool, and not an object of study. The learning activities must therefore stimulate the students' interest and reflection about language phenomena. Our pedagogical objective, relying on research acquisition (Wokusch 1997) consists in linking various texts together with other documents, such as different types of dictionaries or other types of texts, so that knowledge can be acquired using various appropriate contexts.Language teachers are not supposed to be experts in fields such as computer sciences or economics. We aim at helping them to make use of the authentic documents that are related to the subject area in which they teach English. As shown in Foucou and Kübler (1998) the wide range of resources available on the Web can be processed to obtain corpora, i.e. teaching material. Our Web-based environment therefore provides teachers with a series of tools which enable them to access information about the selected specialist subject, select appropriate specialised texts, produce various types of learning activities and evaluate students' progress.Commonly used textbooks Tor specialised English offer a wide range of learning activities, but they are based on documents that very quickly become obsolete, and that are sometimes widely modified. Moreover, they are not adaptable to the various levels of language of the students. From the students' point of view, working on obsolete texts that are either too easy or too difficult can quickly become demotivating, not to say boring.In the next section, we present the general architecture of the teaching/learning environment; the method of accessing and using it, for teachers as well as for students, is then described. The following section deals with the actual production of exercises and their limits. We conclude and present some possible research directions.


Sign in / Sign up

Export Citation Format

Share Document