scholarly journals Unifying Privacy Policy Detection

2021 ◽  
Vol 2021 (4) ◽  
pp. 480-499
Author(s):  
Henry Hosseini ◽  
Martin Degeling ◽  
Christine Utz ◽  
Thomas Hupperich

Abstract Privacy policies have become a focal point of privacy research. With their goal to reflect the privacy practices of a website, service, or app, they are often the starting point for researchers who analyze the accuracy of claimed data practices, user understanding of practices, or control mechanisms for users. Due to vast differences in structure, presentation, and content, it is often challenging to extract privacy policies from online resources like websites for analysis. In the past, researchers have relied on scrapers tailored to the specific analysis or task, which complicates comparing results across different studies. To unify future research in this field, we developed a toolchain to process website privacy policies and prepare them for research purposes. The core part of this chain is a detector module for English and German, using natural language processing and machine learning to automatically determine whether given texts are privacy or cookie policies. We leverage multiple existing data sets to refine our approach, evaluate it on a recently published longitudinal corpus, and show that it contains a number of misclassified documents. We believe that unifying data preparation for the analysis of privacy policies can help make different studies more comparable and is a step towards more thorough analyses. In addition, we provide insights into common pitfalls that may lead to invalid analyses.

Data ◽  
2021 ◽  
Vol 6 (5) ◽  
pp. 52
Author(s):  
Maria Nefeli Nikiforos ◽  
Yorghos Voutos ◽  
Anthi Drougani ◽  
Phivos Mylonas ◽  
Katia Lida Kermanidis

Mining social web text has been at the heart of the Natural Language Processing and Data Mining research community in the last 15 years. Though most of the reported work is on widely spoken languages, such as English, the significance of approaches that deal with less commonly spoken languages, such as Greek, is evident for reasons of preserving and documenting minority languages, cultural and ethnic diversity, and identifying intercultural similarities and differences. The present work aims at identifying, documenting and comparing social text data sets, as well as mining techniques and applications on social web text that target Modern Greek, focusing on the arising challenges and the potential for future research in the specific less widely spoken language.


2021 ◽  
Vol 5 ◽  
Author(s):  
Ina Vandebroek ◽  
David Picking ◽  
Jessica Tretina ◽  
Jason West ◽  
Michael Grizzle ◽  
...  

Jamaican root tonics are fermented beverages made with the roots, bark, vines (and dried leaves) of several plant species, many of which are wild-harvested in forest areas of this Caribbean island. These tonics are popular across Jamaica, and also appreciated among the Jamaican diaspora in the United States, Canada, and the United Kingdom. Although plants are the focal point of the ethnobotany of root tonics, interviews with 99 knowledgeable Jamaicans across five parishes of the island, with the goal of documenting their knowledge, perceptions, beliefs, and oral histories, showed that studying these tonics solely from a natural sciences perspective would serve as an injustice to the important sociocultural dimensions and symbolism that surround their use. Jamaican explanations about root tonics are filled with metaphorical expressions about the reciprocity between the qualities of “nature” and the strength of the human body. Furthermore, testimonies about the perceived cultural origins, and reasons for using root tonics, provided valuable insights into the extent of human hardship endured historically during slavery, and the continued struggle experienced by many Jamaicans living a subsistence lifestyle today. On the other hand, the popularity of root tonics is also indicative of the resilience of hard-working Jamaicans, and their quest for bodily and mental strength and health in dealing with socioeconomic and other societal challenges. Half of all study participants considered Rastafari the present-day knowledge holders of Jamaican root tonics. Even though these tonics represent a powerful informal symbol of Jamaican biocultural heritage, they lack official recognition and development for the benefit of local producers and vendors. We therefore used a sustainable development conceptual framework consisting of social, cultural, economic, and ecological pillars, to design a road map for a cottage industry for these artisanal producers. The four steps of this road map (growing production, growing alliances, transitioning into the formal economy, and safeguarding ecological sustainability) provide a starting point for future research and applied projects to promote this biocultural heritage product prepared with Neglected and Underutilized Species (NUS) of plants.


Author(s):  
Ellen Poplavska ◽  
Thomas B. Norton ◽  
Shomir Wilson ◽  
Norman Sadeh

The European Union’s General Data Protection Regulation (GDPR) has compelled businesses and other organizations to update their privacy policies to state specific information about their data practices. Simultaneously, researchers in natural language processing (NLP) have developed corpora and annotation schemes for extracting salient information from privacy policies, often independently of specific laws. To connect existing NLP research on privacy policies with the GDPR, we introduce a mapping from GDPR provisions to the OPP-115 annotation scheme, which serves as the basis for a growing number of projects to automatically classify privacy policy text. We show that assumptions made in the annotation scheme about the essential topics for a privacy policy reflect many of the same topics that the GDPR requires in these documents. This suggests that OPP-115 continues to be representative of the anatomy of a legally compliant privacy policy, and that the legal assumptions behind it represent the elements of data processing that ought to be disclosed within a policy for transparency. The correspondences we show between OPP-115 and the GDPR suggest the feasibility of bridging existing computational and legal research on privacy policies, benefiting both areas.


2021 ◽  
Vol 13 (3) ◽  
pp. 1589
Author(s):  
Juan Sánchez-Fernández ◽  
Luis-Alberto Casado-Aranda ◽  
Ana-Belén Bastidas-Manzano

The limitations of self-report techniques (i.e., questionnaires or surveys) in measuring consumer response to advertising stimuli have necessitated more objective and accurate tools from the fields of neuroscience and psychology for the study of consumer behavior, resulting in the creation of consumer neuroscience. This recent marketing sub-field stems from a wide range of disciplines and applies multiple types of techniques to diverse advertising subdomains (e.g., advertising constructs, media elements, or prediction strategies). Due to its complex nature and continuous growth, this area of research calls for a clear understanding of its evolution, current scope, and potential domains in the field of advertising. Thus, this current research is among the first to apply a bibliometric approach to clarify the main research streams analyzing advertising persuasion using neuroimaging. Particularly, this paper combines a comprehensive review with performance analysis tools of 203 papers published between 1986 and 2019 in outlets indexed by the ISI Web of Science database. Our findings describe the research tools, journals, and themes that are worth considering in future research. The current study also provides an agenda for future research and therefore constitutes a starting point for advertising academics and professionals intending to use neuroimaging techniques.


2021 ◽  
Vol 13 (4) ◽  
pp. 2121 ◽  
Author(s):  
Ingrid Vigna ◽  
Angelo Besana ◽  
Elena Comino ◽  
Alessandro Pezzoli

Although increasing concern about climate change has raised awareness of the fundamental role of forest ecosystems, forests are threatened by human-induced impacts worldwide. Among them, wildfire risk is clearly the result of the interaction between human activities, ecological domains, and climate. However, a clear understanding of these interactions is still needed both at the global and local levels. Numerous studies have proven the validity of the socioecological system (SES) approach in addressing this kind of interdisciplinary issue. Therefore, a systematic review of the existing literature on the application of SES frameworks to forest ecosystems is carried out, with a specific focus on wildfire risk management. The results demonstrate the existence of different methodological approaches that can be grouped into seven main categories, which range from qualitative analysis to quantitative spatially explicit investigations. The strengths and limitations of the approaches are discussed, with a specific reference to the geographical setting of the works. The research suggests the importance of local community involvement and local knowledge consideration in wildfire risk management. This review provides a starting point for future research on forest SES and a supporting tool for the development of a sustainable wildfire risk adaptation and mitigation strategy.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 664
Author(s):  
Nikos Kanakaris ◽  
Nikolaos Giarelis ◽  
Ilias Siachos ◽  
Nikos Karacapilidis

We consider the prediction of future research collaborations as a link prediction problem applied on a scientific knowledge graph. To the best of our knowledge, this is the first work on the prediction of future research collaborations that combines structural and textual information of a scientific knowledge graph through a purposeful integration of graph algorithms and natural language processing techniques. Our work: (i) investigates whether the integration of unstructured textual data into a single knowledge graph affects the performance of a link prediction model, (ii) studies the effect of previously proposed graph kernels based approaches on the performance of an ML model, as far as the link prediction problem is concerned, and (iii) proposes a three-phase pipeline that enables the exploitation of structural and textual information, as well as of pre-trained word embeddings. We benchmark the proposed approach against classical link prediction algorithms using accuracy, recall, and precision as our performance metrics. Finally, we empirically test our approach through various feature combinations with respect to the link prediction problem. Our experimentations with the new COVID-19 Open Research Dataset demonstrate a significant improvement of the abovementioned performance metrics in the prediction of future research collaborations.


2021 ◽  
pp. 1-13
Author(s):  
Qingtian Zeng ◽  
Xishi Zhao ◽  
Xiaohui Hu ◽  
Hua Duan ◽  
Zhongying Zhao ◽  
...  

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.


2021 ◽  
pp. 002203452110120
Author(s):  
C. Gluck ◽  
S. Min ◽  
A. Oyelakin ◽  
M. Che ◽  
E. Horeth ◽  
...  

The parotid, submandibular, and sublingual glands represent a trio of oral secretory glands whose primary function is to produce saliva, facilitate digestion of food, provide protection against microbes, and maintain oral health. While recent studies have begun to shed light on the global gene expression patterns and profiles of salivary glands, particularly those of mice, relatively little is known about the location and identity of transcriptional control elements. Here we have established the epigenomic landscape of the mouse submandibular salivary gland (SMG) by performing chromatin immunoprecipitation sequencing experiments for 4 key histone marks. Our analysis of the comprehensive SMG data sets and comparisons with those from other adult organs have identified critical enhancers and super-enhancers of the mouse SMG. By further integrating these findings with complementary RNA-sequencing based gene expression data, we have unearthed a number of molecular regulators such as members of the Fox family of transcription factors that are enriched and likely to be functionally relevant for SMG biology. Overall, our studies provide a powerful atlas of cis-regulatory elements that can be leveraged for better understanding the transcriptional control mechanisms of the mouse SMG, discovery of novel genetic switches, and modulating tissue-specific gene expression in a targeted fashion.


2021 ◽  
Vol 14 (2) ◽  
pp. 1-36
Author(s):  
Theja K. Arachchi ◽  
Laurianne Sitbon ◽  
Jinglan Zhang ◽  
Ruwan Gamage ◽  
Priyantha Hewagamage

This article presents how young adults with intellectual disability (ID) from Sri Lanka, who had not previously used the Internet, interacted with Google search while enhancing their web search abilities throughout three web search workshops. Considering the little attention paid to the learning needs of people with ID in the current offering of web search learning tools, we iteratively developed a suite of learning tools to support our participants when they need help in the web search workshops. We employed an iterative participatory approach, with observations and semi-structured interviews, to reflect on how to design eLearning tools that enhance the participants’ interactions with web search. The qualitative thematic analysis resulted in five distinct themes on strategies to support, build on, and develop the abilities of young adults with IDs as they engage with Google search in their native language: application of existing abilities, basic skills to match learning needs, conceptual understanding, animations to facilitate visual memory, and promoting active engagement. These themes will be a starting point for understanding participants’ learning needs and behavior on web search, which would be important for future research on learning support as well as on software design.


2021 ◽  
Vol 54 (3) ◽  
pp. 1-35
Author(s):  
Boubakr Nour ◽  
Hakima Khelifi ◽  
Rasheed Hussain ◽  
Spyridon Mastorakis ◽  
Hassine Moungla

Information-Centric Networking (ICN) has recently emerged as a prominent candidate for the Future Internet Architecture (FIA) that addresses existing issues with the host-centric communication model of the current TCP/IP-based Internet. Named Data Networking (NDN) is one of the most recent and active ICN architectures that provides a clean-slate approach for Internet communication. NDN provides intrinsic content security where security is directly provided to the content instead of communication channel. Among other security aspects, Access Control (AC) rules specify the privileges for the entities that can access the content. In TCP/IP-based AC systems, due to the client-server communication model, the servers control which client can access a particular content. In contrast, ICN-based networks use content names to drive communication and decouple the content from its original location. This phenomenon leads to the loss of control over the content, causing different challenges for the realization of efficient AC mechanisms. To date, considerable efforts have been made to develop various AC mechanisms in NDN. In this article, we provide a detailed and comprehensive survey of the AC mechanisms in NDN. We follow a holistic approach towards AC in NDN where we first summarize the ICN paradigm, describe the changes from channel-based security to content-based security, and highlight different cryptographic algorithms and security protocols in NDN. We then classify the existing AC mechanisms into two main categories: Encryption-based AC and Encryption-independent AC . Each category has different classes based on the working principle of AC (e.g., Attribute-based AC, Name-based AC, Identity-based AC). Finally, we present the lessons learned from the existing AC mechanisms and identify the challenges of NDN-based AC at large, highlighting future research directions for the community.


Sign in / Sign up

Export Citation Format

Share Document