Automated Extraction and Presentation of Data Practices in Privacy Policies

Abstract Privacy policies are documents required by law and regulations that notify users of the collection, use, and sharing of their personal information on services or applications. While the extraction of personal data objects and their usage thereon is one of the fundamental steps in their automated analysis, it remains challenging due to the complex policy statements written in legal (vague) language. Prior work is limited by small/generated datasets and manually created rules. We formulate the extraction of fine-grained personal data phrases and the corresponding data collection or sharing practices as a sequence-labeling problem that can be solved by an entity-recognition model. We create a large dataset with 4.1k sentences (97k tokens) and 2.6k annotated fine-grained data practices from 30 real-world privacy policies to train and evaluate neural networks. We present a fully automated system, called PI-Extract, which accurately extracts privacy practices by a neural model and outperforms, by a large margin, strong rule-based baselines. We conduct a user study on the effects of data practice annotation which highlights and describes the data practices extracted by PI-Extract to help users better understand privacy-policy documents. Our experimental evaluation results show that the annotation significantly improves the users’ reading comprehension of policy texts, as indicated by a 26.6% increase in the average total reading score.

Download Full-text

The Privacy Policy Landscape After the GDPR

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2020-0004 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 47-64 ◽

Cited By ~ 5

Author(s):

Thomas Linden ◽

Rishabh Khandelwal ◽

Hamza Harkous ◽

Kassem Fawaz

Keyword(s):

English Language ◽

User Study ◽

Privacy Policies ◽

General Data Protection Regulation ◽

Data Practices ◽

Before And After ◽

Compliance Requirements ◽

General Data ◽

Transitional Phase ◽

The Eu

AbstractThe EU General Data Protection Regulation (GDPR) is one of the most demanding and comprehensive privacy regulations of all time. A year after it went into effect, we study its impact on the landscape of privacy policies online. We conduct the first longitudinal, in-depth, and at-scale assessment of privacy policies before and after the GDPR. We gauge the complete consumption cycle of these policies, from the first user impressions until the compliance assessment. We create a diverse corpus of two sets of 6,278 unique English-language privacy policies from inside and outside the EU, covering their pre-GDPR and the post-GDPR versions. The results of our tests and analyses suggest that the GDPR has been a catalyst for a major overhaul of the privacy policies inside and outside the EU. This overhaul of the policies, manifesting in extensive textual changes, especially for the EU-based websites, comes at mixed benefits to the users.While the privacy policies have become considerably longer, our user study with 470 participants on Amazon MTurk indicates a significant improvement in the visual representation of privacy policies from the users’ perspective for the EU websites. We further develop a new workflow for the automated assessment of requirements in privacy policies. Using this workflow, we show that privacy policies cover more data practices and are more consistent with seven compliance requirements post the GDPR. We also assess how transparent the organizations are with their privacy practices by performing specificity analysis. In this analysis, we find evidence for positive changes triggered by the GDPR, with the specificity level improving on average. Still, we find the landscape of privacy policies to be in a transitional phase; many policies still do not meet several key GDPR requirements or their improved coverage comes with reduced specificity.

Download Full-text

An effective privacy enhanced interface to support record linkage decisions

International Journal for Population Data Science ◽

10.23889/ijpds.v3i4.889 ◽

2018 ◽

Vol 3 (4) ◽

Author(s):

Hye-Chung Kum ◽

Gurudev Ilangovan ◽

Qinbo Li ◽

Yumei Li ◽

Eric Ragan

Keyword(s):

Record Linkage ◽

Information Disclosure ◽

User Study ◽

Personal Data ◽

Sensitive Information ◽

Cell Level ◽

Fine Grained ◽

Variable Level ◽

Linkage Quality ◽

Accuracy Of Results

IntroductionPrivacy enhanced technologies (PET) are those that measure and protect privacy by preventing unnecessary use of personal data without loss of the functionality of the information system. In practice, implementing such a system requires fine-grained access control so that access can be granted in smaller chunks of data. Objectives and ApproachIn record linkage, PET to date has mostly meant separation of identifiers and sensitive information to allow access to only the necessary part. Moving beyond this current norm, we have designed a privacy enhanced interface to support linkage that discloses only the needed information at the sub variable level, when needed to make good decisions, to reduce exposure of personally identifiable information (PII). The system allows for access to PII both at (1) cell level (e.g., only names of needed people are released) or (2) sub-cell level (e.g., only part of a name, suffix or characters, is released). ResultsIn a user study (N=104) where participants tried to link complicated situations (e.g. twins, Sr/Jr, change of last name) using the interface, we found that users given fully masked data, 0% of information disclosed, were still able to get 75% accuracy using supplemental visual markup. The markups depict data discrepancies such as swapped first and last names, transposed characters, different characters and missing data. More importantly, with this effective interface, we found that there were no statistical difference in accuracy of linkage (84%) or time taken between users with access to all data and those with access to only 30% of the data. We have released a tutorial where users can experience balancing between information disclosure and accuracy of results on sample data. Conclusion/ImplicationsPrivacy is a major public concern when PII is legitimately accessed to link data. Our study demonstrates that a well-designed privacy enhanced interface can significantly reduce exposure of PII to people when resolving ambiguous linkages without compromising linkage quality. This research points to a new direction for PET in record linkage beyond encryption.

Download Full-text

Ambiguities in the Privacy Policies of Common Health and Fitness Apps

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch127 ◽

2021 ◽

pp. 1839-1849

Author(s):

Devjani Sen ◽

Rukhsana Ahmed

Keyword(s):

Personal Information ◽

Personal Data ◽

Online Privacy ◽

Risk Category ◽

Health And Wellness ◽

Privacy Risk ◽

Privacy Policies ◽

Health And Fitness ◽

Privacy Risks ◽

Risk Categories

With a growing number of health and wellness applications (apps), there is a need to explore exactly what third parties can legally do with personal data. Following a review of the online privacy policies of a select set of mobile health and fitness apps, this chapter assessed the privacy policies of four popular health and fitness apps, using a checklist that comprised five privacy risk categories. Privacy risks, were based on two questions: a) is important information missing to make informed decisions about the use of personal data? and b) is information being shared that might compromise the end-user's right to privacy of that information? The online privacy policies of each selected app was further examined to identify important privacy risks. From this, a separate checklist was completed and compared to reach an agreement of the presence or absence of each privacy risk category. This chapter concludes with a set of recommendations when designing privacy policies for the sharing of personal information collected from health and fitness apps.

Download Full-text

Ambiguities in the Privacy Policies of Common Health and Fitness Apps

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch075 ◽

2021 ◽

pp. 1575-1586

Author(s):

Devjani Sen ◽

Rukhsana Ahmed

Keyword(s):

Personal Information ◽

Personal Data ◽

Online Privacy ◽

Risk Category ◽

Health And Wellness ◽

Privacy Risk ◽

Privacy Policies ◽

Health And Fitness ◽

Privacy Risks ◽

Risk Categories

Download Full-text

About privacy and phishing on social networks and the case of Facebook

E-methodology ◽

10.15503/emet.v5i5.429 ◽

2019 ◽

Vol 5 (5) ◽

pp. 100-112

Author(s):

Paolo Di Sia

Keyword(s):

Social Networks ◽

Credit Card ◽

Personal Information ◽

Personal Data ◽

Success Rates ◽

Privacy Policies ◽

Privacy And Security ◽

Password Security ◽

Quantum Internet ◽

Good Percentage

Aim. In recent years, social networks have multiplied on the Internet, becoming more and more used, and consequently raising doubts about the security of privacy. This exponential development has attracted the attention of bad-intentioneds too. The aim of the research is to undestand how “attack algorithms” can violate the privacy of millions of people, despite privacy policies which do not allow their use. Methods. Considering an analysis on password security on Facebook, I evaluate the problems connected with the use of an attack algorithm in relation to privacy and security. Results. Over the years, Facebook privacy policies have been changed, but with new services it is still possible to trace personal information. Using special phishing techniques it is possible to get the access credentials of a good percentage of users. This allows attackers to perform online transactions, view bank accounts and their transactions, call details, credit card numbers and many other personal data. Conclusions. Waiting for the power of the future quantum Internet, it is unfortunately possible today to launch an attack exploiting the analysed techniques and even improve them, making them more effective and reaching even higher success rates, thus placing a very high number of users in serious danger.

Download Full-text

Ethical Ambiguities in the Privacy Policies of Mobile Health and Fitness Applications

Advances in Computer and Electrical Engineering - Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics ◽

10.4018/978-1-5225-7598-6.ch063 ◽

2019 ◽

pp. 877-888

Author(s):

Devjani Sen ◽

Rukhsana Ahmed

Keyword(s):

Mobile Health ◽

Mobile Applications ◽

Personal Information ◽

Personal Data ◽

Third Parties ◽

Privacy Policies ◽

Email Address ◽

Health And Fitness ◽

The Creation

Personal applications (apps) collect all sorts of personal information like name, email address, age, height, weight, and in some cases, detailed health information. When using such apps, many users trustfully log everything from diet to sleep patterns. Studies suggest that many applications do not have a privacy policy, or users do not have access to an app's permissions before s/he downloads it to the mobile device. This raises questions regarding the ethics around sharing personal data gathered from health and fitness apps to third parties. Despite the important role of informed consent in the creation of health and fitness mobile applications, the intersection of ethics and sharing of personal information is understudied and is an often-ignored topic during the creation of mobile applications. After reviewing the online privacy policies of four mobile health and fitness apps, this chapter concludes with a set of recommendations when designing privacy policies to share personal information collected from health and fitness apps.

Download Full-text

Ethical Ambiguities in the Privacy Policies of Mobile Health and Fitness Applications

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch528 ◽

2018 ◽

pp. 6083-6093

Author(s):

Devjani Sen ◽

Rukhsana Ahmed

Keyword(s):

Mobile Health ◽

Mobile Applications ◽

Personal Information ◽

Personal Data ◽

Third Parties ◽

Privacy Policies ◽

Email Address ◽

Health And Fitness ◽

The Creation

Personal Applications (apps) collect all sorts of personal information like name, email address, age, height, weight and in some cases detailed health information. When using such apps, many users trustfully log everything from diet to sleep patterns. Studies suggest that many applications do not have a privacy policy, or users do not have access to an app's permissions before s/he downloads it to the mobile device. This raises questions regarding the ethics around sharing personal data gathered from health and fitness apps to third parties. Despite the important role of informed consent in the creation of health and fitness mobile applications, the intersection of ethics and sharing of personal information is understudied and is an often-ignored topic during the creation of mobile applications. After reviewing the online privacy policies of four mobile health and fitness apps, this chapter concludes with a set of recommendations when designing privacy policies to share personal information collected from health and fitness apps.

Download Full-text

Crowdsourcing privacy policy analysis: Potential, challenges and best practices

it - Information Technology ◽

10.1515/itit-2016-0009 ◽

2016 ◽

Vol 58 (5) ◽

Cited By ~ 2

Author(s):

Florian Schaub ◽

Travis D. Breaux ◽

Norman Sadeh

Keyword(s):

Best Practices ◽

Policy Analysis ◽

Personal Information ◽

Lessons Learned ◽

Privacy Policies ◽

Policy Documents ◽

Specific Data ◽

Data Practices ◽

Research Challenges

AbstractPrivacy policies are supposed to provide transparency about a service's data practices and help consumers make informed choices about which services to entrust with their personal information. In practice, those privacy policies are typically long and complex documents that are largely ignored by consumers. Even for regulators and data protection authorities privacy policies are difficult to assess at scale. Crowdsourcing offers the potential to scale the analysis of privacy policies with microtasks, for instance by assessing how specific data practices are addressed in privacy policies or extracting information about data practices of interest, which can then facilitate further analysis or be provided to users in more effective notice formats. Crowdsourcing the analysis of complex privacy policy documents to non-expert crowdworkers poses particular challenges. We discuss best practices, lessons learned and research challenges for crowdsourcing privacy policy analysis.

Download Full-text

Automatic Assessment of Privacy Policies under the GDPR

Applied Sciences ◽

10.3390/app11041762 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1762

Author(s):

David Sánchez ◽

Alexandre Viejo ◽

Montserrat Batet

Keyword(s):

Data Protection ◽

Personal Data ◽

Automatic Assessment ◽

Privacy Policies ◽

Policy Compliance ◽

General Data Protection Regulation ◽

Fine Grained ◽

Protection Goals ◽

General Data ◽

The Eu

To comply with the EU General Data Protection Regulation (GDPR), companies managing personal data have been forced to review their privacy policies. However, privacy policies will not solve any problems as long as users do not read or are not able to understand them. In order to assist users in both issues, we present a system that automatically assesses privacy policies. Our proposal quantifies the degree of policy compliance with respect to the data protection goals stated by the GPDR and presents clear and intuitive privacy scores to the user. In this way, users will become immediately aware of the risks associated with the services and their severity; this will empower them to take informed decisions when accepting (or not) the terms of a service. We leverage manual annotations and machine learning to train a model that automatically tags privacy policies according to their compliance (or not) with the data protection goals of the GDPR. In contrast with related works, we define clear annotation criteria consistent with the GDPR, and this enables us not only to provide aggregated scores, but also fine-grained ratings that help to understand the reasons of the assessment. The latter is aligned with the concept of explainable artificial intelligence. We have applied our method to the policies of 10 well-known internet services. Our scores are sound and consistent with the results reported in related works.

Download Full-text

Electronic Commerce and Data Privacy

The Economic and Social Impacts of E-Commerce ◽

10.4018/978-1-59140-043-1.ch012 ◽

2003 ◽

pp. 213-238 ◽

Cited By ~ 1

Author(s):

Sandra C. Henderson ◽

Charles A. Snyder ◽

Terry A. Byrd

Keyword(s):

Electronic Commerce ◽

Data Privacy ◽

Government Regulation ◽

Personal Information ◽

Self Regulation ◽

Personal Data ◽

Privacy Policies ◽

Online Questionnaire ◽

Privacy Concerns ◽

Consumer Privacy

Electronic commerce (e-commerce) has had a profound effect on the way we conduct business. It has impacted economies, markets, industry structures, and the flow of products through the supply chain. Despite the phenomenal growth of e-commerce and the potential impact on the revenues of businesses, there are problems with the capabilities of this technology. Organizations are amassing huge quantities of personal data about consumers. As a result, consumers are very concerned about the protection of their personal information and they want something done about the problem.This study examined the relationships between consumer privacy concerns, actual e-commerce activity, the importance of privacy policies, and regulatory preference. Using a model developed from existing literature and theory, an online questionnaire was developed to gauge the concerns of consumers. The results indicated that consumers are concerned about the protection of their personal information and feel that privacy policies are important. Consumers also indicated that they preferred government regulation to industry self-regulation to protect their personal information.

Download Full-text