scholarly journals How data-sharing nudges influence people's privacy preferences: A machine learning-based analysis

Author(s):  
Yang Lu ◽  
Shujun Li ◽  
Alex Freitas ◽  
Athina Ioannou
2015 ◽  
Vol 8 (4) ◽  
pp. 64
Author(s):  
Anas A. Hadi ◽  
Jonathan Cazalas

Location-based services are one of the fastest growing technologies. Millions of users are using these services and sharing their locations using their smart devices. The popularity of using such applications, while enabling others to access user’s location, brings with it many privacy issues. The user has the ability to set his location privacy preferences manually. Many users face difficulties in order to set their preferences in the proper way. One solution is to use machine learning based methods to predict location privacy preferences automatically. These models suffer from degraded performance when there is no sufficient training data. Another solution is to make the decision for the intended user, depending on the collected opinions from similar users. <em>User-User Collaborative Filtering (CF)</em> is an example within this category. In this paper, we will introduce an improved machine learning based predictor. The results show significant improvements in the performance. The accuracy was improved from 75.30% up to 84.82%, while the privacy leak was reduced from 11.75% up to 7.65%. We also introduced an integrated model which combines both machine learning based methods and collaborative filtering based methods in order to get the advantages from both of them.


2022 ◽  
pp. 71-85
Author(s):  
Satvik Tripathi ◽  
Thomas Heinrich Musiolik

Artificial intelligence has a huge array of current and potential applications in healthcare and medicine. Ethical issues arising due to algorithmic biases are one of the greatest challenges faced in the generalizability of AI models today. The authors address safety and regulatory barriers that impede data sharing in medicine as well as potential changes to existing techniques and frameworks that might allow ethical data sharing for machine learning. With these developments in view, they also present different algorithmic models that are being used to develop machine learning-based medical systems that will potentially evolve to be free of the sample, annotator, and temporal bias. These AI-based medical imaging models will then be completely implemented in healthcare facilities and institutions all around the world, even in the remotest areas, making diagnosis and patient care both cheaper and freely accessible.


2021 ◽  
Vol 11 (18) ◽  
pp. 8705
Author(s):  
Quanying Cheng ◽  
Yunqiang Zhu ◽  
Hongyun Zeng ◽  
Jia Song ◽  
Shu Wang ◽  
...  

Geospatial data sharing is an inevitable requirement for scientific and technological innovation and economic and social development decisions in the era of big data. With the development of modern information technology, especially Web 2.0, a large number of geospatial data sharing websites (GDSW) have been developed on the Internet. GDSW is a point of access to geospatial data, which is able to provide a geospatial data inventory. How to precisely identify these data websites is the foundation and prerequisite of sharing and utilizing web geospatial data and is also the main challenge of data sharing at this stage. GDSW identification can be regarded as a binary website classification problem, which can be solved by the current popular machine learning method. However, the websites obtained from the Internet contain a large number of blogs, companies, institutions, etc. If GDSW is directly used as the sample data of machine learning, it will greatly affect the classification precision. For this reason, this paper proposes a method to precisely identify GDSW by combining multi-source semantic information and machine learning. Firstly, based on the keyword set, we used the Baidu search engine to find the websites that may be related to geospatial data in the open web environment. Then, we used the multi-source semantic information of geospatial data content, morphology, sources, and shared websites to filter out a large number of websites that contained geospatial keywords but were not related to geospatial data in the search results through the calculation of comprehensive similarity. Finally, the filtered geospatial data websites were used as the sample data of machine learning, and the GDSWs were identified and evaluated. In this paper, training sets are extracted from the original search data and the data filtered by multi-source semantics, the two datasets are trained by machine learning classification algorithms (KNN, LR, RF, and SVM), and the same test datasets are predicted. The results show that: (1) compared with the four classification algorithms, the classification precision of RF and SVM on the original data is higher than that of the other two algorithms. (2) Taking the data filtered by multi-source semantic information as the sample data for machine learning, the precision of all classification algorithms has been greatly improved. The SVM algorithm has the highest precision among the four classification algorithms. (3) In order to verify the robustness of this method, different initial sample data mentioned above are selected for classification using the same method. The results show that, among the four classification algorithms, the classification precision of SVM is still the highest, which shows that the proposed method is robust and scalable. Therefore, taking the data filtered by multi-source semantic information as the sample data to train through machine learning can effectively improve the classification precision of GDSW, and comparing the four classification algorithms, SVM has the best classification effect. In addition, this method has good robustness, which is of great significance to promote and facilitate the sharing and utilization of open geospatial data.


Author(s):  
Ankit Khushal Barai ◽  
Robin Singh Bhadoria ◽  
Jyotshana Bagwari ◽  
Ivan A. Perl

Conventional machine learning (ML) needs centralized training data to be present on a given machine or datacenter. The healthcare, finance, and other institutions where data sharing is prohibited require an approach for training ML models in secured architecture. Recently, techniques such as federated learning (FL), MIT Media Lab's Split Neural networks, blockchain, aim to address privacy and regulation of data. However, there are difference between the design principles of FL and the requirements of Institutions like healthcare, finance, etc., which needs blockchain-orchestrated FL having the following features: clients with their local data can define access policies to their data and define how updated weights are to be encrypted between the workers and the aggregator using blockchain technology and also prepares audit trail logs undertaken within network and it keeps actual list of participants hidden. This is expected to remove barriers in a range of sectors including healthcare, finance, security, logistics, governance, operations, and manufacturing.


2020 ◽  
Vol 50 (5) ◽  
pp. 325-339
Author(s):  
Xiaojia Guo ◽  
Yael Grushka-Cockayne ◽  
Bert De Reyck

Improving airport collaborative decision making is at the heart of airport operations centers (APOCs) recently established in several major European airports. In this paper, we describe a project commissioned by Eurocontrol, the organization in charge of the safety and seamless flow of European air traffic. The project’s goal was to examine the opportunities offered by the colocation and real-time data sharing in the APOC at London’s Heathrow airport, arguably the most advanced of its type in Europe. We developed and implemented a pilot study of a real-time data-sharing and collaborative decision-making process, selected to improve the efficiency of Heathrow’s operations. In this paper, we describe the process of how we chose the subject of the pilot, namely the improvement of transfer-passenger flows through the airport, and how we helped Heathrow move from its existing legacy system for managing passenger flows to an advanced machine learning–based approach using real-time inputs. The system, which is now in operation at Heathrow, can predict which passengers are likely to miss their connecting flights, reducing the likelihood that departures will incur delays while waiting for delayed passengers. This can be done by off-loading passengers in advance, by expediting passengers through the airport, or by modifying the departure times of aircraft in advance. By aggregating estimated passenger arrival time at various points throughout the airport, the system also improves passenger experiences at the immigration and security desks by enabling modifications to staffing levels in advance of expected surges in arrivals. The nine-stage framework we present here can support the development and implementation of other real-time, data-driven systems. To the best of our knowledge, the proposed system is the first to use machine learning to model passenger flows in an airport.


Author(s):  
Carlos Sáez ◽  
Nekane Romero ◽  
J Alberto Conejero ◽  
Juan M García-Gómez

Abstract Objective The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. Materials and Methods We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. Results Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. Conclusions Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.


Sign in / Sign up

Export Citation Format

Share Document