How data-sharing nudges influence people's privacy preferences: A machine learning-based analysis

Secure Data Sharing Framework Based on Supervised Machine Learning Detection System for Future SDN-Based Networks

Studies in Computational Intelligence - Machine Intelligence and Big Data Analytics for Cybersecurity Applications ◽

10.1007/978-3-030-57024-8_16 ◽

2020 ◽

pp. 355-371

Author(s):

Anass Sebbar ◽

Karim Zkik ◽

Youssef Baddi ◽

Mohammed Boulmalf ◽

Mohamed Dafir Ech-Cherif El Kettani

Keyword(s):

Machine Learning ◽

Data Sharing ◽

Detection System ◽

Supervised Machine Learning ◽

Secure Data

Download Full-text

Machine learning and EU data-sharing practices: Legal aspects of machine learning training datasets for AI systems

Research Handbook on Big Data Law ◽

10.4337/9781788972826.00028 ◽

2021 ◽

pp. 432-453

Author(s):

Mauritz Kop

Keyword(s):

Machine Learning ◽

Data Sharing ◽

Legal Aspects

Download Full-text

Improved Recommender for Location Privacy Preferences

Computer and Information Science ◽

10.5539/cis.v8n4p64 ◽

2015 ◽

Vol 8 (4) ◽

pp. 64

Author(s):

Anas A. Hadi ◽

Jonathan Cazalas

Keyword(s):

Machine Learning ◽

Collaborative Filtering ◽

Location Privacy ◽

Integrated Model ◽

Location Based Services ◽

Training Data ◽

Smart Devices ◽

Privacy Preferences ◽

Privacy Issues

Location-based services are one of the fastest growing technologies. Millions of users are using these services and sharing their locations using their smart devices. The popularity of using such applications, while enabling others to access user’s location, brings with it many privacy issues. The user has the ability to set his location privacy preferences manually. Many users face difficulties in order to set their preferences in the proper way. One solution is to use machine learning based methods to predict location privacy preferences automatically. These models suffer from degraded performance when there is no sufficient training data. Another solution is to make the decision for the intended user, depending on the collected opinions from similar users. <em>User-User Collaborative Filtering (CF)</em> is an example within this category. In this paper, we will introduce an improved machine learning based predictor. The results show significant improvements in the performance. The accuracy was improved from 75.30% up to 84.82%, while the privacy leak was reduced from 11.75% up to 7.65%. We also introduced an integrated model which combines both machine learning based methods and collaborative filtering based methods in order to get the advantages from both of them.

Download Full-text

Fairness and Ethics in Artificial Intelligence-Based Medical Imagining

10.4018/978-1-7998-7888-9.ch004 ◽

2022 ◽

pp. 71-85

Author(s):

Satvik Tripathi ◽

Thomas Heinrich Musiolik

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Patient Care ◽

Data Sharing ◽

Ethical Issues ◽

Medical Systems ◽

Healthcare Facilities ◽

The World ◽

Potential Applications ◽

Medical Imagining

Artificial intelligence has a huge array of current and potential applications in healthcare and medicine. Ethical issues arising due to algorithmic biases are one of the greatest challenges faced in the generalizability of AI models today. The authors address safety and regulatory barriers that impede data sharing in medicine as well as potential changes to existing techniques and frameworks that might allow ethical data sharing for machine learning. With these developments in view, they also present different algorithmic models that are being used to develop machine learning-based medical systems that will potentially evolve to be free of the sample, annotator, and temporal bias. These AI-based medical imaging models will then be completely implemented in healthcare facilities and institutions all around the world, even in the remotest areas, making diagnosis and patient care both cheaper and freely accessible.

Download Full-text

A Method for Identifying Geospatial Data Sharing Websites by Combining Multi-Source Semantic Information and Machine Learning

Applied Sciences ◽

10.3390/app11188705 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8705

Author(s):

Quanying Cheng ◽

Yunqiang Zhu ◽

Hongyun Zeng ◽

Jia Song ◽

Shu Wang ◽

...

Keyword(s):

Machine Learning ◽

Data Sharing ◽

Semantic Information ◽

Original Data ◽

Geospatial Data ◽

The Internet ◽

Classification Algorithms ◽

Machine Learning Classification ◽

Main Challenge ◽

Sample Data

Geospatial data sharing is an inevitable requirement for scientific and technological innovation and economic and social development decisions in the era of big data. With the development of modern information technology, especially Web 2.0, a large number of geospatial data sharing websites (GDSW) have been developed on the Internet. GDSW is a point of access to geospatial data, which is able to provide a geospatial data inventory. How to precisely identify these data websites is the foundation and prerequisite of sharing and utilizing web geospatial data and is also the main challenge of data sharing at this stage. GDSW identification can be regarded as a binary website classification problem, which can be solved by the current popular machine learning method. However, the websites obtained from the Internet contain a large number of blogs, companies, institutions, etc. If GDSW is directly used as the sample data of machine learning, it will greatly affect the classification precision. For this reason, this paper proposes a method to precisely identify GDSW by combining multi-source semantic information and machine learning. Firstly, based on the keyword set, we used the Baidu search engine to find the websites that may be related to geospatial data in the open web environment. Then, we used the multi-source semantic information of geospatial data content, morphology, sources, and shared websites to filter out a large number of websites that contained geospatial keywords but were not related to geospatial data in the search results through the calculation of comprehensive similarity. Finally, the filtered geospatial data websites were used as the sample data of machine learning, and the GDSWs were identified and evaluated. In this paper, training sets are extracted from the original search data and the data filtered by multi-source semantics, the two datasets are trained by machine learning classification algorithms (KNN, LR, RF, and SVM), and the same test datasets are predicted. The results show that: (1) compared with the four classification algorithms, the classification precision of RF and SVM on the original data is higher than that of the other two algorithms. (2) Taking the data filtered by multi-source semantic information as the sample data for machine learning, the precision of all classification algorithms has been greatly improved. The SVM algorithm has the highest precision among the four classification algorithms. (3) In order to verify the robustness of this method, different initial sample data mentioned above are selected for classification using the same method. The results show that, among the four classification algorithms, the classification precision of SVM is still the highest, which shows that the proposed method is robust and scalable. Therefore, taking the data filtered by multi-source semantic information as the sample data to train through machine learning can effectively improve the classification precision of GDSW, and comparing the four classification algorithms, SVM has the best classification effect. In addition, this method has good robustness, which is of great significance to promote and facilitate the sharing and utilization of open geospatial data.

Download Full-text

Predictive Analytics and Modeling Employing Machine Learning Technology: The Next Step in Data Sharing, Analysis, and Individualized Counseling Explored With a Large, Prospective Prenatal Hydronephrosis Database

Urology ◽

10.1016/j.urology.2018.05.041 ◽

2019 ◽

Vol 123 ◽

pp. 204-209 ◽

Cited By ~ 3

Author(s):

Armando J. Lorenzo ◽

Mandy Rickard ◽

Luis H. Braga ◽

Yanbo Guo ◽

John-Paul Oliveria

Keyword(s):

Machine Learning ◽

Data Sharing ◽

Predictive Analytics ◽

Learning Technology ◽

Prenatal Hydronephrosis

Download Full-text

A Blockchain-Based Federated Learning

Multidisciplinary Functions of Blockchain Technology in AI and IoT Applications - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-5876-8.ch008 ◽

2021 ◽

pp. 158-177

Author(s):

Ankit Khushal Barai ◽

Robin Singh Bhadoria ◽

Jyotshana Bagwari ◽

Ivan A. Perl

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Data Sharing ◽

Training Data ◽

Local Data ◽

Audit Trail ◽

Blockchain Technology ◽

Conventional Machine ◽

Healthcare Finance ◽

Access Policies

Conventional machine learning (ML) needs centralized training data to be present on a given machine or datacenter. The healthcare, finance, and other institutions where data sharing is prohibited require an approach for training ML models in secured architecture. Recently, techniques such as federated learning (FL), MIT Media Lab's Split Neural networks, blockchain, aim to address privacy and regulation of data. However, there are difference between the design principles of FL and the requirements of Institutions like healthcare, finance, etc., which needs blockchain-orchestrated FL having the following features: clients with their local data can define access policies to their data and define how updated weights are to be encrypted between the workers and the aggregator using blockchain technology and also prepares audit trail logs undertaken within network and it keeps actual list of participants hidden. This is expected to remove barriers in a range of sectors including healthcare, finance, security, logistics, governance, operations, and manufacturing.

Download Full-text

London Heathrow Airport Uses Real-Time Analytics for Improving Operations

INFORMS Journal on Applied Analytics ◽

10.1287/inte.2020.1044 ◽

2020 ◽

Vol 50 (5) ◽

pp. 325-339

Author(s):

Xiaojia Guo ◽

Yael Grushka-Cockayne ◽

Bert De Reyck

Keyword(s):

Machine Learning ◽

Decision Making ◽

Real Time ◽

Data Sharing ◽

Time Data ◽

Collaborative Decision Making ◽

Staffing Levels ◽

Real Time Data ◽

Passenger Flows ◽

Collaborative Decision

Improving airport collaborative decision making is at the heart of airport operations centers (APOCs) recently established in several major European airports. In this paper, we describe a project commissioned by Eurocontrol, the organization in charge of the safety and seamless flow of European air traffic. The project’s goal was to examine the opportunities offered by the colocation and real-time data sharing in the APOC at London’s Heathrow airport, arguably the most advanced of its type in Europe. We developed and implemented a pilot study of a real-time data-sharing and collaborative decision-making process, selected to improve the efficiency of Heathrow’s operations. In this paper, we describe the process of how we chose the subject of the pilot, namely the improvement of transfer-passenger flows through the airport, and how we helped Heathrow move from its existing legacy system for managing passenger flows to an advanced machine learning–based approach using real-time inputs. The system, which is now in operation at Heathrow, can predict which passengers are likely to miss their connecting flights, reducing the likelihood that departures will incur delays while waiting for delayed passengers. This can be done by off-loading passengers in advance, by expediting passengers through the airport, or by modifying the departure times of aircraft in advance. By aggregating estimated passenger arrival time at various points throughout the airport, the system also improves passenger experiences at the immigration and security desks by enabling modifications to staffing levels in advance of expected surges in arrivals. The nine-stage framework we present here can support the development and implementation of other real-time, data-driven systems. To the best of our knowledge, the proposed system is the first to use machine learning to model passenger flows in an airport.

Download Full-text

A Cryptography and Machine Learning Based Authentication for Secure Data-Sharing in Federated Cloud Services Environment

Journal of Applied Security Research ◽

10.1080/19361610.2020.1870404 ◽

2021 ◽

pp. 1-24

Author(s):

Ashutosh Kumar Singh ◽

Deepika Saxena

Keyword(s):

Machine Learning ◽

Data Sharing ◽

Cloud Services ◽

Secure Data

Download Full-text

Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa258 ◽

2020 ◽

Author(s):

Carlos Sáez ◽

Nekane Romero ◽

J Alberto Conejero ◽

Juan M García-Gómez

Keyword(s):

Machine Learning ◽

Data Quality ◽

Data Sharing ◽

Training Data ◽

Model Complexity ◽

Systematic Assessment ◽

Level Data ◽

Potential Contributor ◽

Data Source

Abstract Objective The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. Materials and Methods We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. Results Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. Conclusions Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.

Download Full-text