Machine learning and EU data-sharing practices: Legal aspects of machine learning training datasets for AI systems

2021 ◽  
pp. 432-453
Author(s):  
Mauritz Kop
Author(s):  
Cahyo Trianggoro ◽  
Tupan Tupan

Research data sharing activities provide many benefits to the research ecosystem. However, in the Indonesian context, there is a lack of policy in regulating research data sharing mechanisms which makes researchers reluctant to undertake the practice of data sharing. Research funders and research institutions play a critical role in developing data-sharing policies. Research related to the policy of research data sharing is important in order to design policies to encourage the practice of research data sharing. A systematic literature review was conducted to see how data-sharing policies were formulated and implemented in various research institutions. The data were taken from Scopus and Dimension indexers using controlled vocabulary. The roles of research institutions and funders as well as policy instruments were analyzed to see patterns that occur between the parties. We examine 23 articles containing data sharing policies. it was found that the funders have the greatest role in determining the design of the data sharing policy. Funders view that research data is an asset in research funded by public funding so that the benefits must be returned to the community. Research institutes play a role as a provider of research infrastructure that contributes to data creation. Meanwhile, researchers as research actors need to provide input in developing data sharing mechanisms and regulating data sensitivity aspects and legal aspects in research data sharing.


2020 ◽  
Author(s):  
Johannes Kirchebner ◽  
Moritz Günther ◽  
Martina Sonnweber ◽  
Alice King ◽  
Steffen Lau

Abstract Background: Prolonged forensic psychiatric hospitalizations have raised ethical, economic, and clinical concerns. Due to the confounded nature of factors affecting length of stay of psychiatric offender patients, prior research has called for the application of a new statistical methodology better accommodating this data structure. The present study attempts to investigate factors contributing to long-term hospitalization of schizophrenic offenders referred to a Swiss forensic institution, using machine learning algorithms that are better suited than conventional methods to detect nonlinear dependencies between variables. Methods: In this retrospective file and registry study, multidisciplinary notes of 143 schizophrenic offenders were reviewed using a structured protocol on patients’ characteristics, criminal and medical history and course of treatment. Via a forward selection procedure, the most influential factors for length of stay were preselected. Machine learning algorithms then identified the most efficient model for predicting length-of-stay. Results: Two factors have been identified as being particularly influential for a prolonged forensic hospital stay, both of which are related to aspects of the index offense, namely (attempted) homicide and the extent of the victim's injury. The results are discussed in light of previous research on this topic. Conclusions: In this study, length of stay was determined by legal considerations, but not by factors that can be influenced therapeutically. Results emphasize that forensic risk assessments should be based on different evaluation criteria and not merely on legal aspects.


2022 ◽  
pp. 71-85
Author(s):  
Satvik Tripathi ◽  
Thomas Heinrich Musiolik

Artificial intelligence has a huge array of current and potential applications in healthcare and medicine. Ethical issues arising due to algorithmic biases are one of the greatest challenges faced in the generalizability of AI models today. The authors address safety and regulatory barriers that impede data sharing in medicine as well as potential changes to existing techniques and frameworks that might allow ethical data sharing for machine learning. With these developments in view, they also present different algorithmic models that are being used to develop machine learning-based medical systems that will potentially evolve to be free of the sample, annotator, and temporal bias. These AI-based medical imaging models will then be completely implemented in healthcare facilities and institutions all around the world, even in the remotest areas, making diagnosis and patient care both cheaper and freely accessible.


2017 ◽  
Vol 27 (7) ◽  
pp. 42-42
Author(s):  
Elliot Fry

2021 ◽  
Vol 11 (18) ◽  
pp. 8705
Author(s):  
Quanying Cheng ◽  
Yunqiang Zhu ◽  
Hongyun Zeng ◽  
Jia Song ◽  
Shu Wang ◽  
...  

Geospatial data sharing is an inevitable requirement for scientific and technological innovation and economic and social development decisions in the era of big data. With the development of modern information technology, especially Web 2.0, a large number of geospatial data sharing websites (GDSW) have been developed on the Internet. GDSW is a point of access to geospatial data, which is able to provide a geospatial data inventory. How to precisely identify these data websites is the foundation and prerequisite of sharing and utilizing web geospatial data and is also the main challenge of data sharing at this stage. GDSW identification can be regarded as a binary website classification problem, which can be solved by the current popular machine learning method. However, the websites obtained from the Internet contain a large number of blogs, companies, institutions, etc. If GDSW is directly used as the sample data of machine learning, it will greatly affect the classification precision. For this reason, this paper proposes a method to precisely identify GDSW by combining multi-source semantic information and machine learning. Firstly, based on the keyword set, we used the Baidu search engine to find the websites that may be related to geospatial data in the open web environment. Then, we used the multi-source semantic information of geospatial data content, morphology, sources, and shared websites to filter out a large number of websites that contained geospatial keywords but were not related to geospatial data in the search results through the calculation of comprehensive similarity. Finally, the filtered geospatial data websites were used as the sample data of machine learning, and the GDSWs were identified and evaluated. In this paper, training sets are extracted from the original search data and the data filtered by multi-source semantics, the two datasets are trained by machine learning classification algorithms (KNN, LR, RF, and SVM), and the same test datasets are predicted. The results show that: (1) compared with the four classification algorithms, the classification precision of RF and SVM on the original data is higher than that of the other two algorithms. (2) Taking the data filtered by multi-source semantic information as the sample data for machine learning, the precision of all classification algorithms has been greatly improved. The SVM algorithm has the highest precision among the four classification algorithms. (3) In order to verify the robustness of this method, different initial sample data mentioned above are selected for classification using the same method. The results show that, among the four classification algorithms, the classification precision of SVM is still the highest, which shows that the proposed method is robust and scalable. Therefore, taking the data filtered by multi-source semantic information as the sample data to train through machine learning can effectively improve the classification precision of GDSW, and comparing the four classification algorithms, SVM has the best classification effect. In addition, this method has good robustness, which is of great significance to promote and facilitate the sharing and utilization of open geospatial data.


2020 ◽  
Author(s):  
Johannes Kirchebner ◽  
Moritz Günther ◽  
Martina Sonnweber ◽  
Alice King ◽  
Steffen Lau

Abstract Background: Prolonged forensic psychiatric hospitalizations have raised ethical, economic, and clinical concerns. Due to the confounded nature of factors affecting length of stay of psychiatric offender patients, prior research has called for the application of a new statistical methodology better accommodating this data structure. The present study attempts to investigate factors contributing to long-term hospitalization of schizophrenic offenders referred to a Swiss forensic institution, using machine learning algorithms that are better suited than conventional methods to detect nonlinear dependencies between variables. Methods: In this retrospective file and registry study, multidisciplinary notes of 143 schizophrenic offenders were reviewed using a structured protocol on patients’ characteristics, criminal and medical history and course of treatment. Via a forward selection procedure, the most influential factors for length of stay were preselected. Machine learning algorithms then identified the most efficient model for predicting length-of-stay. Results: Two factors have been identified as being particularly influential for a prolonged forensic hospital stay, both of which are related to aspects of the index offense, namely (attempted) homicide and the extent of the victim's injury. The results are discussed in light of previous research on this topic. Conclusions: In this study, length of stay was determined by legal considerations, but not by factors that can be influenced therapeutically. Results emphasize that forensic risk assessments should be based on different evaluation criteria and not merely on legal aspects.


Author(s):  
Ankit Khushal Barai ◽  
Robin Singh Bhadoria ◽  
Jyotshana Bagwari ◽  
Ivan A. Perl

Conventional machine learning (ML) needs centralized training data to be present on a given machine or datacenter. The healthcare, finance, and other institutions where data sharing is prohibited require an approach for training ML models in secured architecture. Recently, techniques such as federated learning (FL), MIT Media Lab's Split Neural networks, blockchain, aim to address privacy and regulation of data. However, there are difference between the design principles of FL and the requirements of Institutions like healthcare, finance, etc., which needs blockchain-orchestrated FL having the following features: clients with their local data can define access policies to their data and define how updated weights are to be encrypted between the workers and the aggregator using blockchain technology and also prepares audit trail logs undertaken within network and it keeps actual list of participants hidden. This is expected to remove barriers in a range of sectors including healthcare, finance, security, logistics, governance, operations, and manufacturing.


Sign in / Sign up

Export Citation Format

Share Document