Data Curation Challenges for Artificial Intelligence

Author(s):  
Ken Chang ◽  
Mishka Gidwani ◽  
Jay B. Patel ◽  
Matthew D. Li ◽  
Jayashree Kalpathy-Cramer
2020 ◽  
Vol 29 (01) ◽  
pp. 051-057 ◽  
Author(s):  
Siaw-Teng Liaw ◽  
Harshana Liyanage ◽  
Craig Kuziemsky ◽  
Amanda L. Terry ◽  
Richard Schreiber ◽  
...  

Summary Objective: To create practical recommendations for the curation of routinely collected health data and artificial intelligence (AI) in primary care with a focus on ensuring their ethical use. Methods: We defined data curation as the process of management of data throughout its lifecycle to ensure it can be used into the future. We used a literature review and Delphi exercises to capture insights from the Primary Care Informatics Working Group (PCIWG) of the International Medical Informatics Association (IMIA). Results: We created six recommendations: (1) Ensure consent and formal process to govern access and sharing throughout the data life cycle; (2) Sustainable data creation/collection requires trust and permission; (3) Pay attention to Extract-Transform-Load (ETL) processes as they may have unrecognised risks; (4) Integrate data governance and data quality management to support clinical practice in integrated care systems; (5) Recognise the need for new processes to address the ethical issues arising from AI in primary care; (6) Apply an ethical framework mapped to the data life cycle, including an assessment of data quality to achieve effective data curation. Conclusions: The ethical use of data needs to be integrated within the curation process, hence running throughout the data lifecycle. Current information systems may not fully detect the risks associated with ETL and AI; they need careful scrutiny. With distributed integrated care systems where data are often used remote from documentation, harmonised data quality assessment, management, and governance is important. These recommendations should help maintain trust and connectedness in contemporary information systems and planned developments.


2021 ◽  
pp. 026119292110296
Author(s):  
Vinicius M. Alves ◽  
Scott S. Auerbach ◽  
Nicole Kleinstreuer ◽  
John P. Rooney ◽  
Eugene N. Muratov ◽  
...  

New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7–24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.


2019 ◽  
Vol 1 (6) ◽  
pp. e180095 ◽  
Author(s):  
Mutlu Demirer ◽  
Sema Candemir ◽  
Matthew T. Bigelow ◽  
Sarah M. Yu ◽  
Vikash Gupta ◽  
...  

Author(s):  
David L. Poole ◽  
Alan K. Mackworth

Sign in / Sign up

Export Citation Format

Share Document