Optimizing Data Quality Issues in Process Mining to Maximize Valuable Customer Service (Journal of Computational and Theoretical Nanoscience, Vol. 16(5/6), pp. 2259–2264 (2019))

While noting the importance of data quality, existing process mining methodologies (i) do not provide details on how to assess the quality of event data (ii) do not consider how the identification of data quality issues can be exploited in the planning, data extraction and log building phases of any process mining analysis, (iii) do not highlight potential impacts of poor quality data on different types of process analyses. As our key contribution, we develop a process-centric, data quality-driven approach to preparing for a process mining analysis which can be applied to any existing process mining methodology. Our approach, adapted from elements of the well known CRISP-DM data mining methodology, includes conceptual data modeling, quality assessment at both attribute and event level, and trial discovery and conformance to develop understanding of system processes and data properties to inform data extraction. We illustrate our approach in a case study involving the Queensland Ambulance Service (QAS) and Retrieval Services Queensland (RSQ). We describe the detailed preparation for a process mining analysis of retrieval and transport processes (ground and aero-medical) for road-trauma patients in Queensland. Sample datasets obtained from QAS and RSQ are utilised to show how quality metrics, data models and exploratory process mining analyses can be used to (i) identify data quality issues, (ii) anticipate and explain certain observable features in process mining analyses, (iii) distinguish between systemic and occasional quality issues, and (iv) reason about the mechanisms by which identified quality issues may have arisen in the event log. We contend that this knowledge can be used to guide the data extraction and pre-processing stages of a process mining case study to properly align the data with the case study research questions.

Download Full-text

Getting There: Evidence-Based Decision-Making in Road Trauma Prehospital Transport and Care in Queensland

Prehospital and Disaster Medicine ◽

10.1017/s1049023x19001432 ◽

2019 ◽

Vol 34 (s1) ◽

pp. s64-s65

Author(s):

Robert Andrews ◽

Moe Wynn ◽

Arthur ter Hofstede ◽

Kirsten Vallmuur ◽

Emma Bosley ◽

...

Keyword(s):

Emergency Department ◽

Data Quality ◽

Data Science ◽

Hospital Admissions ◽

Process Mining ◽

Transport Processes ◽

Process Models ◽

Actual Behavior ◽

Patient Pathways ◽

Quality Issues

Introduction:Process mining, a branch of data science, aims at deriving an understanding of process behaviors from data collected during executions of the process. In this study, we apply process mining techniques to examine retrieval and transport of road trauma patients in Queensland. Specifically, we use multiple datasets collected from ground and air ambulance, emergency department, and hospital admissions to investigate the various patient pathways and transport modalities from accident to definitive care.Aim:The project aims to answer the question, “Are we providing the right level of care to patients?” We focus on (i) automatically discovering, from historical records, the different care and transport processes, and (ii) identifying and quantifying factors influencing deviance from standard processes, e.g. mechanisms of injury and geospatial (crash and trauma facility) considerations.Methods:We adapted the Cross-Industry Standard Process for Data Mining methodology to Queensland Ambulance Service, Retrieval Services Queensland (aero-medical), and Queensland Health (emergency department and hospital admissions) data. Data linkage and “case” definition emerged as particular challenges. We developed detailed data models, conduct a data quality assessment, and preliminary process mining analyses.Results:Preliminary results only with full results are presented at the conference. A collection of process models, which revealed multiple transport pathways, were automatically discovered from pilot data. Conformance checking showed some variations from expected processing. Systematic analysis of data quality allowed us to distinguish between systemic and occasional quality issues, and anticipate and explain certain observable features in process mining analyses. Results will be validated with domain experts to ensure insights are accurate and actionable.Discussion:Preliminary analysis unearthed challenging data quality issues that impact the use of historical retrieval data for secondary analysis. The automatically discovered process models will facilitate comparison of actual behavior with existing guidelines.

Download Full-text

The assessment of data quality issues for process mining in healthcare using Medical Information Mart for Intensive Care III, a freely available e-health record database

Health Informatics Journal ◽

10.1177/1460458218810760 ◽

2018 ◽

Vol 25 (4) ◽

pp. 1878-1893 ◽

Cited By ~ 11

Author(s):

Angelina Prima Kurniati ◽

Eric Rojas ◽

David Hogg ◽

Geoff Hall ◽

Owen A Johnson

Keyword(s):

Intensive Care ◽

Electronic Health Record ◽

Data Quality ◽

Medical Information ◽

Process Mining ◽

Data Access ◽

Health Record ◽

Healthcare Process ◽

Quality Issues ◽

Electronic Health

There is a growing body of literature on process mining in healthcare. Process mining of electronic health record systems could give benefit into better understanding of the actual processes happened in the patient treatment, from the event log of the hospital information system. Researchers report issues of data access approval, anonymisation constraints, and data quality. One solution to progress methodology development is to use a high-quality, freely available research dataset such as Medical Information Mart for Intensive Care III, a critical care database which contains the records of 46,520 intensive care unit patients over 12 years. Our article aims to (1) explore data quality issues for healthcare process mining using Medical Information Mart for Intensive Care III, (2) provide a structured assessment of Medical Information Mart for Intensive Care III data quality and challenge for process mining, and (3) provide a worked example of cancer treatment as a case study of process mining using Medical Information Mart for Intensive Care III to illustrate an approach and solution to data quality challenges. The electronic health record software was upgraded partway through the period over which data was collected and we use this event to explore the link between electronic health record system design and resulting process models.

Download Full-text

Thinking about police data: Analysts’ perceptions of data quality in Canadian policing

The Police Journal Theory Practice and Principles ◽

10.1177/0032258x211021461 ◽

2021 ◽

pp. 0032258X2110214

Author(s):

Christopher D O’Connor ◽

John Ng ◽

Dallas Hill ◽

Tyler Frederick

Keyword(s):

Big Data ◽

Data Collection ◽

Data Quality ◽

Research Culture ◽

Police Services ◽

Police Data ◽

Data Collection And Analysis ◽

Quality Issues

Policing is increasingly being shaped by data collection and analysis. However, we still know little about the quality of the data police services acquire and utilize. Drawing on a survey of analysts from across Canada, this article examines several data collection, analysis, and quality issues. We argue that as we move towards an era of big data policing it is imperative that police services pay more attention to the quality of the data they collect. We conclude by discussing the implications of ignoring data quality issues and the need to develop a more robust research culture in policing.

Download Full-text

A Power System Disturbance Classification Method Robust to PMU Data Quality Issues

IEEE Transactions on Industrial Informatics ◽

10.1109/tii.2021.3072397 ◽

2021 ◽

pp. 1-1

Author(s):

Zikang Li ◽

Hao Liu ◽

Junbo Zhao ◽

Tianshu Bi ◽

Qixun Yang

Keyword(s):

Power System ◽

Data Quality ◽

Classification Method ◽

Quality Issues

Download Full-text

Between the Spreadsheets

10.29085/9781783305049 ◽

2021 ◽

Author(s):

Susan Walsh

Keyword(s):

Data Quality ◽

Deep Understanding ◽

Data Classification ◽

Dirty Data ◽

Quality Issues ◽

Book Covers ◽

Level Of Experience ◽

The Impact

Dirty data is a problem that costs businesses thousands, if not millions, every year. In organisations large and small across the globe you will hear talk of data quality issues. What you will rarely hear about is the consequences or how to fix it. Between the Spreadsheets: Classifying and Fixing Dirty Data draws on classification expert Susan Walsh's decade of experience in data classification to present a fool-proof method for cleaning and classifying your data. The book covers everything from the very basics of data classification to normalisation, taxonomies and presents the author's proven COAT methodology, helping ensure an organisation's data is Consistent, Organised, Accurate and Trustworthy. A series of data horror stories outlines what can go wrong in managing data, and if it does, how it can be fixed. After reading this book, regardless of your level of experience, not only will you be able to work with your data more efficiently, but you will also understand the impact the work you do with it has, and how it affects the rest of the organisation. Written in an engaging and highly practical manner, Between the Spreadsheets gives readers of all levels a deep understanding of the dangers of dirty data and the confidence and skills to work more efficiently and effectively with it.

Download Full-text

Data Quality: A Negotiator between Paper-based and Digital Records in the Pakistan’s TB Control Program

10.20944/preprints201806.0185.v1 ◽

2018 ◽

Author(s):

Syed Mustafa Ali ◽

Farah Naureen ◽

Arif Noor ◽

Maged Kamel N. Boulos ◽

Javariya Aamir ◽

...

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Control Program ◽

Digital Data ◽

Patient Treatment ◽

Assessment Framework ◽

Healthcare Organizations ◽

Data Quality Assessment ◽

Quality Issues

Background Increasingly, healthcare organizations are using technology for the efficient management of data. The aim of this study was to compare the data quality of digital records with the quality of the corresponding paper-based records by using data quality assessment framework. Methodology We conducted a desk review of paper-based and digital records over the study duration from April 2016 to July 2016 at six enrolled TB clinics. We input all data fields of the patient treatment (TB01) card into a spreadsheet-based template to undertake a field-to-field comparison of the shared fields between TB01 and digital data. Findings A total of 117 TB01 cards were prepared at six enrolled sites, whereas just 50% of the records (n=59; 59 out of 117 TB01 cards) were digitized. There were 1,239 comparable data fields, out of which 65% (n=803) were correctly matched between paper based and digital records. However, 35% of the data fields (n=436) had anomalies, either in paper-based records or in digital records. 1.9 data quality issues were calculated per digital patient record, whereas it was 2.1 issues per record for paper-based record. Based on the analysis of valid data quality issues, it was found that there were more data quality issues in paper-based records (n=123) than in digital records (n=110). Conclusion There were fewer data quality issues in digital records as compared to the corresponding paper-based records. Greater use of mobile data capture and continued use of the data quality assessment framework can deliver more meaningful information for decision making.

Download Full-text

FROM QoD TO QoS - Data Quality Issues in Cloud Computing

Proceedings of the 1st International Conference on Cloud Computing and Services Science ◽

10.5220/0003558606970702 ◽

2011 ◽

Keyword(s):

Cloud Computing ◽

Data Quality ◽

Quality Issues

Download Full-text

Discovering XML Conditional Dependencies for Data Quality Issues

European Journal of Electrical Engineering and Computer Science ◽

10.24018/ejece.2020.4.1.156 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Mohammed Ragheb Hakawati ◽

Yasmin Yacob ◽

Amiza Amir ◽

Jabiry M. Mohammed ◽

Khalid Jamal Jadaa

Keyword(s):

Data Quality ◽

Primary Standard ◽

Markup Language ◽

Document Type ◽

Data Dependencies ◽

Master Data ◽

Xml Document ◽

Extensible Markup ◽

Quality Issues ◽

Mining Algorithms

Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.

Download Full-text