scholarly journals DATA QUALITY DIMENSIONS, METRICS, AND IMPROVEMENT TECHNIQUES

2021 ◽  
Vol 6 (1) ◽  
pp. 25-44
Author(s):  
Menna Ibrahim Gabr ◽  
◽  
Yehia M. Helmy ◽  
Doaa Saad Elzanfaly ◽  
◽  
...  

Achieving high level of data quality is considered one of the most important assets for any small, medium and large size organizations. Data quality is the main hype for both practitioners and researchers who deal with traditional or big data. The level of data quality is measured through several quality dimensions. High percentage of the current studies focus on assessing and applying data quality on traditional data. As we are in the era of big data, the attention should be paid to the tremendous volume of generated and processed data in which 80% of all the generated data is unstructured. However, the initiatives for creating big data quality evaluation models are still under development. This paper investigates the data quality dimensions that are mostly used in both traditional and big data to figure out the metrics and techniques that are used to measure and handle each dimension. A complete definition for each traditional and big data quality dimension, metrics and handling techniques are presented in this paper. Many data quality dimensions can be applied to both traditional and big data, while few number of quality dimensions are either applied to traditional data or big data. Few number of data quality metrics and barely handling techniques are presented in the current works.

Author(s):  
Anandhi Ramasamy ◽  
Soumitra Chowdhury

Although big data has become an integral part of businesses and society, there is still concern about the quality aspects of big data. Past research has focused on identifying various dimensions of big data. However, the research is scattered and there is a need to synthesize the ever involving phenomenon of big data. This research aims at providing a systematic literature review of the quality dimension of big data. Based on a review of 17 articles from academic research, we have presented a set of key quality dimensions of big data.


2018 ◽  
Vol 44 (6) ◽  
pp. 785-801
Author(s):  
Hong Huang

This article aims to understand the views of genomic scientists with regard to the data quality assurances associated with semiotics and data–information–knowledge (DIK). The resulting communication of signs generated from genomic curation work, was found within different semantic levels of DIK that correlate specific data quality dimensions with their respective skills. Syntactic data quality dimensions were ranked the highest among all other semiotic data quality dimensions, which indicated that scientists spend great efforts for handling data wrangling activities in genome curation work. Semantic- and pragmatic-related sign communications were about meaningful interpretation, thus required additional adaptive and interpretative skills to deal with data quality issues. This expanded concept of ‘curation’ as sign/semiotic was not previously explored from the practical to the theoretical perspectives. The findings inform policy makers and practitioners to develop framework and cyberinfrastructure that facilitate the initiatives and advocacies of ‘Big Data to Knowledge’ by funding agencies. The findings from this study can also help plan data quality assurance policies and thus maximise the efficiency of genomic data management. Our results give strong support to the relevance of data quality skills communication for relationship with data quality assurance in genome curation activities.


2021 ◽  
Vol 12 (3) ◽  
pp. 233-247
Author(s):  
Dong-Sik Yang ◽  
Jae-Min Noh ◽  
Seung-Ryol Maeng ◽  
Dong-Jin Park

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Pratima Verma ◽  
Vimal Kumar ◽  
Ankesh Mittal ◽  
Bhawana Rathore ◽  
Ajay Jha ◽  
...  

PurposeThis study aims to provide insight into the operational factors of big data. The operational indicators/factors are categorized into three functional parts, namely synthesis, speed and significance. Based on these factors, the organization enhances its big data analytics (BDA) performance followed by the selection of data quality dimensions to any organization's success.Design/methodology/approachA fuzzy analytic hierarchy process (AHP) based research methodology has been proposed and utilized to assign the criterion weights and to prioritize the identified speed, synthesis and significance (3S) indicators. Further, the PROMETHEE (Preference Ranking Organization METHod for Enrichment of Evaluations) technique has been used to measure the data quality dimensions considering 3S as criteria.FindingsThe effective indicators are identified from the past literature and the model confirmed with industry experts to measure these indicators. The results of this fuzzy AHP model show that the synthesis is recognized as the top positioned and most significant indicator followed by speed and significance are developed as the next level. These operational indicators contribute toward BDA and explore with their sub-categories' priority.Research limitations/implicationsThe outcomes of this study will facilitate the businesses that are contemplating this technology as a breakthrough, but it is both a challenge and opportunity for developers and experts. Big data has many risks and challenges related to economic, social, operational and political performance. The understanding of data quality dimensions provides insightful guidance to forecast accurate demand, solve a complex problem and make collaboration in supply chain management performance.Originality/valueBig data is one of the most popular technology concepts in the market today. People live in a world where every facet of life increasingly depends on big data and data science. This study creates awareness about the role of 3S encountered during big data quality by prioritizing using fuzzy AHP and PROMETHEE.


2018 ◽  
Vol 1 (4) ◽  
pp. 43 ◽  
Author(s):  
Suraj Juddoo ◽  
Carlisle George ◽  
Penny Duquenoy ◽  
David Windridge

In the health industry, the use of data (including Big Data) is of growing importance. The term ‘Big Data’ characterizes data by its volume, and also by its velocity, variety, and veracity. Big Data needs to have effective data governance, which includes measures to manage and control the use of data and to enhance data quality, availability, and integrity. The type and description of data quality can be expressed in terms of the dimensions of data quality. Well-known dimensions are accuracy, completeness, and consistency, amongst others. Since data quality depends on how the data is expected to be used, the most important data quality dimensions depend on the context of use and industry needs. There is a lack of current research focusing on data quality dimensions for Big Data within the health industry; this paper, therefore, investigates the most important data quality dimensions for Big Data within this context. An inner hermeneutic cycle research approach was used to review relevant literature related to data quality for big health datasets in a systematic way and to produce a list of the most important data quality dimensions. Based on a hierarchical framework for organizing data quality dimensions, the highest ranked category of dimensions was determined.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Muslihah Wook ◽  
Nor Asiakin Hasbullah ◽  
Norulzahrah Mohd Zainudin ◽  
Zam Zarina Abdul Jabar ◽  
Suzaimah Ramli ◽  
...  

AbstractThe popularity of big data analytics (BDA) has boosted the interest of organisations into exploiting their large scale data. This technology can become a strategic stimulation for organisations to achieve competitive advantage and sustainable growth. Previous BDA research, however, has focused more on introducing more traits, known as Vs for big data traits, while ignoring the quality of data when examining the application of BDA. Therefore, this study aims to explore the effect of big data traits and data quality dimensions on BDA application. This study has formulated 10 hypotheses that comprised of the relationships of big data traits, accuracy, believability, completeness, timeliness, ease of operation, and BDA application constructs. This study conducted a survey using a questionnaire as a data collection instrument. Then, the partial least squares structural equation modelling technique was used to analyse the hypothesised relationships between the constructs. The findings revealed that big data traits can significantly affect all constructs for data quality dimensions and that the ease of operation construct has a significant effect on BDA application. This study contributes to the literature by bringing new insights to the field of BDA and may serve as a guideline for future researchers and practitioners when studying BDA application.


Sign in / Sign up

Export Citation Format

Share Document