scholarly journals Quality Assurance in Big Data Engineering - A Metareview

Author(s):  
Daniel Staegemann ◽  
◽  
Matthias Volk ◽  
Klaus Turowski ◽  
◽  
...  

With a continuously increasing amount and complexity of data being produced and captured, traditional ways of dealing with their storing, processing, analysis and presentation are no longer sufficient, which has led to the emergence of the concept of big data. However, not only the implementation of the corresponding applications is a challenging task, but also the proper quality assurance. To facilitate the latter, in this publication, a comprehensive structured literature metareview on the topic of big data quality assurance is presented. The results will provide interested researchers and practitioners with a solid foundation for their own quality assurance related endeavors and therefore help in advancing the cause of quality assurance in big data as well as the domain of big data in general. Furthermore, based on the findings of the review, worthwhile directions for future research were identified, providing prospective authors with some guidance in this complex environment.

2018 ◽  
Vol 44 (6) ◽  
pp. 785-801
Author(s):  
Hong Huang

This article aims to understand the views of genomic scientists with regard to the data quality assurances associated with semiotics and data–information–knowledge (DIK). The resulting communication of signs generated from genomic curation work, was found within different semantic levels of DIK that correlate specific data quality dimensions with their respective skills. Syntactic data quality dimensions were ranked the highest among all other semiotic data quality dimensions, which indicated that scientists spend great efforts for handling data wrangling activities in genome curation work. Semantic- and pragmatic-related sign communications were about meaningful interpretation, thus required additional adaptive and interpretative skills to deal with data quality issues. This expanded concept of ‘curation’ as sign/semiotic was not previously explored from the practical to the theoretical perspectives. The findings inform policy makers and practitioners to develop framework and cyberinfrastructure that facilitate the initiatives and advocacies of ‘Big Data to Knowledge’ by funding agencies. The findings from this study can also help plan data quality assurance policies and thus maximise the efficiency of genomic data management. Our results give strong support to the relevance of data quality skills communication for relationship with data quality assurance in genome curation activities.


Author(s):  
Kamalendu Pal

Global retail business has become diverse and latest Information Technology (IT) advancements have created new possibilities for the management of the deluge of data generated by world-wide business operations of its supply chain. In this business, external data from social media and supplier networks provide a huge influx to augment existing data. This is combined with data from sensors and intelligent machines, commonly known as Internet of Things (IoT) data. This data, originating from the global retail supply chain, is simply known as Big Data - because of its enormous volume, the velocity with which it arrives in the global retail business environment, its veracity to quality related issues, and values it generates for the global supply chain. Many retail products manufacturing companies are trying to find ways to enhance their quality of operational performance while reducing business support costs. They do this primarily by improving defect tracking and better forecasting. These manufacturing and operational improvements along with a favorable customer experience remain crucil to thriving in global competition. In recent years, Big Data and its associated technologies are attracting huge research interest with academics, industry practitioners, and government agencies. Big Data-based software applications are widely used within retail supply chain management - in recommendation, prediction, and decision support systems. The spectacular growth of these software systems has enormous potential for improving the daily performance of retail product and service companies. However, there are increasingly data quality problems resulting in erroneous tesing costs in retail Supply Chain Management (SCM). The heavy investment made in Big Data-based software applications puts increasing pressure on management to justify the quality assurance in these software systems. This chapter discusses about data quality and the dimensions of data quality for Big Data applications. It also examines some of the challenges presented by managing the quality and governance of Big Data, and how those can be balanced with the need of delivery usable Big Data-based software systems. Finally, the chapter highlights the importance of data governance; and it also includes some of the Big Data managerial practice related issues and their justifications for achieving application software quality assurance.


Author(s):  
Arun Thotapalli Sundararaman

Study of data quality for data mining application has always been a complex topic; in the recent years, this topic has gained further complexity with the advent of big data as the source for data mining and business intelligence (BI) applications. In a big data environment, data is consumed in various states and various forms serving as input for data mining, and this is the main source of added complexity. These new complexities and challenges arise from the underlying dimensions of big data (volume, variety, velocity, and value) together with the ability to consume data at various stages of transition from raw data to standardized datasets. These have created a need for expanding the traditional data quality (DQ) factors into BDQ (big data quality) factors besides the need for new BDQ assessment and measurement frameworks for data mining and BI applications. However, very limited advancement has been made in research and industry in the topic of BDQ and their relevance and criticality for data mining and BI applications. Data quality in data mining refers to the quality of the patterns or results of the models built using mining algorithms. DQ for data mining in business intelligence applications should be aligned with the objectives of the BI application. Objective measures, training/modeling approaches, and subjective measures are three major approaches that exist to measure DQ for data mining. However, there is no agreement yet on definitions or measurements or interpretations of DQ for data mining. Defining the factors of DQ for data mining and their measurement for a BI system has been one of the major challenges for researchers as well as practitioners. This chapter provides an overview of existing research in the area of BDQ definitions and measurement for data mining for BI, analyzes the gaps therein, and provides a direction for future research and practice in this area.


2003 ◽  
Author(s):  
Stephan D. Fihn ◽  
Mary B. McDonell ◽  
Stephan M. Anderson

2021 ◽  
pp. 1-9
Author(s):  
Bruno Bordoni ◽  
Stevan Walkowski ◽  
Allan Escher ◽  
Bruno Ducoux

The eupneic act in healthy subjects involves a coordinated combination of functional anatomy and neurological activation. Neurologically, a central pattern generator, the components of which are distributed between the brainstem and the spinal cord, are hypothesized to drive the process and are modeled mathematically. A functionally anatomical approach is easier to understand although just as complex. Osteopathic manipulative treatment (OMT) is part of osteopathic medicine, which has many manual techniques to approach the human body, trying to improve the patient’s homeostatic response. The principle on which OMT is based is the stimulation of self-healing processes, researching the intrinsic physiological mechanisms of the person, taking into consideration not only the physical aspect, but also the emotional one and the context in which the patient lives. This article reviews how the diaphragm muscle moves, with a brief discussion on anatomy and the respiratory neural network. The goal is to highlight the critical issues of OMT on the correct positioning of the hands on the posterolateral area of the diaphragm around the diaphragm, trying to respect the existing scientific anatomical-physiological data, and laying a solid foundation for improving the data obtainable from future research. The correctness of the position of the operator’s hands in this area allows a more effective palpatory perception and, consequently, a probably more incisive result on the respiratory function.


Author(s):  
Christopher D O’Connor ◽  
John Ng ◽  
Dallas Hill ◽  
Tyler Frederick

Policing is increasingly being shaped by data collection and analysis. However, we still know little about the quality of the data police services acquire and utilize. Drawing on a survey of analysts from across Canada, this article examines several data collection, analysis, and quality issues. We argue that as we move towards an era of big data policing it is imperative that police services pay more attention to the quality of the data they collect. We conclude by discussing the implications of ignoring data quality issues and the need to develop a more robust research culture in policing.


Sign in / Sign up

Export Citation Format

Share Document