scholarly journals A Study on Data Profiling Based on the Statistical Analysis for Big Data Quality Diagnosis

Author(s):  
Won-Jung Jang ◽  
Jong-Yoon Kim ◽  
Bum-Taek Lim ◽  
Gwang-Yong Gim

Data quality is important to all private and government organization. Data quality issues can arise in different ways. Due to inconsistent, inaccurate unreliable and loss of data in e-governance, retrieving of accurate data will become a big trouble in decision making. There are some common data quality issues available in a big data. Those issues and causes are cleared by using data profiling. The process of Data profiling methods detects errors, inconsistencies and redundancies in a dataset. Data profiling has different types of analysis techniques to correct the data such as Single Column analysis, Multicolumn analysis, Multi table and Data dependencies. Single column analysis has different set of analysis. In that Pattern matching technique is used to overcome this challenge of inconsistent data along with much needed data quality for analytic results within bounded execution time. Generally pattern matching is performed manually in an organization. Pattern matching helps to discover the various pattern values within the data and validate the values against any organizations. This data pattern profiling method enables to create a valid data set which is used to generate report for future analysis of an organization with more accuracy. This study compares the results of the proposed data pattern logic with other open source tools and proves the efficiency of proposed logic.


Author(s):  
Christopher D O’Connor ◽  
John Ng ◽  
Dallas Hill ◽  
Tyler Frederick

Policing is increasingly being shaped by data collection and analysis. However, we still know little about the quality of the data police services acquire and utilize. Drawing on a survey of analysts from across Canada, this article examines several data collection, analysis, and quality issues. We argue that as we move towards an era of big data policing it is imperative that police services pay more attention to the quality of the data they collect. We conclude by discussing the implications of ignoring data quality issues and the need to develop a more robust research culture in policing.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Ikbal Taleb ◽  
Mohamed Adel Serhani ◽  
Chafik Bouhaddioui ◽  
Rachida Dssouli

AbstractBig Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and analyzed to generate value-added data-driven insights and decisions. Degradation in Data Quality may result in unpredictable consequences. In this case, confidence and worthiness in the data and its source are lost. In the Big Data context, data characteristics, such as volume, multi-heterogeneous data sources, and fast data generation, increase the risk of quality degradation and require efficient mechanisms to check data worthiness. However, ensuring Big Data Quality (BDQ) is a very costly and time-consuming process, since excessive computing resources are required. Maintaining Quality through the Big Data lifecycle requires quality profiling and verification before its processing decision. A BDQ Management Framework for enhancing the pre-processing activities while strengthening data control is proposed. The proposed framework uses a new concept called Big Data Quality Profile. This concept captures quality outline, requirements, attributes, dimensions, scores, and rules. Using Big Data profiling and sampling components of the framework, a faster and efficient data quality estimation is initiated before and after an intermediate pre-processing phase. The exploratory profiling component of the framework plays an initial role in quality profiling; it uses a set of predefined quality metrics to evaluate important data quality dimensions. It generates quality rules by applying various pre-processing activities and their related functions. These rules mainly aim at the Data Quality Profile and result in quality scores for the selected quality attributes. The framework implementation and dataflow management across various quality management processes have been discussed, further some ongoing work on framework evaluation and deployment to support quality evaluation decisions conclude the paper.


Sign in / Sign up

Export Citation Format

Share Document