scholarly journals Creating Informative Data Warehouses: Exploring Data and Information Quality through Data Mining

10.28945/2584 ◽  
2002 ◽  
Author(s):  
Herna L. Viktor ◽  
Wayne Motha

Increasingly, large organizations are engaging in data warehousing projects in order to achieve a competitive advantage through the exploration of the information as contained therein. It is therefore paramount to ensure that the data warehouse includes high quality data. However, practitioners agree that the improvement of the quality of data in an organization is a daunting task. This is especially evident in data warehousing projects, which are often initiated “after the fact”. The slightest suspicion of poor quality data often hinders managers from reaching decisions, when they waste hours in discussions to determine what portion of the data should be trusted. Augmenting data warehousing with data mining methods offers a mechanism to explore these vast repositories, enabling decision makers to assess the quality of their data and to unlock a wealth of new knowledge. These methods can be effectively used with inconsistent, noisy and incomplete data that are commonplace in data warehouses.

2017 ◽  
Vol 4 (1) ◽  
pp. 25-31 ◽  
Author(s):  
Diana Effendi

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The  process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management.   Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.


2020 ◽  
Vol 10 (1) ◽  
pp. 1-16
Author(s):  
Isaac Nyabisa Oteyo ◽  
Mary Esther Muyoka Toili

AbstractResearchers in bio-sciences are increasingly harnessing technology to improve processes that were traditionally pegged on pen-and-paper and highly manual. The pen-and-paper approach is used mainly to record and capture data from experiment sites. This method is typically slow and prone to errors. Also, bio-science research activities are often undertaken in remote and distributed locations. Timeliness and quality of data collected are essential. The manual method is slow to collect quality data and relay it in a timely manner. Capturing data manually and relaying it in real time is a daunting task. The data collected has to be associated to respective specimens (objects or plants). In this paper, we seek to improve specimen labelling and data collection guided by the following questions; (1) How can data collection in bio-science research be improved? (2) How can specimen labelling be improved in bio-science research activities? We present WebLog, an application that we prototyped to aid researchers generate specimen labels and collect data from experiment sites. We use the application to convert the object (specimen) identifiers into quick response (QR) codes and use them to label the specimens. Once a specimen label is successfully scanned, the application automatically invokes the data entry form. The collected data is immediately sent to the server in electronic form for analysis.


2020 ◽  
Vol 17 (1) ◽  
pp. 253-269
Author(s):  
Alaoui El ◽  
Fazziki El ◽  
Fatima Ennaji ◽  
Mohamed Sadgal

The ubiquity of mobile devices and their advanced features have increased the use of crowdsourcing in many areas, such as the mobility in the smart cities. With the advent of high-quality sensors on smartphones, online communities can easily collect and share information. These information are of great importance for the institutions, which must analyze the facts by facilitating the data collecting on crimes and criminals, for example. This paper proposes an approach to develop a crowdsensing framework allowing a wider collaboration between the citizens and the authorities. In addition, this framework takes advantage of an objectivity analysis to ensure the participants? credibility and the information reliability, as law enforcement is often affected by unreliable and poor quality data. In addition, the proposed framework ensures the protection of users' private data through a de-identification process. Experimental results show that the proposed framework is an interesting tool to improve the quality of crowdsensing information in a government context.


2020 ◽  
Vol 9 (1) ◽  
pp. 2535-2539

: Data is very valuable and it is generated in large volumes. The Use of high-quality data for making quality decisions has become a huge task which helps people to make better decisions, analysis, predictions. We are surrounded by data with errors, Data cleaning is a delayed, complicated task and considered costly. Data polishing is important since it is necessary to remove errors from the data before transferring to the data warehouse since poor quality data is eliminated to get the desired results. The Error-free data will produce precise and accurate results when queried. Hence consistent and proper data is required for the decision making. The characteristics of data polishing is data repairing and data association. Identifying the homogeneous object and linking it to the most associated object is defined as Association. The process of making the database reliable by repairing and finding the faults is defined as repairing. In the case of big data applications, we do not use all the existing data, we use only subsets of appropriate data. Association is the process of converting extensive amounts of raw data to subsets of appropriate data that are useful. Once we get the appropriate data, the available data is analyzed and it leads to knowledge [14]. Multiple approaches are used to associate the given data and to achieve meaningful and useful knowledge to fix or repair [12]. Maintaining polished quality of data is referred to as data polishing. Usually the objectives of data polishing are not properly defined. This paper will discuss the goals of data cleaning and different approaches for data cleaning platforms


2009 ◽  
Vol 11 (2) ◽  
Author(s):  
L. Marshall ◽  
R. De la Harpe

[email protected] Making decisions in a business intelligence (BI) environment can become extremely challenging and sometimes even impossible if the data on which the decisions are based are of poor quality. It is only possible to utilise data effectively when it is accurate, up-to-date, complete and available when needed. The BI decision makers and users are in the best position to determine the quality of the data available to them. It is important to ask the right questions of them; therefore the issues of information quality in the BI environment were established through a literature study. Information-related problems may cause supplier relationships to deteriorate, reduce internal productivity and the business' confidence in IT. Ultimately it can have implications for an organisation's ability to perform and remain competitive. The purpose of this article is aimed at identifying the underlying factors that prevent information from being easily and effectively utilised and understanding how these factors can influence the decision-making process, particularly within a BI environment. An exploratory investigation was conducted at a large retail organisation in South Africa to collect empirical data from BI users through unstructured interviews. Some of the main findings indicate specific causes that impact the decisions of BI users, including accuracy, inconsistency, understandability and availability of information. Key performance measures that are directly impacted by the quality of data on decision-making include waste, availability, sales and supplier fulfilment. The time spent on investigating and resolving data quality issues has a major impact on productivity. The importance of documentation was highlighted as an important issue that requires further investigation. The initial results indicate the value of


Author(s):  
Reinhard Viertl

The results of data warehousing and data mining are depending essentially on the quality of data. Usually data are assumed to be numbers or vectors, but this is often not realistic. Especially the result of a measurement of a continuous quantity is always not a precise number, but more or less non-precise. This kind of uncertainty is also called fuzziness and should not be confused with errors. Data mining techniques have to take care of fuzziness in order to avoid unrealistic results.


2017 ◽  
Vol 8 (1) ◽  
pp. 74-81 ◽  
Author(s):  
Jana Lalinská ◽  
Jozef Gašparík ◽  
Denis Šipuš

Abstract Paper deals with the problematic about the information quality impact and the basic methods which can optimize the costs of low quality of using information. First of all, it is important to purify the input data from the inconsistencies and measure the quality of data. This process assures to minimize the reasons that are responsible of poor quality of processes. Targets area of this paper is to identify and minimize the main reasons of delaying the passenger train by comparing years 2012 and 2013. Target groups of passenger trains were divided in three parts responsibilities of delaying – type of train, code of delay, group responsible for the train delay.


Author(s):  
Enes Sari ◽  
Levent FAZLI Umur

BACKGROUND:The aim of this study was to evaluate the information quality of YouTube videos on hallux valgus. METHODS:A YouTube search was performed using the keyword 'hallux valgus' to determine the first 300 videos related to hallux valgus. A total of 54 videos met our inclusion criteria and evaluated for information quality by using DISCERN, Journal of the American Medical Association (JAMA) and hallux valgus information assessment (HAVIA) scores. Number of views, time since the upload date, view rate, number of comments, number of likes, number of dislikes, video power index (VPI) values were calculated to determine video popularity. Video length (sec), video source and video content were also noted. The relation between information quality and these factors were statistically evaluated. RESULTS:The mean DISCERN score was 30.35{plus minus}11.56 (poor quality) (14-64), the mean JAMA score was 2.28{plus minus}0.96 (1-4), and the mean HAVIA score was 3.63{plus minus}2.42 (moderate quality) (0.5-8.5). Although videos uploaded by physicians had higher mean DISCERN, JAMA, and HAVIA scores than videos uploaded by non-physicians, the difference was not statistically significant. Additionally, view rates and VPI values were higher for videos uploaded by health channels, but the difference did not reach statistical significance. A statistically significant positive correlation was found between video length and DISCERN (r= 0.294, p= 0.028), and HAVIA scores (r= 0.326, p= 0.015). CONCLUSIONS:This present study demonstrated that the quality of information available on YouTube videos about hallux valgus was low and insufficient. Videos containing accurate information from reliable sources are needed to educate patients on hallux valgus, especially in less frequently mentioned topics such as postoperative complications and healing period.


2018 ◽  
Vol 46 (6) ◽  
pp. 851-877 ◽  
Author(s):  
Abel Kinyondo ◽  
Riccardo Pelizzo
Keyword(s):  

2006 ◽  
Vol 21 (1) ◽  
pp. 67-70 ◽  
Author(s):  
Brian H. Toby

The definitions for important Rietveld error indices are defined and discussed. It is shown that while smaller error index values indicate a better fit of a model to the data, wrong models with poor quality data may exhibit smaller values error index values than some superb models with very high quality data.


Sign in / Sign up

Export Citation Format

Share Document