From Data Quality to Model Quality

Author(s):  
Tianxing He ◽  
Shengcheng Yu ◽  
Ziyuan Wang ◽  
Jieqiong Li ◽  
Zhenyu Chen
Keyword(s):  
Author(s):  
Victor S. Sheng ◽  
Jing Zhang

With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of training sets for prediction model learning. However, the labels obtained from crowdsourcing are often imperfect, which brings great challenges in model learning. Since 2008, the machine learning community has noticed the great opportunities brought by crowdsourcing and has developed a large number of techniques to deal with inaccuracy, randomness, and uncertainty issues when learning with crowdsourcing. This paper summarizes the technical progress in this field during past eleven years. We focus on two fundamental issues: the data (label) quality and the prediction model quality. For data quality, we summarize ground truth inference methods and some machine learning based methods to further improve data quality. For the prediction model quality, we summarize several learning paradigms developed under the crowdsourcing scenario. Finally, we further discuss several promising future research directions to attract researchers to make contributions in crowdsourcing.


2013 ◽  
Vol 69 (7) ◽  
pp. 1215-1222 ◽  
Author(s):  
K. Diederichs ◽  
P. A. Karplus

In macromolecular X-ray crystallography, typical data sets have substantial multiplicity. This can be used to calculate the consistency of repeated measurements and thereby assess data quality. Recently, the properties of a correlation coefficient, CC1/2, that can be used for this purpose were characterized and it was shown that CC1/2has superior properties compared with `merging'Rvalues. A derived quantity, CC*, links data and model quality. Using experimental data sets, the behaviour of CC1/2and the more conventional indicators were compared in two situations of practical importance: merging data sets from different crystals and selectively rejecting weak observations or (merged) unique reflections from a data set. In these situations controlled `paired-refinement' tests show that even though discarding the weaker data leads to improvements in the mergingRvalues, the refined models based on these data are of lower quality. These results show the folly of such data-filtering practices aimed at improving the mergingRvalues. Interestingly, in all of these tests CC1/2is the one data-quality indicator for which the behaviour accurately reflects which of the alternative data-handling strategies results in the best-quality refined model. Its properties in the presence of systematic error are documented and discussed.


EDU-KATA ◽  
2019 ◽  
Vol 5 (2) ◽  
pp. 101-110
Author(s):  
Iva Mursida

The purpose of this  research is to know the process  development of teaching materials using a four- D model of development and to determine the quality of the development of teaching materials Indonesian containing character values ​​using problem-based learning model . Quality development consists of the validity , effectivity, and practicality.. In this research and development resulting product in the form of teaching materials Indonesian containing character values ​​using problem-based learning model . The process of development of teaching materials used four- D model of development that has 4 ( four ) steps of the process , namely 1. Define, 2. Design , 3. Develop, 4. Dessiminite. Data quality development in the form of data validity, effectivity,and practicality.The results of this study contains development process and data quality development . At the level of validity , obtaining a total score of 90.5 % . At the level of practicality , obtain excellent results by keterlaksanaan RPP 88%, the student response 87 % , 88 % .In response rate teacher effectiveness , student activities earn a score of 85 % and activities for teachers to get a score of 80 %.


Author(s):  
Tor Guimaraes ◽  
Youngohc Yoon ◽  
Peter Aiken

The importance of properly managing the quality of organizational data resources is widely recognized. A metadata framework is presented as the critical tool in addressing the necessary requirements to ensure data quality. This is particularly useful in increasingly encountered complex situations where data usage crosses system boundaries. The basic concept of metadata quality as a foundation for data quality engineering is discussed, as well as an extended data life cycle model consisting of eight phases: metadata creation, metadata structuring, metadata refinement, data creation, data utilization, data assessment, data refinement, and data manipulation. This extended model will enable further development of life cycle phase-specific data quality engineering methods. The paper also expands the concept of applicable data quality dimensions, presenting data quality as a function of four distinct components: data value quality, data representation quality, data model quality, and data architecture quality. Each of these, in turn, is described in terms of specific data quality attributes.


2012 ◽  
Author(s):  
Nurul A. Emran ◽  
Noraswaliza Abdullah ◽  
Nuzaimah Mustafa

2013 ◽  
pp. 97-116 ◽  
Author(s):  
A. Apokin

The author compares several quantitative and qualitative approaches to forecasting to find appropriate methods to incorporate technological change in long-range forecasts of the world economy. A?number of long-run forecasts (with horizons over 10 years) for the world economy and national economies is reviewed to outline advantages and drawbacks for different ways to account for technological change. Various approaches based on their sensitivity to data quality and robustness to model misspecifications are compared and recommendations are offered on the choice of appropriate technique in long-run forecasts of the world economy in the presence of technological change.


2019 ◽  
Vol 10 (2) ◽  
pp. 117-125
Author(s):  
Dana Kubíčková ◽  
◽  
Vladimír Nulíček ◽  

The aim of the research project solved at the University of Finance and administration is to construct a new bankruptcy model. The intention is to use data of the firms that have to cease their activities due to bankruptcy. The most common method for bankruptcy model construction is multivariate discriminant analyses (MDA). It allows to derive the indicators most sensitive to the future companies’ failure as a parts of the bankruptcy model. One of the assumptions for using the MDA method and reassuring the reliable results is the normal distribution and independence of the input data. The results of verification of this assumption as the third stage of the project are presented in this article. We have revealed that this assumption is met only in a few selected indicators. Better results were achieved in the indicators in the set of prosperous companies and one year prior the failure. The selected indicators intended for the bankruptcy model construction thus cannot be considered as suitable for using the MDA method.


Sign in / Sign up

Export Citation Format

Share Document