From Data Quality to Model Quality

2019 ◽

Vol 33 ◽

pp. 9837-9843 ◽

Cited By ~ 1

Author(s):

Victor S. Sheng ◽

Jing Zhang

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Data Quality ◽

Learning Community ◽

Low Cost ◽

Ground Truth ◽

Past Research ◽

Future Research ◽

Model Quality ◽

Model Learning

With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of training sets for prediction model learning. However, the labels obtained from crowdsourcing are often imperfect, which brings great challenges in model learning. Since 2008, the machine learning community has noticed the great opportunities brought by crowdsourcing and has developed a large number of techniques to deal with inaccuracy, randomness, and uncertainty issues when learning with crowdsourcing. This paper summarizes the technical progress in this field during past eleven years. We focus on two fundamental issues: the data (label) quality and the prediction model quality. For data quality, we summarize ground truth inference methods and some machine learning based methods to further improve data quality. For the prediction model quality, we summarize several learning paradigms developed under the crowdsourcing scenario. Finally, we further discuss several promising future research directions to attract researchers to make contributions in crowdsourcing.

Download Full-text

Data quality through model quality

Proceeding of the first international workshop on Model driven service engineering and data quality and security - MoSE+DQS '09 ◽

10.1145/1651415.1651421 ◽

2009 ◽

Cited By ~ 5

Author(s):

Kashif Mehmood ◽

Samira Si-Said Cherfi ◽

Isabelle Comyn-Wattiau

Keyword(s):

Data Quality ◽

Model Quality

Download Full-text

Better models by discarding data?

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444913001121 ◽

2013 ◽

Vol 69 (7) ◽

pp. 1215-1222 ◽

Cited By ~ 160

Author(s):

K. Diederichs ◽

P. A. Karplus

Keyword(s):

Data Quality ◽

Practical Importance ◽

Repeated Measurements ◽

Data Sets ◽

Model Quality ◽

Data Set ◽

X Ray Crystallography ◽

Typical Data ◽

The One ◽

Assess Data Quality

In macromolecular X-ray crystallography, typical data sets have substantial multiplicity. This can be used to calculate the consistency of repeated measurements and thereby assess data quality. Recently, the properties of a correlation coefficient, CC1/2, that can be used for this purpose were characterized and it was shown that CC1/2has superior properties compared with `merging'Rvalues. A derived quantity, CC*, links data and model quality. Using experimental data sets, the behaviour of CC1/2and the more conventional indicators were compared in two situations of practical importance: merging data sets from different crystals and selectively rejecting weak observations or (merged) unique reflections from a data set. In these situations controlled `paired-refinement' tests show that even though discarding the weaker data leads to improvements in the mergingRvalues, the refined models based on these data are of lower quality. These results show the folly of such data-filtering practices aimed at improving the mergingRvalues. Interestingly, in all of these tests CC1/2is the one data-quality indicator for which the behaviour accurately reflects which of the alternative data-handling strategies results in the best-quality refined model. Its properties in the presence of systematic error are documented and discussed.

Download Full-text

PENGEMBANGAN BUKU AJAR BAHASA INDONESIA BERMUATAN NILAI-NILAI KARAKTER MENGGUNAKAN MODEL PEMBELAJARAN BERBASIS MASALAH

EDU-KATA ◽

10.52166/kata.v4i2.1011 ◽

2019 ◽

Vol 5 (2) ◽

pp. 101-110

Author(s):

Iva Mursida

Keyword(s):

Data Quality ◽

Process Development ◽

Problem Based Learning ◽

Learning Model ◽

Student Response ◽

Quality Development ◽

Teaching Materials ◽

Model Quality ◽

Student Activities ◽

D Model

The purpose of this research is to know the process development of teaching materials using a four- D model of development and to determine the quality of the development of teaching materials Indonesian containing character values using problem-based learning model . Quality development consists of the validity , effectivity, and practicality.. In this research and development resulting product in the form of teaching materials Indonesian containing character values using problem-based learning model . The process of development of teaching materials used four- D model of development that has 4 ( four ) steps of the process , namely 1. Define, 2. Design , 3. Develop, 4. Dessiminite. Data quality development in the form of data validity, effectivity,and practicality.The results of this study contains development process and data quality development . At the level of validity , obtaining a total score of 90.5 % . At the level of practicality , obtain excellent results by keterlaksanaan RPP 88%, the student response 87 % , 88 % .In response rate teacher effectiveness , student activities earn a score of 85 % and activities for teachers to get a score of 80 %.

Download Full-text

Using a Metadata Framework to Improve Data Resources Quality

Advances in Information Resources Management - Advanced Topics in Information Resources Management, Volume 1 ◽

10.4018/978-1-930708-44-0.ch002 ◽

2002 ◽

pp. 20-34

Author(s):

Tor Guimaraes ◽

Youngohc Yoon ◽

Peter Aiken

Keyword(s):

Life Cycle ◽

Data Quality ◽

Data Representation ◽

Quality Data ◽

Extended Model ◽

Quality Engineering ◽

Model Quality ◽

Specific Data ◽

Utilization Data ◽

Life Cycle Phase

The importance of properly managing the quality of organizational data resources is widely recognized. A metadata framework is presented as the critical tool in addressing the necessary requirements to ensure data quality. This is particularly useful in increasingly encountered complex situations where data usage crosses system boundaries. The basic concept of metadata quality as a foundation for data quality engineering is discussed, as well as an extended data life cycle model consisting of eight phases: metadata creation, metadata structuring, metadata refinement, data creation, data utilization, data assessment, data refinement, and data manipulation. This extended model will enable further development of life cycle phase-specific data quality engineering methods. The paper also expands the concept of applicable data quality dimensions, presenting data quality as a function of four distinct components: data value quality, data representation quality, data model quality, and data architecture quality. Each of these, in turn, is described in terms of specific data quality attributes.

Download Full-text

A Review of Failure Handling Mechanisms for Data Quality Measures

PsycEXTRA Dataset ◽

10.1037/e667662012-003 ◽

2012 ◽

Author(s):

Nurul A. Emran ◽

Noraswaliza Abdullah ◽

Nuzaimah Mustafa

Keyword(s):

Data Quality ◽

Quality Measures ◽

Failure Handling

Download Full-text

The Role of Technological Change in the Long-run Global Economy Forecasting

Voprosy Ekonomiki ◽

10.32609/0042-8736-2013-1-97-116 ◽

2013 ◽

pp. 97-116 ◽

Cited By ~ 1

Author(s):

A. Apokin

Keyword(s):

Technological Change ◽

Data Quality ◽

Long Range ◽

Global Economy ◽

World Economy ◽

Qualitative Approaches ◽

Long Run ◽

The World ◽

National Economies

The author compares several quantitative and qualitative approaches to forecasting to find appropriate methods to incorporate technological change in long-range forecasts of the world economy. A?number of long-run forecasts (with horizons over 10 years) for the world economy and national economies is reviewed to outline advantages and drawbacks for different ways to account for technological change. Various approaches based on their sensitivity to data quality and robustness to model misspecifications are compared and recommendations are offered on the choice of appropriate technique in long-run forecasts of the world economy in the presence of technological change.

Download Full-text

Increasing participation rates and data quality in e-mail survey. An experimental design of research

SOCIOLOGIA E RICERCA SOCIALE ◽

10.3280/sr2020-122003 ◽

2020 ◽

pp. 45-68

Author(s):

Alessandra Decataldo ◽

Federico Denti ◽

Andrea Amico

Keyword(s):

Experimental Design ◽

Data Quality ◽

Mail Survey ◽

Participation Rates ◽

E Mail

Download Full-text

88. Validation Methods and Data Quality Objectives Analysis for Retrospective Exposure Estimations — Examples, Analysis, and Pitfalls

10.3320/1.2766000 ◽

2001 ◽

Author(s):

J. Rasmuson ◽

L. Birkner

Keyword(s):

Data Quality ◽

Validation Methods

Download Full-text

Bankruptcy Model Construction and its Limitation in Input Data Quality

Journal of Business and Economics ◽

10.15341/jbe(2155-7950)/02.10.2019/003 ◽

2019 ◽

Vol 10 (2) ◽

pp. 117-125

Author(s):

Dana Kubíčková ◽

◽

Vladimír Nulíček ◽

Keyword(s):

Data Quality ◽

Normal Distribution ◽

Input Data ◽

Model Construction ◽

The Third ◽

Third Stage ◽

Discriminant Analyses ◽

One Year ◽

The University ◽

Multivariate Discriminant

The aim of the research project solved at the University of Finance and administration is to construct a new bankruptcy model. The intention is to use data of the firms that have to cease their activities due to bankruptcy. The most common method for bankruptcy model construction is multivariate discriminant analyses (MDA). It allows to derive the indicators most sensitive to the future companies’ failure as a parts of the bankruptcy model. One of the assumptions for using the MDA method and reassuring the reliable results is the normal distribution and independence of the input data. The results of verification of this assumption as the third stage of the project are presented in this article. We have revealed that this assumption is met only in a few selected indicators. Better results were achieved in the indicators in the set of prosperous companies and one year prior the failure. The selected indicators intended for the bankruptcy model construction thus cannot be considered as suitable for using the MDA method.

Download Full-text

From Data Quality to Model Quality

Machine Learning with Crowdsourcing: A Brief Summary of the Past Research and Future Directions

Data quality through model quality

Better models by discarding data?

PENGEMBANGAN BUKU AJAR BAHASA INDONESIA BERMUATAN NILAI-NILAI KARAKTER MENGGUNAKAN MODEL PEMBELAJARAN BERBASIS MASALAH

Using a Metadata Framework to Improve Data Resources Quality

A Review of Failure Handling Mechanisms for Data Quality Measures

The Role of Technological Change in the Long-run Global Economy Forecasting

Increasing participation rates and data quality in e-mail survey. An experimental design of research

88. Validation Methods and Data Quality Objectives Analysis for Retrospective Exposure Estimations — Examples, Analysis, and Pitfalls

Bankruptcy Model Construction and its Limitation in Input Data Quality

Export Citation Format