scholarly journals Reconciliation of inconsistent data sources by correction for measurement error: The feasibility of parameter re-use

2018 ◽  
Vol 34 (3) ◽  
pp. 317-329 ◽  
Author(s):  
Paulina Pankowska ◽  
Bart Bakker ◽  
Daniel L. Oberski ◽  
Dimitris Pavlopoulos
2021 ◽  
pp. 1-22
Author(s):  
Emily Berg ◽  
Johgho Im ◽  
Zhengyuan Zhu ◽  
Colin Lewis-Beck ◽  
Jie Li

Statistical and administrative agencies often collect information on related parameters. Discrepancies between estimates from distinct data sources can arise due to differences in definitions, reference periods, and data collection protocols. Integrating statistical data with administrative data is appealing for saving data collection costs, reducing respondent burden, and improving the coherence of estimates produced by statistical and administrative agencies. Model based techniques, such as small area estimation and measurement error models, for combining multiple data sources have benefits of transparency, reproducibility, and the ability to provide an estimated uncertainty. Issues associated with integrating statistical data with administrative data are discussed in the context of data from Namibia. The national statistical agency in Namibia produces estimates of crop area using data from probability samples. Simultaneously, the Namibia Ministry of Agriculture, Water, and Forestry obtains crop area estimates through extension programs. We illustrate the use of a structural measurement error model for the purpose of synthesizing the administrative and survey data to form a unified estimate of crop area. Limitations on the available data preclude us from conducting a genuine, thorough application. Nonetheless, our illustration of methodology holds potential use for a general practitioner.


Author(s):  
Lihua Lu ◽  
Hengzhen Zhang ◽  
Xiao-Zhi Gao

Purpose – Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data. An important issue on data integration is the existence of conflicts among the different data sources. Data sources may conflict with each other at data level, which is defined as data inconsistency. The purpose of this paper is to aim at this problem and propose a solution for data inconsistency in data integration. Design/methodology/approach – A relational data model extended with data source quality criteria is first defined. Then based on the proposed data model, a data inconsistency solution strategy is provided. To accomplish the strategy, fuzzy multi-attribute decision-making (MADM) approach based on data source quality criteria is applied to obtain the results. Finally, users feedbacks strategies are proposed to optimize the result of fuzzy MADM approach as the final data inconsistent solution. Findings – To evaluate the proposed method, the data obtained from the sensors are extracted. Some experiments are designed and performed to explain the effectiveness of the proposed strategy. The results substantiate that the solution has a better performance than the other methods on correctness, time cost and stability indicators. Practical implications – Since the inconsistent data collected from the sensors are pervasive, the proposed method can solve this problem and correct the wrong choice to some extent. Originality/value – In this paper, for the first time the authors study the effect of users feedbacks on integration results aiming at the inconsistent data.


2017 ◽  
Author(s):  
Marko Bachl ◽  
Michael Scharkow

Linkage analysis is a sophisticated media effect research design that reconstructs the likely exposure to relevant media messages of individual survey respondents by complementing the survey data with a content analysis. It is an important improvement over survey-only designs: Instead of predicting some outcome of interest by media use and implicitly assuming what kind of media messages the respondents were exposed to, linkage analysis explicitly takes the media messages into account (de Vreese & Neijens, 2016; Scharkow & Bachl, 2017; Schuck, Vliegenthart, & de Vreese, 2016; Shoemaker & Reese, 1990; Slater, 2016; Valkenburg & Peter, 2013). The design in its modern form has been pioneered by Miller, Goldenberg, and Erbring (1979) and is today considered a “state-of-the art analysis of the impact of specific news consumption” (Fazekas & Larsen, 2015, p. 196). Its widespread use, especially in the field of political communication, and its still increasing popularity demonstrate the relevance of the design. The main advantage of a linkage analysis is the use of one or more message exposure variables which combine information about media use and media content. However, both constitutive sources are often measured with error: Survey respondents are not very good at reporting their media use reliably, and coders will often make some errors when classifying the relevant messages.In this article, we will first give a short overview on the prevalence and consequences of measurement error in both data sources. The arguments are based on a literature review and a simulation study which are published elsewhere in full detail (Scharkow & Bachl, 2017). We continue with a discussion of possible remedies in measurement and data analysis. Beyond the obvious need to improve the measures themselves, we highlight the importance of serious diagnostics of measurement quality. Such information can then be incorporated in the data analysis using estimation or imputation approaches, which are introduced in the main section of this chapter. We conclude by noting that 1) the improvement of measurements and the diagnosis of measurement error in both parts of a linkage analysis must be taken seriously; 2) many tools for correcting measurement error in single parts of a linkage analysis already exist and should be used; 3) methodological research is needed for the development of an integrated analysis workflow which accounts for measurement error and uncertainty in both data sources.


2005 ◽  
Vol 38 (8) ◽  
pp. 939-970 ◽  
Author(s):  
Kirk Bowman ◽  
Fabrice Lehoucq ◽  
James Mahoney

Recent writings concerning measurement of political democracy offer sophisticated discussions of problems of conceptualization, operationalization, and aggregation. Yet they have less to say about the error that derives from the use of inaccurate, partial, or misleading data sources. Drawing on evidence from five Central American countries, the authors show this data-induced measurement error compromises the validity of the principal, long-term cross-national scales of democracy. They call for an approach to index construction that relies on case expertise and use of a wide range of data sources, and they employ this approach in developing an index of political democracy for the Central American countries during the 20th century. The authors’ index draws on a comprehensive set of secondary and primary sources as it rigorously pursues standards of conceptualization, operationalization, and aggregation. The index’s value is illustrated by showing how it suggests new lines of research in the field of Central American politics.


2017 ◽  
Vol 7 (2) ◽  
pp. 367-384 ◽  
Author(s):  
Max Gallop ◽  
Simon Weschle

Many commonly used data sources in the social sciences suffer from non-random measurement error, understood as mis-measurement of a variable that is systematically related to another variable. We argue that studies relying on potentially suspect data should take the threat this poses to inference seriously and address it routinely in a principled manner. In this article, we aid researchers in this task by introducing a sensitivity analysis approach to non-random measurement error. The method can be used for any type of data or statistical model, is simple to execute, and straightforward to communicate. This makes it possible for researchers to routinely report the robustness of their inference to the presence of non-random measurement error. We demonstrate the sensitivity analysis approach by applying it to two recent studies.


Author(s):  
Munesh Chandra Trivedi ◽  
Virendra Kumar Yadav ◽  
Avadhesh Kumar Gupta

<p>Data warehouse generally contains both types of data i.e. historical &amp; current data from various data sources. Data warehouse in world of computing can be defined as system created for analysis and reporting of these both types of data. These analysis report is then used by an organization to make decisions which helps them in their growth. Construction of data warehouse appears to be simple, collection of data from data sources into one place (after extraction, transform and loading). But construction involves several issues such as inconsistent data, logic conflicts, user acceptance, cost, quality, security, stake holder’s contradictions, REST alignment etc. These issues need to be overcome otherwise will lead to unfortunate consequences affecting the organization growth. Proposed model tries to solve these issues such as REST alignment, stake holder’s contradiction etc. by involving experts of various domains such as technical, analytical, decision makers, management representatives etc. during initialization phase to better understand the requirements and mapping these requirements to data sources during design phase of data warehouse.</p>


2010 ◽  
Vol 69 (8) ◽  
pp. 779-799 ◽  
Author(s):  
Shichao Zhang ◽  
Qingfeng Chen ◽  
Qiang Yang

1999 ◽  
Vol 15 (2) ◽  
pp. 91-98 ◽  
Author(s):  
Lutz F. Hornke

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.


Sign in / Sign up

Export Citation Format

Share Document