statistical matching
Recently Published Documents


TOTAL DOCUMENTS

106
(FIVE YEARS 29)

H-INDEX

13
(FIVE YEARS 1)

2021 ◽  
pp. 1-37
Author(s):  
Felix Bittmann ◽  
Alexander Tekles ◽  
Lutz Bornmann

Abstract Controlling for confounding factors is one of the central aspects of quantitative research. While methods like linear regression models are common, their results can be misleading under certain conditions. We demonstrate how statistical matching can be utilized as an alternative that enables the inspection of post-matching balancing. This contribution serves as an empirical demonstration of matching in bibliometrics and discusses advantages and potential pitfalls. We propose matching as an easy-to-use approach in bibliometrics to estimate effects and remove bias. To exemplify matching, we use data about papers published in Physical Review E and a selection classified as milestone papers. We analyze whether milestone papers score higher in terms of a proposed class of indicators for measuring disruptiveness than non-milestone papers. We consider disruption indicators DI1, DI5, DI1n, DI5n and DEP and test which of the disruption indicators performs best, based on the assumption that milestone papers should have higher disruption indicator values than non-milestone papers. Four matching algorithms (propensity score matching (PSM), coarsened exact matching (CEM), entropy balancing (EB) and inverse probability weighting (IPTW)) are compared. We find that CEM and EB perform best regarding covariate balancing and DI5 and DEP are well-performing to evaluate disruptiveness of published papers. Peer Review https://publons.com/publon/10.1162/qss_a_00158


2021 ◽  
pp. 1-20
Author(s):  
Israa Lewaa ◽  
Mai Sherif Hafez ◽  
Mohamed Ali Ismail

In the era of data revolution, availability and presence of data is a huge wealth that has to be utilized. Instead of making new surveys, benefit can be made from data that already exists. As enormous amounts of data become available, it is becoming essential to undertake research that involves integrating data from multiple sources in order to make the best use out of it. Statistical Data Integration (SDI) is the statistical tool for considering this issue. SDI can be used to integrate data files that have common units, and it also allows to merge unrelated files that do not share any common units, depending on the input data. The convenient method of data integration is determined according to the nature of the input data. SDI has two main methods, Record Linkage (RL) and Statistical Matching (SM). SM techniques typically aim to achieve a complete data file from different sources which do not contain the same units. This paper aims at giving a complete overview of existing SM methods, both classical and recent, in order to provide a unified summary of various SM techniques along with their drawbacks. Points for future research are suggested at the end of this paper.


2021 ◽  
pp. 1-11
Author(s):  
Riccardo D’Allerto ◽  
Meri Raggi

Big Data and the ‘Internet of Things’ are transforming the processes of data collection, storage and use. The relationship between data collected first hand (primary data) and data collected by someone else (secondary data) is becoming more fluid. New possibilities for data collection are envisaged. Data integration is emerging as a reliable strategy to overcome data shortage and other challenges such as data coverage, quality, time dis-alignment and representativeness. When we have two (or more) data sources where the units are not (at least partially) overlapping and/or the units’ unique identifiers are unavailable, the different information collected can be integrated by using Micro Statistical Matching (MiSM). MiSM has been used in the social sciences, politics and economics, but there are very few applications that use agricultural and farm data. We present an example of MiSM data integration between primary and secondary farm data on agricultural holdings in the Emilia-Romagna region (Italy). The novelty of the work lies in the fact that integration is carried out with non-parametric MiSM, which is compared to predictive mean matching and Bayesian linear regression. Moreover, the matching validity is assessed with a new strategy. The main issues addressed, the lessons learned and the use in a research field characterised by critical data shortage are discussed.


2021 ◽  
Vol 130 ◽  
pp. 150-169
Author(s):  
Pier Luigi Conti ◽  
Daniela Marella ◽  
Paola Vicard ◽  
Vincenzina Vitale

Author(s):  
Monica Jamali-Phiri ◽  
Juba Alyce Kafumba ◽  
Malcolm MacLachlan ◽  
Emma M. Smith ◽  
Ikenna D. Ebuenyi ◽  
...  

2020 ◽  
Vol 36 (4) ◽  
pp. 1175-1188
Author(s):  
Pierre Lamarche ◽  
Friderike Oehler ◽  
Irene Rioboo

Poverty indicators purely based on income statistics do not reflect the full picture of household’s economic well-being. Consumption and wealth are two additional key dimensions that determine the economic opportunities of people or material inequalities. We use non-parametric statistical matching methods to join consumption data from the Household Budget Survey to micro data from the European Union Statistics on Income and Living Conditions. In a second step, micro data from the Household Finance and Consumption Survey are joint to produce a common distribution of income, consumption and wealth variables. A variety of different indicators is then produced based on this joint data set, in particular household saving rates. Care has to be taken when interpreting the indicators, since the statistical matching is based on strong assumptions and a limited number of variables common to all of the three original data sets. We are able to show, however, that the assumptions made are justified by the use of strong proxies as matching variables. Thus, the resulting indicators have the potential to contribute to the analysis of inequality patterns and enhance the possibilities of social, and possibly fiscal, policy impact analysis.


Sign in / Sign up

Export Citation Format

Share Document