Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference

Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it ispossibleto find a specification that fits the author's favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.

Download Full-text

A new generic method to improve machine learning applications in official statistics

Statistical Journal of the IAOS ◽

10.3233/sji-210885 ◽

2021 ◽

pp. 1-16

Author(s):

Kevin Kloos

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Statistical Properties ◽

Machine Learning Algorithms ◽

Official Statistics ◽

Academic Literature ◽

Misclassification Bias ◽

Squared Error ◽

Machine Learning Applications ◽

Applications Of Machine Learning

The use of machine learning algorithms at national statistical institutes has increased significantly over the past few years. Applications range from new imputation schemes to new statistical output based entirely on machine learning. The results are promising, but recent studies have shown that the use of machine learning in official statistics always introduces a bias, known as misclassification bias. Misclassification bias does not occur in traditional applications of machine learning and therefore it has received little attention in the academic literature. In earlier work, we have collected existing methods that are able to correct misclassification bias. We have compared their statistical properties, including bias, variance and mean squared error. In this paper, we present a new generic method to correct misclassification bias for time series and we derive its statistical properties. Moreover, we show numerically that it has a lower mean squared error than the existing alternatives in a wide variety of settings. We believe that our new method may improve machine learning applications in official statistics and we aspire that our work will stimulate further methodological research in this area.

Download Full-text

Statistical properties of sketching algorithms

Biometrika ◽

10.1093/biomet/asaa062 ◽

2020 ◽

Author(s):

D C Ahfock ◽

W J Astle ◽

S Richardson

Keyword(s):

Data Compression ◽

Mean Squared Error ◽

Signal To Noise Ratio ◽

Statistical Properties ◽

Generation Process ◽

Compression Technique ◽

Conditional Central Limit Theorem ◽

Squared Error ◽

Original Dataset ◽

Science Community

Summary Sketching is a probabilistic data compression technique that has been largely developed by the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a smaller surrogate dataset. Typically, inference proceeds on the compressed dataset. Sketching algorithms generally use random projections to compress the original dataset, and this stochastic generation process makes them amenable to statistical analysis. We argue that the sketched data can be modelled as a random sample, thus placing this family of data compression methods firmly within an inferential framework. In particular, we focus on the Gaussian, Hadamard and Clarkson–Woodruff sketches and their use in single-pass sketching algorithms for linear regression with huge samples. We explore the statistical properties of sketched regression algorithms and derive new distributional results for a large class of sketching estimators. A key result is a conditional central limit theorem for data-oblivious sketches. An important finding is that the best choice of sketching algorithm in terms of mean squared error is related to the signal-to-noise ratio in the source dataset. Finally, we demonstrate the theory and the limits of its applicability on two datasets.

Download Full-text

Causal Inference without Balance Checking: Coarsened Exact Matching

Political Analysis ◽

10.1093/pan/mpr013 ◽

2012 ◽

Vol 20 (1) ◽

pp. 1-24 ◽

Cited By ~ 1005

Author(s):

Stefano M. Iacus ◽

Gary King ◽

Giuseppe Porro

Keyword(s):

Causal Inference ◽

Open Source ◽

Open Source Software ◽

Statistical Properties ◽

Exact Matching ◽

Causal Inferences ◽

Coarsened Exact Matching ◽

Practical Applications ◽

Matching Methods ◽

Wide Range

We discuss a method for improving causal inferences called “Coarsened Exact Matching” (CEM), and the new “Monotonic Imbalance Bounding” (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of statistical properties not available in most other matching methods but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software forR, Stata, andSPSSthat implement all our suggestions.

Download Full-text

A $C_p$ criterion for semiparametric causal inference

Biometrika ◽

10.1093/biomet/asx054 ◽

2017 ◽

Vol 104 (4) ◽

pp. 845-861 ◽

Cited By ~ 3

Author(s):

Takamichi Baba ◽

Takayuki Kanemori ◽

Yoshiyuki Ninomiya

Keyword(s):

Causal Inference ◽

Mean Squared Error ◽

Real Data ◽

Information Criterion ◽

Potential Outcome ◽

Squared Error ◽

Doubly Robust Estimation ◽

Inverse Probability Weighted ◽

Doubly Robust ◽

Inverse Probability Weighted Estimation

Summary For marginal structural models, which play an important role in causal inference, we consider a model selection problem within a semiparametric framework using inverse-probability-weighted estimation or doubly robust estimation. In this framework, the modelling target is a potential outcome that may be missing, so there is no classical information criterion. We define a mean squared error for treating the potential outcome and derive an asymptotic unbiased estimator as a $C_{p}$ criterion using an ignorable treatment assignment condition. Simulation shows that the proposed criterion outperforms a conventional one by providing smaller squared errors and higher frequencies of selecting the true model in all the settings considered. Moreover, in a real-data analysis we found a clear difference between the two criteria.

Download Full-text

A UNIFIED APPROACH FOR GENERATING OPTIMUM GRADIENT FIR ADAPTIVE ALGORITHMS WITH TIME-VARYING CONVERGENCE FACTORS

Journal of Circuits System and Computers ◽

10.1142/s0218126691000203 ◽

1991 ◽

Vol 01 (01) ◽

pp. 19-42 ◽

Cited By ~ 8

Author(s):

WASFY B. MIKHAEL ◽

FRANK H. WU

Keyword(s):

Computational Complexity ◽

Mean Squared Error ◽

Adaptive Algorithms ◽

Time Varying ◽

Unified Approach ◽

Squared Error ◽

Wide Range ◽

The Mean ◽

Single Data ◽

And Performance

In this paper, a unified approach for generating fast block- and sequential-gradient LMS FIR tapped delay line (TDL) adaptive algorithms is presented. These algorithms employ time-varying convergence factors which are tailored for the adaptive filter coefficients and updated at each block or single data iteration. The convergence factors are chosen to minimize the mean squared error (MSE) and are easily computed from readily available signals. The general formulation leads to three classes of adaptive algorithms. These algorithms. ordered in a descending order of their computational complexity and performance. are: the optimum block adaptive algorithm with individual adaptation of parameters (OBAI), the optimum block adaptive (OBA) and OBA shifting (ODAS) algorithms, and the homogeneous adaptive (HA) algorithm. In this paper, it is shown how each class of algorithms is obtained from the previous one, by a simple trade-off between adaptation performance and computational complexity. Implementation aspects of the generated algorithms are examined and their performance is evaluated and compared with several recently proposed algorithms by means of computer simulations under a wide range of adaptation conditions. The evaluation results show that the generated algorithms have attractive features in the comparisons due to the considerable reduction in the number of iterations required for a given adaptation accuracy. The improvement, however. is achieved at the expense of a relatively modest increase in the number of computations per data sample.

Download Full-text

Use of reflectance spectroscopy to estimate the organic carbon and CaCO3 contents of soils

Agrokémia és Talajtan ◽

10.1556/agrokem.60.2012.2.5 ◽

2012 ◽

Vol 61 (2) ◽

pp. 277-290 ◽

Cited By ~ 1

Author(s):

Ádám Csorba ◽

Vince Láng ◽

László Fenyvesi ◽

Erika Michéli

Keyword(s):

Organic Carbon ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Mean Squared Error ◽

Reflectance Spectroscopy ◽

Least Squares Regression ◽

Root Mean Squared Error ◽

Squared Error

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.

Download Full-text

Minimax Mean-Squared Error Location Estimation Using TOA Measurements

IEICE Transactions on Communications ◽

10.1587/transcom.e93.b.2223 ◽

2010 ◽

Vol E93-B (8) ◽

pp. 2223-2225 ◽

Cited By ~ 2

Author(s):

Chih-Chang SHEN ◽

Ann-Chen CHANG

Keyword(s):

Mean Squared Error ◽

Location Estimation ◽

Squared Error ◽

Error Location

Download Full-text

Using Approximation Non-Bayesian Computation with Fuzzy Data to Estimation Inverse Weibull Parameters and Reliability Function

Ibn AL- Haitham Journal For Pure and Applied Science ◽

10.30526/2017.ihsciconf.1811 ◽

2018 ◽

pp. 397

Author(s):

Nadia Hashim Al-Noor ◽

Shurooq A.K. Al-Sultany

Keyword(s):

Maximum Likelihood ◽

Expectation Maximization ◽

Mean Squared Error ◽

Maximum Likelihood Estimators ◽

Reliability Function ◽

Fuzzy Data ◽

Monte Carlo Simulation Study ◽

Weibull Parameters ◽

Squared Error ◽

Newton Raphson

In real situations all observations and measurements are not exact numbers but more or less non-exact, also called fuzzy. So, in this paper, we use approximate non-Bayesian computational methods to estimate inverse Weibull parameters and reliability function with fuzzy data. The maximum likelihood and moment estimations are obtained as non-Bayesian estimation. The maximum likelihood estimators have been derived numerically based on two iterative techniques namely “Newton-Raphson” and the “Expectation-Maximization” techniques. In addition, we provide compared numerically through Monte-Carlo simulation study to obtained estimates of the parameters and reliability function in terms of their mean squared error values and integrated mean squared error values respectively.

Download Full-text

Image Quality Enhancing by Efficient Histogram Equalization

Wasit Journal of Engineering Sciences ◽

10.31185/ejuow.vol2.iss2.29 ◽

2014 ◽

Vol 2 (2) ◽

pp. 47-58

Author(s):

Ismail Sh. Baqer

Keyword(s):

Image Quality ◽

Contrast Enhancement ◽

Mean Squared Error ◽

Signal To Noise Ratio ◽

Histogram Equalization ◽

Gray Level ◽

Signal To Noise ◽

Squared Error ◽

Noise Ratio

A two Level Image Quality enhancement is proposed in this paper. In the first level, Dualistic Sub-Image Histogram Equalization DSIHE method decomposes the original image into two sub-images based on median of original images. The second level deals with spikes shaped noise that may appear in the image after processing. We presents three methods of image enhancement GHE, LHE and proposed DSIHE that improve the visual quality of images. A comparative calculations is being carried out on above mentioned techniques to examine objective and subjective image quality parameters e.g. Peak Signal-to-Noise Ratio PSNR values, entropy H and mean squared error MSE to measure the quality of gray scale enhanced images. For handling gray-level images, convenient Histogram Equalization methods e.g. GHE and LHE tend to change the mean brightness of an image to middle level of the gray-level range limiting their appropriateness for contrast enhancement in consumer electronics such as TV monitors. The DSIHE methods seem to overcome this disadvantage as they tend to preserve both, the brightness and contrast enhancement. Experimental results show that the proposed technique gives better results in terms of Discrete Entropy, Signal to Noise ratio and Mean Squared Error values than the Global and Local histogram-based equalization methods

Download Full-text

Thematic Maps for the Variation of Bearing Capacity of Soil Using SPTs and MATLAB

Geosciences ◽

10.3390/geosciences10090329 ◽

2020 ◽

Vol 10 (9) ◽

pp. 329

Author(s):

Mahdi O. Karkush ◽

Mahmood D. Ahmed ◽

Ammar Abdul-Hassan Sheikha ◽

Ayad Al-Rumaithi

Keyword(s):

Bearing Capacity ◽

Mean Squared Error ◽

Ground Level ◽

The Other ◽

Thematic Maps ◽

First Order ◽

Squared Error ◽

Order Polynomial ◽

Interpolation Polynomials ◽

Penetration Tests

The current study involves placing 135 boreholes drilled to a depth of 10 m below the existing ground level. Three standard penetration tests (SPT) are performed at depths of 1.5, 6, and 9.5 m for each borehole. To produce thematic maps with coordinates and depths for the bearing capacity variation of the soil, a numerical analysis was conducted using MATLAB software. Despite several-order interpolation polynomials being used to estimate the bearing capacity of soil, the first-order polynomial was the best among the other trials due to its simplicity and fast calculations. Additionally, the root mean squared error (RMSE) was almost the same for the all of the tried models. The results of the study can be summarized by the production of thematic maps showing the variation of the bearing capacity of the soil over the whole area of Al-Basrah city correlated with several depths. The bearing capacity of soil obtained from the suggested first-order polynomial matches well with those calculated from the results of SPTs with a deviation of ±30% at a 95% confidence interval.

Download Full-text