THE ANALYSIS OF OUTLYING DATA POINTS USING ROBUST REGRESSION: A MULTIVARIATE PROBLEM-BANK IDENTIFICATION MODEL

1982 ◽  
Vol 13 (1) ◽  
pp. 71-81 ◽  
Author(s):  
David E. Booth
2016 ◽  
Vol 94 (6) ◽  
pp. 337-364 ◽  
Author(s):  
Andrew J. Leone ◽  
Miguel Minutti-Meza ◽  
Charles E. Wasley

ABSTRACT Accounting studies often encounter observations with extreme values that can influence coefficient estimates and inferences. Two widely used approaches to address influential observations in accounting studies are winsorization and truncation. While expedient, both depend on researcher-selected cutoffs, applied on a variable-by-variable basis, which, unfortunately, can alter legitimate data points. We compare the efficacy of winsorization, truncation, influence diagnostics (Cook's Distance), and robust regression at identifying influential observations. Replication of three published accounting studies shows that the choice impacts estimates and inferences. Simulation evidence shows that winsorization and truncation are ineffective at identifying influential observations. While influence diagnostics and robust regression both outperform winsorization and truncation, overall, robust regression outperforms the other methods. Since robust regression is a theoretically appealing and easily implementable approach based on a model's residuals, we recommend that future accounting studies consider using robust regression, or at least report sensitivity tests using robust regression. JEL Classifications: C12; C13; C18; C51; C52; M41. Data Availability: Data are available from the public sources cited in the text.


OR Spectrum ◽  
2021 ◽  
Author(s):  
Nathan Sudermann-Merx ◽  
Steffen Rebennack

AbstractThe design of regression models that are not affected by outliers is an important task which has been subject of numerous papers within the statistics community for the last decades. Prominent examples of robust regression models are least trimmed squares (LTS), where the k largest squared deviations are ignored, and least trimmed absolute deviations (LTA) which ignores the k largest absolute deviations. The numerical complexity of both models is driven by the number of binary variables and by the value k of ignored deviations. We introduce leveraged least trimmed absolute deviations (LLTA) which exploits that LTA is already immune against y-outliers. Therefore, LLTA has only to be guarded against outlying values in x, so-called leverage points, which can be computed beforehand, in contrast to y-outliers. Thus, while the mixed-integer formulations of LTS and LTA have as many binary variables as data points, LLTA only needs one binary variable per leverage point, resulting in a significant reduction of binary variables. Based on 11 data sets from the literature, we demonstrate that (1) LLTA’s prediction quality improves much faster than LTS and as fast as LTA for increasing values of k and (2) that LLTA solves the benchmark problems about 80 times faster than LTS and about five times faster than LTA, in median.


Author(s):  
Zenji Horita ◽  
Ryuzo Nishimachi ◽  
Takeshi Sano ◽  
Minoru Nemoto

Absorption correction is often required in quantitative x-ray microanalysis of thin specimens using the analytical electron microscope. For such correction, it is convenient to use the extrapolation method[l] because the thickness, density and mass absorption coefficient are not necessary in the method. The characteristic x-ray intensities measured for the analysis are only requirement for the absorption correction. However, to achieve extrapolation, it is imperative to obtain data points more than two at different thicknesses in the identical composition. Thus, the method encounters difficulty in analyzing a region equivalent to beam size or the specimen with uniform thickness. The purpose of this study is to modify the method so that extrapolation becomes feasible in such limited conditions. Applicability of the new form is examined by using a standard sample and then it is applied to quantification of phases in a Ni-Al-W ternary alloy.The earlier equation for the extrapolation method was formulated based on the facts that the magnitude of x-ray absorption increases with increasing thickness and that the intensity of a characteristic x-ray exhibiting negligible absorption in the specimen is used as a measure of thickness.


1997 ◽  
Vol 78 (02) ◽  
pp. 855-858 ◽  
Author(s):  
Armando Tripodi ◽  
Veena Chantarangkul ◽  
Marigrazia Clerici ◽  
Barbara Negri ◽  
Pier Mannuccio Mannucci

SummaryA key issue for the reliable use of new devices for the laboratory control of oral anticoagulant therapy with the INR is their conformity to the calibration model. In the past, their adequacy has mostly been assessed empirically without reference to the calibration model and the use of International Reference Preparations (IRP) for thromboplastin. In this study we reviewed the requirements to be fulfilled and applied them to the calibration of a new near-patient testing device (TAS, Cardiovascular Diagnostics) which uses thromboplastin-containing test cards for determination of the INR. On each of 10 working days citrat- ed whole blood and plasma samples were obtained from 2 healthy subjects and 6 patients on oral anticoagulants. PT testing on whole blood and plasma was done with the TAS and parallel testing for plasma by the manual technique with the IRP CRM 149S. Conformity to the calibration model was judged satisfactory if the following requirements were met: (i) there was a linear relationship between paired log-PTs (TAS vs CRM 149S); (ii) the regression line drawn through patients data points, passed through those of normals; (iii) the precision of the calibration expressed as the CV of the slope was <3%. A good linear relationship was observed for calibration plots for plasma and whole blood (r = 0.98). Regression lines drawn through patients data points, passed through those of normals. The CVs of the slope were in both cases 2.2% and the ISIs were 0.965 and 1.000 for whole blood and plasma. In conclusion, our study shows that near-patient testing devices can be considered reliable tools to measure INR in patients on oral anticoagulants and provides guidelines for their evaluation.


Author(s):  
Uppuluri Sirisha ◽  
G. Lakshme Eswari

This paper briefly introduces Internet of Things(IOT) as a intellectual connectivity among the physical objects or devices which are gaining massive increase in the fields like efficiency, quality of life and business growth. IOT is a global network which is interconnecting around 46 million smart meters in U.S. alone with 1.1 billion data points per day[1]. The total installation base of IOT connecting devices would increase to 75.44 billion globally by 2025 with a increase in growth in business, productivity, government efficiency, lifestyle, etc., This paper familiarizes the serious concern such as effective security and privacy to ensure exact and accurate confidentiality, integrity, authentication access control among the devices.


Author(s):  
Ryan Ka Yau Lai ◽  
Youngah Do

This article explores a method of creating confidence bounds for information-theoretic measures in linguistics, such as entropy, Kullback-Leibler Divergence (KLD), and mutual information. We show that a useful measure of uncertainty can be derived from simple statistical principles, namely the asymptotic distribution of the maximum likelihood estimator (MLE) and the delta method. Three case studies from phonology and corpus linguistics are used to demonstrate how to apply it and examine its robustness against common violations of its assumptions in linguistics, such as insufficient sample size and non-independence of data points.


1979 ◽  
Vol 7 (1) ◽  
pp. 3-13
Author(s):  
F. C. Brenner ◽  
A. Kondo

Abstract Tread wear data are frequently fitted by a straight line having average groove depth as the ordinate and mileage as the abscissa. The authors have observed that the data points are not randomly scattered about the line but exist in runs of six or seven points above the line followed by the same number below the line. Attempts to correlate these cyclic deviations with climatic data failed. Harmonic content analysis of the data for each individual groove showed strong periodic behavior. Groove 1, a shoulder groove, had two important frequencies at 40 960 and 20 480 km (25 600 and 12 800 miles); Grooves 2 and 3, the inside grooves, had important frequencies at 10 240, 13 760, and 20 480 km (6400, 8600, and 12 800 miles), with Groove 4 being similar. A hypothesis is offered as a possible explanation for the phenomenon.


Sign in / Sign up

Export Citation Format

Share Document