scholarly journals Outliers in official statistics

2020 ◽  
Vol 3 (2) ◽  
pp. 669-691
Author(s):  
Kazumi Wada

AbstractThe purpose of this manuscript is to provide a survey on the important methods addressing outliers while producing official statistics. Outliers are often unavoidable in survey statistics. They may reduce the information of survey datasets and distort estimation on each step of the survey statistics production process. This paper defines outliers to be focused on each production step and introduces practical methods to cope with them. The statistical production process is roughly divided into the following three steps. The first step is data cleaning, and outliers to be focused are that may contain mistakes to be corrected. Robust estimators of a mean vector and covariance matrix are introduced for the purpose. The next step is imputation. Among a variety of imputation methods, regression and ratio imputation are the subjects in this paper. Outliers to be focused on in this step are not erroneous but have extreme values that may distort parameter estimation. Robust estimators that are not affected by remaining outliers are introduced. The final step is estimation and formatting. We have to be careful about outliers that have extreme values with large design weights since they have a considerable influence on the final statistics products. Weight calibration methods controlling the influence are discussed based on the robust weights obtained in the previous imputation step. A few examples of practical application are also provided briefly, although multivariate outlier detection methods introduced in this paper are mostly in the research stage in the field of official statistics.

2020 ◽  
Vol 36 (2) ◽  
pp. 315-338
Author(s):  
Stefano M. Iacus ◽  
Giuseppe Porro ◽  
Silvia Salini ◽  
Elena Siletti

AbstractWith the increase of social media usage, a huge new source of data has become available. Despite the enthusiasm linked to this revolution, one of the main outstanding criticisms in using these data is selection bias. Indeed, the reference population is unknown. Nevertheless, many studies show evidence that these data constitute a valuable source because they are more timely and possess higher space granularity. We propose to adjust statistics based on Twitter data by anchoring them to reliable official statistics through a weighted, space-time, small area estimation model. As a by-product, the proposed method also stabilizes the social media indicators, which is a welcome property required for official statistics. The method can be adapted anytime official statistics exists at the proper level of granularity and for which social media usage within the population is known. As an example, we adjust a subjective well-being indicator of “working conditions” in Italy, and combine it with relevant official statistics. The weights depend on broadband coverage and the Twitter rate at province level, while the analysis is performed at regional level. The resulting statistics are then compared with survey statistics on the “quality of job” at macro-economic regional level, showing evidence of similar paths.


2020 ◽  
Vol 2020 (10) ◽  
pp. 133-1-133-7
Author(s):  
Jiho Yoon ◽  
Chulhee Lee

In this paper, we propose a new edge detection method for color images, based on the Bhattacharyya distance with adjustable block space. First, the Wiener filter was used to remove the noise as pre-processing. To calculate the Bhattacharyya distance, a pair of blocks were extracted for each pixel. To detect subtle edges, we adjusted the block space. The mean vector and covariance matrix were computed from each block. Using the mean vectors and covariance matrices, we computed the Bhattacharyya distance, which was used to detect edges. By adjusting the block space, we were able to detect weak edges, which other edge detections failed to detect. Experimental results show promising results compared to some existing edge detection methods.


Author(s):  
Yulinda Uswatun Kasanah ◽  
Pratya Poeri Suryadhini ◽  
Murni Astuti

Lean Manufacturing is a method used to increased productivity and costs reduction by minimizing waste in the production process. This study describe the use of lean manufacturing with Single Minutes Exchange Of Dies tools on the PSR’s production floor in PT XYZ, which is engaged in automotive manufacture form making of car tires. Research stage begins by analyzing waste using mapping tool and identification causes of waste in workstation curing. The next stage is analyzing every step of machine setup that occurs are workpiece setup, mold setup, curing setup, and finishing setup. Based on observations, the amount of the initial state setup time is 194,05 minutes. The improvement begin by convert internal activities setup into external setup, reduction of operator displacement activity, elimination of adjusment, and apply a parallel operation by using two operators. So the total setup time can be reduced is equal to 127,41 minutes.Keyword :  Lean manufacture, SMED, Setup time, Workstation curing Lean Manufacturing merupakan metode yang digunakan untuk meningkatkkan produktivitas dan pengurangan biaya dengan cara meminimasi pemborosan dalam proses produksi. Penelitian ini menjelaskan penggunaan Lean Manufacturing dengan tool Single Minutes Exchange Of Dies (SMED) pada lantai produksi PSR di PT XYZ, yang bergerak dalam bidang Automotive Manufacture berupa pembuatan ban mobil. Tahap penelitian diawali dengan melakukan analisis waste dengan Mapping tools dilanjutkan dengan mengidentifikasi penyebab pemborosan pada workstation curing. Tahap Selanjutnya adalah menganalisis tahapan proses setup mesin curing yang terdiri dari setup benda kerja, setup mold, setup curing, dan setup finishing. Berdasarkan hasil pengamatan, jumlah waktu setup keadaan awal adalah 194,05 menit. Perbaikan yang dilakukan adalah dengan mengkonversi aktivitas internal setup menjadi eksternal setup, pengurangan aktivitas perpindahan operator, eliminasi adjusment, dan menerapkan operasi paralel yaitu dengan menggunakan 2 operator. Sehingga total waktu setup yang dapat direduksi adalah 127,41 menit.Kata kunci: Lean manufacture, SMED, Waktu setup, Workstation curing


2021 ◽  
Vol 15 (10) ◽  
pp. 4625-4636
Author(s):  
Moritz Buchmann ◽  
Michael Begert ◽  
Stefan Brönnimann ◽  
Christoph Marty

Abstract. Daily measurements of snow depth and snowfall can vary strongly over short distances. However, it is not clear if there is a seasonal dependence in these variations and how they impact common snow climate indicators based on mean values, as well as estimated return levels of extreme events based on maximum values. To analyse the impacts of local-scale variations we compiled a unique set of parallel snow measurements from the Swiss Alps consisting of 30 station pairs with up to 77 years of parallel data. Station pairs are usually located in the same villages (or within 3 km horizontal and 150 m vertical distances). Investigated snow climate indicators include average snow depth, maximum snow depth, sum of new snow, days with snow on the ground, days with snowfall, and snow onset and disappearance dates, which are calculated for various seasons (December to February (DJF), November to April (NDJFMA), and March to April (MA)). We computed relative and absolute error metrics for all these indicators at each station pair to demonstrate the potential variability. We found the largest relative inter-pair differences for all indicators in spring (MA) and the smallest in DJF. Furthermore, there is hardly any difference between DJF and NDJFMA, which show median variations of less than 5 % for all indicators. Local-scale variability ranges between less than 24 % (DJF) and less than 43 % (MA) for all indicators and 75 % of all station pairs. The highest percentage (90 %) of station pairs with variability of less than 15 % is observed for days with snow on the ground. The lowest percentage (30 %) of station pairs with variability of less than 15 % is observed for average snow depth. Median differences of snow disappearance dates are rather small (3 d) and similar to the ones found for snow onset dates (2 d). An analysis of potential sunshine duration could not explain the higher variabilities in spring. To analyse the impact of local-scale variations on the estimation of extreme events, 50-year return levels were quantified for maximum snow depth and maximum 3 d new snow sum, which are often used for avalanche prevention measures. The found return levels are within each other's 95 % confidence intervals for all (but three) station pairs, revealing no striking differences. The findings serve as an important basis for our understanding of variabilities of commonly used snow indicators and extremal indices. Knowledge about such variabilities in combination with break-detection methods is the groundwork in view of any homogenization efforts regarding snow time series.


2021 ◽  
Vol 16 (3) ◽  
pp. 177-187
Author(s):  
Şaban Kızılarslan ◽  
Ceren Camkıran

The aim of this study is to compare the performance of robust estimators in the presence of explanatory variables with Generalized Extreme Value (GEV) distributions in the logistic regression model. Existence of extreme values in the logistic regression model negatively affects the bias and effectiveness of classical Maximum Likelihood (ML) estimators. For this reason, robust estimators that are less sensitive to extreme values have been developed. Random variables with extreme values may be fit in one of specific distributions. In study, the GEV distribution family was examined and five robust estimators were compared for the Fréchet, Gumbel and Weibull distributions. To the simulation results, the CUBIF estimator is prominent according to both bias and efficiency criteria for small samples. In medium and large samples, while the MALLOWS estimator has the minimum bias, the CUBIF estimator has the best efficiency. The same results apply for different contamination ratios and different scale parameter values of the distributions. Simulation findings were supported by a meteorological real data application.


Author(s):  
David Wong

Local Moran and local G-statistic are commonly used to identify high-value (hot spot) and low-value (cold spot) spatial clusters for various purposes. However, these popular tools are based on the concept of spatial autocorrelation or association (SA), but do not explicitly consider if values are high or low enough to deserve attention. Resultant clusters may not include areas with extreme values that practitioners often want to identify when using these tools. Additionally, these tools are based on statistics that assume observed values or estimates are highly accurate with error levels that can be ignored or are spatially uniform. In this article, problems associated with these popular SA-based cluster detection tools were illustrated. Alternative hot spot-cold spot detection methods considering estimate error were explored. The class separability classification method was demonstrated to produce useful results. A heuristic hot spot-cold spot identification method was also proposed. Based on user-determined threshold values, areas with estimates exceeding the thresholds were treated as seeds. These seeds and neighboring areas with estimates that were not statistically different from those in the seeds at a given confidence level constituted the hot spots and cold spots. Results from the heuristic method were intuitively meaningful and practically valuable.


2021 ◽  
Vol 9 (SPE2) ◽  
Author(s):  
Tatyana Igorevna Nikitina ◽  
Aleksey Aleksandrovich Nikitin ◽  
Bulat Ildarovich Yakupov

The relevance of the topic is conditioned by the need for scientific understanding of media analytics and a qualitative understanding of its methods as promotion tools, as well as for the introduction of the research stage of media consumption into the film production process to popularize and further promote domestic film products among the Russian audience. In the Russian film industry, this experience is especially necessary within modern economic conditions. The need to ensure the financial success of Russian film products, which are subsidized by the state, has led to the interest in finding new approaches promoting and increasing the loyalty of the cinema audience, which is especially important at a time when only 1 out of 10 domestic films pays for their film production during the distribution period.


Author(s):  
Konrad Kania ◽  
Tomasz Rymarczyk

The article contains a description of the quality control system based on optical detection algorithms. It plays an increasingly important in the production process. The development of new systems based on the technology of optical detection methods to a large degree can improve the production process at different stages.


Author(s):  
Marco Mata Rodriguez ◽  
Martin Herrera-Trejo ◽  
Arturo Isaías Martínez Enríquez ◽  
Rodolfo Sanchez-Martinez ◽  
M. J. Castro-Román ◽  
...  

The increasing demand for higher inclusion cleanliness levels motivates the control over the formation and evolution of inclusions in the steel production process. In this work, the evolution of the chemical composition and size distribution of inclusions throughout a slab production process of Al-killed steel, including ladle furnace (LF) treatment and continuous casting (CC), was followed. The initial solid Al2O3 and Al2O3-MgO inclusions were modified to liquid Al2O3-CaO-MgO inclusions during LF treatment. The evolution of the size distributions during LF treatment was associated with the growth and removal of inclusions, as new inclusions were not created after the deoxidation process, according to a population density function (PDF) analysis. Additionally, the size distributions tended to be similar as the LF treatment progressed regardless of their initial features, whereas they differed during CC. Analysis of the upper tails of the distributions through generalized extreme values theory showed that inclusion distributions shifted from larger to smaller sizes as the process progressed. There were great changes in the distributions of large inclusions throughout the LF treatment and between the end of the LF treatment and the start of the CC process. Additionally, distributions of large inclusions differed at the end of the LF treatment, whereas such differences decreased as CC progressed.


Author(s):  
Anne F. Bushnell ◽  
Sarah Webster ◽  
Lynn S. Perlmutter

Apoptosis, or programmed cell death, is an important mechanism in development and in diverse disease states. The morphological characteristics of apoptosis were first identified using the electron microscope. Since then, DNA laddering on agarose gels was found to correlate well with apoptotic cell death in cultured cells of dissimilar origins. Recently numerous DNA nick end labeling methods have been developed in an attempt to visualize, at the light microscopic level, the apoptotic cells responsible for DNA laddering.The present studies were designed to compare various tissue processing techniques and staining methods to assess the occurrence of apoptosis in post mortem tissue from Alzheimer's diseased (AD) and control human brains by DNA nick end labeling methods. Three tissue preparation methods and two commercial DNA nick end labeling kits were evaluated: the Apoptag kit from Oncor and the Biotin-21 dUTP 3' end labeling kit from Clontech. The detection methods of the two kits differed in that the Oncor kit used digoxigenin dUTP and anti-digoxigenin-peroxidase and the Clontech used biotinylated dUTP and avidinperoxidase. Both used 3-3' diaminobenzidine (DAB) for final color development.


Sign in / Sign up

Export Citation Format

Share Document