scholarly journals Robust Stability Best Subset Selection for Autocorrelated Data Based on Robust Location and Dispersion Estimator

2015 ◽  
Vol 2015 ◽  
pp. 1-8
Author(s):  
Hassan S. Uraibi ◽  
Habshah Midi ◽  
Sohel Rana

Stability selection (multisplit) approach is a variable selection procedure which relies on multisplit data to overcome the shortcomings that may occur to single-split data. Unfortunately, this procedure yields very poor results in the presence of outliers and other contamination in the original data. The problem becomes more complicated when the regression residuals are serially correlated. This paper presents a new robust stability selection procedure to remedy the combined problem of autocorrelation and outliers. We demonstrate the good performance of our proposed robust selection method using real air quality data and simulation study.

Author(s):  
RUNZE LI

In this paper, a new variable selection procedure is introduced for the analysis of uniform design and computer experiment. The new procedure is distinguished from the traditional ones in such a way that it deletes insignificant variables and estimates the coefficients of significant variables simultaneously. The new procedure has an oracle property (Fan and Li8). It is better than the best subset variable selection in terms of computational cost and model stability. It is superior to the stepwise regression because it does not ignore stochastic errors during the course of selecting variables. The proposed procedure is illustrated by two examples, one is a typical example of uniform design, and the other one is a classical example for computer experiment.


2019 ◽  
Vol 11 (10) ◽  
pp. 2944 ◽  
Author(s):  
Huijie Zhang ◽  
Ke Ren ◽  
Yiming Lin ◽  
Dezhan Qu ◽  
Zhenxin Li

Nowadays, huge volume of air quality data provides unprecedented opportunities for analyzing pollution. However, due to the high complexity, most traditional analytical methods focus on abstracting data, so these techniques discard the original structure and limit the understanding of the results. Visual analysis is a powerful technique for exploring unknown patterns since it retains the details of the original data and gives visual feedback to users. In this paper, we focus on air quality data and propose the AirInsight design, an interactive visual analytic system for recognizing, exploring, and summarizing regular patterns, as well as detecting, classifying, and interpreting abnormal cases. Based on the time-varying and multivariate features of air quality data, a dimension reduction method Composite Least Square Projection (CLSP) is proposed, which allows appreciating and interpreting the data patterns in the context of attributes. On the basis of the observed regular patterns, multiple abnormal cases are further detected, including the multivariate anomalies by the proposed Noise Hierarchical Clustering (NHC) method, abruptly changing timestamps by Time diversity (TD) indicator, and cities with unique patterns by the Geographical Surprise (GS) measure. Moreover, we combine TD and GS to group anomalies based on their underlying spatiotemporal correlations. AirInsight includes multiple coordinated views and rich interactive functions to provide contextual information from different aspects and facilitate a comprehensive understanding. In particular, a pair of glyphs are designed that provide a visual representation of the temporal variation in air quality conditions for a user-selected city. Experiments show that CLSP improves the accuracy of Least Square Projection (LSP) and that NHC has the ability to separate noises. Meanwhile, several case studies and task-based user evaluation demonstrate that our system is effective and practical for exploring and interpreting multivariate spatiotemporal patterns and anomalies in air quality data.


Methodology ◽  
2018 ◽  
Vol 14 (4) ◽  
pp. 177-188 ◽  
Author(s):  
Martin Schultze ◽  
Michael Eid

Abstract. In the construction of scales intended for the use in cross-cultural studies, the selection of items needs to be guided not only by traditional criteria of item quality, but has to take information about the measurement invariance of the scale into account. We present an approach to automated item selection which depicts the process as a combinatorial optimization problem and aims at finding a scale which fulfils predefined target criteria – such as measurement invariance across cultures. The search for an optimal solution is performed using an adaptation of the [Formula: see text] Ant System algorithm. The approach is illustrated using an application to item selection for a personality scale assuming measurement invariance across multiple countries.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


2021 ◽  
Vol 138 ◽  
pp. 104976
Author(s):  
Juan José Díaz ◽  
Ivan Mura ◽  
Juan Felipe Franco ◽  
Raha Akhavan-Tabatabaei

2020 ◽  
Vol 91 (4) ◽  
pp. 2127-2140 ◽  
Author(s):  
Glenn Thompson ◽  
John A. Power ◽  
Jochen Braunmiller ◽  
Andrew B. Lockhart ◽  
Lloyd Lynch ◽  
...  

Abstract An eruption of the Soufrière Hills Volcano (SHV) on the eastern Caribbean island of Montserrat began on 18 July 1995 and continued until February 2010. Within nine days of the eruption onset, an existing four-station analog seismic network (ASN) was expanded to 10 sites. Telemetered data from this network were recorded, processed, and archived locally using a system developed by scientists from the U.S. Geological Survey (USGS) Volcano Disaster Assistance Program (VDAP). In October 1996, a digital seismic network (DSN) was deployed with the ability to capture larger amplitude signals across a broader frequency range. These two networks operated in parallel until December 2004, with separate telemetry and acquisition systems (analysis systems were merged in March 2001). Although the DSN provided better quality data for research, the ASN featured superior real-time monitoring tools and captured valuable data including the only seismic data from the first 15 months of the eruption. These successes of the ASN have been rather overlooked. This article documents the evolution of the ASN, the VDAP system, the original data captured, and the recovery and conversion of more than 230,000 seismic events from legacy SUDS, Hypo71, and Seislog formats into Seisan database with waveform data in miniSEED format. No digital catalog existed for these events, but students at the University of South Florida have classified two-thirds of the 40,000 events that were captured between July 1995 and October 1996. Locations and magnitudes were recovered for ∼10,000 of these events. Real-time seismic amplitude measurement, seismic spectral amplitude measurement, and tiltmeter data were also captured. The result is that the ASN seismic dataset is now more discoverable, accessible, and reusable, in accordance with FAIR data principles. These efforts could catalyze new research on the 1995–2010 SHV eruption. Furthermore, many observatories have data in these same legacy data formats and might benefit from procedures and codes documented here.


Sign in / Sign up

Export Citation Format

Share Document