Sampling-Based Partitioning in MapReduce for Skewed Data

Author(s):  
Yujie Xu ◽  
Peng Zou ◽  
Wenyu Qu ◽  
Zhiyang Li ◽  
Keqiu Li ◽  
...  
Keyword(s):  
Sankhya B ◽  
2021 ◽  
Author(s):  
Zhixin Lun ◽  
Ravindra Khattree
Keyword(s):  

2019 ◽  
Vol 2019 ◽  
pp. 1-7
Author(s):  
Chao Zhao ◽  
Jinyan Yang

The standard boxplot is one of the most popular nonparametric tools for detecting outliers in univariate datasets. For Gaussian or symmetric distributions, the chance of data occurring outside of the standard boxplot fence is only 0.7%. However, for skewed data, such as telemetric rain observations in a real-time flood forecasting system, the probability is significantly higher. To overcome this problem, a medcouple (MC) that is robust to resisting outliers and sensitive to detecting skewness was introduced to construct a new robust skewed boxplot fence. Three types of boxplot fences related to MC were analyzed and compared, and the exponential function boxplot fence was selected. Operating on uncontaminated as well as simulated contaminated data, the results showed that the proposed method could produce a lower swamping rate and higher accuracy than the standard boxplot and semi-interquartile range boxplot. The outcomes of this study demonstrated that it is reasonable to use the new robust skewed boxplot method to detect outliers in skewed rain distributions.


2020 ◽  
Vol 1 (1) ◽  
pp. 9-16
Author(s):  
O. L. Aako ◽  
J. A. Adewara ◽  
K. S Adekeye ◽  
E. B. Nkemnole

The fundamental assumption of variable control charts is that the data are normally distributed and spread randomly about the mean. Process data are not always normally distributed, hence there is need to set up appropriate control charts that gives accurate control limits to monitor processes that are skewed. In this study Shewhart-type control charts for monitoring positively skewed data that are assumed to be from Marshall-Olkin Inverse Loglogistic Distribution (MOILLD) was developed. Average Run Length (ARL) and Control Limits Interval (CLI) were adopted to assess the stability and performance of the MOILLD control chart. The results obtained were compared with Classical Shewhart (CS) and Skewness Correction (SC) control charts using the ARL and CLI. It was discovered that the control charts based on MOILLD performed better and are more stable compare to CS and SC control charts. It is therefore recommended that for positively skewed data, a Marshall-Olkin Inverse Loglogistic Distribution based control chart will be more appropriate.


Sign in / Sign up

Export Citation Format

Share Document