A Simulation Study of Semiparametric Estimation in Copula Models Based on Minimum Alpha-Divergence

The purpose of this paper is to introduce two semiparametric methods for the estimation of copula parameter. These methods are based on minimum Alpha-Divergence between a non-parametric estimation of copula density using local likelihood probit transformation method and a true copula density function. A Monte Carlo study is performed to measure the performance of these methods based on Hellinger distance and Neyman divergence as special cases of Alpha-Divergence. Simulation results are compared to the Maximum Pseudo-Likelihood (MPL) estimation as a conventional estimation method in well-known bivariate copula models. These results show that the proposed method based on Minimum Pseudo Hellinger Distance estimation has a good performance in small sample size and weak dependency situations. The parameter estimation methods are applied to a real data set in Hydrology.

Download Full-text

PARAMETRIC ESTIMATION METHODS FOR BIVARIATE COPULA IN RAINFALL APPLICATION

Jurnal Teknologi ◽

10.11113/jt.v81.12059 ◽

2018 ◽

Vol 81 (1) ◽

Author(s):

Rahmah Mohd Lokoman ◽

Fadhilah Yusof

Keyword(s):

Empirical Study ◽

Sample Size ◽

Small Sample Size ◽

Empirical Studies ◽

Daily Rainfall ◽

Small Sample ◽

Parametric Estimation ◽

Estimation Methods ◽

Dependence Structure ◽

Correlation Level

This study focuses on the parametric methods: maximum likelihood (ML), inference function of margins (IFM), and adaptive maximization by parts (AMBP) in estimating copula dependence parameter. Their performance is compared through simulation and empirical studies. For empirical study, 44 years of daily rainfall data of Station Kuala Krai and Station Ulu Sekor are used. The correlation of the two stations is statistically significant at 0.4137. The results from the simulation study show that when the sample size is small (n <1000) for correlation level less than 0.80, IFM has the best performance. While, when the sample size is large (n ≥ 1000) for any correlation level, AMBP has the best performance. The results from the empirical study also show that AMBP has the best performance when the sample size is large. Thus, in order to estimate a precise Copula dependence parameter, it can be concluded that for parametric approaches, IFM is preferred for small sample size and has correlation level less than 0.80 and AMBP is preferred for larger sample size and for any correlation level. The results obtained in this study highlight the importance of estimating the dependence structure of the hydrological data. By using the fitted copula, Malaysian Meteorological Department will able to generate hydrological events for a system performance analysis such as flood and drought control system.

Download Full-text

Reliability estimation for drive axle of wheel loader under extreme small sample

Advances in Mechanical Engineering ◽

10.1177/1687814019836849 ◽

2019 ◽

Vol 11 (3) ◽

pp. 168781401983684 ◽

Cited By ~ 1

Author(s):

Leilei Cao ◽

Lulu Cao ◽

Lei Guo ◽

Kui Liu ◽

Xin Ding

Keyword(s):

Sample Size ◽

Promising Result ◽

Small Sample Size ◽

Bootstrap Method ◽

Estimation Method ◽

Small Sample ◽

Reliability Estimation ◽

Fatigue Reliability ◽

Drive Axle ◽

Semi Empirical

It is difficult to have enough samples to implement the full-scale life test on the loader drive axle due to high cost. But the extreme small sample size can hardly meet the statistical requirements of the traditional reliability analysis methods. In this work, the method of combining virtual sample expanding with Bootstrap is proposed to evaluate the fatigue reliability of the loader drive axle with extreme small sample. First, the sample size is expanded by virtual augmentation method to meet the requirement of Bootstrap method. Then, a modified Bootstrap method is used to evaluate the fatigue reliability of the expanded sample. Finally, the feasibility and reliability of the method are verified by comparing the results with the semi-empirical estimation method. Moreover, from the practical perspective, the promising result from this study indicates that the proposed method is more efficient than the semi-empirical method. The proposed method provides a new way for the reliability evaluation of costly and complex structures.

Download Full-text

A Multi-Linear Statistical Method for Discriminant Analysis of 2D Frontal Face Images

Cross-Disciplinary Applications of Artificial Intelligence and Pattern Recognition - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-61350-429-1.ch002 ◽

2012 ◽

pp. 18-33 ◽

Cited By ~ 1

Author(s):

Carlos Eduardo Thomaz ◽

Vagner do Amaral ◽

Gilson Antonio Giraldi ◽

Edson Caoru Kitani ◽

João Ricardo Sato ◽

...

Keyword(s):

Small Sample Size ◽

Face Image ◽

Small Sample ◽

High Dimensional ◽

Data Set ◽

Linear Discriminant ◽

Face Images ◽

Linear Framework ◽

Facial Changes ◽

2D Data

This chapter describes a multi-linear discriminant method of constructing and quantifying statistically significant changes on human identity photographs. The approach is based on a general multivariate two-stage linear framework that addresses the small sample size problem in high-dimensional spaces. Starting with a 2D data set of frontal face images, the authors determine a most characteristic direction of change by organizing the data according to the patterns of interest. These experiments on publicly available face image sets show that the multi-linear approach does produce visually plausible results for gender, facial expression and aging facial changes in a simple and efficient way. The authors believe that such approach could be widely applied for modeling and reconstruction in face recognition and possibly in identifying subjects after a lapse of time.

Download Full-text

SSMD: a semi-supervised approach for a robust cell type identification and deconvolution of mouse transcriptomics data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa307 ◽

2020 ◽

Author(s):

Xiaoyu Lu ◽

Szu-Wei Tu ◽

Wennan Chang ◽

Changlin Wan ◽

Jiashi Wang ◽

...

Keyword(s):

Small Sample Size ◽

Cell Types ◽

Small Sample ◽

Training Data ◽

Mouse Tissue ◽

Marker Genes ◽

Specific Cell ◽

Cell Type ◽

Data Set ◽

Transcriptomics Data

Abstract Deconvolution of mouse transcriptomic data is challenged by the fact that mouse models carry various genetic and physiological perturbations, making it questionable to assume fixed cell types and cell type marker genes for different data set scenarios. We developed a Semi-Supervised Mouse data Deconvolution (SSMD) method to study the mouse tissue microenvironment. SSMD is featured by (i) a novel nonparametric method to discover data set-specific cell type signature genes; (ii) a community detection approach for fixing cell types and their marker genes; (iii) a constrained matrix decomposition method to solve cell type relative proportions that is robust to diverse experimental platforms. In summary, SSMD addressed several key challenges in the deconvolution of mouse tissue data, including: (i) varied cell types and marker genes caused by highly divergent genotypic and phenotypic conditions of mouse experiment; (ii) diverse experimental platforms of mouse transcriptomics data; (iii) small sample size and limited training data source and (iv) capable to estimate the proportion of 35 cell types in blood, inflammatory, central nervous or hematopoietic systems. In silico and experimental validation of SSMD demonstrated its high sensitivity and accuracy in identifying (sub) cell types and predicting cell proportions comparing with state-of-the-arts methods. A user-friendly R package and a web server of SSMD are released via https://github.com/xiaoyulu95/SSMD.

Download Full-text

Deep generative models in DataSHIELD

BMC Medical Research Methodology ◽

10.1186/s12874-021-01237-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Stefan Lenz ◽

Moritz Hess ◽

Harald Binder

Keyword(s):

Genetic Variant ◽

Small Sample Size ◽

Synthetic Data ◽

Routine Data ◽

Original Data ◽

Generative Models ◽

Small Sample ◽

Generative Adversarial Networks ◽

Artificial Data ◽

Data Set

Abstract Background The best way to calculate statistics from medical data is to use the data of individual patients. In some settings, this data is difficult to obtain due to privacy restrictions. In Germany, for example, it is not possible to pool routine data from different hospitals for research purposes without the consent of the patients. Methods The DataSHIELD software provides an infrastructure and a set of statistical methods for joint, privacy-preserving analyses of distributed data. The contained algorithms are reformulated to work with aggregated data from the participating sites instead of the individual data. If a desired algorithm is not implemented in DataSHIELD or cannot be reformulated in such a way, using artificial data is an alternative. Generating artificial data is possible using so-called generative models, which are able to capture the distribution of given data. Here, we employ deep Boltzmann machines (DBMs) as generative models. For the implementation, we use the package “BoltzmannMachines” from the Julia programming language and wrap it for use with DataSHIELD, which is based on R. Results We present a methodology together with a software implementation that builds on DataSHIELD to create artificial data that preserve complex patterns from distributed individual patient data. Such data sets of artificial patients, which are not linked to real patients, can then be used for joint analyses. As an exemplary application, we conduct a distributed analysis with DBMs on a synthetic data set, which simulates genetic variant data. Patterns from the original data can be recovered in the artificial data using hierarchical clustering of the virtual patients, demonstrating the feasibility of the approach. Additionally, we compare DBMs, variational autoencoders, generative adversarial networks, and multivariate imputation as generative approaches by assessing the utility and disclosure of synthetic data generated from real genetic variant data in a distributed setting with data of a small sample size. Conclusions Our implementation adds to DataSHIELD the ability to generate artificial data that can be used for various analyses, e.g., for pattern recognition with deep learning. This also demonstrates more generally how DataSHIELD can be flexibly extended with advanced algorithms from languages other than R.

Download Full-text

IMPROVED PARAMETER ESTIMATION FOR VARIANCE-STABILIZING TRANSFORMATION OF GENE-EXPRESSION MICROARRAY DATA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000806 ◽

2004 ◽

Vol 02 (04) ◽

pp. 669-679 ◽

Cited By ~ 3

Author(s):

MASATO INOUE ◽

SHIN-ICHI NISHIMURA ◽

GEN HORI ◽

HIROYUKI NAKAHARA ◽

MICHIKO SAITO ◽

...

Keyword(s):

Gene Expression ◽

Parameter Estimation ◽

Small Sample Size ◽

Estimation Method ◽

Small Sample ◽

Gene Expression Microarray ◽

Expression Microarray ◽

Gene Expression Microarray Data ◽

Log Normal ◽

Poor Management

A gene-expression microarray datum is modeled as an exponential expression signal (log-normal distribution) and additive noise. Variance-stabilizing transformation based on this model is useful for improving the uniformity of variance, which is often assumed for conventional statistical analysis methods. However, the existing method of estimating transformation parameters may not be perfect because of poor management of outliers. By employing an information normalization technique, we have developed an improved parameter estimation method, which enables statistically more straightforward outlier exclusion and works well even in the case of small sample size. Validation of this method with experimental data has suggested that it is superior to the conventional method.

Download Full-text

Approximated fast estimator for the shape parameter of generalized Gaussian distribution for a small sample size

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.1515/bpasts-2015-0046 ◽

2015 ◽

Vol 63 (2) ◽

pp. 405-411 ◽

Cited By ~ 4

Author(s):

R. Krupiński

Keyword(s):

Sample Size ◽

Gaussian Distribution ◽

Shape Parameter ◽

Small Sample Size ◽

Estimation Method ◽

Small Sample ◽

Time Data ◽

Generalized Gaussian Distribution ◽

Root Finding ◽

Real Time Data Processing

Abstract Most estimators of the shape parameter of generalized Gaussian distribution (GGD) assume asymptotic case when there is available infinite number of observations, but in the real case, there is only available a set of limited size. The most popular estimator for the shape parameter, i.e., the maximum likelihood (ML) method, has a larger variance with a decreasing sample size. A very high value of variance for a very small sample size makes this estimation method very inaccurate. A new fast approximated method based on the standardized moment to overcome this limitation is introduced in the article. The relative mean square error (RMSE) was plotted for the range 0.3-3 of the shape parameter for comparison with other methods. The method does not require any root finding, any long look-up table or multi step approach, therefore it is suitable for real-time data processing

Download Full-text

Simple House Needs in Jember with Robust Small Area Estimation

Jurnal ILMU DASAR ◽

10.19184/jid.v18i1.3159 ◽

2017 ◽

Vol 18 (1) ◽

pp. 1

Author(s):

Frida Murtinasari ◽

Alfian Futuhul Hadi ◽

Dian Anggraeni

Keyword(s):

Small Area ◽

Robust Regression ◽

Small Sample Size ◽

Small Area Estimation ◽

Small Sample ◽

Estimation Methods ◽

Linear Unbiased Prediction ◽

Area Estimation ◽

Best Linear Unbiased ◽

M Estimation

SAE (Small Area Estimation) is often used by researchers, especially statisticians to estimate parameters of a subpopulation which has a small sample size. Empirical Best Linear Unbiased Prediction (EBLUP) is one of the indirect estimation methods in Small Area Estimation. The presence of outliers in the data can not guarantee that these methods yield precise predictions . Robust regression is one approach that is used in the model Small Area Estimation. Robust approach in estimating such a small area known as the Robust Small Area Estimation. Robust Small Area Estimation divided into several approaches. It calls Maximum Likelihood and M- Estimation. From the result, Robust Small Area Estimation with M-Estimation has the smallest RMSE than others. The value is 1473.7 (with outliers) and 1279.6 (without outlier). In addition the research also indicated that REBLUP with M-Estimation more robust to outliers. It causes the RMSE value with EBLUP has five times to be large with only one outlier are included in the data analysis. As for the REBLUP method is relatively more stable RMSE results.

Download Full-text

Autoregressive Prediction with Rolling Mechanism for Time Series Forecasting with Small Sample Size

Mathematical Problems in Engineering ◽

10.1155/2014/572173 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Zhihua Wang ◽

Yongbo Zhang ◽

Huimin Fu

Keyword(s):

Time Series ◽

Sample Size ◽

Small Sample Size ◽

Computational Effort ◽

Small Sample ◽

Grey Theory ◽

Data Set ◽

Rolling Mechanism ◽

Short Term Forecasting ◽

Prediction Approach

Reasonable prediction makes significant practical sense to stochastic and unstable time series analysis with small or limited sample size. Motivated by the rolling idea in grey theory and the practical relevance of very short-term forecasting or 1-step-ahead prediction, a novel autoregressive (AR) prediction approach with rolling mechanism is proposed. In the modeling procedure, a new developed AR equation, which can be used to model nonstationary time series, is constructed in each prediction step. Meanwhile, the data window, for the next step ahead forecasting, rolls on by adding the most recent derived prediction result while deleting the first value of the former used sample data set. This rolling mechanism is an efficient technique for its advantages of improved forecasting accuracy, applicability in the case of limited and unstable data situations, and requirement of little computational effort. The general performance, influence of sample size, nonlinearity dynamic mechanism, and significance of the observed trends, as well as innovation variance, are illustrated and verified with Monte Carlo simulations. The proposed methodology is then applied to several practical data sets, including multiple building settlement sequences and two economic series.

Download Full-text

Hand Hygiene in the Era of Big Data: We Can Now See What We Have Been Missing

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.1115 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s445-s446

Author(s):

Megan DiGiorgio ◽

Lori Moore ◽

Greg Robbins ◽

Albert Parker ◽

James Arbogast

Keyword(s):

Hand Hygiene ◽

Direct Observation ◽

Fixed Effects ◽

Small Sample Size ◽

Compliance Rate ◽

Final Analysis ◽

Small Sample ◽

Annual Number ◽

Data Set ◽

Cast Doubt

Background: Hand hygiene (HH) has long been a focus in the prevention of healthcare-associated infections. The limitations of direct observation, including small sample size (often 20–100 observations per month) and the Hawthorne effect, have cast doubt on the accuracy of reported compliance rates. As a result, hospitals are exploring the use of automated HH monitoring systems (AHHMS) to overcome the limitations of direct observation and to provide a more robust and realistic estimation of HH behaviors. Methods: Data analyzed in this study were captured utilizing a group-based AHHMS installed in a number of North American hospitals. Emergency departments, overflow units, and units with <1 year of data were excluded from the study. The final analysis included data from 58 inpatient units in 10 hospitals. Alcohol-based hand rub and soap dispenses HH events (HHEs) and room entries and exits (HH opportunities (HHOs) were used to calculate unit-level compliance rates. Statistical analysis was performed on the annual number of dispenses and opportunities using a mixed effects Poisson regression with random effects for facility, unit, and year, and fixed effects for unit type. Interactions were not included in the model based on interaction plots and significance tests. Poisson assumptions were verified with Pearson residual plots. Results: Over the study period, 222.7 million HHOs and 99 million HHEs were captured in the data set. There were an average of 18.7 beds per unit. The average number of HHOs per unit per day was 3,528, and the average number of HHEs per unit per day was 1,572. The overall median compliance rate was 35.2 (95% CI, 31.5%–39.3%). Unit-to-unit comparisons revealed some significant differences: compliance rates for medical-surgical units were 12.6% higher than for intensive care units (P < .0001). Conclusions: This is the largest HH data set ever reported. The results illustrate the magnitude of HHOs captured (3,528 per unit per day) by an AHHMS compared to that possible through direct observation. It has been previously suggested that direct observation samples between 0.5% to 1.7% of all HHOs. In healthcare, it is unprecedented for a patient safety activity that occurs as frequently as HH to not be accurately monitored and reported, especially with HH compliance as low as it is in this multiyear, multicenter study. Furthermore, hospitals relying on direct observation alone are likely insufficiently allocating and deploying valuable resources for improvement efforts based on the scant information obtained. AHHMSs have the potential to introduce a new era in HH improvement.Funding: GOJO Industries, Inc., provided support for this study.Disclosures: Lori D. Moore and James W. Arbogast report salary from GOJO.

Download Full-text