scholarly journals Integrated conditional moment test and beyond: when the number of covariates is divergent

Biometrika ◽  
2021 ◽  
Author(s):  
Falong Tan ◽  
Lixing Zhu

Summary The classic integrated conditional moment test is a promising method for model checking and its basic idea has been applied to develop several variants. However, in diverging dimension scenarios, the integrated conditional moment test may break down and has completely different limiting properties from those in fixed dimension cases, and the related wild bootstrap approximation would also be invalid. To extend this classic test to diverging dimension settings, we propose a projected adaptive-to-model version of the integrated conditional moment test. We study the asymptotic properties of the new test under both the null and alternative hypotheses to examine its significance level maintenance and its sensitivity to the global and local alternatives that are distinct from the null at the rate n-1/2. The corresponding wild bootstrap approximation can still work for the new test in diverging dimension scenarios. We also derive the consistency and asymptotically linear representation of the least squares estimator of the parameter at the fastest rate of divergence in the literature for nonlinear models. The numerical studies show that the new test can greatly enhance the performance of the integrated conditional moment test in high-dimensional cases. We also apply the test to a real data set for illustration.

2005 ◽  
Vol 01 (01) ◽  
pp. 173-193
Author(s):  
HIROSHI MAMITSUKA

We consider the problem of mining from noisy unsupervised data sets. The data point we call noise is an outlier in the current context of data mining, and it has been generally defined as the one locates in low probability regions of an input space. The purpose of the approach for this problem is to detect outliers and to perform efficient mining from noisy unsupervised data. We propose a new iterative sampling approach for this problem, using both model-based clustering and the likelihood given to each example by a trained probabilistic model for finding data points of such low probability regions in an input space. Our method uses an arbitrary probabilistic model as a component model and repeats two steps of sampling non-outliers with high likelihoods (computed by previously obtained models) and training the model with the selected examples alternately. In our experiments, we focused on two-mode and co-occurrence data and empirically evaluated the effectiveness of our proposed method, comparing with two other methods, by using both synthetic and real data sets. From the experiments using the synthetic data sets, we found that the significance level of the performance advantage of our method over the two other methods had more pronounced for higher noise ratios, for both medium- and large-sized data sets. From the experiments using a real noisy data set of protein–protein interactions, a typical co-occurrence data set, we further confirmed the performance of our method for detecting outliers from a given data set. Extended abstracts of parts of the work presented in this paper have appeared in Refs. 1 and 2.


2014 ◽  
Vol 0 (0) ◽  
Author(s):  
Dipak D. Patil ◽  
Uttara V. Naik-Nimbalkar

AbstractIn this paper we propose a conditional distribution of the waiting time given the queue length. We assume that the number of customers in a queue, that is the queue length, follows a geometric distribution and the waiting time of a newly arrived customer in a queue follows an exponential distribution. We model the conditional distribution of the waiting time given the queue length using Gumbel's bivariate exponential copula. Parameters are estimated using the likelihood approach with a two stage estimation procedure. A simulation study indicates the performance of the estimators. Asymptotic properties of the estimators of the parameters are studied. We apply the model to a real data set to illustrate the dependence between the waiting time and the queue length.


Filomat ◽  
2007 ◽  
Vol 21 (2) ◽  
pp. 133-152 ◽  
Author(s):  
Vladica Stojanovic ◽  
Biljana Popovic

Famous models of conditional heteroscedasticity describe various effects of behavior of the financial markets. In this paper, we investigate the related model, called Split-ARCH, in some of its stochastic aspects, as the necessary and sufficient conditions of the strong stationarity and the estimation procedure. The basic asymptotic properties of those estimates are described, too. The most important segment of our work is dedicated to the practical issue of Split-ARCH model in analysis of the dynamics of the real data. We compared the Split-ARCH with standard models of ARCH type and showed that it was better stochastic model for the explanation of the world market prices of some precious metals. .


Symmetry ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 887
Author(s):  
Subin Cho ◽  
Kyeongjun Lee

In many situations of survival and reliability test, the withdrawal of units from the test is pre-planned in order to to free up testing facilities for other tests, or to save cost and time. It is known that several risk factors (RiFs) compete for the immediate failure cause of items. In this paper, we derive an inference for a competing risks model (CompRiM) with a generalized type II progressive hybrid censoring scheme (GeTy2PrHCS). We derive the conditional moment generating functions (CondMgfs), distributions and confidence interval (ConfI) of the scale parameters of exponential distribution (ExDist) under GeTy2PrHCS with CompRiM. A real data set is analysed to illustrate the validity of the method developed here. From the data, it can be seen that the conditional PDFs of MLEs is almost symmetrical.


2019 ◽  
Vol XVI (2) ◽  
pp. 1-11
Author(s):  
Farrukh Jamal ◽  
Hesham Mohammed Reyad ◽  
Soha Othman Ahmed ◽  
Muhammad Akbar Ali Shah ◽  
Emrah Altun

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


2019 ◽  
Vol 14 (2) ◽  
pp. 148-156
Author(s):  
Nighat Noureen ◽  
Sahar Fazal ◽  
Muhammad Abdul Qadir ◽  
Muhammad Tanvir Afzal

Background: Specific combinations of Histone Modifications (HMs) contributing towards histone code hypothesis lead to various biological functions. HMs combinations have been utilized by various studies to divide the genome into different regions. These study regions have been classified as chromatin states. Mostly Hidden Markov Model (HMM) based techniques have been utilized for this purpose. In case of chromatin studies, data from Next Generation Sequencing (NGS) platforms is being used. Chromatin states based on histone modification combinatorics are annotated by mapping them to functional regions of the genome. The number of states being predicted so far by the HMM tools have been justified biologically till now. Objective: The present study aimed at providing a computational scheme to identify the underlying hidden states in the data under consideration. </P><P> Methods: We proposed a computational scheme HCVS based on hierarchical clustering and visualization strategy in order to achieve the objective of study. Results: We tested our proposed scheme on a real data set of nine cell types comprising of nine chromatin marks. The approach successfully identified the state numbers for various possibilities. The results have been compared with one of the existing models as well which showed quite good correlation. Conclusion: The HCVS model not only helps in deciding the optimal state numbers for a particular data but it also justifies the results biologically thereby correlating the computational and biological aspects.


Genetics ◽  
1996 ◽  
Vol 143 (1) ◽  
pp. 589-602 ◽  
Author(s):  
Peter J E Goss ◽  
R C Lewontin

Abstract Regions of differing constraint, mutation rate or recombination along a sequence of DNA or amino acids lead to a nonuniform distribution of polymorphism within species or fixed differences between species. The power of five tests to reject the null hypothesis of a uniform distribution is studied for four classes of alternate hypothesis. The tests explored are the variance of interval lengths; a modified variance test, which includes covariance between neighboring intervals; the length of the longest interval; the length of the shortest third-order interval; and a composite test. Although there is no uniformly most powerful test over the range of alternate hypotheses tested, the variance and modified variance tests usually have the highest power. Therefore, we recommend that one of these two tests be used to test departure from uniformity in all circumstances. Tables of critical values for the variance and modified variance tests are given. The critical values depend both on the number of events and the number of positions in the sequence. A computer program is available on request that calculates both the critical values for a specified number of events and number of positions as well as the significance level of a given data set.


2021 ◽  
Vol 13 (9) ◽  
pp. 1703
Author(s):  
He Yan ◽  
Chao Chen ◽  
Guodong Jin ◽  
Jindong Zhang ◽  
Xudong Wang ◽  
...  

The traditional method of constant false-alarm rate detection is based on the assumption of an echo statistical model. The target recognition accuracy rate and the high false-alarm rate under the background of sea clutter and other interferences are very low. Therefore, computer vision technology is widely discussed to improve the detection performance. However, the majority of studies have focused on the synthetic aperture radar because of its high resolution. For the defense radar, the detection performance is not satisfactory because of its low resolution. To this end, we herein propose a novel target detection method for the coastal defense radar based on faster region-based convolutional neural network (Faster R-CNN). The main processing steps are as follows: (1) the Faster R-CNN is selected as the sea-surface target detector because of its high target detection accuracy; (2) a modified Faster R-CNN based on the characteristics of sparsity and small target size in the data set is employed; and (3) soft non-maximum suppression is exploited to eliminate the possible overlapped detection boxes. Furthermore, detailed comparative experiments based on a real data set of coastal defense radar are performed. The mean average precision of the proposed method is improved by 10.86% compared with that of the original Faster R-CNN.


2021 ◽  
Vol 1978 (1) ◽  
pp. 012047
Author(s):  
Xiaona Sheng ◽  
Yuqiu Ma ◽  
Jiabin Zhou ◽  
Jingjing Zhou

Sign in / Sign up

Export Citation Format

Share Document