scholarly journals Sign, Wilcoxon and Mann-Whitney Tests for Functional Data: An Approach Based on Random Projections

Mathematics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 44
Author(s):  
Rafael Meléndez ◽  
Ramón Giraldo ◽  
Víctor Leiva

Sign, Wilcoxon and Mann-Whitney tests are nonparametric methods in one or two-sample problems. The nonparametric methods are alternatives used for testing hypothesis when the standard methods based on the Gaussianity assumption are not suitable to be applied. Recently, the functional data analysis (FDA) has gained relevance in statistical modeling. In FDA, each observation is a curve or function which usually is a realization of a stochastic process. In the literature of FDA, several methods have been proposed for testing hypothesis with samples coming from Gaussian processes. However, when this assumption is not realistic, it is necessary to utilize other approaches. Clustering and regression methods, among others, for non-Gaussian functional data have been proposed recently. In this paper, we propose extensions of the sign, Wilcoxon and Mann-Whitney tests to the functional data context as methods for testing hypothesis when we have one or two samples of non-Gaussian functional data. We use random projections to transform the functional problem into a scalar one, and then we proceed as in the standard case. Based on a simulation study, we show that the proposed tests have a good performance. We illustrate the methodology by applying it to a real data set.

2016 ◽  
Vol 5 (4) ◽  
pp. 1
Author(s):  
Bander Al-Zahrani

The paper gives a description of estimation for the reliability function of weighted Weibull distribution. The maximum likelihood estimators for the unknown parameters are obtained. Nonparametric methods such as empirical method, kernel density estimator and a modified shrinkage estimator are provided. The Markov chain Monte Carlo method is used to compute the Bayes estimators assuming gamma and Jeffrey priors. The performance of the maximum likelihood, nonparametric methods and Bayesian estimators is assessed through a real data set.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3074
Author(s):  
Cristian Preda ◽  
Quentin Grimonprez ◽  
Vincent Vandewalle

Categorical functional data represented by paths of a stochastic jump process with continuous time and a finite set of states are considered. As an extension of the multiple correspondence analysis to an infinite set of variables, optimal encodings of states over time are approximated using an arbitrary finite basis of functions. This allows dimension reduction, optimal representation, and visualisation of data in lower dimensional spaces. The methodology is implemented in the cfda R package and is illustrated using a real data set in the clustering framework.


Author(s):  
Sara Leulmi ◽  
Fatiha Messaci

We introduce a local linear nonparametric estimation for the generalized regression function of a scalar response variable given a random variable taking values in a semi metric space. We establish a rate of uniform consistency for the proposed estimators. Then, based on a real data set we illustrate the performance of a particular studied estimator with respect to other known estimators


2020 ◽  
Vol 45 (6) ◽  
pp. 719-749
Author(s):  
Eduardo Doval ◽  
Pedro Delicado

We propose new methods for identifying and classifying aberrant response patterns (ARPs) by means of functional data analysis. These methods take the person response function (PRF) of an individual and compare it with the pattern that would correspond to a generic individual of the same ability according to the item-person response surface. ARPs correspond to atypical difference functions. The ARP classification is done with functional data clustering applied to the PRFs identified as ARP. We apply these methods to two sets of simulated data (the first is used to illustrate the ARP identification methods and the second demonstrates classification of the response patterns flagged as ARP) and a real data set (a Grade 12 science assessment test, SAT, with 32 items answered by 600 examinees). For comparative purposes, ARPs are also identified with three nonparametric person-fit indices (Ht, Modified Caution Index, and ZU3). Our results indicate that the ARP detection ability of one of our proposed methods is comparable to that of person-fit indices. Moreover, the proposed classification methods enable ARP associated with either spuriously low or spuriously high scores to be distinguished.


2021 ◽  
Vol 9 (1) ◽  
pp. 62-81
Author(s):  
Kjersti Aas ◽  
Thomas Nagler ◽  
Martin Jullum ◽  
Anders Løland

Abstract In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.


Geophysics ◽  
1990 ◽  
Vol 55 (12) ◽  
pp. 1613-1624 ◽  
Author(s):  
C. deGroot‐Hedlin ◽  
S. Constable

Magnetotelluric (MT) data are inverted for smooth 2-D models using an extension of the existing 1-D algorithm, Occam’s inversion. Since an MT data set consists of a finite number of imprecise data, an infinity of solutions to the inverse problem exists. Fitting field or synthetic electromagnetic data as closely as possible results in theoretical models with a maximum amount of roughness, or structure. However, by relaxing the misfit criterion only a small amount, models which are maximally smooth may be generated. Smooth models are less likely to result in overinterpretation of the data and reflect the true resolving power of the MT method. The models are composed of a large number of rectangular prisms, each having a constant conductivity. [Formula: see text] information, in the form of boundary locations only or both boundary locations and conductivity, may be included, providing a powerful tool for improving the resolving power of the data. Joint inversion of TE and TM synthetic data generated from known models allows comparison of smooth models with the true structure. In most cases, smoothed versions of the true structure may be recovered in 12–16 iterations. However, resistive features with a size comparable to depth of burial are poorly resolved. Real MT data present problems of non‐Gaussian data errors, the breakdown of the two‐dimensionality assumption and the large number of data in broadband soundings; nevertheless, real data can be inverted using the algorithm.


2008 ◽  
Vol 20 (11) ◽  
pp. 2696-2714 ◽  
Author(s):  
Umberto Picchini ◽  
Susanne Ditlevsen ◽  
Andrea De Gaetano ◽  
Petr Lansky

Stochastic leaky integrate-and-fire (LIF) neuronal models are common theoretical tools for studying properties of real neuronal systems. Experimental data of frequently sampled membrane potential measurements between spikes show that the assumption of constant parameter values is not realistic and that some (random) fluctuations are occurring. In this article, we extend the stochastic LIF model, allowing a noise source determining slow fluctuations in the signal. This is achieved by adding a random variable to one of the parameters characterizing the neuronal input, considering each interspike interval (ISI) as an independent experimental unit with a different realization of this random variable. In this way, the variation of the neuronal input is split into fast (within-interval) and slow (between-intervals) components. A parameter estimation method is proposed, allowing the parameters to be estimated simultaneously over the entire data set. This increases the statistical power, and the average estimate over all ISIs will be improved in the sense of decreased variance of the estimator compared to previous approaches, where the estimation has been conducted separately on each individual ISI. The results obtained on real data show good agreement with classical regression methods.


Author(s):  
Jozef M. Zurada ◽  
Alan S. Levitan ◽  
Jian Guan

<p class="MsoNormal" style="text-align: justify; margin: 0in 0.5in 0pt;"><span style="font-size: 10pt;"><span style="font-family: Times New Roman;">Lack of precision is common in property value assessment. Recently non-conventional methods, such as neural networks based methods, have been introduced in property value assessment as an attempt to better address this lack of precision and uncertainty. Although fuzzy logic has been suggested as another possible solution, no other artificial intelligence methods have been applied to real estate value assessment other than neural network based methods. This paper presents the results of using two new non-conventional methods, fuzzy logic and memory-based reasoning, in evaluating residential property values for a real data set. The paper compares the results with those obtained using neural networks and multiple regression. Methods of feature reduction, such as principal component analysis and variable selection, have also been used for possible improvement of the final results.<span style="mso-spacerun: yes;">&nbsp; </span>The results indicate that no single one of the new methods is consistently superior for the given data set.</span></span></p>


2019 ◽  
Vol XVI (2) ◽  
pp. 1-11
Author(s):  
Farrukh Jamal ◽  
Hesham Mohammed Reyad ◽  
Soha Othman Ahmed ◽  
Muhammad Akbar Ali Shah ◽  
Emrah Altun

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


Sign in / Sign up

Export Citation Format

Share Document