Assessing the Multimodality of a Multivariate Distribution Using Nonparametric Techniques

The objective of the present paper is to develop a truly functional Bayesian method specifically designed for time series microarray data. The method allows one to identify differentially expressed genes in a time-course microarray experiment, to rank them and to estimate their expression profiles. Each gene expression profile is modeled as an expansion over some orthonormal basis, where the coefficients and the number of basis functions are estimated from the data. The proposed procedure deals successfully with various technical difficulties that arise in typical microarray experiments such as a small number of observations, non-uniform sampling intervals and missing or replicated data. The procedure allows one to account for various types of errors and offers a good compromise between nonparametric techniques and techniques based on normality assumptions. In addition, all evaluations are performed using analytic expressions, so the entire procedure requires very small computational effort. The procedure is studied using both simulated and real data, and is compared with competitive recent approaches. Finally, the procedure is applied to a case study of a human breast cancer cell line stimulated with estrogen. We succeeded in finding new significant genes that were not marked in an earlier work on the same dataset.

Download Full-text

Testing Group Symmetry of a Multivariate Distribution

Symmetry ◽

10.3390/sym1020180 ◽

2009 ◽

Vol 1 (2) ◽

pp. 180-200 ◽

Cited By ~ 4

Author(s):

Lyudmila Sakhanenko

Keyword(s):

Multivariate Distribution ◽

Group Symmetry ◽

Testing Group

Download Full-text

Nonparametric Techniques in System Identification: The Time-Varying and Missing Data Cases

Encyclopedia of Systems and Control ◽

10.1007/978-1-4471-5102-9_100164-1 ◽

2019 ◽

pp. 1-8

Author(s):

R. Pintelon ◽

J. Lataire

Keyword(s):

Missing Data ◽

System Identification ◽

Time Varying ◽

Nonparametric Techniques

Download Full-text

TOWARDS AN EFFICIENT TRAFFIC CONGESTION PREDICTION METHOD BASED ON NEURAL NETWORKS AND BIG GPS DATA

IIUM Engineering Journal ◽

10.31436/iiumej.v20i1.997 ◽

2019 ◽

Vol 20 (1) ◽

pp. 108-118 ◽

Cited By ~ 3

Author(s):

Wiam Elleuch ◽

Ali Wali ◽

Adel M. Alimi

Keyword(s):

Traffic Congestion ◽

Prediction Method ◽

Traffic Information ◽

Gps Data ◽

Traffic Patterns ◽

Dynamic Changes ◽

Traffic Conditions ◽

Congestion Prediction ◽

Different Types ◽

Nonparametric Techniques

ABSTRACT: The prediction of accurate traffic information such as speed, travel time, and congestion state is a very important task in many Intelligent Transportations Systems (ITS) applications. However, the dynamic changes in traffic conditions make this task harder. In fact, the type of road, such as the freeways and the highways in urban regions, can influence the driving speeds and the congestion state of the corresponding road. In this paper, we present a NNs-based model to predict the congestion state in roads. Our model handles new inputs and distinguishes the dynamic traffic patterns in two different types of roads: highways and freeways. The model has been tested using a big GPS database gathered from vehicles circulating in Tunisia. The NNs-based model has shown their capabilities of detecting the nonlinearity of dynamic changes and different patterns of roads compared to other nonparametric techniques from the literature. ABSTRAK: Ramalan maklumat trafik yang tepat seperti kelajuan, masa perjalanan dan keadaan kesesakan adalah tugas yang sangat penting dalam banyak aplikasi Sistem Pengangkutan Pintar (ITS). Walau bagaimanapun, perubahan keadaan lalu lintas yang dinamik menjadikan tugas ini menjadi lebih sukar. Malah, jenis jalan raya, seperti jalan raya dan lebuh raya di kawasan bandar, boleh mempengaruhi kelajuan memandu dan keadaan kesesakan jalan yang sama. Dalam makalah ini, kami membentangkan model berasaskan NN untuk meramalkan keadaan kesesakan di jalan raya. Model kami mengendalikan input baru dan membezakan corak trafik dinamik dalam dua jenis jalan raya yang lebuh raya dan jalan raya. Model ini telah diuji menggunakan pangkalan data GPS yang besar yang dikumpulkan dari kenderaan yang beredar di Tunisia. Model berasaskan NNs telah menunjukkan keupayaan mereka untuk mengesan ketiadaan perubahan dinamik dan pola jalan yang berbeza berbanding dengan teknik nonparametrik yang lain dari kesusasteraan.

Download Full-text

Application of Copula functions in statistics

10.12681/eadd/32050 ◽

2007 ◽

Author(s):

Αριστείδης Νικολουλόπουλος

Keyword(s):

Discrete Data ◽

Distribution Functions ◽

Cumulative Distribution ◽

Multivariate Distribution ◽

Dependence Structure ◽

Copula Models ◽

Cumulative Distribution Functions ◽

Multivariate Discrete Distributions ◽

Dependence Properties ◽

Multivariate Copula

Studying associations among multivariate outcomes is an interesting problem in statistical science. The dependence between random variables is completely described by their multivariate distribution. When the multivariate distribution has a simple form, standard methods can be used to make inference. On the other hand one may create multivariate distributions based on particular assumptions, limiting thus their use. Unfortunately, these limitations occur very often when working with multivariate discrete distributions. Some multivariate discrete distributions used in practice can have only certain properties, as for example they allow only for positive dependence or they can have marginal distributions of a given form. To solve this problem copulas seem to be a promising solution. Copulas are a currently fashionable way to model multivariate data as they account for the dependence structure and provide a flexible representation of the multivariate distribution. Furthermore, for copulas the dependence properties can be separated from their marginal properties and multivariate models with marginal densities of arbitrary form can be constructed, allowing a wide range of possible association structures. In fact they allow for flexible dependence modelling, different from assuming simple linear correlation structures. However, in the application of copulas to discrete data marginal parameters affect dependence structure, too, and, hence the dependence properties are not fully separated from the marginal properties. Introducing covariates to describe the dependence by modelling the copula parameters is of special interest in this thesis. Thus, covariate information can describe the dependence either indirectly through the marginalparameters or directly through the parameters of the copula . We examine the case when the covariates are used both in marginal and/or copula parameters aiming at creating a highly flexible model producing very elegant dependence structures. Furthermore, the literature contains many theoretical results and families of copulas with several properties but there are few papers that compare the copula families and discuss model selection issues among candidate copula models rendering the question of which copulas are appropriate and whether we are able, from real data, to select the true copula that generated the data, among a series of candidates with, perhaps, very similar dependence properties. We examined a large set of candidate copula families taking intoaccount properties like concordance and tail dependence. The comparison is made theoretically using Kullback-Leibler distances between them. We have selected this distance because it has a nice relationship with log-likelihood and thus it can provide interesting insight on the likelihood based procedures used in practice. Furthermore a goodness of fit test based on Mahalanobisdistance, which is computed through parametric bootstrap, will be provided. Moreover we adopt a model averaging approach on copula modelling, based on the non-parametric bootstrap. Our intention is not to underestimate variability but add some additional variability induced by model selection making the precision of the estimate unconditional on the selected model. Moreover our estimates are synthesize from several different candidate copula models and thus they can have a flexible dependence structure. Taking under consideration the extended literature of copula for multivariate continuous data we concentrated our interest on fitting copulas on multivariate discrete data. The applications of multivariate copula models for discrete data are limited. Usually we have to trade off between models with limited dependence (e.g. only positive association) and models with flexible dependence but computational intractabilities. For example, the elliptical copulas provide a wide range of flexible dependence, but do not have closed form cumulative distribution functions. Thus one needs to evaluate the multivariate copula and, hence, a multivariate integral repeatedly for a large number of times. This can be time consuming but also, because of the numerical approach used to evaluate a multivariate integral, it may produce roundoff errors. On the other hand, multivariate Archimedean copulas, partially-symmetric m-variate copulas with m − 1 dependence parameters and copulas that are mixtures of max-infinitely divisible bivariate copulas have closed form cumulative distribution functions and thus computations are easy, but allow only positive dependence among the random variables. The bridge of the two above-mentioned problems might be the definition of a copula family which has simple form for its distribution function while allowing for negative dependence among the variables. We define such a multivariate copula family exploiting the use of finite mixture of simple uncorrelated normal distributions. Since the correlation vanishes, the cumulative distribution is simply the product of univariate normal cumulative distribution functions. The mixing operation introduces dependence. Hence we obtain a kind of flexible dependence, and allow for negative dependence.

Download Full-text

A Methodology for Appropriate Testing When Data is Heterogeneous Using EXCEL

Industrial and Systems Engineering Review ◽

10.37266/iser.2016v4i1.pp54-66 ◽

2016 ◽

Vol 4 (1) ◽

pp. 54-66

Author(s):

Nguyen Khanh ◽

Jimin Lee ◽

Susan Reiser ◽

Donna Parsons ◽

Sara Russell ◽

...

Keyword(s):

Operating System ◽

Programming Languages ◽

Operating Systems ◽

Degrees Of Freedom ◽

Heterogeneous Data ◽

Data Sets ◽

Social Scientists ◽

Unequal Variances ◽

Data Volume ◽

Nonparametric Techniques

A Methodology for Appropriate Testing When Data is Heterogeneous was originally published and copy written in the mid-1990s in Turbo Pascal and a 16-bit operating system. While working on an ergonomic dissertation (Yearout, 1987), the author determined that the perceptual lighting preference data was heterogeneous and not normal. Drs. Milliken and Johnson, the authors of Analysis of Messy Data Volume I: Designed Experiments (1989), advised that Satterthwaite’s Approximation with Bonferroni’s Adjustment to correct for pairwise error be used to analyze the heterogeneous data. This technique of applying linear combinations with adjusted degrees of freedom allowed the use of t-Table criteria to make group comparisons without using standard nonparametric techniques. Thus data with unequal variances and unequal sample sizes could be analyzed without losing valuable information. Variances to the 4th power were so large that they could not be reentered into basic calculators. The solution was to develop an original software package which was written in Turbo Pascal on a 7 ¼ inch disk 16-bit operating system. Current operating systems of 32 and 64 bits and more efficient programming languages have made the software obsolete and unusable. Using the old system could result either in many returns being incorrect or the system terminating. The purpose of this research was to develop a spreadsheet algorithm with multiple interactive EXCEL worksheets that will efficiently apply Satterthwaite’s Approximation with Bonferroni’s Adjustment to solve the messy data problem. To ensure that the pedagogy is accurate, the resulting package was successfully tested in the classroom with academically diverse students. A comparison between this technique and EXCEL’s Add-Ins Analysis ToolPak for a t-test Two-Sample Assuming Unequal Variances was conducted using several different data sets. The results of this comparison were that the EXCEL Add-Ins returned incorrect significant differences. Engineers, ergonomists, psychologists, and social scientists will find the developed program very useful. A major benefit is that spreadsheets will continue to be current regardless of evolving operating systems’ status.

Download Full-text