Solving High-Dimensional Problems in Statistical Modelling: A Comparative Study

Stamatis Choudalakis; Marilena Mitrouli; Athanasios Polychronou; Paraskevi Roupa

doi:10.3390/math9151806

Solving High-Dimensional Problems in Statistical Modelling: A Comparative Study

Mathematics ◽

10.3390/math9151806 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1806

Author(s):

Stamatis Choudalakis ◽

Marilena Mitrouli ◽

Athanasios Polychronou ◽

Paraskevi Roupa

Keyword(s):

Parameter Estimation ◽

Numerical Methods ◽

Comparative Study ◽

Real Data ◽

Statistical Modelling ◽

High Dimensional ◽

Minimum Norm ◽

Supersaturated Designs ◽

Extensive Analysis ◽

Selection For

In this work, we present numerical methods appropriate for parameter estimation in high-dimensional statistical modelling. The solution of these problems is not unique and a crucial question arises regarding the way that a solution can be found. A common choice is to keep the corresponding solution with the minimum norm. There are cases in which this solution is not adequate and regularisation techniques have to be considered. We classify specific cases for which regularisation is required or not. We present a thorough comparison among existing methods for both estimating the coefficients of the model which corresponds to design matrices with correlated covariates and for variable selection for supersaturated designs. An extensive analysis for the properties of design matrices with correlated covariates is given. Numerical results for simulated and real data are presented.

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models

Neural Computation ◽

10.1162/neco_a_00128 ◽

2011 ◽

Vol 23 (6) ◽

pp. 1605-1622 ◽

Cited By ~ 12

Author(s):

Lingyan Ruan ◽

Ming Yuan ◽

Hui Zou

Keyword(s):

Parameter Estimation ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Expectation Maximization Algorithm ◽

Real Data ◽

Gaussian Mixture ◽

High Dimensional ◽

Model Based Clustering ◽

Text Type ◽

Effective Dimensionality

Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficulty. The [Formula: see text]-type penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps to reduce the effective dimensionality of the problem. We show that the proposed estimate can be efficiently computed using an expectation-maximization algorithm. To illustrate the practical merits of the proposed method, we consider its applications in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool for high-dimensional data analysis.

Ensemble feature selection for high dimensional data: a new method and a comparative study

Advances in Data Analysis and Classification ◽

10.1007/s11634-017-0285-y ◽

2017 ◽

Vol 12 (4) ◽

pp. 937-952 ◽

Cited By ~ 21

Author(s):

Afef Ben Brahim ◽

Mohamed Limam

Keyword(s):

Feature Selection ◽

Comparative Study ◽

High Dimensional Data ◽

New Method ◽

High Dimensional ◽

Selection For

A COMPARATIVE STUDY OF NUMERICAL METHODS OF ELASTIC - PLASTIC ANALYSIS

10.2514/6.1967-1122 ◽

1967 ◽

Cited By ~ 1

Author(s):

PEDRO MARCAL

Keyword(s):

Numerical Methods ◽

Comparative Study ◽

Elastic Plastic ◽

Plastic Analysis

Approximate likelihood with proxy variables for parameter estimation in high-dimensional factor copula models

Statistical Papers ◽

10.1007/s00362-021-01252-1 ◽

2021 ◽

Author(s):

Pavel Krupskii ◽

Harry Joe

Keyword(s):

Parameter Estimation ◽

High Dimensional ◽

Copula Models ◽

Proxy Variables ◽

Approximate Likelihood

Data-driven Feature Selection for Long Longitudinal Breadth and High Dimensional Dataset

Proceedings of the 2020 12th International Conference on Machine Learning and Computing ◽

10.1145/3383972.3383992 ◽

2020 ◽

Author(s):

Ji-Han Liu ◽

Cheng-Tse Wu ◽

Ta-Wei Chu ◽

and Jyh-Shing Roger Jang

Keyword(s):

Feature Selection ◽

Data Driven ◽

High Dimensional ◽

Selection For

Cole equation and parameter estimation from electrical bioimpedance spectroscopy measurements - A comparative study

10.1109/iembs.2009.5334494 ◽

2009 ◽

Cited By ~ 23

Author(s):

D. Ayllon ◽

F. Seoane ◽

R. Gil-Pita

Keyword(s):

Parameter Estimation ◽

Comparative Study ◽

Bioimpedance Spectroscopy ◽

Electrical Bioimpedance

Effective customer selection for marketing campaigns based on net scores

Journal of Research in Interactive Marketing ◽

10.1108/jrim-10-2015-0080 ◽

2017 ◽

Vol 11 (1) ◽

pp. 2-15 ◽

Cited By ~ 7

Author(s):

René Michel ◽

Igor Schnakenburg ◽

Tobias von Martens

Keyword(s):

Decision Trees ◽

Direct Marketing ◽

Real Data ◽

Business Case ◽

Added Value ◽

Scoring Method ◽

Content Type ◽

Effective Selection ◽

Selection For ◽

Response Modeling

Purpose This paper aims to address the effective selection of customers for direct marketing campaigns. It introduces a new method to forecast campaign-related uplifts (also known as incremental response modeling or net scoring). By means of these uplifts, only the most responsive customers are targeted by a campaign. This paper also aims at calculating the financial impact of the new approach compared to the classical (gross) scoring methods. Design/methodology/approach First, gross and net scoring approaches to customer selection for direct marketing campaigns are compared. After that, it is shown how net scoring can be applied in practice with regard to different strategical objectives. Then, a new statistic for net scoring based on decision trees is developed. Finally, a business case based on real data from the financial sector is calculated to compare gross and net scoring approaches. Findings Whereas gross scoring focuses on customers with a high probability of purchase, regardless of being targeted by a campaign, net scoring identifies those customers who are most responsive to campaigns. A common scoring procedure – decision trees – can be enhanced by the new statistic to forecast those campaign-related uplifts. The business case shows that the selected scoring method has a relevant impact on economical indicators. Practical implications The contribution of net scoring to campaign effectiveness and efficiency is shown by the business case. Furthermore, this paper suggests a framework for customer selection, given strategical objectives, e.g. minimizing costs or maximizing (gross or lift)-added value, and presents a new statistic that can be applied to common scoring procedures. Originality/value Despite its lever on the effectiveness of marketing campaigns, only few contributions address net scores up to now. The new χ2-statistic is a straightforward approach to the enhancement of decision trees for net scoring. Furthermore, this paper is the first to the application of net scoring with regard to different strategical objectives.

Le fonti, i metodi e le narrazioni della storia della videoarte in Italia negli anni Settanta

Sciami | ricerche ◽

10.47109/0102220105 ◽

2017 ◽

Vol 2 (1) ◽

Cited By ~ 2

Author(s):

Lisa Parola

Keyword(s):

Comparative Study ◽

Contemporary Art ◽

Narrative Structure ◽

Video Art ◽

Wide Range ◽

History Of ◽

Selection For ◽

The Many ◽

Art And Architecture ◽

Selection Of

This essay derives from the primary need to make order between direct and indirect sources available for the reconstruction of the history of video art in Italy in the seventies. In fact, during the researches for the Ph.D. thesis it became clear that in most cases it is difficult to define, in terms of facts, which of the different historiographies should be taken into consideration to deepen the study of video art in Italy. Beyond legitimate differences of perspectives and methods, historiographical narratives all share similar issues and narrative structure. The first intention of the essay is, therefore, to compare the different historiographic narratives on Italian video art of the seventies, verifying their genealogy, the sources used and the accuracy of the narrated facts. For the selection of the corpus, it was decided to analyze in particular monographic volumes dealing with the history of the origins of video art in Italy. The aim was, in fact, to get a wide range of types of "narrations", as in the case of contemporary art and architecture magazines, which are examined in the second part of the essay. After the selection, for an analytical and comparative study of the various historiography, the essay focuses only on the Terza Biennale Internazionale della Giovane Pittura. Gennaio ’70. Comportamenti, oggetti e mediazioni (Third International Biennial of Young Painting. January '70. Behaviors, Objects and Mediations, 1970, Bologna), the exhibition which - after Lucio Fontana's pioneering experiments - is said to be the first sign of the arrival of videotape in Italy (called at the time videorecording), curated by Renato Barilli, Tommaso Trini, Andrea Emiliani and Maurizio Calvesi. The narration given so far of this exhibition appeared more mythological than historical and could be compared structurally to that of the many numerous beginnings that historiographyies on international video art identify as ‘first’ and ‘generative’. In the first part of the essay the 'facts' related to Gennaio ’70, as narrated by historiography on video art, are compared. In the second part the survey is carried out through some of the direct sources identified during the research, with the aim of answering to questions raised by the comparison between historiographies. Concluding, it is important to underline that the tapes containing the videos transmitted have not been found and seem to have disappeared since the ending of the exhibition. Nevertheless, the deepening of the works and documentation transmitted during the exhibition is possible thanks to other types of sources which give us many valuable information regarding video techniques and practices at the beginning of 1970 in Italy.

Detecting common breaks in the means of high dimensional cross-dependent panels

Econometrics Journal ◽

10.1093/ectj/utab028 ◽

2021 ◽

Author(s):

Lajos Horváth ◽

Zhenya Liu ◽

Gregory Rice ◽

Yuqian Zhao

Keyword(s):

Panel Data ◽

Common Factors ◽

Real Data ◽

Change Points ◽

High Dimensional ◽

Asymptotic Results ◽

Cross Sectional ◽

Data Set ◽

Monte Carlo Simulation Study ◽

Cross Sectional Dependence

Abstract The problem of detecting change points in the mean of high dimensional panel data with potentially strong cross–sectional dependence is considered. Under the assumption that the cross–sectional dependence is captured by an unknown number of common factors, a new CUSUM type statistic is proposed. We derive its asymptotic properties under three scenarios depending on to what extent the common factors are asymptotically dominant. With panel data consisting of N cross sectional time series of length T, the asymptotic results hold under the mild assumption that min {N, T} → ∞, with an otherwise arbitrary relationship between N and T, allowing the results to apply to most panel data examples. Bootstrap procedures are proposed to approximate the sampling distribution of the test statistics. A Monte Carlo simulation study showed that our test outperforms several other existing tests in finite samples in a number of cases, particularly when N is much larger than T. The practical application of the proposed results are demonstrated with real data applications to detecting and estimating change points in the high dimensional FRED-MD macroeconomic data set.

Multi-Instance Dimensionality Reduction via Sparsity and Orthogonality

Neural Computation ◽

10.1162/neco_a_01140 ◽

2018 ◽

Vol 30 (12) ◽

pp. 3281-3308

Author(s):

Hong Zhu ◽

Li-Zhi Liao ◽

Michael K. Ng

Keyword(s):

Dimensionality Reduction ◽

Optimization Problem ◽

Augmented Lagrangian ◽

Main Idea ◽

Real Data ◽

Learning Performance ◽

High Dimensional ◽

Data Sets ◽

Outer Loop ◽

Orthogonality Constraints

We study a multi-instance (MI) learning dimensionality-reduction algorithm through sparsity and orthogonality, which is especially useful for high-dimensional MI data sets. We develop a novel algorithm to handle both sparsity and orthogonality constraints that existing methods do not handle well simultaneously. Our main idea is to formulate an optimization problem where the sparse term appears in the objective function and the orthogonality term is formed as a constraint. The resulting optimization problem can be solved by using approximate augmented Lagrangian iterations as the outer loop and inertial proximal alternating linearized minimization (iPALM) iterations as the inner loop. The main advantage of this method is that both sparsity and orthogonality can be satisfied in the proposed algorithm. We show the global convergence of the proposed iterative algorithm. We also demonstrate that the proposed algorithm can achieve high sparsity and orthogonality requirements, which are very important for dimensionality reduction. Experimental results on both synthetic and real data sets show that the proposed algorithm can obtain learning performance comparable to that of other tested MI learning algorithms.