Tabu Search for Variable Selection in Classification

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch292 ◽

2011 ◽

pp. 1909-1915

Author(s):

Silvia Casado Yusta ◽

Joaquín Pacheco Bonrostro

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Tabu Search ◽

Variable Selection ◽

Classification Problem ◽

Linear Functions ◽

New Method ◽

Linear Discriminant ◽

Regression Methods ◽

Input Variables

Variable selection plays an important role in classification. Before beginning the design of a classification method, when many variables are involved, only those variables that are really required should be selected. There can be many reasons for selecting only a subset of the variables instead of the whole set of candidate variables (Reunanen, 2003): (1) It is cheaper to measure only a reduced set of variables, (2) Prediction accuracy may be improved through the exclusion of redundant and irrelevant variables, (3) The predictor to be built is usually simpler and potentially faster when fewer input variables are used and (4) Knowing which variables are relevant can give insight into the nature of the prediction problem and allows a better understanding of the final classification model. The importance of variables selection before using classification methods is also pointed out in recent works such as Cai et al.(2007) and Rao and Lakshminarayanan (2007). The aim in the classification problem is to classify instances that are characterized by attributes or variables. Based on a set of examples (whose class is known) a set of rules is designed and generalised to classify the set of instances with the greatest precision possible. There are several methodologies for dealing with this problem: Classic Discriminant Analysis, Logistic Regression, Neural Networks, Decision Trees, Instance- Based Learning, etc. Linear Discriminant Analysis and Logistic Regression methods search for linear functions and then use them for classification purposes. They continue to be interesting methodologies. In this work an “ad hoc” new method for variable selection in classification, specifically in discriminant analysis and logistic regression, is analysed. This new method is based on the metaheuristic strategy tabu search and yields better results than the classic methods (stepwise, backward and forward) used by statistical packages such as SPSS or BMDP, as it’s shown below. This method is performed for 2 classes.

Download Full-text

Appropriateness of Linear Discriminant and Multinomial Classification Analysis in Marketing Research

Journal of Marketing Research ◽

10.1177/002224377801500112 ◽

1978 ◽

Vol 15 (1) ◽

pp. 103-112 ◽

Cited By ~ 14

Author(s):

William R. Dillon ◽

Matthew Goldstein ◽

Leon G. Schiffman

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Marketing Research ◽

Classification Problem ◽

Relative Performance ◽

New Method ◽

Classification Methods ◽

Classification Analysis ◽

Linear Discriminant ◽

Usage Behavior

Buyer usage behavior data are used to compare the relative performance of a linear discriminant analysis and several multinomial classification methods. The potential shortcomings of each of the procedures investigated are cited, and a new method for determining the contribution of a variable to discrimination in the context of the multinomial classification problem also is presented.

Download Full-text

Application of sparse linear discriminant analysis for metabolomics data

Analytical Methods ◽

10.1039/c4ay01715c ◽

2014 ◽

Vol 6 (22) ◽

pp. 9037-9044 ◽

Cited By ~ 7

Author(s):

Meilan Ouyang ◽

Zhimin Zhang ◽

Chen Chen ◽

Xinbo Liu ◽

Yizeng Liang

Keyword(s):

Discriminant Analysis ◽

Variable Selection ◽

Linear Discriminant Analysis ◽

New Method ◽

Metabolomics Data ◽

Linear Discriminant

A new method performs classification and variable selection simultaneously to analyze complicated metabolomics datasets.

Download Full-text

On the binary classification problem in discriminant analysis using linear programming methods

Operations Research and Decisions ◽

10.37190/ord200107 ◽

2020 ◽

Vol 30 (1) ◽

Author(s):

Michael O. Olusola ◽

Sydney I. Onyeagu

Keyword(s):

Linear Programming ◽

Discriminant Analysis ◽

Binary Classification ◽

Classification Problem ◽

Solution Technique ◽

Phase Method ◽

Bound Constraints ◽

Two Phase ◽

Linear Discriminant ◽

Binary Classification Problem

This paper is centred on a binary classification problem in which it is desired to assign a new object with multivariate features to one of two distinct populations as based on historical sets of samples from two populations. A linear discriminant analysis framework has been proposed, called the minimised sum of deviations by proportion (MSDP) to model the binary classification problem. In the MSDP formulation, the sum of the proportion of exterior deviations is minimised subject to the group separation constraints, the normalisation constraint, the upper bound constraints on proportions of exterior deviations and the sign unrestriction vis-à-vis the non-negativity constraints. The two-phase method in linear programming is adopted as a solution technique to generate the discriminant function. The decision rule on group-membership prediction is constructed using the apparent error rate. The performance of the MSDP has been compared with some existing linear discriminant models using a previously published dataset on road casualties. The MSDP model was more promising and well suited for the imbalanced dataset on road casualties.

Download Full-text

A new Method for Fault‐Scarp Detection Using Linear Discriminant Analysis in High‐Resolution Bathymetry Data From the Alarcón Rise and Pescadero Basin

Tectonics ◽

10.1029/2021tc006925 ◽

2021 ◽

Author(s):

L.A. Vega‐Ramírez ◽

R.M. Spelz ◽

R. Negrete‐Aranda ◽

F. Neumann ◽

D.W. Caress ◽

...

Keyword(s):

Discriminant Analysis ◽

High Resolution ◽

Linear Discriminant Analysis ◽

Fault Scarp ◽

New Method ◽

Linear Discriminant ◽

High Resolution Bathymetry

Download Full-text

An adapted linear discriminant analysis with variable selection for the classification in high-dimension, and an application to medical data

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2020.107031 ◽

2020 ◽

Vol 152 ◽

pp. 107031 ◽

Cited By ~ 1

Author(s):

Khuyen T. Le ◽

Caroline Chaux ◽

Frédéric J.P. Richard ◽

Eric Guedj

Keyword(s):

Discriminant Analysis ◽

Variable Selection ◽

Linear Discriminant Analysis ◽

High Dimension ◽

Medical Data ◽

Linear Discriminant ◽

Selection For

Download Full-text

Optimizing Location of Car-Sharing Stations Based on Potential Travel Demand and Present Operation Characteristics: The Case of Chengdu

Journal of Advanced Transportation ◽

10.1155/2019/7546303 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 6

Author(s):

Yu Cheng ◽

Xu Chen ◽

Xiaohua Ding ◽

Linting Zeng

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Travel Demand ◽

Learning Algorithm ◽

Population Data ◽

Research Area ◽

Optimal Location ◽

Influential Factors ◽

Car Sharing ◽

Linear Discriminant

Car-sharing is becoming an increasingly popular travel mode in China and many companies invest plenty of money on that including vehicle enterprises and Internet companies. But most of them build car-sharing stations by their experience or randomly as long as there is parking space in the early development of their business. This results in many stations with low operational efficiency and causes capital loss. This study aims to use different data source with statistical models and machine learning algorithm to help car-sharing operator to choose the optimal location of new stations and adjust the location of existing stations. We select Chengdu where there are huge amounts of car-sharing travel demand and several large car-sharing operators as the research area and two main operators as the research objects. Chengdu is divided into 58724 squared grids each of which is 0.5km⁎0.5km instead of focusing on the buffers generated by stations. We try to find a model to estimate a potential travel demand value for each small grid with three data sources: order data, population data, and Point of Interest (POI) data. This problem is transformed into a binary form and five different methods, Logistic Regression, Logistic Regression with LASSO, Naive Bayes, Linear Discriminant Analysis, and Quadratic Discriminant Analysis, are implemented. The optimal model, Logistic Regression with LASSO, is chosen to estimate the probability of existence of demand in all grids. With car-sharing order data from different operators, an existing order heat value is also computed for each grid. Then we analyze and classify all the grids into four groups. For different groups of grids, we give different suggestions on the optimal location of stations. This study focuses on a more competitive market and finds the influential factors on order number. Suggestions on the optimal location of stations are given in consideration of competitors. We hope that our research can help operators improve their business and make rational plans.

Download Full-text

An Operational Rapid Intensification Prediction Aid for the Western North Pacific

Weather and Forecasting ◽

10.1175/waf-d-18-0012.1 ◽

2018 ◽

Vol 33 (3) ◽

pp. 799-811 ◽

Cited By ~ 11

Author(s):

John A. Knaff ◽

Charles R. Sampson ◽

Kate D. Musgrave

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Tropical Cyclone ◽

North Pacific ◽

Western North Pacific ◽

Rapid Intensification ◽

Linear Discriminant ◽

Probabilistic Forecasts ◽

Weighted Probability ◽

Joint Typhoon Warning Center

Abstract This work describes tropical cyclone rapid intensification forecast aids designed for the western North Pacific tropical cyclone basin and for use at the Joint Typhoon Warning Center. Two statistical methods, linear discriminant analysis and logistic regression, are used to create probabilistic forecasts for seven intensification thresholds including 25-, 30-, 35-, and 40-kt changes in 24 h, 45- and 55-kt in 36 h, and 70-kt in 48 h (1 kt = 0.514 m s−1). These forecast probabilities are further used to create an equally weighted probability consensus that is then used to trigger deterministic forecasts equal to the intensification thresholds once the probability in the consensus reaches 40%. These deterministic forecasts are incorporated into an operational intensity consensus forecast as additional members, resulting in an improved intensity consensus for these important and difficult to predict cases. Development of these methods is based on the 2000–15 typhoon seasons, and independent performance is assessed using the 2016 and 2017 typhoon seasons. In many cases, the probabilities have skill relative to climatology and adding the rapid intensification deterministic aids to the operational intensity consensus significantly reduces the negative forecast biases.

Download Full-text

Variable selection for Fisher linear discriminant analysis using the modified sequential backward selection algorithm for the microarray data

Applied Mathematics and Computation ◽

10.1016/j.amc.2014.03.141 ◽

2014 ◽

Vol 238 ◽

pp. 132-140 ◽

Cited By ~ 4

Author(s):

Hong-Yi Peng ◽

Chun-Fu Jiang ◽

Xiang Fang ◽

Jin-Shan Liu

Keyword(s):

Discriminant Analysis ◽

Variable Selection ◽

Linear Discriminant Analysis ◽

Microarray Data ◽

Selection Algorithm ◽

Fisher Linear Discriminant ◽

Linear Discriminant ◽

Fisher Linear Discriminant Analysis ◽

Selection For

Download Full-text

Proposing a New Method Based on Linear Discriminant Analysis to Build a Robust Classifier

Journal of Bioinformatics and Intelligent Control ◽

10.1166/jbic.2014.1086 ◽

2014 ◽

Vol 3 (3) ◽

pp. 186-193

Author(s):

Mohamad Iman Jamnejad ◽

Hamid Parvin ◽

Hamid Alinejad-Rokny ◽

Ali Heidarzadegan

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

New Method ◽

Linear Discriminant

Download Full-text

Methods of Evaluation for Region’s Landslide Susceptibility. Short Overview

Safety in Technosphere ◽

10.12737/22192 ◽

2017 ◽

Vol 6 (3) ◽

pp. 57-60

Author(s):

Денис Кривогуз ◽

Denis Krivoguz

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Artificial Neural Networks ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Landslide Susceptibility ◽

Susceptibility Assessment ◽

Linear Discriminant ◽

Landslide Susceptibility Assessment ◽

Artificial Neural

Modern approaches to the region’s landslide susceptibility assessment are considered in this paper. Have been presented descriptions of the most used techniques for landslide susceptibility assessment: logistic regression, indicator validity, linear discriminant analysis and application of artificial neural networks. These techniques’ advantages and disadvantages are discussed in the paper. The most suitable techniques for various conditions of analysis have been marked. It has been concluded that the most acceptable techniques of analysis for a large number of input data related to the studied region are the method of logistic regression and indicator validity method. With these methods the most accurate results are achieved. When there is a lack of information, it is more expedient to use linear discriminant analysis and artificial neural networks that will minimize potential analysis inaccuracies.

Download Full-text