Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data

AbstractRandom Projection is one of the most popular and successful dimensionality reduction algorithms for large volumes of data. However, given its stochastic nature, different initializations of the projection matrix can lead to very different levels of performance. This paper presents a guided random search algorithm to mitigate this problem. The proposed method uses a small number of training data samples to iteratively adjust a projection matrix, improving its performance on similarly distributed data. Experimental results show that projection matrices generated with the proposed method result in a better preservation of distances between data samples. Conveniently, this is achieved while preserving the database-friendliness of the projection matrix, as it remains sparse and comprised exclusively of integers after being tuned with our algorithm. Moreover, running the proposed algorithm on a consumer-grade CPU requires only a few seconds.

Download Full-text

Data-Adaptive Metric Learning with Scale Alignment

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013347 ◽

2019 ◽

Vol 33 ◽

pp. 3347-3354 ◽

Cited By ~ 1

Author(s):

Shuo Chen ◽

Chen Gong ◽

Jian Yang ◽

Ying Tai ◽

Le Hui ◽

...

Keyword(s):

Metric Learning ◽

Projection Matrix ◽

Training Data ◽

Data Pair ◽

Data Points ◽

Local Patterns ◽

Data Adaptive ◽

Thresholding Algorithm ◽

Projection Matrices ◽

Adaptive Metric

The central problem for most existing metric learning methods is to find a suitable projection matrix on the differences of all pairs of data points. However, a single unified projection matrix can hardly characterize all data similarities accurately as the practical data are usually very complicated, and simply adopting one global projection matrix might ignore important local patterns hidden in the dataset. To address this issue, this paper proposes a novel method dubbed “Data-Adaptive Metric Learning” (DAML), which constructs a data-adaptive projection matrix for each data pair by selectively combining a set of learned candidate matrices. As a result, every data pair can obtain a specific projection matrix, enabling the proposed DAML to flexibly fit the training data and produce discriminative projection results. The model of DAML is formulated as an optimization problem which jointly learns candidate projection matrices and their sparse combination for every data pair. Nevertheless, the over-fitting problem may occur due to the large amount of parameters to be learned. To tackle this issue, we adopt the Total Variation (TV) regularizer to align the scales of data embedding produced by all candidate projection matrices, and thus the generated metrics of these learned candidates are generally comparable. Furthermore, we extend the basic linear DAML model to the kernerlized version (denoted “KDAML”) to handle the non-linear cases, and the Iterative Shrinkage-Thresholding Algorithm (ISTA) is employed to solve the optimization model. Intensive experimental results on various applications including retrieval, classification, and verification clearly demonstrate the superiority of our algorithm to other state-of-the-art metric learning methodologies.

Download Full-text

GMBO: Group Mean-Based Optimizer for Solving Various Optimization Problems

Mathematics ◽

10.3390/math9111190 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1190

Author(s):

Mohammad Dehghani ◽

Zeinab Montazeri ◽

Štěpán Hubálovský

Keyword(s):

Optimization Algorithm ◽

Optimization Problems ◽

Search Algorithm ◽

Random Search ◽

Optimization Algorithms ◽

Gravitational Search Algorithm ◽

Main Idea ◽

Population Based ◽

Grey Wolf Optimizer ◽

Whale Optimization

There are many optimization problems in the different disciplines of science that must be solved using the appropriate method. Population-based optimization algorithms are one of the most efficient ways to solve various optimization problems. Population-based optimization algorithms are able to provide appropriate solutions to optimization problems based on a random search of the problem-solving space without the need for gradient and derivative information. In this paper, a new optimization algorithm called the Group Mean-Based Optimizer (GMBO) is presented; it can be applied to solve optimization problems in various fields of science. The main idea in designing the GMBO is to use more effectively the information of different members of the algorithm population based on two selected groups, with the titles of the good group and the bad group. Two new composite members are obtained by averaging each of these groups, which are used to update the population members. The various stages of the GMBO are described and mathematically modeled with the aim of being used to solve optimization problems. The performance of the GMBO in providing a suitable quasi-optimal solution on a set of 23 standard objective functions of different types of unimodal, high-dimensional multimodal, and fixed-dimensional multimodal is evaluated. In addition, the optimization results obtained from the proposed GMBO were compared with eight other widely used optimization algorithms, including the Marine Predators Algorithm (MPA), the Tunicate Swarm Algorithm (TSA), the Whale Optimization Algorithm (WOA), the Grey Wolf Optimizer (GWO), Teaching–Learning-Based Optimization (TLBO), the Gravitational Search Algorithm (GSA), Particle Swarm Optimization (PSO), and the Genetic Algorithm (GA). The optimization results indicated the acceptable performance of the proposed GMBO, and, based on the analysis and comparison of the results, it was determined that the GMBO is superior and much more competitive than the other eight algorithms.

Download Full-text

Sensitivity-informed Bayesian Inference for Home PLC Network Models with Unknown Parameters

Energies ◽

10.3390/en14092402 ◽

2021 ◽

Vol 14 (9) ◽

pp. 2402

Author(s):

David S. Ching ◽

Cosmin Safta ◽

Thomas A. Reichardt

Keyword(s):

Bayesian Inference ◽

Transfer Function ◽

Network Topology ◽

Random Search ◽

Network Models ◽

Training Data ◽

Unknown Parameters ◽

Network Parameter ◽

Dimensional Parameter ◽

Discrete Random Variables

Bayesian inference is used to calibrate a bottom-up home PLC network model with unknown loads and wires at frequencies up to 30 MHz. A network topology with over 50 parameters is calibrated using global sensitivity analysis and transitional Markov Chain Monte Carlo (TMCMC). The sensitivity-informed Bayesian inference computes Sobol indices for each network parameter and applies TMCMC to calibrate the most sensitive parameters for a given network topology. A greedy random search with TMCMC is used to refine the discrete random variables of the network. This results in a model that can accurately compute the transfer function despite noisy training data and a high dimensional parameter space. The model is able to infer some parameters of the network used to produce the training data, and accurately computes the transfer function under extrapolative scenarios.

Download Full-text

The controlled random search algorithm in optimizing regression models

Computational Statistics & Data Analysis ◽

10.1016/0167-9473(95)90127-2 ◽

1995 ◽

Vol 20 (2) ◽

pp. 229-234 ◽

Cited By ~ 15

Author(s):

Ivan Křivý ◽

Josef Tvrdík

Keyword(s):

Regression Models ◽

Search Algorithm ◽

Random Search ◽

Controlled Random Search ◽

Random Search Algorithm

Download Full-text

A Hybrid GSA-K-Mean Classifier Algorithm to Predict Diabetes Mellitus

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2017100106 ◽

2017 ◽

Vol 8 (4) ◽

pp. 99-112 ◽

Cited By ~ 7

Author(s):

Rojalina Priyadarshini ◽

Rabindra Kumar Barik ◽

Nilamadhab Dash ◽

Brojo Kishore Mishra ◽

Rachita Misra

Keyword(s):

Search Algorithm ◽

Gravitational Search Algorithm ◽

Cluster Head ◽

Training Data ◽

Learning Classifier ◽

Initial Cluster ◽

Positive Class ◽

Initial Placement ◽

Negative Class ◽

Inherent Problem

Lots of research has been carried out globally to design a machine classifier which could predict it from some physical and bio-medical parameters. In this work a hybrid machine learning classifier has been proposed to design an artificial predictor to correctly classify diabetic and non-diabetic people. The classifier is an amalgamation of the widely used K-means algorithm and Gravitational search algorithm (GSA). GSA has been used as an optimization tool which will compute the best centroids from the two classes of training data; the positive class (who are diabetic) and negative class (who are non-diabetic). In K-means algorithm instead of using random samples as initial cluster head, the optimized centroids from GSA are used as the cluster centers. The inherent problem associated with k-means algorithm is the initial placement of cluster centers, which may cause convergence delay thereby degrading the overall performance. This problem is tried to overcome by using a combined GSA and K-means.

Download Full-text

A Controlled Random Search Algorithm with Local Newton-type Search for Global Optimization

Applied Optimization - High Performance Algorithms and Software in Nonlinear Optimization ◽

10.1007/978-1-4613-3279-4_10 ◽

1998 ◽

pp. 143-159

Author(s):

Gianni Di Pillo ◽

Stefano Lucidi ◽

Laura Palagi ◽

Massimo Roma

Keyword(s):

Global Optimization ◽

Search Algorithm ◽

Random Search ◽

Controlled Random Search ◽

Random Search Algorithm

Download Full-text

Heterogeneous Simulated Annealing Teams: An Optimizing Search Algorithm Inspired by Engineering Design Teams

10.31224/osf.io/8k2cr ◽

2018 ◽

Author(s):

Christopher McComb ◽

Jonathan Cagan ◽

Kenneth Kotovsky

Keyword(s):

Simulated Annealing ◽

Engineering Design ◽

Simulated Annealing Algorithm ◽

Search Algorithm ◽

Random Search ◽

Design Teams ◽

Design Cognition ◽

Gradient Based ◽

Multi Agent ◽

Annealing Algorithms

Although insights uncovered by design cognition are often utilized to develop the methods used by human designers, using such insights to inform computational methodologies also has the potential to improve the performance of design algorithms. This paper uses insights from research on design cognition and design teams to inform a better simulated annealing search algorithm. Simulated annealing has already been established as a model of individual problem solving. This paper introduces the Heterogeneous Simulated Annealing Team (HSAT) algorithm, a multi-agent simulated annealing algorithm. Each agent controls an adaptive annealing schedule, allowing the team develop heterogeneous search strategies. Such diversity is a natural part of engineering design, and boosts performance in other multi-agent algorithms. Further, interaction between agents in HSAT is structured to mimic interaction between members of a design team. Performance is compared to several other simulated annealing algorithms, a random search algorithm, and a gradient-based algorithm. Compared to other algorithms, the team-based HSAT algorithm returns better average results with lower variance.

Download Full-text

Constructing Synthetic Samples

Journal of Official Statistics ◽

10.1515/jos-2016-0005 ◽

2016 ◽

Vol 32 (1) ◽

pp. 113-127

Author(s):

Hua Dong ◽

Glen Meeden

Keyword(s):

Optimization Problem ◽

Search Algorithm ◽

Random Search ◽

The United States ◽

Simulation Studies ◽

Synthetic Sample ◽

Population Means ◽

Adaptive Random Search ◽

Random Search Algorithm ◽

Two Populations

Abstract We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.

Download Full-text