Optimal 1-NN prototypes for pathological geometries

Using prototype methods to reduce the size of training datasets can drastically reduce the computational cost of classification with instance-based learning algorithms like the k-Nearest Neighbour classifier. The number and distribution of prototypes required for the classifier to match its original performance is intimately related to the geometry of the training data. As a result, it is often difficult to find the optimal prototypes for a given dataset, and heuristic algorithms are used instead. However, we consider a particularly challenging setting where commonly used heuristic algorithms fail to find suitable prototypes and show that the optimal number of prototypes can instead be found analytically. We also propose an algorithm for finding nearly-optimal prototypes in this setting, and use it to empirically validate the theoretical results. Finally, we show that a parametric prototype generation method that normally cannot solve this pathological setting can actually find optimal prototypes when combined with the results of our theoretical analysis.

Download Full-text

ISOGEOMETRIC COLLOCATION METHODS

Mathematical Models and Methods in Applied Sciences ◽

10.1142/s0218202510004878 ◽

2010 ◽

Vol 20 (11) ◽

pp. 2075-2107 ◽

Cited By ~ 213

Author(s):

F. AURICCHIO ◽

L. BEIRÃO DA VEIGA ◽

T. J. R. HUGHES ◽

A. REALI ◽

G. SANGALLI

Keyword(s):

Theoretical Analysis ◽

Isogeometric Analysis ◽

Computational Cost ◽

Basis Functions ◽

Three Dimensions ◽

Collocation Methods ◽

One Dimensional ◽

Numerical Tests ◽

Low Computational Cost ◽

Theoretical Results

We initiate the study of collocation methods for NURBS-based isogeometric analysis. The idea is to connect the superior accuracy and smoothness of NURBS basis functions with the low computational cost of collocation. We develop a one-dimensional theoretical analysis, and perform numerical tests in one, two and three dimensions. The numerical results obtained confirm theoretical results and illustrate the potential of the methodology.

Download Full-text

Dimentionality reduction based on binary cooperative particle swarm optimization

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i3.pp1382-1391 ◽

2019 ◽

Vol 15 (3) ◽

pp. 1382

Author(s):

Sharifah Sakinah Syed Ahmad ◽

Ezzatul Farhain Azmi ◽

Fauziah Kasmin ◽

Zuraini Othman

Keyword(s):

Real World ◽

Research Work ◽

Particle Swarm ◽

Training Data ◽

Nearest Neighbour ◽

Classification Rate ◽

Unseen Data ◽

Real World Datasets ◽

Nearest Neighbour Classifier

Even though there are numerous classifiers algorithms that are more complex, k-Nearest Neighbour (k-NN) is regarded as one amongst the most successful approaches to solve real-world issues. The classification process’s effectiveness relies on the training set’s data. However, when k-NN classifier is applied to a real world, various issues could arise; for instance, they are considered to be computationally expensive as the complete training set needs to be stored in the computer for classification of the unseen data. Also, intolerance of k-NN classifier towards irrelevant features can be seen. Conversely, imbalance in the training data could occur wherein considerably larger numbers of data could be seen with some classes versus other classes. Thus, selected training data are employed to improve the effectiveness of k-NN classifier when dealing with large datasets. In this research work, a substitute method is present to enhance data selection by simultaneously clubbing the feature selection as well as instances selection pertaining to k-NN classifier by employing Cooperative Binary Particle Swarm Optimisation (CBPSO). This method can also address the constraint of employing the k-nearest neighbour classifier, particularly when handling high dimensional and imbalance data. A comparison study was performed to demonstrate the performance of our approach by employing 20 real world datasets taken from the UCI Machine Learning Repository. The corresponding table of the classification rate demonstrates the algorithm’s performance. The experimental outcomes exhibit the efficacy of our proposed approach.

Download Full-text

DIAGNOSE EFFECTIVE EVOLUTIONARY PROTOTYPE SELECTION USING AN OVERLAPPING MEASURE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001409007727 ◽

2009 ◽

Vol 23 (08) ◽

pp. 1527-1548 ◽

Cited By ~ 18

Author(s):

SALVADOR GARCÍA ◽

JOSÉ-RAMÓN CANO ◽

ESTER BERNADÓ-MANSILLA ◽

FRANCISCO HERRERA

Keyword(s):

Computational Cost ◽

Complexity Measure ◽

Selection Strategy ◽

Nearest Neighbour ◽

Prototype Selection ◽

Classification Problems ◽

The Past ◽

Nearest Neighbour Classifier ◽

Selection Algorithms

Evolutionary prototype selection has shown its effectiveness in the past in the prototype selection domain. It improves in most of the cases the results offered by classical prototype selection algorithms but its computational cost is expensive. In this paper, we analyze the behavior of the evolutionary prototype selection strategy, considering a complexity measure for classification problems based on overlapping. In addition, we have analyzed different k values for the nearest neighbour classifier in this domain of study to see its influence on the results of PS methods. The objective consists of predicting when the evolutionary prototype selection is effective for a particular problem, based on this overlapping measure.

Download Full-text

Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecule Neural Network

10.26434/chemrxiv.7151435.v2 ◽

2018 ◽

Author(s):

Roman Zubatyuk ◽

Justin S. Smith ◽

Jerzy Leszczynski ◽

Olexandr Isayev

Keyword(s):

Neural Network ◽

Molecular System ◽

Computational Cost ◽

Chemical Properties ◽

The State ◽

Molecular Properties ◽

Training Data ◽

Dft Methods ◽

Benchmark Datasets ◽

Quantum Phenomena

<p>Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.</p>

Download Full-text

Optimising the AR Engraved Structure on Light-Guide Facets for a Wide Range of Wavelengths

Optics ◽

10.3390/opt2010002 ◽

2020 ◽

Vol 2 (1) ◽

pp. 25-42

Author(s):

Ioseph Gurwich ◽

Yakov Greenberg ◽

Kobi Harush ◽

Yarden Tzabari

Keyword(s):

Theoretical Analysis ◽

Light Guide ◽

Spectral Band ◽

Angular Range ◽

The Past ◽

Input And Output ◽

Wide Range ◽

The Given ◽

Theoretical Results ◽

Shape And Size

The present study is aimed at designing anti-reflective (AR) engraving on the input–output surfaces of a rectangular light-guide. We estimate AR efficiency, by the transmittance level in the angular range, determined by the light-guide. Using nano-engraving, we achieve a uniform high transmission over a wide range of wavelengths. In the past, we used smoothed conical pins or indentations on the faces of light-guide crystal as the engraved structure. Here, we widen the class of pins under consideration, following the physical model developed in the previous paper. We analyze the smoothed pyramidal pins with different base shapes. The possible effect of randomization of the pins parameters is also examined. The results obtained demonstrate optimized engraved structure with parameters depending on the required spectral range and facet format. The predicted level of transmittance is close to 99%, and its flatness (estimated by the standard deviation) in the required wavelengths range is 0.2%. The theoretical analysis and numerical calculations indicate that the obtained results demonstrate the best transmission (reflection) we can expect for a facet with the given shape and size for the required spectral band. The approach is equally useful for any other form and of the facet. We also discuss a simple way of comparing experimental and theoretical results for a light-guide with the designed input and output features. In this study, as well as in our previous work, we restrict ourselves to rectangular facets. We also consider the limitations on maximal transmission produced by the size and shape of the light-guide facets. The theoretical analysis is performed for an infinite structure and serves as an upper bound on the transmittance for smaller-size apertures.

Download Full-text

Maximum Variance Hashing via Column Generation

Mathematical Problems in Engineering ◽

10.1155/2013/379718 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10

Author(s):

Lei Luo ◽

Chao Zhang ◽

Yongrui Qin ◽

Chunyuan Zhang

Keyword(s):

Column Generation ◽

Large Scale ◽

Web Search ◽

Nearest Neighbor ◽

Computational Cost ◽

Multimedia Retrieval ◽

Training Data ◽

Nonlinear Dimensionality Reduction ◽

Maximum Variance ◽

Data Volume

With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.

Download Full-text

Cluster-Based Nearest-Neighbour Classifier and Its Application on the Lightning Classification

Journal of Computer Science and Technology ◽

10.1007/s11390-008-9153-8 ◽

2008 ◽

Vol 23 (4) ◽

pp. 573-581 ◽

Cited By ~ 1

Author(s):

Loris Nanni ◽

Alessandra Lumini

Keyword(s):

Nearest Neighbour ◽

Nearest Neighbour Classifier

Download Full-text

A Proxy Peng-Robinson EOS for Efficient Modeling of Phase Behavior

10.2118/203914-ms ◽

2021 ◽

Author(s):

Mark Zhao ◽

Ryosuke Okuno

Keyword(s):

Phase Behavior ◽

Computational Cost ◽

Training Data ◽

Petroleum Engineering ◽

Fugacity Coefficient ◽

Fugacity Model ◽

Proxy Model ◽

Reservoir Conditions ◽

Fugacity Coefficients ◽

Proxy Models

Abstract Equation-of-state (EOS) compositional simulation is commonly used to model the interplay between phase behavior and fluid flow for various reservoir and surface processes. Because of its computational cost, however, there is a critical need for efficient phase-behavior calculations using an EOS. The objective of this research was to develop a proxy model for fugacity coefficient based on the Peng-Robinson EOS for rapid multiphase flash in compositional flow simulation. The proxy model as implemented in this research is to bypass the calculations of fugacity coefficients when the Peng-Robinson EOS has only one root, which is often the case at reservoir conditions. The proxy fugacity model was trained by artificial neural networks (ANN) with over 30 million fugacity coefficients based on the Peng-Robinson EOS. It accurately predicts the Peng- Robinson fugacity coefficient by using four parameters: Am, Bm, Bi, and ΣxiAij. Since these scalar parameters are general, not specific to particular compositions, pressures, and temperatures, the proxy model is applicable to petroleum engineering applications as equally as the original Peng-Robinson EOS. The proxy model is applied to multiphase flash calculations (phase-split and stability), where the cubic equation solutions and fugacity coefficient calculations are bypassed when the Peng-Robinson EOS has one root. The original fugacity coefficient is analytically calculated when the EOS has more than one root, but this occurs only occasionally at reservoir conditions. A case study shows the proxy fugacity model gave a speed-up factor of 3.4% in comparison to the conventional EOS calculation. Case studies also demonstrate accurate multiphase flash results (stability and phase split) and interchangeable proxy models for different fluid cases with different (numbers of) components. This is possible because it predicts the Peng-Robinson fugacity in the variable space that is not specific to composition, temperature, and pressure. For the same reason, non-zero binary iteration parameters do not impair the applicability, accuracy, robustness, and efficiency of the model. As the proxy models are specific to individual components, a combination of proxy models can be used to model for any mixture of components. Tuning of training hyperparameters and training data sampling method helped reduce the mean absolute percent error to less than 0.1% in the ANN modeling. To the best of our knowledge, this is the first generalized proxy model of the Peng-Robinson fugacity that is applicable to any mixture. The proposed model retains the conventional flash iteration, the convergence robustness, and the option of manual parameter tuning for fluid characterization.

Download Full-text

Generalized Field-Development Optimization With Derivative-Free Procedures

SPE Journal ◽

10.2118/163631-pa ◽

2014 ◽

Vol 19 (05) ◽

pp. 891-908 ◽

Cited By ~ 55

Author(s):

Obiajulu J. Isebor ◽

David Echeverría Ciaurri ◽

Louis J. Durlofsky

Keyword(s):

Computational Cost ◽

Optimization Method ◽

Optimal Number ◽

Global Search ◽

Search Method ◽

Pattern Search ◽

Mixed Integer ◽

Categorical Variables ◽

Field Development ◽

Derivative Free

Summary The optimization of general oilfield development problems is considered. Techniques are presented to simultaneously determine the optimal number and type of new wells, the sequence in which they should be drilled, and their corresponding locations and (time-varying) controls. The optimization is posed as a mixed-integer nonlinear programming (MINLP) problem and involves categorical, integer-valued, and real-valued variables. The formulation handles bound, linear, and nonlinear constraints, with the latter treated with filter-based techniques. Noninvasive derivative-free approaches are applied for the optimizations. Methods considered include branch and bound (B&B), a rigorous global-search procedure that requires the relaxation of the categorical variables; mesh adaptive direct search (MADS), a local pattern-search method; particle swarm optimization (PSO), a heuristic global-search method; and a PSO-MADS hybrid. Four example cases involving channelized-reservoir models are presented. The recently developed PSO-MADS hybrid is shown to consistently outperform the standalone MADS and PSO procedures. In the two cases in which B&B is applied, the heuristic PSO-MADS approach is shown to give comparable solutions but at a much lower computational cost. This is significant because B&B provides a systematic search in the categorical variables. We conclude that, although it is demanding in terms of computation, the methodology presented here, with PSO-MADS as the core optimization method, appears to be applicable for realistic reservoir development and management.

Download Full-text

Normalised Local Naïve Bayes Nearest-Neighbour Classifier for Offline Writer Identification

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) ◽

10.1109/icdar.2017.168 ◽

2017 ◽

Cited By ~ 2

Author(s):

Hussein Mohammed ◽

Volker Maergner ◽

Thomas Konidaris ◽

H. Siegfried Stiehl

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Nearest Neighbour ◽

Writer Identification ◽

Nearest Neighbour Classifier

Download Full-text