similarity functions Latest Research Papers

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?

Semantic Web ◽

10.3233/sw-212892 ◽

2022 ◽

pp. 1-24

Author(s):

Jan Portisch ◽

Nicolas Heist ◽

Heiko Paulheim

Keyword(s):

Data Mining ◽

Link Prediction ◽

Graph Embedding ◽

Knowledge Graph ◽

Graph Embeddings ◽

Similarity Functions ◽

Evaluation Methodologies ◽

Series Of Experiments ◽

Two Sides ◽

Lower Dimensional

Knowledge Graph Embeddings, i.e., projections of entities and relations to lower dimensional spaces, have been proposed for two purposes: (1) providing an encoding for data mining tasks, and (2) predicting links in a knowledge graph. Both lines of research have been pursued rather in isolation from each other so far, each with their own benchmarks and evaluation methodologies. In this paper, we argue that both tasks are actually related, and we show that the first family of approaches can also be used for the second task and vice versa. In two series of experiments, we provide a comparison of both families of approaches on both tasks, which, to the best of our knowledge, has not been done so far. Furthermore, we discuss the differences in the similarity functions evoked by the different embedding approaches.

Unified Wind Wave Growth and Spectrum Functions for All Water Depths: Field Observations and Model Results

Journal of Physical Oceanography ◽

10.1175/jpo-d-21-0258.1 ◽

2022 ◽

Keyword(s):

Wind Wave ◽

Peak Frequency ◽

Wind Forcing ◽

Spectral Peak ◽

Dimensionless Frequency ◽

Wave Age ◽

Surface Wave Dispersion ◽

Similarity Functions ◽

Spectrum Model ◽

Water Depths

Abstract Wind wave development is governed by the fetch- or duration-limited growth principle that is expressed as a pair of similarity functions relating the dimensionless elevation variance (wave energy) and spectral peak frequency to fetch or duration. Combining the pair of similarity funtions the fetch or duration variable can be removed to form a dimensionless function of elevation variance and spectral peak frequency, which is interepreated as the wave enegry evolution with wave age. The relationship is initially developed for quasi-neural stability and quasi-steady wind forcing conditions. Further analyses show that the same fetch, duration, and wave age similarity functions are applicable to unsteady wind forcing conditions, including rapidly accelerating and decelerating mountain gap wind episodes and tropical cyclone (TC) wind fields. Here it is shown that with the dimensionless frequency converted to dimensionless wavenumber using the surface wave dispersion relationship, the same similarity function is applicable in all water depths. Field data collected in shallow to deep waters and mild to TC wind conditions, and synthetic data generated by spectrum model computations are assembled to illustrate the applicability. For the simulation work, the finite-depth wind wave spectrum model and its shoaling function are formulated for variable spectral slopes. Given wind speed, wave age, and water depth, the measrued and spectrum-computed significant wave heights and the associated growth parameters are in good agreement in forcing conditions from mild to TC winds and in all depths from deep ocean to shallow lake.

Provable randomized rounding for minimum-similarity diversification

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00811-2 ◽

2022 ◽

Author(s):

Bruno Ordozgoiti ◽

Ananth Mahadevan ◽

Antonis Matakos ◽

Aristides Gionis

Keyword(s):

Optimization Problems ◽

Randomized Algorithm ◽

Similarity Function ◽

Randomized Rounding ◽

Cardinality Constraint ◽

Similarity Functions ◽

Penalty Term ◽

Combinatorial Optimization Problems ◽

Benchmark Datasets ◽

Approximation Guarantee

AbstractWhen searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.

Evolutionary Feature Manipulation in Unsupervised Learning

10.26686/wgtn.17142221 ◽

2021 ◽

Author(s):

◽

Andrew Lensen

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Manifold Learning ◽

State Of The Art ◽

Fitness Function ◽

Learning Problems ◽

Similarity Functions ◽

Learning Tasks ◽

Benchmark Datasets

<p>Unsupervised learning is a fundamental category of machine learning that works on data for which no pre-existing labels are available. Unlike in supervised learning, which has such labels, methods that perform unsupervised learning must discover intrinsic patterns within data. The size and complexity of data has increased substantially in recent years, which has necessitated the creation of new techniques for reducing the complexity and dimensionality of data in order to allow humans to understand the knowledge contained within data. This is particularly problematic in unsupervised learning, as the number of possible patterns in a dataset grows exponentially with regard to the number of dimensions. Feature manipulation techniques such as feature selection (FS) and feature construction (FC) are often used in these situations. FS automatically selects the most valuable features (attributes) in a dataset, whereas FC constructs new, more powerful and meaningful features that provide a lower-dimensional space. Evolutionary computation (EC) approaches have become increasingly recognised for their potential to provide high-quality solutions to data mining problems in a reasonable amount of computational time. Unlike other popular techniques such as neural networks, EC methods have global search ability without needing gradient information, which makes them much more flexible and applicable to a wider range of problems. EC approaches have shown significant potential in feature manipulation tasks with methods such as Particle Swarm Optimisation (PSO) commonly used for FS, and Genetic Programming (GP) for FC. The use of EC for feature manipulation has, until now, been predominantly restricted to supervised learning problems. This is a notable gap in the research: if unsupervised learning is even more sensitive to high-dimensionality, then why is EC-based feature manipulation not used for unsupervised learning problems? This thesis provides the first comprehensive investigation into the use of evolutionary feature manipulation for unsupervised learning tasks. It clearly shows the ability of evolutionary feature manipulation to improve both the performance of algorithms and interpretability of solutions in unsupervised learning tasks. A variety of tasks are investigated, including the well-established task of clustering, as well as more recent unsupervised learning problems, such as benchmark dataset creation and manifold learning. This thesis proposes a new PSO-based approach to performing simultaneous FS and clustering. A number of improvements to the state-of-the-art are made, including the introduction of a new medoid-based representation and an improved fitness function. A sophisticated three-stage algorithm, which takes advantage of heuristic techniques to determine the number of clusters and to fine-tune clustering performance is also developed. Empirical evaluation on a range of clustering problems demonstrates a decrease in the number of features used, while also improving the clustering performance. This thesis also introduces two innovative approaches to performing wrapper-based FC in clustering tasks using GP. An initial approach where constructed features are directly provided to the k-means clustering algorithm demonstrates the clear strength of GP-based FC for improving clustering results. A more advanced method is proposed that utilises the functional nature of GP-based FC to evolve more specific, concise, and understandable similarity functions for use in clustering algorithms. These similarity functions provide clear improvements in performance and can be easily interpreted by machine learning practitioners. This thesis demonstrates the ability of evolutionary feature manipulation to solve unsupervised learning tasks that traditional methods have struggled with. The synthesis of benchmark datasets has long been a technique used for evaluating machine learning techniques, but this research is the first to present an approach that automatically creates diverse and challenging redundant features for a given dataset. This thesis introduces a GP-based FC approach that creates difficult benchmark datasets for evaluating FS algorithms. It also makes the intriguing discovery that using a mutual information-based fitness function with GP has the potential to be used to improve supervised learning tasks even when the labels are not utilised. Manifold learning is an approach to dimensionality reduction that aims to reduce dimensionality by discovering the inherent lower-dimensional structure of a dataset. While state-of-the-art manifold learning approaches show impressive performance in reducing data dimensionality, they do so at the cost of removing the ability for humans to understand the data in terms of the original features. By utilising a GP-based approach, this thesis proposes new methods that can perform interpretable manifold learning, which provides deep insight into patterns in the data. These four contributions clearly support the hypothesis that evolutionary feature manipulation has untapped potential in unsupervised learning. This thesis demonstrates that EC-based feature manipulation can be successfully applied to a variety of unsupervised learning tasks with clear improvements in both performance and interpretability. A plethora of future research directions in this area are also discovered, which we hope will lead to further valuable findings in this area.</p>

Evolutionary Feature Manipulation in Unsupervised Learning

10.26686/wgtn.17142221.v1 ◽

2021 ◽

Author(s):

◽

Andrew Lensen

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Manifold Learning ◽

State Of The Art ◽

Fitness Function ◽

Learning Problems ◽

Similarity Functions ◽

Learning Tasks ◽

Benchmark Datasets

<p>Unsupervised learning is a fundamental category of machine learning that works on data for which no pre-existing labels are available. Unlike in supervised learning, which has such labels, methods that perform unsupervised learning must discover intrinsic patterns within data. The size and complexity of data has increased substantially in recent years, which has necessitated the creation of new techniques for reducing the complexity and dimensionality of data in order to allow humans to understand the knowledge contained within data. This is particularly problematic in unsupervised learning, as the number of possible patterns in a dataset grows exponentially with regard to the number of dimensions. Feature manipulation techniques such as feature selection (FS) and feature construction (FC) are often used in these situations. FS automatically selects the most valuable features (attributes) in a dataset, whereas FC constructs new, more powerful and meaningful features that provide a lower-dimensional space. Evolutionary computation (EC) approaches have become increasingly recognised for their potential to provide high-quality solutions to data mining problems in a reasonable amount of computational time. Unlike other popular techniques such as neural networks, EC methods have global search ability without needing gradient information, which makes them much more flexible and applicable to a wider range of problems. EC approaches have shown significant potential in feature manipulation tasks with methods such as Particle Swarm Optimisation (PSO) commonly used for FS, and Genetic Programming (GP) for FC. The use of EC for feature manipulation has, until now, been predominantly restricted to supervised learning problems. This is a notable gap in the research: if unsupervised learning is even more sensitive to high-dimensionality, then why is EC-based feature manipulation not used for unsupervised learning problems? This thesis provides the first comprehensive investigation into the use of evolutionary feature manipulation for unsupervised learning tasks. It clearly shows the ability of evolutionary feature manipulation to improve both the performance of algorithms and interpretability of solutions in unsupervised learning tasks. A variety of tasks are investigated, including the well-established task of clustering, as well as more recent unsupervised learning problems, such as benchmark dataset creation and manifold learning. This thesis proposes a new PSO-based approach to performing simultaneous FS and clustering. A number of improvements to the state-of-the-art are made, including the introduction of a new medoid-based representation and an improved fitness function. A sophisticated three-stage algorithm, which takes advantage of heuristic techniques to determine the number of clusters and to fine-tune clustering performance is also developed. Empirical evaluation on a range of clustering problems demonstrates a decrease in the number of features used, while also improving the clustering performance. This thesis also introduces two innovative approaches to performing wrapper-based FC in clustering tasks using GP. An initial approach where constructed features are directly provided to the k-means clustering algorithm demonstrates the clear strength of GP-based FC for improving clustering results. A more advanced method is proposed that utilises the functional nature of GP-based FC to evolve more specific, concise, and understandable similarity functions for use in clustering algorithms. These similarity functions provide clear improvements in performance and can be easily interpreted by machine learning practitioners. This thesis demonstrates the ability of evolutionary feature manipulation to solve unsupervised learning tasks that traditional methods have struggled with. The synthesis of benchmark datasets has long been a technique used for evaluating machine learning techniques, but this research is the first to present an approach that automatically creates diverse and challenging redundant features for a given dataset. This thesis introduces a GP-based FC approach that creates difficult benchmark datasets for evaluating FS algorithms. It also makes the intriguing discovery that using a mutual information-based fitness function with GP has the potential to be used to improve supervised learning tasks even when the labels are not utilised. Manifold learning is an approach to dimensionality reduction that aims to reduce dimensionality by discovering the inherent lower-dimensional structure of a dataset. While state-of-the-art manifold learning approaches show impressive performance in reducing data dimensionality, they do so at the cost of removing the ability for humans to understand the data in terms of the original features. By utilising a GP-based approach, this thesis proposes new methods that can perform interpretable manifold learning, which provides deep insight into patterns in the data. These four contributions clearly support the hypothesis that evolutionary feature manipulation has untapped potential in unsupervised learning. This thesis demonstrates that EC-based feature manipulation can be successfully applied to a variety of unsupervised learning tasks with clear improvements in both performance and interpretability. A plethora of future research directions in this area are also discovered, which we hope will lead to further valuable findings in this area.</p>

Prediction of gradient‐based similarity functions from the Mellor‐Yamada model

Quarterly Journal of the Royal Meteorological Society ◽

10.1002/qj.4161 ◽

2021 ◽

Author(s):

Lech Łobocki ◽

Paola Porretta‐Tomaszewska

Keyword(s):

Similarity Functions ◽

Gradient Based

Survey of Similarity Functions on Neighborhood-Based Collaborative Filtering

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115482 ◽

2021 ◽

pp. 115482

Author(s):

Halime Khojamli ◽

Jafar Razmara

Keyword(s):

Collaborative Filtering ◽

Similarity Functions

Genetic Programming for Evolving Similarity Functions Tailored to Clustering Algorithms

2021 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec45853.2021.9504855 ◽

2021 ◽

Author(s):

Hayden Andersen ◽

Andrew Lensen ◽

Bing Xue

Keyword(s):

Genetic Programming ◽

Clustering Algorithms ◽

Similarity Functions

Comparison of mutual information and its point similarity implementation for image registration

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i3.pp2613-2620 ◽

2021 ◽

Vol 11 (3) ◽

pp. 2613

Author(s):

Wassim El Hajj Chehade ◽

Peter Rogelj

Keyword(s):

Image Registration ◽

Mutual Information ◽

Similarity Measures ◽

Reference Image ◽

Constant Intensity ◽

Similarity Functions ◽

Moving Image ◽

Intensity Dependence ◽

Registration Error ◽

Better Than

Mutual information (MI) is one of the most popular and widely used similarity measures in image registration. In traditional registration processes, MI is computed in each optimization step to measure the similarity between the reference image and the moving image. The presumption is that whenever MI reaches its highest value, this corresponds to the best match. This paper shows that this presumption is not always valid and this leads to registration error. To overcome this problem, we propose to use point similarity measures (PSM) which in contrast to MI allows constant intensity dependence estimates called point similarity functions (PSF). We compare MI and PSM similarity measures in terms of registration misalignment errors. The result of the comparison confirms that the best alignment is not at the highest value of MI but near to it and it shows that PSM performs better than MI if PSF matches the correct intensity dependence between images. This opens a new direction of research towards the improvement of image registration.

On solution existence of MHD Casson nanofluid transportation across an extending cylinder through porous media and evaluation of priori bounds

Scientific Reports ◽

10.1038/s41598-021-86953-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sohaib Abdal ◽

Sajjad Hussain ◽

Imran Siddique ◽

Ali Ahmadian ◽

Massimiliano Ferrara

Keyword(s):

Porous Matrix ◽

Momentum Equation ◽

Flow Speed ◽

Existence Of Solution ◽

Similarity Functions ◽

Shooting Technique ◽

Casson Nanofluid ◽

Runge Kutta Method ◽

Large Injection ◽

Flow Through

AbstractIt is a theoretical exportation for mass transpiration and thermal transportation of Casson nanofluid over an extending cylindrical surface. The Stagnation point flow through porous matrix is influenced by magnetic field of uniform strength. Appropriate similarity functions are availed to yield the transmuted system of leading differential equations. Existence for the solution of momentum equation is proved for various values of Casson parameter $$\beta $$ β , magnetic parameter M, porosity parameter $$K_p$$ K p and Reynolds number Re in two situations of mass transpiration (suction/injuction). The core interest for this study aroused to address some analytical aspects. Therefore, existence of solution is proved and uniqueness of this results is discussed with evaluation of bounds for existence of solution. Results for skin friction factor are established to attain accuracy for large injection values. Thermal and concentration profiles are delineated numerically by applying Runge-Kutta method and shooting technique. The flow speed retards against M, $$\beta $$ β and $$K_p$$ K p for both situations of mass injection and suction. The thermal boundary layer improves with Brownian and thermopherotic diffusions.

similarity functions
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?

Unified Wind Wave Growth and Spectrum Functions for All Water Depths: Field Observations and Model Results

Provable randomized rounding for minimum-similarity diversification

Evolutionary Feature Manipulation in Unsupervised Learning

Evolutionary Feature Manipulation in Unsupervised Learning

Prediction of gradient‐based similarity functions from the Mellor‐Yamada model

Survey of Similarity Functions on Neighborhood-Based Collaborative Filtering

Genetic Programming for Evolving Similarity Functions Tailored to Clustering Algorithms

Comparison of mutual information and its point similarity implementation for image registration

On solution existence of MHD Casson nanofluid transportation across an extending cylinder through porous media and evaluation of priori bounds

Export Citation Format

similarity functionsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?

Unified Wind Wave Growth and Spectrum Functions for All Water Depths: Field Observations and Model Results

Provable randomized rounding for minimum-similarity diversification

Evolutionary Feature Manipulation in Unsupervised Learning

Evolutionary Feature Manipulation in Unsupervised Learning

Prediction of gradient‐based similarity functions from the Mellor‐Yamada model

Survey of Similarity Functions on Neighborhood-Based Collaborative Filtering

Genetic Programming for Evolving Similarity Functions Tailored to Clustering Algorithms

Comparison of mutual information and its point similarity implementation for image registration

On solution existence of MHD Casson nanofluid transportation across an extending cylinder through porous media and evaluation of priori bounds

similarity functions
Recently Published Documents