scholarly journals Understanding and Using Common Similarity Measures for Text Analysis

2020 ◽  
Author(s):  
John R. Ladd

This lesson introduces three common measures for determining how similar texts are to one another: city block distance, Euclidean distance, and cosine distance. You will learn the general principles behind similarity, the different advantages of these measures, and how to calculate each of them using the SciPy Python library.

2017 ◽  
Vol 2 (1) ◽  
pp. 6
Author(s):  
Rania Ahmed Kadry Abdel Gawad Birry

Abstract—Alzheimer’s disease (AD) is a brain disease that causes a slow decline in memory, thinking and reasoning skills. It represents a major public health problem.  Magnetic Resonance Imaging (MRI) have shown that the brains of people with (AD) shrink significantly as the disease progresses. This shrinkage appears in specific brain regions such as the hippocampus which is a small, curved formation in the brain that plays an important role in the limbic system also involved in the formation of new memories and is also associated with learning and emotions.  Medical information on brain MRI is used in detecting the abnormalities in physiological structures. Structural MRI measurements can detect and follow the evolution of brain atrophy which is a marker of the disease progression; therefore, it allows diagnosis and prediction of AD.  The research’s main target is the early recognition of Alzheimer’s disease automatically, which will thereby avoid deterioration of the case resulting in complete brain damage stage.  Alzheimer’s disease yields visible changes in the brain structures. The aim is to recognize if the patient belongs to Alzheimer’s disease category or a normal healthy person at an early stage. Initially, image pre-processing and features extraction techniques are applied including data reduction using Discrete Cosine Transform (DCT) and Cropping, then traditional classification techniques like Euclidean Distance, Chebyshev Distance, Cosine Distance, City Block Distance, and Black pixel counter, were applied on the resulting vectors for classification. Image pre-processing includes noise reduction, Gray-scale conversion and binary scale conversion were applied for the MRI images. Feature extraction techniques follow including cropping and low spatial frequency components (DCT). This paper aims to automatically recognize and detect Alzheimer’s infected brain using MRI, without the need of clinical expert. This early recognition would be helpful to postpone the disease progression and maintain it at an almost steady stage. It was concluded after collecting a dataset of 50 MRI , 25 for normal MRI and  25 for AD MRI that Chebyshev Distance classifier yielded the highest success rate in the recognition of Alzheimer’s disease with accuracy 94% compared to other classification techniques used where, Euclidean Distance is 91.6%,  Cosine Distance is 86.8%, City block Distance is 89.6%, Correlation Distance is 86.4% and Black pixels counter is 90%.


Author(s):  
Alexander Sadovski ◽  

In many fields including agroclimatology, pedology, plant introduction, environmental health and agricultural transfer, detection of areas of similar climate is of significant interest. Numerical methods including cluster analysis, similarity measures, and other techniques were used to compare climatic data from Islandian meteorological stations to classify them according to similar homoclimate. Using Euclidean distance and City-block (Manhattan) distance, data from Iceland, Finland, Sweden, Norway, and Alaska state of the USA were analyzed to reveal homoclime. One of the conclusions from the study is that Iceland has a similar climate to Alaska and Norway. Climate change is already affecting agriculture, with effects unevenly distributed across the world. These changes will undoubtedly lead to a reconsideration of the question of allocation of appropriate agricultural crops to given areas and evaluation of bioclimatic resources in territories with similar climate. Results from this study are related to the territory of Iceland, but the approach to classify meteorological stations according to similar homoclimate and reveal homoclime in selected territories is applicable everywhere in the world.


2014 ◽  
Vol 12 (4) ◽  
pp. 3373-3381
Author(s):  
Metty Mustikasari ◽  
Sarifuddin Madenda

Recently Content based image retrieval (CBIR) is an active research. This paper proposes a technique to retrieve images based on color feature and evaluate the retrieval system performance. In this retrieval system Euclidean distance and City block distance are used to measure similarity of images. This algorithm is tested by using Corel image database which is provided by James Wang.  The performance of retrieval system is measured in terms of its recall and precision.  The effectiveness of retrieval system is also measured based on Average Rank (AVRR) of all relevant retrieves images and Ideal Average Rank of relevant images (IAVRR). The experimental results show that city block has achieved higher retrieval performance than Euclidean distance.


2017 ◽  
Vol 7 (2) ◽  
pp. 21 ◽  
Author(s):  
Dwi Nugraheny

One commonality or similarity matching phase characteristics of an image is by using the method of distance measurement. Distance is an important aspect in the development of methods of grouping and regression. Before the grouping of data or object to the detection process, first determined the size of the proximity distance between data elements. In this study, there will be a comparison of several methods including distance measurement using Euclidean distance, Manhattan/ City Block Distance, Mahalanobis which will be implemented in the case of cumulonimbus image clouds detection using Principal Component Analysis (PCA). The average percentage of accuracy of image similarity value Cumulonimbus clouds using the Euclidean distance method was 93 percent and the distance Manhattan/ City Block Distance is 90 percent, while the Mahalanobis distance method was 50 percent.


2020 ◽  
Author(s):  
Lauren S Aulet ◽  
Stella F. Lourenco

Human and non-human animals have the remarkable capacity to rapidly estimate the quantity of objects in the environment. The dominant view of this ability posits an abstract numerosity code, uncontaminated by non-numerical visual information. The present study provides novel evidence in contradiction to this view by demonstrating that number and cumulative surface area are perceived holistically, classically known as integral dimensions. Whether assessed explicitly (Experiment 1) or implicitly (Experiment 2), perceived similarity for dot arrays that varied parametrically in number and cumulative area was best modeled by Euclidean, as opposed to city-block, distance within the stimulus space, comparable to other integral dimensions (brightness/saturation and radial frequency components), but different from separable dimensions (shape/color and brightness/size). Moreover, Euclidean distance remained the best-performing model, even when compared to models that controlled for other magnitude properties (e.g., density) or image similarity. These findings suggest that numerosity perception entails the obligatory processing of non-numerical magnitude.


2021 ◽  
Vol 25 (01) ◽  
pp. 80-91
Author(s):  
Saba K. Naji ◽  
◽  
Muthana H. Hamd ◽  

Due to, the great electronic development, which reinforced the need to define people's identities, different methods, and databases to identification people's identities have emerged. In this paper, we compare the results of two texture analysis methods: Local Binary Pattern (LBP) and Local Ternary Pattern (LTP). The comparison based on comparing the extracting facial texture features of 40 and 401 subjects taken from ORL and UFI databases respectively. As well, the comparison has taken in the account using three distance measurements such as; Manhattan Distance (MD), Euclidean Distance (ED), and Cosine Distance (CD). Where the maximum accuracy of the LBP method (99.23%) is obtained with a Manhattan and ORL database, while the LTP method attained (98.76%) using the same distance and database. While, the facial database of UFI shows low quality, which is satisfied 75.98% and 73.82% recognition rates using LBP and LTP respectively with Manhattan distance.


Human-computer interaction (HCI), in recent times, is gaining a lot of significance. The systems based on HCI have been designed for recognizing different facial expressions. The application areas for face recognition include robotics, safety, and surveillance system. The emotions so captured aid in predicting future actions in addition to providing valuable information. Fear, neutral, sad, surprise, happy are the categories of primary emotions. From the database of still images, certain features can be obtained using Gabor Filter (GF) and Histogram of Oriented Gradient (HOG). These two techniques are being used while extracting features for getting the expressions from the face. This paper focuses on the customized classification of GF and HOG using the KNN classifier.GF provides texture features whereas HOG finds applications for images exhibiting differing lighting conditions. Simplicity and linearity of KNN classifier appeals for its use in the present application. The paper also elaborates various distances used in KNN classifiers like city-block, Euclidean and correlation distance. This paper uses Matlab implementation of GF, HOG and KNN for extracting the required features and classification, respectively. Results exhibit that the accuracy of city- block distance is more .


2015 ◽  
pp. 744-758
Author(s):  
Soma Panja ◽  
Dilip Roy

This chapter examines the closeness between the optimum portfolio and portfolio selected by an investor who follows a heuristic approach. There may be basically two ways of arriving at an optimum portfolio – one by minimizing the risk and the other by maximizing the return. In this chapter, the authors propose to strike a balance between these two. The optimum portfolio has been obtained through a mathematical programming framework so as to minimize the portfolio risk subject to return constraint expressed in terms of coefficient of optimism (a), where a varies between 0 to 1. Simultaneously, the authors propose to develop four heuristic portfolios for the optimistic and pessimistic investors, risk planners, and random selectors. Given the optimum portfolio and a heuristic portfolio, City Block Distance has been calculated to measure the departure of the heuristic solution from the optimum solution. Based on daily security wise data of ten companies listed in Nifty for the years 2004 to 2008, the authors have obtained that when the value of a lies between 0 to 0.5, the pessimistic investor's decision is mostly closest to the optimum solution, and when the value of a is greater than 0.5, the optimistic investor's decision is mostly near to the optimum decision. Near the point a = 0.5, the random selectors and risk planners' solutions come closer to the optimum decision. This study may help the investors to take heuristic investment decision and, based on his/her value system, reach near to the optimum solution.


2019 ◽  
Vol 35 (13) ◽  
pp. 1400-1414 ◽  
Author(s):  
Miriam Rodrigues da Silva ◽  
Osmar Abílio de Carvalho ◽  
Renato Fontes Guimarães ◽  
Roberto Arnaldo Trancoso Gomes ◽  
Cristiano Rosa Silva

Sign in / Sign up

Export Citation Format

Share Document