scholarly journals Combined cosine-linear regression model similarity with application to handwritten word spotting

Author(s):  
Youssef Elfakir ◽  
Ghizlane Khaissidi ◽  
Mostafa Mrabti ◽  
Driss Chenouni ◽  
Manal Boualam

The similarity or the distance measure have been used widely to calculate the similarity or dissimilarity between vector sequences, where the document images similarity is known as the domain that dealing with image information and both similarity/distance has been an important role for matching and pattern recognition. There are several types of similarity measure, we cover in this paper the survey of various distance measures used in the images matching and we explain the limitations associated with the existing distances. Then, we introduce the concept of the floating distance which describes the variation of the threshold’s selection for each word in decision making process, based on a combination of Linear Regression and cosine distance. Experiments are carried out on a handwritten Arabic image documents of Gallica library. These experiments show that the proposed floating distance outperforms the traditional distance in word spotting system.

Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 436
Author(s):  
Ruirui Zhao ◽  
Minxia Luo ◽  
Shenggang Li

Picture fuzzy sets, which are the extension of intuitionistic fuzzy sets, can deal with inconsistent information better in practical applications. A distance measure is an important mathematical tool to calculate the difference degree between picture fuzzy sets. Although some distance measures of picture fuzzy sets have been constructed, there are some unreasonable and counterintuitive cases. The main reason is that the existing distance measures do not or seldom consider the refusal degree of picture fuzzy sets. In order to solve these unreasonable and counterintuitive cases, in this paper, we propose a dynamic distance measure of picture fuzzy sets based on a picture fuzzy point operator. Through a numerical comparison and multi-criteria decision-making problems, we show that the proposed distance measure is reasonable and effective.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shumpei Haginoya ◽  
Aiko Hanayama ◽  
Tamae Koike

Purpose The purpose of this paper was to compare the accuracy of linking crimes using geographical proximity between three distance measures: Euclidean (distance measured by the length of a straight line between two locations), Manhattan (distance obtained by summing north-south distance and east-west distance) and the shortest route distances. Design/methodology/approach A total of 194 cases committed by 97 serial residential burglars in Aomori Prefecture in Japan between 2004 and 2015 were used in the present study. The Mann–Whitney U test was used to compare linked (two offenses committed by the same offender) and unlinked (two offenses committed by different offenders) pairs for each distance measure. Discrimination accuracy between linked and unlinked crime pairs was evaluated using area under the receiver operating characteristic curve (AUC). Findings The Mann–Whitney U test showed that the distances of the linked pairs were significantly shorter than those of the unlinked pairs for all distance measures. Comparison of the AUCs showed that the shortest route distance achieved significantly higher accuracy compared with the Euclidean distance, whereas there was no significant difference between the Euclidean and the Manhattan distance or between the Manhattan and the shortest route distance. These findings give partial support to the idea that distance measures taking the impact of environmental factors into consideration might be able to identify a crime series more accurately than Euclidean distances. Research limitations/implications Although the results suggested a difference between the Euclidean and the shortest route distance, it was small, and all distance measures resulted in outstanding AUC values, probably because of the ceiling effects. Further investigation that makes the same comparison in a narrower area is needed to avoid this potential inflation of discrimination accuracy. Practical implications The shortest route distance might contribute to improving the accuracy of crime linkage based on geographical proximity. However, further investigation is needed to recommend using the shortest route distance in practice. Given that the targeted area in the present study was relatively large, the findings may contribute especially to improve the accuracy of proactive comparative case analysis for estimating the whole picture of the distribution of serial crimes in the region by selecting more effective distance measure. Social implications Implications to improve the accuracy in linking crimes may contribute to assisting crime investigations and the earlier arrest of offenders. Originality/value The results of the present study provide an initial indication of the efficacy of using distance measures taking environmental factors into account.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249624
Author(s):  
C. B. Scott ◽  
Eric Mjolsness

We define a new family of similarity and distance measures on graphs, and explore their theoretical properties in comparison to conventional distance metrics. These measures are defined by the solution(s) to an optimization problem which attempts find a map minimizing the discrepancy between two graph Laplacian exponential matrices, under norm-preserving and sparsity constraints. Variants of the distance metric are introduced to consider such optimized maps under sparsity constraints as well as fixed time-scaling between the two Laplacians. The objective function of this optimization is multimodal and has discontinuous slope, and is hence difficult for univariate optimizers to solve. We demonstrate a novel procedure for efficiently calculating these optima for two of our distance measure variants. We present numerical experiments demonstrating that (a) upper bounds of our distance metrics can be used to distinguish between lineages of related graphs; (b) our procedure is faster at finding the required optima, by as much as a factor of 103; and (c) the upper bounds satisfy the triangle inequality exactly under some assumptions and approximately under others. We also derive an upper bound for the distance between two graph products, in terms of the distance between the two pairs of factors. Additionally, we present several possible applications, including the construction of infinite “graph limits” by means of Cauchy sequences of graphs related to one another by our distance measure.


2019 ◽  
Vol 0 (0) ◽  
pp. 1-29 ◽  
Author(s):  
Víctor G. Alfaro-García ◽  
José M. Merigó ◽  
Leobardo Plata-Pérez ◽  
Gerardo G. Alfaro-Calderón ◽  
Anna M. Gil-Lafuente

This paper introduces the induced ordered weighted logarithmic averaging IOWLAD and multiregion induced ordered weighted logarithmic averaging MR-IOWLAD operators. The distinctive characteristic of these operators lies in the notion of distance measures combined with the complex reordering mechanism of inducing variables and the properties of the logarithmic averaging operators. The main advantage of MR-IOWLAD operators is their design, which is specifically thought to aid in decision-making when a set of diverse regions with different properties must be considered. Moreover, the induced weighting vector and the distance measure mechanisms of the operator allow for the wider modeling of problems, including heterogeneous information and the complex attitudinal character of experts, when aiming for an ideal scenario. Along with analyzing the main properties of the IOWLAD operators, their families and specific cases, we also introduce some extensions, such as the induced generalized ordered weighted averaging IGOWLAD operator and Choquet integrals. We present the induced Choquet logarithmic distance averaging ICLD operator and the generalized induced Choquet logarithmic distance averaging IGCLD operator. Finally, an illustrative example is proposed, including real-world information retrieved from the United Nations World Statistics for global regions.


2018 ◽  
Vol 7 (4) ◽  
pp. 9 ◽  
Author(s):  
Shakir F. Kak ◽  
Firas M. Mustafa ◽  
Pedro R. Valente

In a recent past, face recognition was one of the most popular methods and successful application of image processing field which is widely used in security and biometric applications. The innovation of new approaches to face identification technologies is continuously subject to building much strong face recognition algorithms. Face recognition in real-time applications has been fast-growing challenging and interesting. The human face identification process is not trivial task especially different face lighting and poses are captured to be matched. In this study, the proposed method is tested using a benchmark ORL database that contains 400 images of 40 persons as the variant posse, lighting, etc. Discrete avelet Transform technique is applied on the ORL database to enhance the accuracy and the recognition rate. The best recognition rate result obtained is 99.25%, when tested using 9 training images and 1 testing image with cosine distance measurement. The recognition rate Increased when applying 2-level of DWT with the bior5.5 filter on training image database and the test image. For feature extraction and dimension reduction, PCA is used. Euclidean distance, Manhattan distance, and Cosine distance are Distance measures used for the matching process.


SPE Journal ◽  
2021 ◽  
pp. 1-25
Author(s):  
Chang Gao ◽  
Juliana Y. Leung

Summary The steam-assisted gravity drainage (SAGD) recovery process is strongly impacted by the spatial distributions of heterogeneous shale barriers. Though detailed compositional flow simulators are available for SAGD recovery performance evaluation, the simulation process is usually quite computationally demanding, rendering their use over a large number of reservoir models for assessing the impacts of heterogeneity (uncertainties) to be impractical. In recent years, data-driven proxies have been widely proposed to reduce the computational effort; nevertheless, the proxy must be trained using a large data set consisting of many flow simulation cases that are ideally spanning the model parameter spaces. The question remains: is there a more efficient way to screen a large number of heterogeneous SAGD models? Such techniques could help to construct a training data set with less redundancy; they can also be used to quickly identify a subset of heterogeneous models for detailed flow simulation. In this work, we formulated two particular distance measures, flow-based and static-based, to quantify the similarity among a set of 3D heterogeneous SAGD models. First, to formulate the flow-based distance measure, a physics-basedparticle-tracking model is used: Darcy’s law and energy balance are integrated to mimic the steam chamber expansion process; steam particles that are located at the edge of the chamber would release their energy to the surrounding cold bitumen, while detailed fluid displacements are not explicitly simulated. The steam chamber evolution is modeled, and a flow-based distance between two given reservoir models is defined as the difference in their chamber sizes over time. Second, to formulate the static-based distance, the Hausdorff distance (Hausdorff 1914) is used: it is often used in image processing to compare two images according to their corresponding spatial arrangement and shapes of various objects. A suite of 3D models is constructed using representative petrophysical properties and operating constraints extracted from several pads in Suncor Energy’s Firebag project. The computed distance measures are used to partition the models into different groups. To establish a baseline for comparison, flow simulations are performed on these models to predict the actual chamber evolution and production profiles. The grouping results according to the proposed flow- and static-based distance measures match reasonably well to those obtained from detailed flow simulations. Significant improvement in computational efficiency is achieved with the proposed techniques. They can be used to efficiently screen a large number of reservoir models and facilitate the clustering of these models into groups with distinct shale heterogeneity characteristics. It presents a significant potential to be integrated with other data-driven approaches for reducing the computational load typically associated with detailed flow simulations involving multiple heterogeneous reservoir realizations.


2019 ◽  
Vol 59 (4) ◽  
pp. 722-741 ◽  
Author(s):  
Paul Phillips ◽  
Nuno Antonio ◽  
Ana de Almeida ◽  
Luís Nunes

This study examines the relationship between distance measures and a Portuguese data set consisting of 34,622 online hotel reviews extracted from Booking.com and TripAdvisor written in Portuguese, Spanish, and English. Based on the country of origin of each review author, a geographic and a psychic distance measure is calculated for Portugal. Data and text mining analysis provides additional insights into online hotel ratings. The authors confirm that online travelers’ evaluations are multifaceted constructs displaying varying patterns of rating behavior among the traveler base. By investigating the contemporary relevance of geographic and psychic distance, a key finding of this study is that travelers with less distance both in terms of psychic and geographic distance give a lower rating score than travelers with greater distance. The inclusion of psychic and geographic distance is advocated as a salient aspect for future researchers and for those practitioners who wish to enhance hotel product and service features.


Sign in / Sign up

Export Citation Format

Share Document