Distance Measures and Stemming Impact on ‎Arabic Document Clustering

Due to the huge growth of internet usage, large volume of information flow has also been increased, which leads to the problem of information congestion. In unsupervised learning, clustering is consider as most important problem. Big quality, high dimensionality and complicated semantics are the difficult issue of document clustering.it focus on the way of identifying a structure from an unlabeled data collection. A cluster is a method in which the data items are identified and grouped based on the resemblance between the objects from a dissimilar object set. Decision of a good cluster, can be demonstrated that there is no absolute “best” criterion independent of the final objective of the clustering. A good document clustering scheme’s primary objective is to minimize intra-cluster distance between papers while maximizing inter-cluster distance(using a suitable document distance measure).A distance measure(or, dually, measure of resemblance)is therefore at the core of document clustering. This assessment gives an implication about the different methods(Vector Space Model, Latent Sematic Indexing, Latent Dirichlet Allocation, Singular Value Decomposition, Doc2Vec Model, Graph model), distance measures(Euclidean Distance, Cosine Similarity, Jaccard Coefficient, Pearson Correlation Coefficient)and evaluation parameters of document clustering. This work is theoretical in nature and aims to corner the overall procedure of document clustering.

Download Full-text

Some Aspects on the Utility of Distance Measures in Comparing Two MROC Curves

Mathematical Journal of Interdisciplinary Sciences ◽

10.15415/mjis.2016.51006 ◽

2016 ◽

Vol 5 (1) ◽

pp. 61-80

Author(s):

SAMEERA G ◽

◽

VISHNU VARDHAN R

Keyword(s):

Distance Measures

Download Full-text

Comparision of Different Distance Measure Methods in Text Document Clustering

INTERNATIONAL JOURNAL OF RESEARCH AND ENGINEERING ◽

10.21276/ijre.2018.5.7.2 ◽

2018 ◽

Vol 5 (7) ◽

Author(s):

Yin Min Tun ◽

Keyword(s):

Distance Measure ◽

Document Clustering ◽

Text Document ◽

Measure Methods

Download Full-text

An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405614666180903112541 ◽

2020 ◽

Vol 16 (4) ◽

pp. 296-306 ◽

Cited By ~ 3

Author(s):

Laith Mohammad Abualigah ◽

Essam Said Hanandeh ◽

Ahamad Tajudin Khader ◽

Mohammed Abdallh Otair ◽

Shishir Kumar Shandilya

Keyword(s):

Optimization Technique ◽

Document Clustering ◽

Text Clustering ◽

Hill Climbing ◽

Text Documents ◽

Clustering Problem ◽

Text Document ◽

Text Information ◽

Amount Of Knowledge ◽

The Hill

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

Download Full-text

Deformable Models for Segmentation Based on Local Analysis

Mathematical Problems in Engineering ◽

10.1155/2017/1646720 ◽

2017 ◽

Vol 2017 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Jimena Olveres ◽

Erik Carbajal-Degante ◽

Boris Escalante-Ramírez ◽

Enrique Vallejo ◽

Carla María García-Moreno

Keyword(s):

Local Binary Patterns ◽

Distance Measures ◽

Local Analysis ◽

Hermite Transform ◽

Active Shape Models ◽

Shape Models ◽

Segmentation Methods ◽

Segmentation Algorithms ◽

Image Modality ◽

New Algorithms

Segmentation tasks in medical imaging represent an exhaustive challenge for scientists since the image acquisition nature yields issues that hamper the correct reconstruction and visualization processes. Depending on the specific image modality, we have to consider limitations such as the presence of noise, vanished edges, or high intensity differences, known, in most cases, as inhomogeneities. New algorithms in segmentation are required to provide a better performance. This paper presents a new unified approach to improve traditional segmentation methods as Active Shape Models and Chan-Vese model based on level set. The approach introduces a combination of local analysis implementations with classic segmentation algorithms that incorporates local texture information given by the Hermite transform and Local Binary Patterns. The mixture of both region-based methods and local descriptors highlights relevant regions by considering extra information which is helpful to delimit structures. We performed segmentation experiments on 2D images including midbrain in Magnetic Resonance Imaging and heart’s left ventricle endocardium in Computed Tomography. Quantitative evaluation was obtained with Dice coefficient and Hausdorff distance measures. Results display a substantial advantage over the original methods when we include our characterization schemes. We propose further research validation on different organ structures with promising results.

Download Full-text

A Dynamic Distance Measure of Picture Fuzzy Sets and Its Application

Symmetry ◽

10.3390/sym13030436 ◽

2021 ◽

Vol 13 (3) ◽

pp. 436

Author(s):

Ruirui Zhao ◽

Minxia Luo ◽

Shenggang Li

Keyword(s):

Fuzzy Sets ◽

Distance Measure ◽

Distance Measures ◽

Numerical Comparison ◽

Mathematical Tool ◽

Practical Applications ◽

Fuzzy Point ◽

Point Operator ◽

The Difference ◽

Picture Fuzzy Sets

Picture fuzzy sets, which are the extension of intuitionistic fuzzy sets, can deal with inconsistent information better in practical applications. A distance measure is an important mathematical tool to calculate the difference degree between picture fuzzy sets. Although some distance measures of picture fuzzy sets have been constructed, there are some unreasonable and counterintuitive cases. The main reason is that the existing distance measures do not or seldom consider the refusal degree of picture fuzzy sets. In order to solve these unreasonable and counterintuitive cases, in this paper, we propose a dynamic distance measure of picture fuzzy sets based on a picture fuzzy point operator. Through a numerical comparison and multi-criteria decision-making problems, we show that the proposed distance measure is reasonable and effective.

Download Full-text

Identifying longitudinal-growth patterns from infancy to childhood: a study comparing multiple clustering techniques

International Journal of Epidemiology ◽

10.1093/ije/dyab021 ◽

2021 ◽

Author(s):

Paraskevi Massara ◽

Charles D G Keown-Stoneman ◽

Lauren Erdman ◽

Eric O Ohuma ◽

Celine Bourdon ◽

...

Keyword(s):

Latent Class ◽

Growth Patterns ◽

Pattern Detection ◽

Distance Measures ◽

Longitudinal Growth ◽

Important Health ◽

Modelling Techniques ◽

Different Populations ◽

Growth Features ◽

Time Invariant

Abstract Background Most studies on children evaluate longitudinal growth as an important health indicator. Different methods have been used to detect growth patterns across childhood, but with no comparison between them to evaluate result consistency. We explored the variation in growth patterns as detected by different clustering and latent class modelling techniques. Moreover, we investigated how the characteristics/features (e.g. slope, tempo, velocity) of longitudinal growth influence pattern detection. Methods We studied 1134 children from The Applied Research Group for Kids cohort with longitudinal-growth measurements [height, weight, body mass index (BMI)] available from birth until 12 years of age. Growth patterns were identified by latent class mixed models (LCMM) and time-series clustering (TSC) using various algorithms and distance measures. Time-invariant features were extracted from all growth measures. A random forest classifier was used to predict the identified growth patterns for each growth measure using the extracted features. Results Overall, 72 TSC configurations were tested. For BMI, we identified three growth patterns by both TSC and LCMM. The clustering agreement was 58% between LCMM and TS clusters, whereas it varied between 30.8% and 93.3% within the TSC configurations. The extracted features (n = 67) predicted the identified patterns for each growth measure with accuracy of 82%–89%. Specific feature categories were identified as the most important predictors for patterns of all tested growth measures. Conclusion Growth-pattern detection is affected by the method employed. This can impact on comparisons across different populations or associations between growth patterns and health outcomes. Growth features can be reliably used as predictors of growth patterns.

Download Full-text

A Novel Validated Injectable Colistimethate Sodium Analysis Combining Advanced Chemometrics and Design of Experiments

Molecules ◽

10.3390/molecules26061546 ◽

2021 ◽

Vol 26 (6) ◽

pp. 1546

Author(s):

Ioanna Dagla ◽

Anthony Tsarbopoulos ◽

Evagelos Gikas

Keyword(s):

Design Of Experiments ◽

High Performance ◽

Similarity Index ◽

Principal Component ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Distance Measures ◽

Linear Regression Models ◽

Colistimethate Sodium

Colistimethate sodium (CMS) is widely administrated for the treatment of life-threatening infections caused by multidrug-resistant Gram-negative bacteria. Until now, the quality control of CMS formulations has been based on microbiological assays. Herein, an ultra-high-performance liquid chromatography coupled to ultraviolet detector methodology was developed for the quantitation of CMS in injectable formulations. The design of experiments was performed for the optimization of the chromatographic parameters. The chromatographic separation was achieved using a Waters Acquity BEH C8 column employing gradient elution with a mobile phase consisting of (A) 0.001 M aq. ammonium formate and (B) methanol/acetonitrile 79/21 (v/v). CMS compounds were detected at 214 nm. In all, 23 univariate linear-regression models were constructed to measure CMS compounds separately, and one partial least-square regression (PLSr) model constructed to assess the total CMS amount in formulations. The method was validated over the range 100–220 μg mL−1. The developed methodology was employed to analyze several batches of CMS injectable formulations that were also compared against a reference batch employing a Principal Component Analysis, similarity and distance measures, heatmaps and the structural similarity index. The methodology was based on freely available software in order to be readily available for the pharmaceutical industry.

Download Full-text

Deep Multi-view Document Clustering with Enhanced Semantic Embedding

Information Sciences ◽

10.1016/j.ins.2021.02.027 ◽

2021 ◽

Author(s):

Ruina Bai ◽

Ruizhang Huang ◽

Yanping Chen ◽

Yongbin Qin

Keyword(s):

Document Clustering ◽

Semantic Embedding

Download Full-text