Clusterization of Russian Regions by the Level of Mortgage Developing

In this article, in order to optimize the economic policy in the field of mortgagehousing lending, the clustering of Russian regions by the most optimal method was carried out and analyzed. The main limitations arising from the application of the most popular k-means clustering algorithm for analyzing mortgages are considered and ways to correct them are suggested. The regions were grouped using clustering algorithms using medians and medoids that are more resistant to outliers. A comparison was made of the results of the k-means, k-medians and k-medoids algorithms, and the optimal number of groups of regions with similar indicators in the field of mortgage lending and their relevant regions representatives were found. A hierarchical clustering algorithm based on the Ward method was used, the result of which was the use of five mortgage clusters in Russia. The study of the characteristics of these groups of regions will help in creating a mortgage policy that takes into account the peculiarities of the regions of Russia. All calculations were made in the R programming language; graphics were created in the Rstudio development environment.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

Hesitant Fuzzy Linguistic Agglomerative Hierarchical Clustering Algorithm and Its Application in Judicial Practice

Mathematics ◽

10.3390/math9040370 ◽

2021 ◽

Vol 9 (4) ◽

pp. 370

Author(s):

Shuangsheng Wu ◽

Jie Lin ◽

Zhenyu Zhang ◽

Yushu Yang

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Agglomerative Hierarchical Clustering ◽

Research Gaps ◽

Judicial Practice ◽

Linguistic Term ◽

Clustering Effect ◽

Hierarchical Clustering Algorithm ◽

Fuzzy Linguistic

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm

10.21203/rs.3.rs-58011/v3 ◽

2021 ◽

Author(s):

Congming Shi ◽

Bingtao Wei ◽

Shoulin Wei ◽

Wen Wang ◽

Hai Liu ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Machine Learning Method ◽

Cluster Number ◽

Number Of Clusters ◽

Public Dataset ◽

Optimal Cluster ◽

Better Than ◽

Optimal Number Of Clusters

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.

Download Full-text

Data Analysis Using Representation Theory and Clustering Algorithms

WSEAS TRANSACTIONS ON COMPUTERS ◽

10.37394/23205.2020.19.38 ◽

2021 ◽

Vol 19 ◽

pp. 310-320

Author(s):

Suboh Alkhushayni ◽

Taeyoung Choi ◽

Du’a Alzaleq

Keyword(s):

Data Analysis ◽

Random Forest ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Categorical Variables ◽

Common Disease ◽

Agglomerative Hierarchical Clustering ◽

Data Set

This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.

Download Full-text

Proposed methodology and tools of econophysics to analyze mesodynamics of economic sectors in Russian regions

Economic Analysis Theory and Practice ◽

10.24891/ea.19.7.1192 ◽

2020 ◽

Vol 19 (7) ◽

pp. 1192-1217

Author(s):

V.K. Semenychev ◽

G.A. Khmeleva ◽

A.A. Korobetskaya

Keyword(s):

A Priori ◽

Russian Regions ◽

Medium Term ◽

Distribution Law ◽

Economic Sectors ◽

Component Distribution ◽

R Programming Language ◽

Temporal Features ◽

Bootstrap Approach ◽

R Programming

Subject. The article analyzes the mesodynamics of economic sectors in Russian regions. Objectives. The aim of the study is to undertake quantitative and qualitative monitoring of the components' evolution in twelve basic economic sectors from 2005 to 2017, to describe the regions' homogeneity, stability and balance, and their prospects for investment. Methods. We employ methods of identification, lowess smoothing, and the bootstrap approach to identify models of mesodynamics on 30–50-value samples, without a priori knowledge about the stochastic component distribution law. Results. The findings confirmed the possibility and adequacy of the econophysics paradigm for mesodynamic analysis. We justified and tested methods for general mesodynamic models' identification, using the R programming language. Conclusions. The offered methodology and tools will enable to develop a knowledge base for fundamental laws of mesodynamics. They will also help reveal spatial and temporal features of cyclical development of Russian regions; assess the level of balance and sustainability of regional development in the medium term period; group the regions by their cycle length and define the most promising ones for investment.

Download Full-text

A SEQUENCE-ELEMENT-BASED HIERARCHICAL CLUSTERING ALGORITHM FOR CATEGORICAL SEQUENCE DATA

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622005001398 ◽

2005 ◽

Vol 04 (01) ◽

pp. 81-96 ◽

Cited By ~ 5

Author(s):

SEUNG-JOON OH ◽

JAE-YEARN KIM

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Sequence Data ◽

Clustering Algorithms ◽

Scientific Data ◽

Sequence Element ◽

Hierarchical Clustering Algorithm ◽

Synthetic Datasets ◽

Better Than

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, few existing clustering algorithms consider sequentiality. In this paper, we study how to cluster these sequence datasets. We propose a new similarity measure to compute the similarity between two sequences. In the proposed measure, subsets of a sequence are considered, and the more identical subsets there are, the more similar the two sequences. In addition, we propose a hierarchical clustering algorithm and an efficient method for measuring similarity. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.

Download Full-text

A Novel Local Density Hierarchical Clustering Algorithm Based on Reverse Nearest Neighbors

Mathematical Problems in Engineering ◽

10.1155/2019/2959017 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Yaohui Liu ◽

Dong Liu ◽

Fang Yu ◽

Zhengming Ma

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Clustering Algorithms ◽

Real Data ◽

Nearest Neighbors ◽

Clustering Methods ◽

Density Peak ◽

Hierarchical Clustering Algorithm

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.

Download Full-text

Comparison of Fuzzy Clustering Methods and Their Applications to Geophysics Data

Applied Computational Intelligence and Soft Computing ◽

10.1155/2009/876361 ◽

2009 ◽

Vol 2009 ◽

pp. 1-16 ◽

Cited By ~ 4

Author(s):

David J. Miller ◽

Carl A. Nelson ◽

Molly Boeka Cannon ◽

Kenneth P. Cannon

Keyword(s):

Fuzzy Clustering ◽

Real World ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Optimum Number ◽

Clustering Methods ◽

Real World Data ◽

Data Set ◽

World Data

Fuzzy clustering algorithms are helpful when there exists a dataset with subgroupings of points having indistinct boundaries and overlap between the clusters. Traditional methods have been extensively studied and used on real-world data, but require users to have some knowledge of the outcome a priori in order to determine how many clusters to look for. Additionally, iterative algorithms choose the optimal number of clusters based on one of several performance measures. In this study, the authors compare the performance of three algorithms (fuzzy c-means, Gustafson-Kessel, and an iterative version of Gustafson-Kessel) when clustering a traditional data set as well as real-world geophysics data that were collected from an archaeological site in Wyoming. Areas of interest in the were identified using a crisp cutoff value as well as a fuzzyα-cut to determine which provided better elimination of noise and non-relevant points. Results indicate that theα-cut method eliminates more noise than the crisp cutoff values and that the iterative version of the fuzzy clustering algorithm is able to select an optimum number of subclusters within a point set (in both the traditional and real-world data), leading to proper indication of regions of interest for further expert analysis

Download Full-text

Overview of R and RStudio

10.1007/978-3-030-80519-7_2 ◽

2021 ◽

pp. 31-47

Author(s):

Joseph F. Hair ◽

G. Tomas M. Hult ◽

Christian M. Ringle ◽

Marko Sarstedt ◽

Nicholas P. Danks ◽

...

Keyword(s):

Programming Language ◽

Graphical User Interface ◽

Computational Statistics ◽

Development Environment ◽

Integrated Development ◽

Vast Array ◽

Popular Method ◽

R Language ◽

R Programming Language ◽

R Programming

AbstractComputational statistics is now an increasingly popular method of analysis for researchers that combines a vast array of algorithms, statistical methods, and the power of functional coding. The R programming language, in particular, has benefitted from this development alongside of traditional graphical user interface (GUI) software. Today, it has become the language of choice for empirical researchers. In this chapter, we introduce the R programming language as well as its popular development environment in the form of RStudio. We walk the reader through downloading both the R language and the RStudio integrated development environment (IDE). Then, we discuss the software layout and demonstrate how to interact with the software. Finally, we address creating and managing R projects and scripts, gaining access to documentation and help via various sources. This chapter is not intended as a tutorial on the writing of code in the R programming language. We do, however, provide useful open-source resources for learning R, which can be accessed from the R console RStudio environment.

Download Full-text