Classification and Clustering Algorithms of Machine Learning with their Applications

Author(s):  
Ravinder Ahuja ◽  
Aakarsha Chug ◽  
Shaurya Gupta ◽  
Pratyush Ahuja ◽  
Shruti Kohli
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
José Castela Forte ◽  
Galiya Yeshmagambetova ◽  
Maureen L. van der Grinten ◽  
Bart Hiemstra ◽  
Thomas Kaufmann ◽  
...  

AbstractCritically ill patients constitute a highly heterogeneous population, with seemingly distinct patients having similar outcomes, and patients with the same admission diagnosis having opposite clinical trajectories. We aimed to develop a machine learning methodology that identifies and provides better characterization of patient clusters at high risk of mortality and kidney injury. We analysed prospectively collected data including co-morbidities, clinical examination, and laboratory parameters from a minimally-selected population of 743 patients admitted to the ICU of a Dutch hospital between 2015 and 2017. We compared four clustering methodologies and trained a classifier to predict and validate cluster membership. The contribution of different variables to the predicted cluster membership was assessed using SHapley Additive exPlanations values. We found that deep embedded clustering yielded better results compared to the traditional clustering algorithms. The best cluster configuration was achieved for 6 clusters. All clusters were clinically recognizable, and differed in in-ICU, 30-day, and 90-day mortality, as well as incidence of acute kidney injury. We identified two high mortality risk clusters with at least 60%, 40%, and 30% increased. ICU, 30-day and 90-day mortality, and a low risk cluster with 25–56% lower mortality risk. This machine learning methodology combining deep embedded clustering and variable importance analysis, which we made publicly available, is a possible solution to challenges previously encountered by clustering analyses in heterogeneous patient populations and may help improve the characterization of risk groups in critical care.


2020 ◽  
Vol 5 (6) ◽  
pp. 651-658 ◽  
Author(s):  
Mirpouya Mirmozaffari ◽  
Azam Boskabadi ◽  
Gohar Azeem ◽  
Reza Massah ◽  
Elahe Boskabadi ◽  
...  

Machine learning grows quickly, which has made numerous academic discoveries and is extensively evaluated in several areas. Optimization, as a vital part of machine learning, has fascinated much consideration of practitioners. The primary purpose of this paper is to combine optimization and machine learning to extract hidden rules, remove unrelated data, introduce the most productive Decision-Making Units (DMUs) in the optimization part, and to introduce the algorithm with the highest accuracy in Machine learning part. In the optimization part, we evaluate the productivity of 30 banks from eight developing countries over the period 2015-2019 by utilizing Data Envelopment Analysis (DEA). An additive Data Envelopment Analysis (DEA) model for measuring the efficiency of decision processes is used. The additive models are often named Slack Based Measure (SBM). This group of models measures efficiency via slack variables. After applying the proposed model, the Malmquist Productivity Index (MPI) is computed to evaluate the productivity of companies. In the machine learning part, we use a specific two-layer data mining filtering pre-processes for clustering algorithms to increase the efficiency and to find the superior algorithm. This study tackles data and methodology-related issues in measuring the productivity of the banks in developing countries and highlights the significance of DMUs productivity and algorithms accuracy in the banking industry by comparing suggested models.


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


Clustering mixed and incomplete data is a goal of frequent approaches in the last years because its common apparition in soft sciences problems. However, there is a lack of studies evaluating the performance of clustering algorithms for such kind of data. In this paper we present an experimental study about performance of seven clustering algorithms which used one of these techniques: partition, hierarchal or metaheuristic. All the methods ran over 15 databases from UCI Machine Learning Repository, having mixed and incomplete data descriptions. In external cluster validation using the indices Entropy and V-Measure, the algorithms that use the last technique showed the best results. Thus, we recommend metaheuristic based clustering algorithms for clustering data having mixed and incomplete descriptions.


2021 ◽  
Author(s):  
Natacha Galmiche ◽  
Nello Blaser ◽  
Morten Brun ◽  
Helwig Hauser ◽  
Thomas Spengler ◽  
...  

<p>Probability distributions based on ensemble forecasts are commonly used to assess uncertainty in weather prediction. However, interpreting these distributions is not trivial, especially in the case of multimodality with distinct likely outcomes. The conventional summary employs mean and standard deviation across ensemble members, which works well for unimodal, Gaussian-like distributions. In the case of multimodality this misleads, discarding crucial information. </p><p>We aim at combining previously developed clustering algorithms in machine learning and topological data analysis to extract useful information such as the number of clusters in an ensemble. Given the chaotic behaviour of the atmosphere, machine learning techniques can provide relevant results even if no, or very little, a priori information about the data is available. In addition, topological methods that analyse the shape of the data can make results explainable.</p><p>Given an ensemble of univariate time series, a graph is generated whose edges and vertices represent clusters of members, including additional information for each cluster such as the members belonging to them, their uncertainty, and their relevance according to the graph. In the case of multimodality, this approach provides relevant and quantitative information beyond the commonly used mean and standard deviation approach that helps to further characterise the predictability.</p>


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Qingfeng Zhou ◽  
Chun Janice Wong ◽  
Xian Su

Since the number of bicycles is critical to the sustainable development of dockless PBS, this research practiced the introduction of a machine learning approach to quantity management using OFO bike operation data in Shenzhen. First, two clustering algorithms were used to identify the bicycle gathering area, and the available bike number and coefficient of available bike number variation were analyzed in each bicycle gathering area’s type. Second, five classification algorithms were compared in the accuracy of distinguishing the type of bicycle gathering areas using 25 impact factors. Finally, the application of the knowledge gained from the existing dockless bicycle operation data to guide the number planning and management of public bicycles was explored. We found the following. (1) There were 492 OFO bicycle gathering areas that can be divided into four types: high inefficient, normal inefficient, high efficient, and normal efficient. The high inefficient and normal inefficient areas gathered about 110,000 bicycles with low usage. (2) More types of bicycle gathering area will affect the accuracy of the classification algorithm. The random forest classification had the best performance in identifying bicycle gathering area types in five classification algorithms with an accuracy of more than 75%. (3) There were obvious differences in the characteristics of 25 impact factors in four types of bicycle gathering areas. It is feasible to use these factors to predict area type to optimize the number of available bicycles, reduce operating costs, and improve utilization efficiency. This work helps operators and government understand the characteristics of dockless PBS and contributes to promoting long-term sustainable development of the system through a machine learning approach.


2019 ◽  
Vol 488 (1) ◽  
pp. 1377-1386 ◽  
Author(s):  
V Carruba ◽  
S Aljbaae ◽  
A Lucchini

ABSTRACT Asteroid families are groups of asteroids that share a common origin. They can be the outcome of a collision or be the result of the rotational failure of a parent body or its satellites. Collisional asteroid families have been identified for several decades using hierarchical clustering methods (HCMs) in proper elements domains. In this method, the distance of an asteroid from a reference body is computed, and, if it is less than a critical value, the asteroid is added to the family list. The process is then repeated with the new object as a reference, until no new family members are found. Recently, new machine-learning clustering algorithms have been introduced for the purpose of cluster classification. Here, we apply supervised-learning hierarchical clustering algorithms for the purpose of asteroid families identification. The accuracy, precision, and recall values of results obtained with the new method, when compared with classical HCM, show that this approach is able to found family members with an accuracy above 89.5 per cent, and that all asteroid previously identified as family members by traditional methods are consistently retrieved. Values of the areas under the curve coefficients below Receiver Operating Characteristic curves are also optimal, with values consistently above 85 per cent. Overall, we identify 6 new families and 13 new clumps in regions where the method can be applied that appear to be consistent and homogeneous in terms of physical and taxonomic properties. Machine-learning clustering algorithms can, therefore, be very efficient and fast tools for the problem of asteroid family identification.


2012 ◽  
Vol 2 (1) ◽  
pp. 11-20 ◽  
Author(s):  
Ritu Vijay ◽  
Prerna Mahajan ◽  
Rekha Kandwal

Cluster analysis has been extensively used in machine learning and data mining to discover distribution patterns in the data. Clustering algorithms are generally based on a distance metric in order to partition the data into small groups such that data instances in the same group are more similar than the instances belonging to different groups. In this paper the authors have extended the concept of hamming distance for categorical data .As a data processing step they have transformed the data into binary representation. The authors have used proposed algorithm to group data points into clusters. The experiments are carried out on the data sets from UCI machine learning repository to analyze the performance study. They conclude by stating that this proposed algorithm shows promising result and can be extended to handle numeric as well as mixed data.


Sign in / Sign up

Export Citation Format

Share Document