scholarly journals Practical Application Using the Clustering Algorithm

2021 ◽  
Author(s):  
Yoosoo Oh ◽  
Seonghee Min

This chapter will survey the clustering algorithm that is unsupervised learning among data mining and machine learning techniques. The most popular clustering algorithm is the K-means clustering algorithm; It can represent a cluster of data. The K-means clustering algorithm is an essential factor in finding an appropriate K value for distributing the training dataset. It is common to find this value experimentally. Also, it can use the elbow method, which is a heuristic approach used in determining the number of clusters. One of the present clusterings applied studies is the particulate matter concentration clustering algorithm for particulate matter distribution estimation. This algorithm divides the area of the center that the fine dust distribution using K-means clustering. It then finds the coordinates of the optimal point according to the distribution of the particulate matter values. The training dataset is the latitude, longitude of the observatory, and PM10 value obtained from the AirKorea website provided by the Korea Environment Corporation. This study performed the K-means clustering algorithm to cluster feature datasets. Furthermore, it showed an experiment on the K values to represent the cluster better. It performed clustering by changing K values from 10 to 23. Then it generated 16 labels divided into 16 cities in Korea and compared them to the clustering result. Visualizing them on the actual map confirmed whether the clusters of each city were evenly bound. Moreover, it figures out the cluster center to find the observatory location representing particulate matter distribution.

2019 ◽  
Vol 9 (19) ◽  
pp. 4036 ◽  
Author(s):  
You ◽  
Wu ◽  
Lee ◽  
Liu

Multi-class classification is a very important technique in engineering applications, e.g., mechanical systems, mechanics and design innovations, applied materials in nanotechnologies, etc. A large amount of research is done for single-label classification where objects are associated with a single category. However, in many application domains, an object can belong to two or more categories, and multi-label classification is needed. Traditionally, statistical methods were used; recently, machine learning techniques, in particular neural networks, have been proposed to solve the multi-class classification problem. In this paper, we develop radial basis function (RBF)-based neural network schemes for single-label and multi-label classification, respectively. The number of hidden nodes and the parameters involved with the basis functions are determined automatically by applying an iterative self-constructing clustering algorithm to the given training dataset, and biases and weights are derived optimally by least squares. Dimensionality reduction techniques are adopted and integrated to help reduce the overfitting problem associated with the RBF networks. Experimental results from benchmark datasets are presented to show the effectiveness of the proposed schemes.


2015 ◽  
Vol 713-715 ◽  
pp. 2499-2502
Author(s):  
Jiang Kun Mao ◽  
Fan Zhan

Intrusion detection system as a proactive network security technology, is necessary and reasonable to add a static defense. However, the traditional exceptions and errors detecting exist issues of leakage police, the false alarm rate or maintenance difficult. In this paper, The intrusion detection system based on data mining with statistics, machine learning techniques in the detection performance, robustness, self-adaptability has a great advantage. The system improves the K-means clustering algorithm, focus on solving two questions of the cluster center node selection and discriminating of clustering properties, the test shows that the system further enhance the detection efficiency of the system.


Author(s):  
Jong-Yong Lee ◽  
Daesung Lee

Since it is very difficult to replace or recharge the batteries of the sensor nodes in the wireless sensor network (WSN), efficient use of the batteries of the sensor nodes is a very important issue. This has a deep relationship with the lifetime of the network. If the node's energy is exhausted, the node is no longer available. If a certain number of nodes (50% or 80%) in a network consumes energy completely, the whole network will not work. Therefore, various protocols have been proposed to maintain the network for a long time by minimizing energy consumption. In recent years, a protocol using a K-means clustering algorithm, one of machine learning techniques, has been proposed. A KCED protocol is proposed in consideration of residual energy of a node, a cluster center, and a distance to a base station in order to improve a problem of a protocol using K-average gung zipper algorithm such as cluster center consideration.


2021 ◽  
Vol 251 ◽  
pp. 03013
Author(s):  
Leonardo Cristella ◽  

To sustain the harsher conditions of the high-luminosity LHC, the CMS collaboration is designing a novel endcap calorimeter system. The new calorimeter will predominantly use silicon sensors to achieve sufficient radiation tolerance and will maintain highly-granular information in the readout to help mitigate the effects of pileup. In regions characterised by lower radiation levels, small scintillator tiles with individual on-tile SiPM readout are employed. A unique reconstruction framework (TICL: The Iterative CLustering) is being developed to fully exploit the granularity and other significant detector features, such as particle identification and precision timing, with a view to mitigate pileup in the very dense environment of HL-LHC. The inputs to the framework are clusters of energy deposited in individual calorimeter layers. Clusters are formed by a density-based algorithm. Recent developments and tunes of the clustering algorithm will be presented. To help reduce the expected pressure on the computing resources in the HL-LHC era, the algorithms and their data structures are designed to be executed on GPUs. Preliminary results will be presented on decreases in clustering time when using GPUs versus CPUs. Ideas for machine-learning techniques to further improve the speed and accuracy of reconstruction algorithms will be presented.


2003 ◽  
Vol 23 (17-19) ◽  
pp. 1787-1809 ◽  
Author(s):  
A.P. Karageorgis ◽  
H.G. Kaberi ◽  
A. Tengberg ◽  
V. Zervakis ◽  
P.O.J. Hall ◽  
...  

2020 ◽  
Vol 9 (6) ◽  
pp. 379 ◽  
Author(s):  
Eleonora Grilli ◽  
Fabio Remondino

The use of machine learning techniques for point cloud classification has been investigated extensively in the last decade in the geospatial community, while in the cultural heritage field it has only recently started to be explored. The high complexity and heterogeneity of 3D heritage data, the diversity of the possible scenarios, and the different classification purposes that each case study might present, makes it difficult to realise a large training dataset for learning purposes. An important practical issue that has not been explored yet, is the application of a single machine learning model across large and different architectural datasets. This paper tackles this issue presenting a methodology able to successfully generalise to unseen scenarios a random forest model trained on a specific dataset. This is achieved looking for the best features suitable to identify the classes of interest (e.g., wall, windows, roof and columns).


Sign in / Sign up

Export Citation Format

Share Document