A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

Author(s):  
Hind Bangui ◽  
Mouzhi Ge ◽  
Barbora Buhnova

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this article starts by reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this article identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.

Author(s):  
Zhanqiu Yu

To explore the Internet of things logistics system application, an Internet of things big data clustering analysis algorithm based on K-mans was discussed. First of all, according to the complex event relation and processing technology, the big data processing of Internet of things was transformed into the extraction and analysis of complex relational schema, so as to provide support for simplifying the processing complexity of big data in Internet of things (IOT). The traditional K-means algorithm was optimized and improved to make it fit the demand of big data RFID data network. Based on Hadoop cloud cluster platform, a K-means cluster analysis was achieved. In addition, based on the traditional clustering algorithm, a center point selection technology suitable for RFID IOT data clustering was selected. The results showed that the clustering efficiency was improved to some extent. As a result, an RFID Internet of things clustering analysis prototype system is designed and realized, which further tests the feasibility.


2019 ◽  
Vol 29 (1) ◽  
pp. 1496-1513 ◽  
Author(s):  
Omkaresh Kulkarni ◽  
Sudarson Jena ◽  
C. H. Sanjay

Abstract The recent advancements in information technology and the web tend to increase the volume of data used in day-to-day life. The result is a big data era, which has become a key issue in research due to the complexity in the analysis of big data. This paper presents a technique called FPWhale-MRF for big data clustering using the MapReduce framework (MRF), by proposing two clustering algorithms. In FPWhale-MRF, the mapper function estimates the cluster centroids using the Fractional Tangential-Spherical Kernel clustering algorithm, which is developed by integrating the fractional theory into a Tangential-Spherical Kernel clustering approach. The reducer combines the mapper outputs to find the optimal centroids using the proposed Particle-Whale (P-Whale) algorithm, for the clustering. The P-Whale algorithm is proposed by combining Whale Optimization Algorithm with Particle Swarm Optimization, for effective clustering such that its performance is improved. Two datasets, namely localization and skin segmentation datasets, are used for the experimentation and the performance is evaluated regarding two performance evaluation metrics: clustering accuracy and DB-index. The maximum accuracy attained by the proposed FPWhale-MRF technique is 87.91% and 90% for the localization and skin segmentation datasets, respectively, thus proving its effectiveness in big data clustering.


Author(s):  
Peyakunta Bhargavi ◽  
Singaraju Jyothi

The recent development of sensors remote sensing is an important source of information for mapping and natural and man-made land covers. The increasing amounts of available hyperspectral data originates from AVIRIS, HyMap, and Hyperion for a wide range of applications in the data volume, velocity, and variety of data contributed to the term big data. Sensing is enabled by Wireless Sensor Network (WSN) technologies to infer and understand environmental indicators, from delicate ecologies and natural resources to urban environments. The communication network creates the Internet of Things (IoT) where sensors and actuators blend with the environment around us, and the information is shared across platforms in order to develop a common operating picture (COP). With RFID tags, embedded sensor and actuator nodes, the next revolutionary technology developed transforming the Internet into a fully integrated Future Internet. This chapter describes the use of Big Data and Internet of the Things for analyzing and designing various systems based on hyperspectral images.


Author(s):  
B. K. Tripathy ◽  
Hari Seetha ◽  
M. N. Murty

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.


Author(s):  
Ting Xie ◽  
Taiping Zhang

As a powerful unsupervised learning technique, clustering is the fundamental task of big data analysis. However, many traditional clustering algorithms for big data that is a collection of high dimension, sparse and noise data do not perform well both in terms of computational efficiency and clustering accuracy. To alleviate these problems, this paper presents Feature K-means clustering model on the feature space of big data and introduces its fast algorithm based on Alternating Direction Multiplier Method (ADMM). We show the equivalence of the Feature K-means model in the original space and the feature space and prove the convergence of its iterative algorithm. Computationally, we compare the Feature K-means with Spherical K-means and Kernel K-means on several benchmark data sets, including artificial data and four face databases. Experiments show that the proposed approach is comparable to the state-of-the-art algorithm in big data clustering.


2019 ◽  
Vol 2019 ◽  
pp. 1-20 ◽  
Author(s):  
Ameera M. Almasoud ◽  
Hend S. Al-Khalifa ◽  
Abdulmalik S. Al-Salman

In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.


Sign in / Sign up

Export Citation Format

Share Document