A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

Hind Bangui; Mouzhi Ge; Barbora Buhnova

doi:10.4018/ijoci.2019040102

A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2019040102 ◽

2019 ◽

Vol 9 (2) ◽

pp. 16-30 ◽

Cited By ~ 1

Author(s):

Hind Bangui ◽

Mouzhi Ge ◽

Barbora Buhnova

Keyword(s):

Big Data ◽

Internet Of Things ◽

Mobile Networks ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Future Internet ◽

Research Challenges ◽

Initial Stage ◽

Big Data Technologies

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this article starts by reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this article identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.

Download Full-text

Big Data Clustering Analysis Algorithm for Internet of Things Based on K-Means

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2019010101 ◽

2019 ◽

Vol 10 (1) ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Zhanqiu Yu

Keyword(s):

Big Data ◽

Internet Of Things ◽

Clustering Analysis ◽

Data Clustering ◽

Clustering Algorithm ◽

Prototype System ◽

Point Selection ◽

Logistics System ◽

Relational Schema ◽

Analysis Algorithm

To explore the Internet of things logistics system application, an Internet of things big data clustering analysis algorithm based on K-mans was discussed. First of all, according to the complex event relation and processing technology, the big data processing of Internet of things was transformed into the extraction and analysis of complex relational schema, so as to provide support for simplifying the processing complexity of big data in Internet of things (IOT). The traditional K-means algorithm was optimized and improved to make it fit the demand of big data RFID data network. Based on Hadoop cloud cluster platform, a K-means cluster analysis was achieved. In addition, based on the traditional clustering algorithm, a center point selection technology suitable for RFID IOT data clustering was selected. The results showed that the clustering efficiency was improved to some extent. As a result, an RFID Internet of things clustering analysis prototype system is designed and realized, which further tests the feasibility.

Download Full-text

Exploring Big Data Clustering Algorithms for Internet of Things Applications

Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security ◽

10.5220/0006773402690276 ◽

2018 ◽

Cited By ~ 4

Author(s):

Hind Bangui ◽

Mouzhi Ge ◽

Barbora Buhnova

Keyword(s):

Big Data ◽

Internet Of Things ◽

Data Clustering ◽

Clustering Algorithms

Download Full-text

Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0117 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1496-1513 ◽

Cited By ~ 1

Author(s):

Omkaresh Kulkarni ◽

Sudarson Jena ◽

C. H. Sanjay

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Mapreduce Framework ◽

Swarm Optimization ◽

Skin Segmentation ◽

Kernel Clustering ◽

Whale Optimization ◽

Clustering Approach

Abstract The recent advancements in information technology and the web tend to increase the volume of data used in day-to-day life. The result is a big data era, which has become a key issue in research due to the complexity in the analysis of big data. This paper presents a technique called FPWhale-MRF for big data clustering using the MapReduce framework (MRF), by proposing two clustering algorithms. In FPWhale-MRF, the mapper function estimates the cluster centroids using the Fractional Tangential-Spherical Kernel clustering algorithm, which is developed by integrating the fractional theory into a Tangential-Spherical Kernel clustering approach. The reducer combines the mapper outputs to find the optimal centroids using the proposed Particle-Whale (P-Whale) algorithm, for the clustering. The P-Whale algorithm is proposed by combining Whale Optimization Algorithm with Particle Swarm Optimization, for effective clustering such that its performance is improved. Two datasets, namely localization and skin segmentation datasets, are used for the experimentation and the performance is evaluated regarding two performance evaluation metrics: clustering accuracy and DB-index. The maximum accuracy attained by the proposed FPWhale-MRF technique is 87.91% and 90% for the localization and skin segmentation datasets, respectively, thus proving its effectiveness in big data clustering.

Download Full-text

Big Data in Internet of Things: Architecture and Open Research Challenges

2020 IEEE 23rd International Multitopic Conference (INMIC) ◽

10.1109/inmic50486.2020.9318203 ◽

2020 ◽

Author(s):

Iram Haider ◽

Muhammad Arslan Haider ◽

Arshad Saeed

Keyword(s):

Big Data ◽

Internet Of Things ◽

Research Challenges ◽

Open Research

Download Full-text

Automated management of maritime container terminals using internet of things and big data technologies

Proceedings of the 4th International Conference on Smart City Applications - SCA '19 ◽

10.1145/3368756.3369046 ◽

2019 ◽

Author(s):

Farah Al Kaderi ◽

Rim Koulali ◽

Mohamed Rida

Keyword(s):

Big Data ◽

Internet Of Things ◽

Container Terminals ◽

Big Data Technologies ◽

Automated Management

Download Full-text

Big Data and Internet of Things for Analysing and Designing Systems Based on Hyperspectral Images

Exploring the Convergence of Big Data and the Internet of Things - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2947-7.ch017 ◽

2018 ◽

pp. 240-260

Author(s):

Peyakunta Bhargavi ◽

Singaraju Jyothi

Keyword(s):

Big Data ◽

Internet Of Things ◽

Future Internet ◽

Urban Environments ◽

Hyperspectral Data ◽

Hyperspectral Images ◽

The Internet ◽

Sensors And Actuators ◽

Fully Integrated ◽

Data Volume

The recent development of sensors remote sensing is an important source of information for mapping and natural and man-made land covers. The increasing amounts of available hyperspectral data originates from AVIRIS, HyMap, and Hyperion for a wide range of applications in the data volume, velocity, and variety of data contributed to the term big data. Sensing is enabled by Wireless Sensor Network (WSN) technologies to infer and understand environmental indicators, from delicate ecologies and natural resources to urban environments. The communication network creates the Internet of Things (IoT) where sensors and actuators blend with the environment around us, and the information is shared across platforms in order to develop a common operating picture (COP). With RFID tags, embedded sensor and actuator nodes, the next revolutionary technology developed transforming the Internet into a fully integrated Future Internet. This chapter describes the use of Big Data and Internet of the Things for analyzing and designing various systems based on hyperspectral images.

Download Full-text

Uncertainty-Based Clustering Algorithms for Large Data Sets

Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2805-0.ch001 ◽

2018 ◽

pp. 1-33 ◽

Cited By ~ 1

Author(s):

B. K. Tripathy ◽

Hari Seetha ◽

M. N. Murty

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Mining Machine ◽

Data Sets ◽

Fuzzy C Means ◽

Intuitionistic Fuzzy ◽

New Algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

A Wavelet Analysis-Based Big Data Spectral Clustering Algorithm for Electric Internet of Things

Journal of Physics Conference Series ◽

10.1088/1742-6596/1627/1/012007 ◽

2020 ◽

Vol 1627 ◽

pp. 012007

Author(s):

Hao Zhang ◽

Xin Liu ◽

Donglan Liu ◽

Hao Yu

Keyword(s):

Big Data ◽

Internet Of Things ◽

Wavelet Analysis ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Spectral Clustering Algorithm

Download Full-text

The fast clustering algorithm for the big data based on K-means

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500538 ◽

2020 ◽

Vol 18 (06) ◽

pp. 2050053

Author(s):

Ting Xie ◽

Taiping Zhang

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Feature Space ◽

Data Sets ◽

Benchmark Data ◽

Clustering Model ◽

Alternating Direction ◽

Learning Technique ◽

Noise Data

As a powerful unsupervised learning technique, clustering is the fundamental task of big data analysis. However, many traditional clustering algorithms for big data that is a collection of high dimension, sparse and noise data do not perform well both in terms of computational efficiency and clustering accuracy. To alleviate these problems, this paper presents Feature K-means clustering model on the feature space of big data and introduces its fast algorithm based on Alternating Direction Multiplier Method (ADMM). We show the equivalence of the Feature K-means model in the original space and the feature space and prove the convergence of its iterative algorithm. Computationally, we compare the Feature K-means with Spherical K-means and Kernel K-means on several benchmark data sets, including artificial data and four face databases. Experiments show that the proposed approach is comparable to the state-of-the-art algorithm in big data clustering.

Download Full-text

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

BioMed Research International ◽

10.1155/2019/6750296 ◽

2019 ◽

Vol 2019 ◽

pp. 1-20 ◽

Cited By ~ 1

Author(s):

Ameera M. Almasoud ◽

Hend S. Al-Khalifa ◽

Abdulmalik S. Al-Salman

Keyword(s):

Big Data ◽

Semantic Similarity ◽

Data Clustering ◽

Input Data ◽

Distributed Processing ◽

Clustering Algorithms ◽

Similarity Measures ◽

Parallel And Distributed Processing ◽

Time Reduction ◽

Improved Performance

In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.

Download Full-text