Research on network security defence based on big data clustering algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2019040102 ◽

2019 ◽

Vol 9 (2) ◽

pp. 16-30 ◽

Cited By ~ 1

Author(s):

Hind Bangui ◽

Mouzhi Ge ◽

Barbora Buhnova

Keyword(s):

Big Data ◽

Internet Of Things ◽

Mobile Networks ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Future Internet ◽

Research Challenges ◽

Initial Stage ◽

Big Data Technologies

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this article starts by reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this article identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.

Download Full-text

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

BioMed Research International ◽

10.1155/2019/6750296 ◽

2019 ◽

Vol 2019 ◽

pp. 1-20 ◽

Cited By ~ 1

Author(s):

Ameera M. Almasoud ◽

Hend S. Al-Khalifa ◽

Abdulmalik S. Al-Salman

Keyword(s):

Big Data ◽

Semantic Similarity ◽

Data Clustering ◽

Input Data ◽

Distributed Processing ◽

Clustering Algorithms ◽

Similarity Measures ◽

Parallel And Distributed Processing ◽

Time Reduction ◽

Improved Performance

In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.

Download Full-text

A Brief Account of Iterative Big Data Clustering Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i10.292301 ◽

2017 ◽

Vol 5 (10) ◽

pp. 292-301

Author(s):

M. Shankar Lingam ◽

◽

A. M. Sudhakara

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms

Download Full-text

The Modeling and Simulation of Data Clustering Algorithms in Data Mining with Big Data

Journal of Industrial Integration and Management ◽

10.1142/s2424862218500173 ◽

2019 ◽

Vol 04 (01) ◽

pp. 1850017 ◽

Cited By ~ 3

Author(s):

Weiru Chen ◽

Jared Oliverio ◽

Jin Ho Kim ◽

Jiayue Shen

Keyword(s):

Data Mining ◽

Big Data ◽

Data Reduction ◽

Data Clustering ◽

Clustering Algorithms ◽

High Volume ◽

Clustering Methods ◽

Data Set ◽

Processing Methods ◽

Integration Data

Big Data is a popular cutting-edge technology nowadays. Techniques and algorithms are expanding in different areas including engineering, biomedical, and business. Due to the high-volume and complexity of Big Data, it is necessary to conduct data pre-processing methods when data mining. The pre-processing methods include data cleaning, data integration, data reduction, and data transformation. Data clustering is the most important step of data reduction. With data clustering, mining on the reduced data set should be more efficient yet produce quality analytical results. This paper presents the different data clustering methods and related algorithms for data mining with Big Data. Data clustering can increase the efficiency and accuracy of data mining.

Download Full-text

Analysis of Mahout Big Data Clustering Algorithms

Advances in Intelligent Systems and Computing - Intelligent Communication, Control and Devices ◽

10.1007/978-981-10-5903-2_105 ◽

2018 ◽

pp. 999-1008

Author(s):

Ishan Sharma ◽

Rajeev Tiwari ◽

Hukam Singh Rana ◽

Abhineet Anand

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms

Download Full-text

A Quantitative Analysis of Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry

2020 IEEE International Conference on Consumer Electronics (ICCE) ◽

10.1109/icce46568.2020.9043023 ◽

2020 ◽

Author(s):

Avishek Bose ◽

Arslan Munir ◽

Neda Shabani

Keyword(s):

Big Data ◽

Quantitative Analysis ◽

Market Segmentation ◽

Data Clustering ◽

Hospitality Industry ◽

Clustering Algorithms

Download Full-text

Iterative big data clustering algorithms: a review

Software Practice and Experience ◽

10.1002/spe.2341 ◽

2015 ◽

Vol 46 (1) ◽

pp. 107-129 ◽

Cited By ~ 31

Author(s):

Amin Mohebi ◽

Saeed Aghabozorgi ◽

Teh Ying Wah ◽

Tutut Herawan ◽

Ramin Yahyapour

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms

Download Full-text

An Introduction to Clustering Algorithms in Big Data

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch040 ◽

2021 ◽

pp. 559-576

Author(s):

Rajit Nair ◽

Amit Bhagat

Keyword(s):

Big Data ◽

Single Machine ◽

Data Clustering ◽

Clustering Algorithms ◽

Time Limit ◽

Computation Cost ◽

Different Types ◽

Clustering Approach ◽

Future Path ◽

Parallel Clustering

In big data, clustering is the process through which analysis is performed. Since the data is big, it is very difficult to perform clustering approach. Big data is mainly termed as petabytes and zeta bytes of data and high computation cost is needed for the implementation of clusters. In this chapter, the authors show how clustering can be performed on big data and what are the different types of clustering approach. The challenge during clustering approach is to find observations within the time limit. The chapter also covers the possible future path for more advanced clustering algorithms. The chapter will cover single machine clustering and multiple machines clustering, which also includes parallel clustering.

Download Full-text