Research on Clustering Algorithm of Heterogeneous Network Privacy Big Data Set Based on Cloud Computing

Partially missing or blurring attribute values make data become incomplete during collecting data. Generally we use inputation or discarding method to deal with incomplete data before clustering. In this paper we proposed an a new similarity metrics algorithm based on incomplete information system. First algorithm divided the data set into a complete data set and non complete data set, and then the complete data set was clustered using the affinity propagation clustering algorithm, incomplete data according to the design method of the similarity metric is divided into the corresponding cluster. In order to improve the efficiency of the algorithm, designing the distributed clustering algorithm based on cloud computing technology. Experiment demonstrates the proposed algorithm can cluster the incomplete big data directly and improve the accuracy and effectively.

Download Full-text

An Optimal Data Placement Strategy for Improving System Performance of Massive Data Applications Using Graph Clustering

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2018070102 ◽

2018 ◽

Vol 9 (3) ◽

pp. 15-30 ◽

Cited By ~ 4

Author(s):

S. Vengadeswaran ◽

S. R. Balasundaram

Keyword(s):

Big Data ◽

Execution Time ◽

Clustering Algorithm ◽

Graph Clustering ◽

Data Placement ◽

Data Locality ◽

Query Execution ◽

Data Set ◽

Statistical Measures ◽

Default Data

This article describes how the time taken to execute a query and return the results, increase exponentially as the data size increases, leading to more waiting times of the user. Hadoop with its distributed processing capability is considered as an efficient solution for processing such large data. Hadoop's Default Data Placement Strategy (HDDPS) allocates the data blocks randomly across the cluster of nodes without considering any of the execution parameters. This result in non-availability of the blocks required for execution in local machine so that the data has to be transferred across the network for execution, leading to data locality issue. Also, it is commonly observed that most of the data intensive applications show grouping semantics. Hence during query execution, only a part of the Big-Data set is utilized. Since such execution parameters and grouping behavior are not considered, the default placement does not perform well resulting in several lacunas such as decreased local map task execution, increased query execution time, query latency, etc. In order to overcome such issues, an Optimal Data Placement Strategy (ODPS) based on grouping semantics is proposed. Initially, user history log is dynamically analyzed for identifying access pattern which is depicted as a graph. Markov clustering, a Graph clustering algorithm is applied to identify groupings among the dataset. Then, an Optimal Data Placement Algorithm (ODPA) is proposed based on the statistical measures estimated from the clustered graph. This in turn re-organizes the default data layouts in HDFS to achieve improved performance for Big-Data sets in heterogeneous distributed environment. Our proposed strategy is tested in a 15 node cluster placed in a single rack topology. The result has proved to be more efficient for massive datasets, reducing query execution time by 26% and significantly improves the data locality by 38% compared to HDDPS.

Download Full-text

Cloud Computing and Its Application in Big Data Processing of Distance Higher Education

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v10i8.5280 ◽

2015 ◽

Vol 10 (8) ◽

pp. 55 ◽

Cited By ~ 4

Author(s):

Guolei Zhang ◽

Jia Li ◽

Li Hao

Keyword(s):

Higher Education ◽

Cloud Computing ◽

Distance Education ◽

Big Data ◽

Data Processing ◽

Clustering Algorithm ◽

Evaluation Method ◽

Low Cost ◽

Science And Technology ◽

Large Data

In the development of information technology the development of scientific theory has brought the progress of science and technology. The progress of science and technology has an impact on the educational field, which changes the way of education. The arrival of the era of big data for the promotion and dissemination of educational resources has played an important role, it makes more and more people benefit. Modern distance education relies on the background of big data and cloud computing, which is composed of a series of tools to support a variety of teaching mode. Clustering algorithm can provide an effective evaluation method for students' personality characteristics and learning status in distance education. However, the traditional K-means clustering algorithm has the characteristics of randomness, uncertainty, high time complexity, and it does not meet the requirements of large data processing. In this paper, we study the parallel K-means clustering algorithm based on cloud computing platform Hadoop, and give the design and strategy of the algorithm. Then, we carry out experiments on several different sizes of data sets, and compare the performance of the proposed method with the general clustering method. Experimental results show that the proposed algorithm which is accelerated has good speed up and low cost. It is suitable for the analysis and mining of large data in the distance higher education.

Download Full-text

Application of big data optimized clustering algorithm in cloud computing environment in traffic accident forecast

Peer-to-Peer Networking and Applications ◽

10.1007/s12083-020-00994-3 ◽

2020 ◽

Author(s):

Zhun Tian ◽

Shengrui Zhang

Keyword(s):

Cloud Computing ◽

Big Data ◽

Traffic Accident ◽

Clustering Algorithm ◽

Computing Environment ◽

Cloud Computing Environment

Download Full-text

Modified Immune Evolutionary Algorithm for Medical Data Clustering and Feature Extraction under Cloud Computing Environment

Journal of Healthcare Engineering ◽

10.1155/2020/1051394 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Jing Yu ◽

Hang Li ◽

Desheng Liu

Keyword(s):

Cloud Computing ◽

Feature Extraction ◽

Big Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Fitness Function ◽

Medical Data ◽

Computing Environment ◽

Cloud Computing Environment ◽

Evolutionary Method

Medical data have the characteristics of particularity and complexity. Big data clustering plays a significant role in the area of medicine. The traditional clustering algorithms are easily falling into local extreme value. It will generate clustering deviation, and the clustering effect is poor. Therefore, we propose a new medical big data clustering algorithm based on the modified immune evolutionary method under cloud computing environment to overcome the above disadvantages in this paper. Firstly, we analyze the big data structure model under cloud computing environment. Secondly, we give the detailed modified immune evolutionary method to cluster medical data including encoding, constructing fitness function, and selecting genetic operators. Finally, the experiments show that this new approach can improve the accuracy of data classification, reduce the error rate, and improve the performance of data mining and feature extraction for medical data clustering.

Download Full-text

Load Forecasting Method Based on Improved Deep Learning in Cloud Computing Environment

Scientific Programming ◽

10.1155/2021/3250732 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Kai Zhang ◽

Wei Guo ◽

Jian Feng ◽

Mei Liu

Keyword(s):

Cloud Computing ◽

Deep Learning ◽

Clustering Algorithm ◽

Load Forecasting ◽

Load Prediction ◽

Computing Environment ◽

Density Peak ◽

Data Set ◽

Cloud Computing Environment ◽

Forecasting Method

For the problems of low accuracy and low efficiency of most load forecasting methods, a load forecasting method based on improved deep learning in cloud computing environment is proposed. Firstly, the preprocessed data set is divided into several data partitions with relatively balanced data volume through spatial grid, so as to better detect abnormal data. Then, the density peak clustering algorithm based on spark is used to detect abnormal data in each partition, and the local clusters and abnormal points are merged. The parallel processing of data is realized by using spark cluster computing platform. Finally, the deep belief network is used for load classification, and the classification results are input into the empirical mode decomposition-gating recurrent unit network model, and the load prediction results are obtained through learning. Based on the load data of a power grid, the experimental results demonstrate that the mean prediction error of the proposed method is basically controlled within 3% in the short term and 0.023 MW, 19.75%, and 2.76% in the long term, which are better than other comparison methods, and the parallel performance is good, which has a certain feasibility.

Download Full-text

Fuzzyc-Means and Cluster Ensemble with Random Projection for Big Data Clustering

Mathematical Problems in Engineering ◽

10.1155/2016/6529794 ◽

2016 ◽

Vol 2016 ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Mao Ye ◽

Wenfen Liu ◽

Jianghong Wei ◽

Xuexian Hu

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

State Of The Art ◽

Random Projection ◽

Aggregation Method ◽

Data Set ◽

Cluster Ensemble ◽

Positive Effects ◽

Fcm Clustering ◽

Value Decomposition

Because of its positive effects on dealing with the curse of dimensionality in big data, random projection for dimensionality reduction has become a popular method recently. In this paper, an academic analysis of influences of random projection on the variability of data set and the dependence of dimensions has been proposed. Together with the theoretical analysis, a new fuzzyc-means (FCM) clustering algorithm with random projection has been presented. Empirical results verify that the new algorithm not only preserves the accuracy of original FCM clustering, but also is more efficient than original clustering and clustering with singular value decomposition. At the same time, a new cluster ensemble approach based on FCM clustering with random projection is also proposed. The new aggregation method can efficiently compute the spectral embedding of data with cluster centers based representation which scales linearly with data size. Experimental results reveal the efficiency, effectiveness, and robustness of our algorithm compared to the state-of-the-art methods.

Download Full-text

Challenges and Cloud Computing Environments Towards Big Data

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207277 ◽

2014 ◽

pp. 203-208

Author(s):

Kiran Kumar S V N Madupu

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Big Data ◽

Technology Development ◽

Computing Environments ◽

Modern Technologies

Big Data has terrific influence on scientific discoveries and also value development. This paper presents approaches in data mining and modern technologies in Big Data. Difficulties of data mining as well as data mining with big data are discussed. Some technology development of data mining as well as data mining with big data are additionally presented.

Download Full-text

Workflow Scheduling for Scientific Application in Homogeneous Cloud Environment

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i7.114 ◽

2017 ◽

Vol 7 (7) ◽

pp. 137

Author(s):

. Monika ◽

Pardeep Kumar ◽

Sanjay Tyagi

Keyword(s):

Cloud Computing ◽

Big Data ◽

Large Data ◽

Workflow Scheduling ◽

Cloud Environment ◽

Computing Environment ◽

Scientific Application ◽

Cloud Computing Environment ◽

Flow Of Information

In Cloud computing environment QoS i.e. Quality-of-Service and cost is the key element that to be take care of. As, today in the era of big data, the data must be handled properly while satisfying the request. In such case, while handling request of large data or for scientific applications request, flow of information must be sustained. In this paper, a brief introduction of workflow scheduling is given and also a detailed survey of various scheduling algorithms is performed using various parameter.

Download Full-text

Issues in security and privacy of big data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i12.482 ◽

2018 ◽

Vol 7 (12) ◽

pp. 1

Author(s):

Shaveta Bhatia

Keyword(s):

Cloud Computing ◽

Big Data ◽

Approximate Method ◽

Biomedical Research ◽

Cyber Security ◽

Data Science ◽

Third Party ◽

Security And Privacy ◽

Security Threats ◽

The Third

The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity. It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.

Download Full-text