Analysis of Artistic Modeling of Opera Stage Clothing Based on Big Data Clustering Algorithm

In order to deal with the problem that the traditional stage costume artistry analysis method cannot correct the results of big data clustering, which leads to deviations in the extraction of costume artistry features, this paper proposes a clothing artistic modeling method based on big data clustering algorithm. The proposed method provides a database for big data clustering by constructing the attribute set of the big data feature sequence training set and, at the same time, constructing a second-order cone programming model to correct the big data. Aiming at the problem that traditional stage costume art analysis methods cannot correct the clustering results of big data. On this basis, the costume elements of the opera stage are segmented, initialized, and transformed into a binary function. Finally, using the convolutional neural network, combining the element segmentation results and the large data clustering space state vector, a feature extraction model of stage costume art is constructed. Experimental results show that the model has good convergence, short time-consuming, high accuracy, and ideal feature recognition capabilities.

Download Full-text

Uncertainty-Based Clustering Algorithms for Large Data Sets

Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2805-0.ch001 ◽

2018 ◽

pp. 1-33 ◽

Cited By ~ 1

Author(s):

B. K. Tripathy ◽

Hari Seetha ◽

M. N. Murty

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Mining Machine ◽

Data Sets ◽

Fuzzy C Means ◽

Intuitionistic Fuzzy ◽

New Algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch062 ◽

2016 ◽

pp. 1220-1243

Author(s):

Ilias K. Savvas ◽

Georgia N. Sofianidou ◽

M-Tahar Kechadi

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Large Data ◽

Large Data Sets ◽

Distributed File System ◽

Data Sets ◽

Raw Data ◽

Hadoop Distributed File System ◽

Access To Data

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.

Download Full-text

A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2019040102 ◽

2019 ◽

Vol 9 (2) ◽

pp. 16-30 ◽

Cited By ~ 1

Author(s):

Hind Bangui ◽

Mouzhi Ge ◽

Barbora Buhnova

Keyword(s):

Big Data ◽

Internet Of Things ◽

Mobile Networks ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Future Internet ◽

Research Challenges ◽

Initial Stage ◽

Big Data Technologies

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this article starts by reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this article identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.

Download Full-text

Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.103 ◽

2017 ◽

Vol 2 (2) ◽

pp. 57-62

Author(s):

Padmanathan Anantharaman ◽

H.V. Ramakrishan

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Programming Model ◽

Hybrid Approach ◽

Processing Technique ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Dataset Size

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.

Download Full-text

Big Data Clustering and Hadoop Distributed File System Architecture

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8256 ◽

2019 ◽

Vol 16 (9) ◽

pp. 3824-3829

Author(s):

Deepak Ahlawat ◽

Deepali Gupta

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Early Stage ◽

Large Data ◽

Data File ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Data Files ◽

Technological World

Due to advancement in the technological world, there is a great surge in data. The main sources of generating such a large amount of data are social websites, internet sites etc. The large data files are combined together to create a big data architecture. Managing the data file in such a large volume is not easy. Therefore, modern techniques are developed to manage bulk data. To arrange and utilize such big data, Hadoop Distributed File System (HDFS) architecture from Hadoop was presented in the early stage of 2015. This architecture is used when traditional methods are insufficient to manage the data. In this paper, a novel clustering algorithm is implemented to manage a large amount of data. The concepts and frames of Big Data are studied. A novel algorithm is developed using the K means and cosine-based similarity clustering in this paper. The developed clustering algorithm is evaluated using the precision and recall parameters. The prominent results are obtained which successfully manages the big data issue.

Download Full-text

HAMR: A dataflow-based real-time in-memory cluster computing engine

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016672080 ◽

2016 ◽

Vol 31 (5) ◽

pp. 361-374 ◽

Cited By ~ 3

Author(s):

Yao Wu ◽

Long Zheng ◽

Brian Heilig ◽

Guang R Gao

Keyword(s):

Big Data ◽

Memory Management ◽

High Performance ◽

Cluster Computing ◽

Programming Model ◽

Distributed Processing ◽

Large Data ◽

Computing System ◽

Fine Grain ◽

Execution Model

As the attention given to big data grows, cluster computing systems for distributed processing of large data sets become the mainstream and critical requirement in high performance distributed system research. One of the most successful systems is Hadoop, which uses MapReduce as a programming/execution model and takes disks as intermedia to process huge volumes of data. Spark, as an in-memory computing engine, can solve the iterative and interactive problems more efficiently. However, currently it is a consensus that they are not the final solutions to big data due to a MapReduce-like programming model, synchronous execution model and the constraint that only supports batch processing, and so on. A new solution, especially, a fundamental evolution is needed to bring big data solutions into a new era. In this paper, we introduce a new cluster computing system called HAMR which supports both batch and streaming processing. To achieve better performance, HAMR integrates high performance computing approaches, i.e. dataflow fundamental into a big data solution. With more specifications, HAMR is fully designed based on in-memory computing to reduce the unnecessary disk access overhead; task scheduling and memory management are in fine-grain manner to explore more parallelism; asynchronous execution improves efficiency of computation resource usage, and also makes workload balance across the whole cluster better. The experimental results show that HAMR can outperform Hadoop MapReduce and Spark by up to 19x and 7x respectively, in the same cluster environment. Furthermore, HAMR can handle scaling data size well beyond the capabilities of Spark.

Download Full-text

A Succinct Distributive Big Data Clustering Algorithm Based on Local-Remote Coordination

2015 IEEE International Conference on Systems, Man, and Cybernetics ◽

10.1109/smc.2015.322 ◽

2015 ◽

Author(s):

Chao Ma ◽

Xun Liang ◽

Yuefeng Ma

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithm

Download Full-text

Parallel Classification Algorithm Design of Human Resource Big Data Based on Spark Platform

Security and Communication Networks ◽

10.1155/2021/5811918 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Wang Zhouhuo

Keyword(s):

Big Data ◽

Human Resources ◽

Human Resource ◽

Data Clustering ◽

Ant Colony Algorithm ◽

Algorithm Design ◽

Large Data ◽

Classification Algorithm ◽

Classification Rule ◽

Fuzzy Genetic

In order to solve the problem of large data classification of human resources, a new parallel classification algorithm of large data of human resources based on the Spark platform is proposed in this study. According to the spark platform, it can complete the update and distance calculation of the human resource big data clustering center and design the big data clustering process. Based on this, the K-means clustering method is introduced to mine frequent itemsets of large data and optimize the aggregation degree of similar large data. A fuzzy genetic algorithm is used to identify the balance of big data. This study adopts the selective integration method to study the unbalanced human resource database classifier in the process of transmission, introduces the decision contour matrix to construct the anomaly support model of the set of unbalanced human resource data classifier, identifies the features of the big data of human resource in parallel, repairs the relevance of the big data of human resource, introduces the improved ant colony algorithm, and finally realizes the design of the parallel classification algorithm of the big data of human resource. The experimental results show that the proposed algorithm has a low time cost, good classification effect, and ideal parallel classification rule complexity.

Download Full-text

A distributed big data library extending Java 8

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9476 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 237

Author(s):

MD. A R Quadri ◽

B. Sruthi ◽

A. D. SriRam ◽

B. Lavanya

Keyword(s):

Big Data ◽

Distributed Computing ◽

Programming Model ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Distributed Environment ◽

Multiple Systems ◽

Huge Data ◽

Distributed Streams

Java is one of the finest language for big data because of its write once and run anywhere nature. The new release of java 8 introduced few strategies like lambda expressions and streams which are helpful for parallel computing. Though these new strategies helps in extracting, sorting and filtering data from collections and arrays, still there are problems with it. Streams cannot properly process with the large data sets like big data. Also, there are few problems associated while executing in distributed environment. The new streams introduced in java are restricted to computations inside the single system there is no method for distributed computing over multiple systems. And streams store data in their memory and therefore cannot support huge data sets. Now, this paper cope with java 8 behalf of massive data and deed in distributed environment by providing extensions to the Programming model with distributed streams. The distributed computing of large data programming models may be consummated by introducing distributed stream frameworks.

Download Full-text

A Big Data Framework for Quality Assurance and Validation

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1912.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2490-2494

Keyword(s):

Health Care ◽

Big Data ◽

Relational Databases ◽

Programming Model ◽

New Technology ◽

Large Data ◽

Quality Data ◽

Post Processing ◽

Data Framework ◽

User Data

Big data is a new technology, which is defined by large amount of data, so it is possible to extract value from the capturing and analysis process. Large data faced many challenges due to various features such as volume, speed, variation, value, complexity and performance. Many organizations face challenges while facing test strategies for structured and unstructured data validation, establishing a proper testing environment, working with non relational databases and maintaining functional testing. These challenges have low quality data in production, delay in execution and increase in cost. Reduce the map for data intensive business and scientific applications Provides parallel and scalable programming model. To get the performance of big data applications, defined as response time, maximum online user data capacity size, and a certain maximum processing capacity. In proposed, to test the health care big data . In health care data contains text file, image file, audio file and video file. To test the big data document, by using two concepts such as big data preprocessing testing and post processing testing. To classify the data from unstructured format to structured format using SVM algorithm. In preprocessing testing test all the data, for the purpose data accuracy. In preprocessing testing such as file size testing, file extension testing and de-duplication testing. In Post Processing to implement the map reduce concept for the use of easily to fetch the data.

Download Full-text