CSFC: A New Centroid Based Clustering Method to Improve the Efficiency of Storing and Accessing Small Files in Hadoop

doi:10.35940/ijrte.d1014.1284s519

CSFC: A New Centroid Based Clustering Method to Improve the Efficiency of Storing and Accessing Small Files in Hadoop

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d1014.1284s519 ◽

2020 ◽

Vol 8 (4S5) ◽

pp. 122-127

Keyword(s):

Big Data ◽

Clustering Method ◽

Number Of Clusters ◽

Clustering Techniques ◽

Clustering Technique ◽

Memory Size

In day to day life, the computer plays a major role, due to this advancement of technology collection of data from various fields are increasing. A large amount of data is produced by various fields for every second and is not easy to process. This large amount of data is called as Big data. A large number of small files also considered as Big data. It's not easy to process and store the small files in Hadoop. In the existing methods Merging technologies and Clustering Techniques are used to combine smaller files to large files up to 128 MB before sending it to HDFS in Hadoop. In the Proposed system CSFC (Clustering Small Files based on Centroid) Clustering Technique is used without mentioning the number of Clusters previously because if the clusters are mentioned before, all the files are clubbed within the limited number of clusters. In proposing system clusters are generated by depending on the number of related files in the dataset. The relevant files are combined up to 128 MB in a cluster. If any file is not relevant to the existing cluster or if the memory size reached 128MB then-new cluster will be generated and the file will be stored. It is easy to process the related files, comparing two irrelevant files. By using this method fetching data from the data node, it produces efficient result when comparing with other clustering techniques.

Download Full-text

Comparative Study and Analysis of Students results using Clustering Techniques

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v9i2.421 ◽

2021 ◽

Vol 9 (2) ◽

pp. 835-842

Author(s):

Mrs. Bhawna Janghel, Et. al.

Keyword(s):

Academic Performance ◽

Comparative Study ◽

Data Clustering ◽

Quality Education ◽

Clustering Method ◽

Clustering Techniques ◽

Clustering Technique ◽

Weak Student ◽

Using Data

In this paper using clustering method for student’s school academic performance are measured from same district. By using data clustering technique we can predict which school is best. And try to identify the weak student of particular school and will identify the result of best school. This will show which school is better for observing the techniques in disrict.The best school will be help us to making the quality education.

Download Full-text

Clustering Techniques Within Service Sector

Advances in Business Information Systems and Analytics - Applying Predictive Analytics Within the Service Sector ◽

10.4018/978-1-5225-2148-8.ch005 ◽

2017 ◽

pp. 74-87 ◽

Cited By ~ 1

Author(s):

İbrahim Yazici ◽

Ömer Faruk Beyca ◽

Selim Zaim

Keyword(s):

Data Mining ◽

Social Media ◽

Big Data ◽

Service Sector ◽

Data Availability ◽

Management Tool ◽

Clustering Method ◽

Data Mining Technique ◽

Clustering Techniques ◽

Group A

Due to big data availability in markets recently, processing and making predictions with data have been becoming more difficult, and this difficulty has been affecting management decisions. As a result, competitiveness for companies are related to analyze and utilize big data in order to achieve company targets. Transforming big data into business advantage has become a vital management tool across all industries. There are many data mining techniques that are being applied to plenty of problems. One of the frequently utilized data mining technique is clustering method. Clustering techniques aim to group a set of objects in clusters that more similar objects are in the same cluster. Main utilization aim of clustering techniques is segmenting or clustering or grouping objects. Clustering techniques and their utilization within service sector by aim of clustering technique and their methodologies are presented. Energy, social media and bank sectors are found that the mostly user of clustering techniques within service sector for segmenting customers based on searched papers.

Download Full-text

Clustering Using Cyclic Spaces of Reversible Cellular Automata

Complex Systems ◽

10.25088/complexsystems.30.2.205 ◽

2021 ◽

Vol 30 (2) ◽

pp. 205-237

Author(s):

Sukanya Mukherjee ◽

◽

Kamalika Bhattacharjee ◽

Sukanta Das ◽

◽

...

Keyword(s):

Cellular Automata ◽

Cellular Automaton ◽

Configuration Space ◽

Present Level ◽

Number Of Clusters ◽

Clustering Techniques ◽

Clustering Technique ◽

Previous Level ◽

Reversible Cellular Automata ◽

Number Of Cycles

This paper introduces a cycle-based clustering technique using the cyclic spaces of reversible cellular automata (CAs). Traditionally, a cluster consists of close objects, which in the case of CAs necessarily means that the objects belong to the same cycle; that is, they are reachable from each other. Each of the cyclic spaces of a cellular automaton (CA) forms a unique cluster. This paper identifies CA properties based on “reachability” that make the clustering effective. To do that, we first figure out which CA rules contribute to maintaining the minimum intracluster distance. Our CA is then designed with such rules to ensure that a limited number of cycles exist in the configuration space. An iterative strategy is also introduced that can generate a desired number of clusters by merging objects of closely reachable clusters from a previous level in the present level using a unique auxiliary CA. Finally, the performance of our algorithm is measured using some standard benchmark validation indices and compared with existing well-known clustering techniques. It is found that our algorithm is at least on a par with the best algorithms existing today on the metric of these standard validation indices.

Download Full-text

Research on Renewal Probability Problem of Applying Clustering Method Under Big Data

2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE) ◽

10.1109/iccece51280.2021.9342508 ◽

2021 ◽

Author(s):

Ying Miao ◽

Xinlei Zhao ◽

Jiankai Zuo ◽

Zhongzhi Li ◽

Yilin Yan ◽

...

Keyword(s):

Big Data ◽

Clustering Method ◽

Probability Problem

Download Full-text

A survey of clustering techniques for big data analysis

2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence) ◽

10.1109/confluence.2014.6949256 ◽

2014 ◽

Cited By ~ 19

Author(s):

Saurabh Arora ◽

Inderveer Chana

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Clustering Techniques

Download Full-text

Detection of Jihadism in Social Networks Using Big Data Techniques Supported by Graphs and Fuzzy Clustering

Complexity ◽

10.1155/2019/1238780 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13

Author(s):

Cristina Sánchez-Rebollo ◽

Cristina Puente ◽

Rafael Palacios ◽

Claudia Piriz ◽

Juan P. Fuentes ◽

...

Keyword(s):

Social Networks ◽

Big Data ◽

Fuzzy Clustering ◽

Extraction Techniques ◽

Public Database ◽

New Members ◽

Terrorist Organizations ◽

Clustering Techniques ◽

Level Of Activity ◽

Data Architecture

Social networks are being used by terrorist organizations to distribute messages with the intention of influencing people and recruiting new members. The research presented in this paper focuses on the analysis of Twitter messages to detect the leaders orchestrating terrorist networks and their followers. A big data architecture is proposed to analyze messages in real time in order to classify users according to different parameters like level of activity, the ability to influence other users, and the contents of their messages. Graphs have been used to analyze how the messages propagate through the network, and this involves a study of the followers based on retweets and general impact on other users. Then, fuzzy clustering techniques were used to classify users in profiles, with the advantage over other classifications techniques of providing a probability for each profile instead of a binary categorization. Algorithms were tested using public database from Kaggle and other Twitter extraction techniques. The resulting profiles detected automatically by the system were manually analyzed, and the parameters that describe each profile correspond to the type of information that any expert may expect. Future applications are not limited to detecting terrorist activism. Human resources departments can apply the power of profile identification to automatically classify candidates, security teams can detect undesirable clients in the financial or insurance sectors, and immigration officers can extract additional insights with these techniques.

Download Full-text

TCLUST: Trimming Approach of Robust Clustering Method

Malaysian Journal of Fundamental and Applied Sciences ◽

10.11113/mjfas.v8n4.154 ◽

2014 ◽

Vol 8 (4) ◽

Author(s):

Muhamad Alias Md. Jedi ◽

Robiah Adnan

Keyword(s):

Clustering Algorithm ◽

Likelihood Function ◽

R Package ◽

Clustering Method ◽

Number Of Clusters ◽

Robust Clustering ◽

Scatter Matrix ◽

Group Assignment ◽

Log Likelihood ◽

Clustering Approach

TCLUST is a method in statistical clustering technique which is based on modification of trimmed k-means clustering algorithm. It is called “crisp” clustering approach because the observation is can be eliminated or assigned to a group. TCLUST strengthen the group assignment by putting constraint to the cluster scatter matrix. The emphasis in this paper is to restrict on the eigenvalues, λ of the scatter matrix. The idea of imposing constraints is to maximize the log-likelihood function of spurious-outlier model. A review of different robust clustering approach is presented as a comparison to TCLUST methods. This paper will discuss the nature of TCLUST algorithm and how to determine the number of cluster or group properly and measure the strength of group assignment. At the end of this paper, R-package on TCLUST implement the types of scatter restriction, making the algorithm to be more flexible for choosing the number of clusters and the trimming proportion.

Download Full-text

A Secure Clustering Technique for Unstructured and Uncertain Big Data

Advances in Intelligent Systems and Computing - Progress in Advanced Computing and Intelligent Engineering ◽

10.1007/978-981-10-6875-1_45 ◽

2017 ◽

pp. 459-466

Author(s):

Md Tabrez Nafis ◽

Ranjit Biswas

Keyword(s):

Big Data ◽

Clustering Technique

Download Full-text

Spreading Activation Connectivity Based Approach to Network Clustering

Advances in Wireless Technologies and Telecommunication - Graph Theoretic Approaches for Analyzing Large-Scale Social Networks ◽

10.4018/978-1-5225-2814-2.ch013 ◽

2018 ◽

pp. 207-219

Author(s):

Alexander Troussov ◽

Sergey Maruev ◽

Sergey Vinogradov ◽

Mikhail Zhizhin

Keyword(s):

Big Data ◽

Social Network ◽

Social Systems ◽

Network Models ◽

Real Data ◽

Spreading Activation ◽

Network Clustering ◽

Clustering Method ◽

Network Characteristics ◽

Structure Detection

Techno-social systems generate data, which are rather different, than data, traditionally studied in social network analysis and other fields. In massive social networks agents simultaneously participate in several contexts, in different communities. Network models of many real data from techno-social systems reflect various dimensionalities and rationales of actor's actions and interactions. The data are inherently multidimensional, where “everything is deeply intertwingled”. The multidimensional nature of Big Data and the emergence of typical network characteristics in Big Data, makes it reasonable to address the challenges of structure detection in network models, including a) development of novel methods for local overlapping clustering with outliers, b) with near linear performance, c) preferably combined with the computation of the structural importance of nodes. In this chapter the spreading connectivity based clustering method is introduced. The viability of the approach and its advantages are demonstrated on the data from the largest European social network VK.

Download Full-text

Big Data Clustering Techniques: Recent Advances and Survey

Machine Learning and Data Mining for Emerging Trend in Cyber Dynamics ◽

10.1007/978-3-030-66288-2_3 ◽

2021 ◽

pp. 57-79

Author(s):

Hassan Ibrahim Hayatu ◽

Abdullahi Mohammed ◽

Ahmad Barroon Isma’eel

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Techniques ◽

Recent Advances

Download Full-text