scholarly journals Sørensen-Dice Similarity Indexing based Weighted Iterative Clustering for Big Data Analytics

Author(s):  
Kalyana Saravanan ◽  
Angamuthu Tamilarasi

Big data is a collection of large volume of data and extract similar data points from large dataset. Clustering is an essential data mining technique for examining large volume of data. Several techniques have been developed for handling big dataset. However, with much time consumption and space complexity, accuracy is said to be compromised. In order to improve clustering accuracy with less complexity, Sørensen-Dice Indexing based Weighted Iterative X-means Clustering (SDI-WIXC) technique is introduced. SDI-WIXC technique is used for grouping the similar data points with higher clustering accuracy and minimal time. First, number of data points is collected from big dataset. Then, along with the weight value, the given dataset is partitioned into ‘X’ number of clusters. Next, based on the similarity measure, Weighted Iterated X-means Clustering (WIXC) is applied for clustering data points. Sørensen-Dice Indexing Process is used for measuring similarity between cluster weight value and data points. Upon similarity found between weight value of cluster and data point, data points are grouped into a specific cluster. Besides, the WIXC method also improves the cluster assignments through repeated subdivision using Bayesian probability criterion. This in turn helps to group all data points and hence, improving the clustering accuracy. Experimental evaluation is carried out with number of factors such as clustering accuracy, clustering time and space complexity with respect to the number of data points. The experimental results reported that the proposed SDI-WIXC technique obtains high clustering accuracy with minimum time as well as space complexity.

Author(s):  
Shweta Kumari

n a business enterprise there is an enormous amount of data generated or processed daily through different data points. It is increasing day by day. It is tough to handle it through traditional applications like excel or any other tools. So, big data analytics and environment may be helpful in the current scenario and the situation discussed above. This paper discussed the big data management ways with the impact of computational methodologies. It also covers the applicability domains and areas. It explores the computational methods applicability scenario and their conceptual design based on the previous literature. Machine learning, artificial intelligence and data mining techniques have been discussed for the same environment based on the related study.


2019 ◽  
Vol 8 (3) ◽  
pp. 8178-8184

Big data contains massive amounts of information’s that are difficult to manage, acquire, store and analyses. The clustering of data is a demanding issue in the field of big data analytics. The existing techniques developed for clustering does not provide efficient performance and also time complexity of clustering was higher. Further, minimizing dimensionality of big data was not addressed effectively. In order to overcome these limitations, a Moore Data Clustering based Bloom Hash Storage (MDC-BHS) Technique is proposed. The MDC-BHS Technique is designed with aim of reducing the dimensionality of big data with lesser time through clustering. The MDC-BHS Technique used Moore Data Clustering (MDC) Model in order to group the data in big dataset with minimum time consumption. After performing clustering process, the MDC-BHS Technique employed Bloom Hash Storage (BHS) Model in order to store clustered data with minimum space complexity. The BHS Model is a space-efficient probabilistic data structure which utilized hashing function to create hash value for clustered data. Therefore, proposed MDC-BHS Technique significantly reduces the dimensionality of larger dataset. The experimental evaluation of MDC-BHS technique is carried out on weather data with factors such as clustering time and clustering accuracy and space complexity with respect to number of data. The experimental results demonstrate that MDC-BHS Technique is able to improve the clustering accuracy and also minimizes the space complexity when compared to state-of-the-art works


Author(s):  
Francisca Vale Lima ◽  
Carlos Costa ◽  
Maribel Yasmina Santos

The large volume of data that is constantly being generated leads to the need of extracting useful patterns, trends, or insights from this data, raising the interest in business intelligence and big data analytics. The volume, velocity, and variety of data highlight the need for concepts like real-time big data warehouses (RTBDWs). The lack of guidelines or methodological approaches for implementing these systems requires further research in this recent topic. This chapter presents the proposal of a RTBDW architecture that includes the main components and data flows needed to collect, process, store, and analyze the available data, integrating streaming with batch data and enabling real-time decision making. Using Twitter data, several technologies were evaluated to understand their performance. The obtained results were satisfactory and allowed the identification of a methodological approach that can be followed for the implementation of this type of system.


Author(s):  
Stephen Dass ◽  
Prabhu J.

This chapter describes how in the digital data era, a large volume of data became accessible to data science engineers. With the reckless growth in networking, communication, storage, and data collection capability, the Big Data science is quickly growing in each engineering and science domain. This paper aims to study many numbers of the various analytics ways and tools which might be practiced to Big Data. The important deportment in this paper is step by step process to handle the large volume and variety of data expeditiously. The rapidly evolving big data tools and Platforms have given rise to numerous technologies to influence completely different Big Data portfolio.In this paper, we debate in an elaborate manner about analyzing tools, processing tools and querying tools for Big datahese tools used for data analysis Big Data tools utilize numerous tasks, like Data capture, storage, classification, sharing, analysis, transfer, search, image, and deciding which might also apply to Big data.


2022 ◽  
pp. 1527-1548
Author(s):  
Stephen Dass ◽  
Prabhu J.

This chapter describes how in the digital data era, a large volume of data became accessible to data science engineers. With the reckless growth in networking, communication, storage, and data collection capability, the Big Data science is quickly growing in each engineering and science domain. This paper aims to study many numbers of the various analytics ways and tools which might be practiced to Big Data. The important deportment in this paper is step by step process to handle the large volume and variety of data expeditiously. The rapidly evolving big data tools and Platforms have given rise to numerous technologies to influence completely different Big Data portfolio.In this paper, we debate in an elaborate manner about analyzing tools, processing tools and querying tools for Big datahese tools used for data analysis Big Data tools utilize numerous tasks, like Data capture, storage, classification, sharing, analysis, transfer, search, image, and deciding which might also apply to Big data.


2020 ◽  
Vol 8 (10) ◽  
pp. 15-23
Author(s):  
Michail Angelopoulos ◽  
Christina Kontakou

Industries and utilities explore more and more case studies and expert advice concerning the possibilities of intelligent business administration and Big Data analytics. The movement to integrate and meaningfully interpret huge tanks of data points is reaching beyond cutting edge utilities into the mainstream. This project aims at exploring the application of Big Data analytics in the energy sector. Within project’s framework, already developed and operational datasets will be combined with supplemental data from publicly available sources, including prior energy efficiency program participation, enrollment in other energy programs and services, geographic data, customer equipment profiles, demographics, and psychographic customer segmentation categories. There are significant technical, budgetary and project management challenges in undertaking the development and integration of Big Data analytics. There are also organizational challenges in managing change, making decisions in a cooperative framework, and grounding project goals and strategies in the customer experience. The ultimate goal of the project is to develop a reliable database for a step closer to the energy efficiency direction.


2018 ◽  
Vol 7 (4.5) ◽  
pp. 689
Author(s):  
Sarada. B ◽  
Vinayaka Murthy. M ◽  
Udaya Rani. V

Now a days data is increasing exponentially daily in terms of velocity, variety and volume which is also known as Big data. When the dataset has small number of dimensions, limited number of clusters and less number of data points the existing traditional clustering al- gorithms will give the expected results. As we know this is the Big Data age, with large volume of data sets through the traditional clus- tering algorithms we will not be able to get expected results. So there is a need to develop a new approach which gives better accuracy and computational time for large volume of data processing. The Proposed new System Architecture is a combination of canopy, Kmeans and RK sorting algorithm through Map Reduce Hadoop frame work platform. The analysis shows that the large volume of data processing will take less computational time and higher accuracy, and the RK sorting does not require swapping of elements and stack spaces. 


Author(s):  
Manbir Sandhu ◽  
Purnima, Anuradha Saini

Big data is a fast-growing technology that has the scope to mine huge amount of data to be used in various analytic applications. With large amount of data streaming in from a myriad of sources: social media, online transactions and ubiquity of smart devices, Big Data is practically garnering attention across all stakeholders from academics, banking, government, heath care, manufacturing and retail. Big Data refers to an enormous amount of data generated from disparate sources along with data analytic techniques to examine this voluminous data for predictive trends and patterns, to exploit new growth opportunities, to gain insight, to make informed decisions and optimize processes. Data-driven decision making is the essence of business establishments. The explosive growth of data is steering the business units to tap the potential of Big Data to achieve fueling growth and to achieve a cutting edge over their competitors. The overwhelming generation of data brings with it, its share of concerns. This paper discusses the concept of Big Data, its characteristics, the tools and techniques deployed by organizations to harness the power of Big Data and the daunting issues that hinder the adoption of Business Intelligence in Big Data strategies in organizations.


2019 ◽  
Vol 54 (5) ◽  
pp. 20
Author(s):  
Dheeraj Kumar Pradhan

2020 ◽  
Vol 49 (5) ◽  
pp. 11-17
Author(s):  
Thomas Wrona ◽  
Pauline Reinecke

Big Data & Analytics (BDA) ist zu einer kaum hinterfragten Institution für Effizienz und Wettbewerbsvorteil von Unternehmen geworden. Zu viele prominente Beispiele, wie der Erfolg von Google oder Amazon, scheinen die Bedeutung zu bestätigen, die Daten und Algorithmen zur Erlangung von langfristigen Wettbewerbsvorteilen zukommt. Sowohl die Praxis als auch die Wissenschaft scheinen geradezu euphorisch auf den „Datenzug“ aufzuspringen. Wenn Risiken thematisiert werden, dann handelt es sich meist um ethische Fragen. Dabei wird häufig übersehen, dass die diskutierten Vorteile sich primär aus einer operativen Effizienzperspektive ergeben. Strategische Wirkungen werden allenfalls in Bezug auf Geschäftsmodellinnovationen diskutiert, deren tatsächlicher Innovationsgrad noch zu beurteilen ist. Im Folgenden soll gezeigt werden, dass durch BDA zwar Wettbewerbsvorteile erzeugt werden können, dass aber hiermit auch große strategische Risiken verbunden sind, die derzeit kaum beachtet werden.


Sign in / Sign up

Export Citation Format

Share Document