scholarly journals CoPart: A Context-based Partitioning Technique for Big Data

2020 ◽  
Author(s):  
Sara Migliorini ◽  
Alberto Belussi ◽  
Elisa Quintarelli ◽  
Damiano Carra

Abstract The MapReduce programming paradigm is frequently used in order to process and analyse huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analysed, it becomes a limit for sophisticated analyses that imply correlations between records that can be exploited to preliminary prune unnecessary computations.In this paper we propose a partitioning technique which exploits the notion of context for partitioning data. We design a context-based multi-dimensional partitioning technique, called \copart, which considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Sara Migliorini ◽  
Alberto Belussi ◽  
Elisa Quintarelli ◽  
Damiano Carra

AbstractThe MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analyzed, it becomes a limit for sophisticated analyses, in which correlations between records can be exploited to preliminarily prune unnecessary computations. In this paper we design a context-based multi-dimensional partitioning technique, called CoPart, which takes care of data correlation in order to determine how records are subdivided between splits (i.e., units of work assigned to a computation node). More specifically, it considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times.


Author(s):  
Manbir Sandhu ◽  
Purnima, Anuradha Saini

Big data is a fast-growing technology that has the scope to mine huge amount of data to be used in various analytic applications. With large amount of data streaming in from a myriad of sources: social media, online transactions and ubiquity of smart devices, Big Data is practically garnering attention across all stakeholders from academics, banking, government, heath care, manufacturing and retail. Big Data refers to an enormous amount of data generated from disparate sources along with data analytic techniques to examine this voluminous data for predictive trends and patterns, to exploit new growth opportunities, to gain insight, to make informed decisions and optimize processes. Data-driven decision making is the essence of business establishments. The explosive growth of data is steering the business units to tap the potential of Big Data to achieve fueling growth and to achieve a cutting edge over their competitors. The overwhelming generation of data brings with it, its share of concerns. This paper discusses the concept of Big Data, its characteristics, the tools and techniques deployed by organizations to harness the power of Big Data and the daunting issues that hinder the adoption of Business Intelligence in Big Data strategies in organizations.


MedienJournal ◽  
2017 ◽  
Vol 38 (4) ◽  
pp. 50-61 ◽  
Author(s):  
Jan Jagodzinski

This paper will first briefly map out the shift from disciplinary to control societies (what I call designer capitalism, the idea of control comes from Gilles Deleuze) in relation to surveillance and mediation of life through screen cultures. The paper then shifts to the issues of digitalization in relation to big data that have the danger of continuing to close off life as zoë, that is life that is creative rather than captured via attention technologies through marketing techniques and surveillance. The last part of this paper then develops the way artists are able to resist the big data archive by turning the data in on itself to offer viewers and participants a glimpse of the current state of manipulating desire and maintaining copy right in order to keep the future closed rather than being potentially open.


Author(s):  
Muhammad Waqar Khan ◽  
Muhammad Asghar Khan ◽  
Muhammad Alam ◽  
Wajahat Ali

<p>During past few years, data is growing exponentially attracting researchers to work a popular term, the Big Data. Big Data is observed in various fields, such as information technology, telecommunication, theoretical computing, mathematics, data mining and data warehousing. Data science is frequently referred with Big Data as it uses methods to scale down the Big Data. Currently<br />more than 3.2 billion of the world population is connected to internet out of which 46% are connected via smart phones. Over 5.5 billion people are using cell phones. As technology is rapidly shifting from ordinary cell phones towards smart phones, therefore proportion of using internet is also growing. There<br />is a forecast that by 2020 around 7 billion people at the globe will be using internet out of which 52% will be using their smart phones to connect. In year 2050 that figure will be touching 95% of world population. Every device connect to internet generates data. As majority of the devices are using smart phones to<br />generate this data by using applications such as Instagram, WhatsApp, Apple, Google, Google+, Twitter, Flickr etc., therefore this huge amount of data is becoming a big threat for telecom sector. This paper is giving a comparison of amount of Big Data generated by telecom industry. Based on the collected data<br />we use forecasting tools to predict the amount of Big Data will be generated in future and also identify threats that telecom industry will be facing from that huge amount of Big Data.</p>


Author(s):  
M Rajeshwari ◽  
A Amirthavalli

In Tamil Nadu Hinduism and Buddhism, Jainism is one of the three oldest Indian strict conventions still in presence and a necessary piece of South indian strict conviction and practice. While frequently utilizing ideas imparted to Hinduism and Buddhism, the consequence of a typical social and phonetic foundation, the Jain convention should be viewed as a free marvel as opposed to as a Hindu order or a Buddhist blasphemy, as some previous Western researchers accepted. In South India, Jainism is minimal in overflow of a name. Indeed, even genuine understudies of religion in India gave little consideration to it. In a populace of almost 60 crores of individuals, Jainas may establish almost nearly 3 million individuals. Jainism is the religion of the Jains who follow the way, lectured and rehearsed by the Jinas. It is a fully evolved and grounded religion and social framework that rose up out of 6 century BC .The trademark highlight of this religion is its case to all inclusiveness which it holds essentially contrary to Brahmanism. It very well may be said that throughout the previous 2500 years the Jains have contributed such a huge amount to each circle of life of Indian individuals both as a religion and a way of thinking. They contributed a lot to the regions of culture, language, exchange and agribusiness, or all in all the Jains opened up another period of human thoughts and musings. In Indian History, endeavors were made to contemplate Jainism as a religion and its commitments yet focus on the Jain movement into Tamil Nadu and its effects are restricted. An endeavor is made in this examination to investigate the recorded geology of the Jain focuses in Tamil Nadu.


Author(s):  
Vinay Kumar ◽  
Arpana Chaturvedi

<div><p><em>With the advent of Social Networking Sites (SNS), volumes of data are generated daily. Most of these data are multimedia type and unstructured with exponential growth. This exponential growth of variety, volume and complexity of structured and unstructured data leads to the concept of big data. Managing big data and harnessing its benefits is a real challenge. With increase in access to big data repository for various applications, security and access control is another aspect that needs to be considered while managing big data. We have discussed area of application of big data, opportunities it provides and challenges that we face in the managing such huge amount of data for various applications. Issues related to security against different threat perception of big data are also discussed. </em></p></div>


2018 ◽  
Vol 10 (9) ◽  
pp. 3215 ◽  
Author(s):  
Pasquale Del Vecchio ◽  
Gioconda Mele ◽  
Valentina Ndou ◽  
Giustina Secundo

This paper aims to contribute to the debate on Open Innovation in the age of Big Data by shedding new light on the role that social networks can play as enabling platforms for tourists’ involvement and sources for the creation and management of valuable knowledge assets. The huge amount of data generated on social media by tourists related to their travel experiences can be a valid source of open innovation. To achieve this aim, this paper presents evidence of a digital tourism experience, through a longitudinal case study of a destination in Apulia, a Southern European region. The findings of the study demonstrate how social Big Data could open up innovation processes that could be of support in defining sustainable tourism experiences in a destination.


2016 ◽  
Vol 1 ◽  
pp. 19-19
Author(s):  
Stephen G. Odaibo ◽  
David G. Odaibo

Sign in / Sign up

Export Citation Format

Share Document