Analysis of big data for data-intensive applications

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality

Scalable Computing Practice and Experience ◽

10.12694/scpe.v19i3.1375 ◽

2018 ◽

Vol 19 (3) ◽

pp. 245-258

Author(s):

Vengadeswaran Shanmugasundaram ◽

Balasundaram Sadhu Ramakrishnan

Keyword(s):

Big Data ◽

Data Placement ◽

Query Execution ◽

Access Pattern ◽

Clustering Techniques ◽

Data Intensive ◽

Markov Clustering ◽

Default Data ◽

Data Intensive Applications ◽

Grouping Behavior

In this data era, massive volumes of data are being generated every second in variety of domains such as Geoscience, Social Web, Finance, e-Commerce, Health Care, Climate modelling, Physics, Astronomy, Government sectors etc. Hadoop has been well-recognized as de factobig data processing platform that have been extensively adopted, and is currently widely used, in many application domains processing Big Data. Even though it is considered as an efficient solution for such complex query processing, it has its own limitation when the data to be processed exhibit interest locality. The data required for any query execution follows grouping behavior wherein only a part of the Big-Data is accessed frequently. During such scenarion, the time taken to execute a queryand return results, increases exponentially as the amount of data increases leading to much waiting time for the user. Since Hadoop default data placement strategy (HDDPS) does not consider such grouping behavior, it does not perform efficiently resulting in lacunas such as decreased local map task execution, increased query execution time etc. Hence proposed an Optimal Data Placement Strategy (ODPS) based on grouping semantics. In this paper we experiment the significance oftwo most promising clustering techniques viz. Hierarchical Agglomerative Clustering (HAC) and Markov Clustering (MCL) in grouping aware data placement for data intensive applications having interest locality. Initially user access pattern is identified by dynamically analyzing history log.Then both clustering techniques (HAC & MCL) are separately applied over the access pattern to obtain independent clusters. These clusters are interpreted and validated to extract the Optimal Data Groupings (ODG). Finally proposed strategy reorganizes the default data layouts in HDFSbased on ODG to achieve maximum parallel execution per group subjective to Load Balancer and Rack Awareness. Our proposed strategy is tested in 10 node cluster placed in a multi rack with Hadoop installed in every node deployed in cloud platform. Proposed strategy reduces the query execution time, significantly improves the data locality and has proved to be more efficient for massive datasets processing in heterogeneous distributed environment. Also MCL shows a marginal improved performance over HAC for queries exhibiting interest localities.

Download Full-text

Decoding Big Data Analytics for Emerging Business Through Data-Intensive Applications and Business Intelligence

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch004 ◽

2020 ◽

pp. 66-80

Author(s):

Vinay Kellengere Shankarnarayan

Keyword(s):

Big Data ◽

Business Intelligence ◽

Business Models ◽

Big Data Analytics ◽

Research Area ◽

Future Perspective ◽

Massive Data ◽

Data Intensive ◽

Tools And Techniques ◽

Data Intensive Applications

In recent years, big data have gained massive popularity among researchers, decision analysts, and data architects in any enterprise. Big data had been just another way of saying analytics. In today's world, the company's capital lies with big data. Think of worlds huge companies. The value they offer comes from their data, which they analyze for their proactive benefits. This chapter showcases the insight of big data and its tools and techniques the companies have adopted to deal with data problems. The authors also focus on framework and methodologies to handle the massive data in order to make more accurate and precise decisions. The chapter begins with the current organizational scenario and what is meant by big data. Next, it draws out various challenges faced by organizations. The authors also observe big data business models and different frameworks available and how it has been categorized and finally the conclusion discusses the challenges and what is the future perspective of this research area.

Download Full-text

ScaDS Dresden/Leipzig – A competence center for collaborative big data research

it - Information Technology ◽

10.1515/itit-2018-0026 ◽

2018 ◽

Vol 60 (5-6) ◽

pp. 327-333 ◽

Cited By ~ 1

Author(s):

René Jäkel ◽

Eric Peukert ◽

Wolfgang E. Nagel ◽

Erhard Rahm

Keyword(s):

Big Data ◽

Heterogeneous Data ◽

Data Sets ◽

Data Intensive ◽

Innovative Methods ◽

Huge Data ◽

Wide Range ◽

Resource Requirements ◽

Visualization Of Data ◽

Data Intensive Applications

Abstract The efficient and intelligent handling of large, often distributed and heterogeneous data sets increasingly determines the scientific and economic competitiveness in most application areas. Mobile applications, social networks, multimedia collections, sensor networks, data intense scientific experiments, and complex simulations nowadays generate a huge data deluge. Nonetheless, processing and analyzing these data sets with innovative methods open up new opportunities for its exploitation and new insights. Nevertheless, the resulting resource requirements exceed usually the possibilities of state-of-the-art methods for the acquisition, integration, analysis and visualization of data and are summarized under the term big data. ScaDS Dresden/Leipzig, as one Germany-wide competence center for collaborative big data research, bundles efforts to realize data-intensive applications for a wide range of applications in science and industry. In this article, we present the basic concept of the competence center and give insights in some of its research topics.

Download Full-text

BRPS: A Big Data Placement Strategy for Data Intensive Applications

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) ◽

10.1109/icdmw.2016.0120 ◽

2016 ◽

Cited By ~ 4

Author(s):

Lihui Liu ◽

Junping Song ◽

Haibo Wang ◽

Pin Lv

Keyword(s):

Big Data ◽

Data Placement ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

Information Sciences ◽

10.1016/j.ins.2014.01.015 ◽

2014 ◽

Vol 275 ◽

pp. 314-347 ◽

Cited By ~ 1295

Author(s):

C.L. Philip Chen ◽

Chun-Yang Zhang

Keyword(s):

Big Data ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

Performance-efficient Recommendation and Prediction Service for Big Data frameworks focusing on Data Compression and In-memory Data Storage Indicators

Scalable Computing Practice and Experience ◽

10.12694/scpe.v22i4.1945 ◽

2021 ◽

Vol 22 (4) ◽

pp. 401-412

Author(s):

Hrachya Astsatryan ◽

Arthur Lalayan ◽

Aram Kocharyan ◽

Daniel Hagimont

Keyword(s):

Big Data ◽

Data Compression ◽

Data Storage ◽

File Systems ◽

Large Datasets ◽

Data Sets ◽

Mapreduce Framework ◽

Data Intensive ◽

Parallel Data ◽

Data Intensive Applications

The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on metrics, and recommends the best configuration parameters to improve Hadoop and Spark frameworks' performance. Several CPU and data-intensive applications and micro-benchmarks have been evaluated to improve the performance, including Log Analyzer, WordCount, and K-Means.

Download Full-text

MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

Computational Intelligence and Neuroscience ◽

10.1155/2015/297672 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 22

Author(s):

Yang Liu ◽

Jie Yang ◽

Yuan Huang ◽

Lixiong Xu ◽

Siguang Li ◽

...

Keyword(s):

Neural Networks ◽

Big Data ◽

Large Scale ◽

Training Data ◽

Computer Cluster ◽

Data Intensive ◽

Big Data Applications ◽

The Neural Network ◽

Computation Process ◽

Data Intensive Applications

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.

Download Full-text

Data Intensive Cloud Computing

Big Data ◽

10.4018/978-1-4666-9840-6.ch029 ◽

2016 ◽

pp. 639-654

Author(s):

Jayalakshmi D. S. ◽

R. Srinivasan ◽

K. G. Srinivasa

Keyword(s):

Cloud Computing ◽

Big Data ◽

Cluster Computing ◽

Resource Provisioning ◽

Data Intensive ◽

Scientific Value ◽

Data Intensive Applications ◽

Cloud Applications ◽

Problem Data ◽

Huge Challenge

Processing Big Data is a huge challenge for today's technology. There is a need to find, apply and analyze new ways of computing to make use of the Big Data so as to derive business and scientific value from it. Cloud computing with its promise of seemingly infinite computing resources is seen as the solution to this problem. Data Intensive computing on cloud builds upon the already mature parallel and distributed computing technologies such HPC, grid and cluster computing. However, handling Big Data in the cloud presents its own challenges. In this chapter, we analyze issues specific to data intensive cloud computing and provides a study on available solutions in programming models, data distribution and replication, resource provisioning and scheduling with reference to data intensive applications in cloud. Future directions for further research enabling data intensive cloud applications in cloud environment are identified.

Download Full-text

Intelligent Secure Storage Mechanism for Big Data

Webology ◽

10.14704/web/v18si01/web18057 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 246-261

Author(s):

K.R. Remesh Babu ◽

K.P. Madhu

Keyword(s):

Big Data ◽

Data Storage ◽

Big Data Analytics ◽

Business Organizations ◽

Storage Mechanism ◽

Data Intensive ◽

Secure Storage ◽

Huge Data ◽

Efficient Data ◽

Data Intensive Applications

The management of big data became more important due to the wide spread adoption of internet of things in various fields. The developments in technology, science, human habits, etc., generates massive amount of data, so it is increasingly important to store and protect these data from attacks. Big data analytics is now a hot topic. The data storage facility provided by the cloud computing enabled business organizations to overcome the burden of huge data storage and maintenance. Also, several distributed cloud applications supports them to analyze this data for taking appropriate decisions. The dynamic growth of data and data intensive applications demands an efficient intelligent storage mechanism for big data. The proposed system analyzes IP packets for vulnerabilities and classifies data nodes as reliable and unreliable nodes for the efficient data storage. The proposed Apriori algorithm based method automatically classifies the nodes for intelligent secure storage mechanism for the distributed big data storage.

Download Full-text

Analysis of big data for data-intensive applications

NoSQL Databases

Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality

Decoding Big Data Analytics for Emerging Business Through Data-Intensive Applications and Business Intelligence

ScaDS Dresden/Leipzig – A competence center for collaborative big data research

BRPS: A Big Data Placement Strategy for Data Intensive Applications

Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

Performance-efficient Recommendation and Prediction Service for Big Data frameworks focusing on Data Compression and In-memory Data Storage Indicators

MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

Data Intensive Cloud Computing

Intelligent Secure Storage Mechanism for Big Data

Export Citation Format