Performance Analysis of NoSQL and Relational Databases with CouchDB and MySQL for Application’s Data Storage

In the current context of emerging several types of database systems (relational and non-relational), choosing the type and database system for storing large amounts of data in today’s big data applications has become an important challenge. In this paper, we aimed to provide a comparative evaluation of two popular open-source database management systems (DBMSs): MySQL as a relational DBMS and, more recently, as a non-relational DBMS, and CouchDB as a non-relational DBMS. This comparison was based on performance evaluation of CRUD (CREATE, READ, UPDATE, DELETE) operations for different amounts of data to show how these two databases could be modeled and used in an application and highlight the differences in the response time and complexity. The main objective of the paper was to make a comparative analysis of the impact that each specific DBMS has on application performance when carrying out CRUD requests. To perform the analysis and to ensure the consistency of tests, two similar applications were developed in Java, one using MySQL and the other one using CouchDB database; these applications were further used to evaluate the time responses for each database technology on the same CRUD operations on the database. Finally, a comprehensive discussion based on the results of the analysis was performed that centered on the results obtained and several conclusions were revealed. Advantages and drawbacks for each DBMS are outlined to support a decision for choosing a specific type of DBMS that could be used in a big data application.

Download Full-text

Database Systems for Big Data Storage and Retrieval

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch003 ◽

2018 ◽

pp. 76-100 ◽

Cited By ~ 2

Author(s):

Venkat Gudivada ◽

Amy Apon ◽

Dhana L. Rao

Keyword(s):

Big Data ◽

Data Storage ◽

Database Systems ◽

Query Languages ◽

Research Issues ◽

Storage And Retrieval ◽

Emerging Trends ◽

Big Data Applications ◽

Data Application ◽

Big Data Application

Special needs of Big Data applications have ushered in several new classes of systems for data storage and retrieval. Each class targets the needs of a category of Big Data application. These systems differ greatly in their data models and system architecture, approaches used for high availability and scalability, query languages and client interfaces provided. This chapter begins with a description of the emergence of Big Data and data management requirements of Big Data applications. Several new classes of database management systems have emerged recently to address the needs of Big Data applications. NoSQL is an umbrella term used to refer to these systems. Next, a taxonomy for NoSQL systems is developed and several NoSQL systems are classified under this taxonomy. Characteristics of representative systems in each class are also discussed. The chapter concludes by indicating the emerging trends of NoSQL systems and research issues.

Download Full-text

Performance Evaluation of a Big Data Application on Apache Spark

10.32920/ryerson.14651544 ◽

2021 ◽

Author(s):

Jeanne Alcantara

Keyword(s):

Big Data ◽

Performance Evaluation ◽

Execution Time ◽

Apache Spark ◽

Massive Data ◽

Application Performance ◽

Data Application ◽

Size Number ◽

Big Data Application ◽

The Impact

Apache Spark enables a big data application—one that takes massive data as input and may produce massive data along its execution—to run in parallel on multiple nodes. Hence, for a big data application, performance is a vital issue. This project analyzes a WordCount application using Apache Spark, where the impact on the execution time and average utilization is assessed. To facilitate this assessment, the number of executor cores and the size of executor memory are varied across different sizes of data that the application has to process, and the different number of nodes in the cluster that the application runs on. It is concluded that different pairs (data size, number of nodes in the cluster) require different number of executor cores and different size of executor memory to obtain optimum results for execution time and average node utilization.

Download Full-text

Performance Evaluation of a Big Data Application on Apache Spark

10.32920/ryerson.14651544.v1 ◽

2021 ◽

Author(s):

Jeanne Alcantara

Keyword(s):

Big Data ◽

Performance Evaluation ◽

Execution Time ◽

Apache Spark ◽

Massive Data ◽

Application Performance ◽

Data Application ◽

Size Number ◽

Big Data Application ◽

The Impact

Download Full-text

Preliminary Benefits of Big Data in the Construction Industry: A Case Study

Proceedings of the Institution of Civil Engineers - Management Procurement and Law ◽

10.1680/jmapl.21.00027 ◽

2022 ◽

pp. 1-11

Author(s):

Bernard Tuffour Atuahene ◽

Sittimont Kanjanabootra ◽

Thayaparan Gajendran

Keyword(s):

Big Data ◽

Construction Industry ◽

Construction Projects ◽

Big Data Applications ◽

Data Application ◽

Construction Firm ◽

Big Data Application ◽

Tangible Benefit ◽

Design Construction

Big data applications consist of i) data collection using big data sources, ii) storing and processing the data, and iii) analysing data to gain insights for creating organisational benefit. The influx of digital technologies and digitization in the construction process includes big data as one newly emerging digital technology adopted in the construction industry. Big data application is in a nascent stage in construction, and there is a need to understand the tangible benefit(s) that big data can offer the construction industry. This study explores the benefits of big data in the construction industry. Using a qualitative case study design, construction professionals in an Australian Construction firm were interviewed. The research highlights that the benefits of big data include reduction of litigation amongst projects stakeholders, enablement of near to real-time communication, and facilitation of effective subcontractor selection. By implication, on a broader scale, these benefits can improve contract management, procurement, and management of construction projects. This study contributes to an ongoing discourse on big data application, and more generally, digitization in the construction industry.

Download Full-text

Hybrid Data Protection Framework to Enhance A2O Functionality in Production Database Virtualization

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f1203.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 5691-5697

Keyword(s):

Data Storage ◽

Performance Indicator ◽

Development Trend ◽

Similar Data ◽

Primary Role ◽

City Development ◽

Big Data Applications ◽

Tightly Coupled ◽

Deep Integration ◽

The Impact

Deluge of information flows in the unprecedented scenario of smart city development trend, hence prone to issues on stability, reliability and availability. Smart data storage resources are vulnerable to provide functionality Always Available Online (A2O) due to their inherent heavy dependence on System Down Time (SDT), Redundant Systems and Software Failure (RS2F) or whole/ multiple site failures. In the absence of Production Database Management Services (PDMS), duplicate deployment of similar data on disjoint but similar architecture provides a Tightly Coupled Ultimate System (TCUS), which assures A2O mutually exclusive services. In this paper, we investigated active Data Guard (aDG) and Data Guard (DG) role management or switchover for a real time transition performed for database at standby state to cope up both planned maintenance and accidental RS2F events. We expose our results for deep integration of aDGs with ODB in-terms of Fast Sync to align synchronously at an ease of zero of wait states for disk I/O and configurability to Null Data Loss (NDL). Over a large range of remote or standby databases NDL make it certain to zero failover. The impact of aDG Fast-Start Failover in the cloud proximity make sure guaranteed NDL in synchronously and near NDL protection asynchronously. Hence, avoids unusual overhead impeding disk I/O and eventually on a primary database. We observe the key performance indicator in failover does not restart the standby database for primary role resumption, but introduce cloud proximity as a new primary database and the process is performed without any intervention of manual migration. The reliability of aDG Redo is flexible across not only standby databases but also primary sites running different operating system over diverse hardware platforms. The Redo capability enables migration with minimal downtime for any transaction in the clouds, therefore adds an inevitable functionality to big data applications.

Download Full-text

Federated Learning in Big Data Application and Sharing

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200721 ◽

2020 ◽

Author(s):

Jing Yang ◽

Quan Zhang ◽

Kunpeng Liu ◽

Peng Jin ◽

Guoyi Zhao

Keyword(s):

Big Data ◽

Data Privacy ◽

The State ◽

Ideal Model ◽

Local Data ◽

Data Safety ◽

Big Data Applications ◽

Data Application ◽

Big Data Application ◽

Uniform Model

In recent years, electricity big data has extensive applications in the grid companies across the provinces. However, certain problems are encountered including, the inability to generate an ideal model using the isolated data possessed by each company, and the priority concerns for data privacy and safety during big data application and sharing. In this pursuit, the present research envisaged the application of federated learning to protect the local data, and to build a uniform model for different companies affiliated to the State Grid. Federated learning can serve as an essential means for realizing the grid-wide promotion of the achievements of big data applications, while ensuring the data safety.

Download Full-text

The Impact of Big Data Applications on Supply Chain Management

Proceedings of the 6th International Asia Conference on Industrial Engineering and Management Innovation ◽

10.2991/978-94-6239-148-2_13 ◽

2015 ◽

pp. 127-135

Author(s):

Dong-xiang Zhang ◽

Bin Cheng

Keyword(s):

Big Data ◽

Supply Chain ◽

Supply Chain Management ◽

Chain Management ◽

Big Data Applications ◽

The Impact

Download Full-text

ECL-watch: A big data application performance tuning tool in the HPCC systems platform

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8258263 ◽

2017 ◽

Cited By ~ 2

Author(s):

Lili Xu ◽

Edin Muharemagic ◽

Flavio Villanustre ◽

Amy Apon

Keyword(s):

Big Data ◽

Performance Tuning ◽

Application Performance ◽

Data Application ◽

Big Data Application ◽

Hpcc Systems

Download Full-text

The Impact of Big Data on Security

Big Data ◽

10.4018/978-1-4666-9840-6.ch068 ◽

2016 ◽

pp. 1495-1518

Author(s):

Mohammad Alaa Hussain Al-Hamami

Keyword(s):

Social Media ◽

Big Data ◽

Management System ◽

Database Management ◽

Database Systems ◽

Structured Data ◽

Database Management System ◽

Unstructured Data ◽

And Behavior ◽

The Impact

Big Data is comprised systems, to remain competitive by techniques emerging due to Big Data. Big Data includes structured data, semi-structured and unstructured. Structured data are those data formatted for use in a database management system. Semi-structured and unstructured data include all types of unformatted data including multimedia and social media content. Among practitioners and applied researchers, the reaction to data available through blogs, Twitter, Facebook, or other social media can be described as a “data rush” promising new insights about consumers' choices and behavior and many other issues. In the past Big Data has been used just by very large organizations, governments and large enterprises that have the ability to create its own infrastructure for hosting and mining large amounts of data. This chapter will show the requirements for the Big Data environments to be protected using the same rigorous security strategies applied to traditional database systems.

Download Full-text

Modeling and Indexing Spatiotemporal Trajectory Data in Non-Relational Databases

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Managing Big Data in Cloud Computing Environments ◽

10.4018/978-1-4666-9834-5.ch006 ◽

2016 ◽

pp. 133-162 ◽

Cited By ~ 7

Author(s):

Berkay Aydin ◽

Vijay Akkineni ◽

Rafal A Angryk

Keyword(s):

Data Storage ◽

Relational Databases ◽

Large Scale ◽

Database Systems ◽

Distributed Database ◽

Data Models ◽

Trajectory Data ◽

Nosql Databases ◽

Distributed Database Systems ◽

Partitioned Data

With the ever-growing nature of spatiotemporal data, it is inevitable to use non-relational and distributed database systems for storing massive spatiotemporal datasets. In this chapter, the important aspects of non-relational (NoSQL) databases for storing large-scale spatiotemporal trajectory data are investigated. Mainly, two data storage schemata are proposed for storing trajectories, which are called traditional and partitioned data models. Additionally spatiotemporal and non-spatiotemporal indexing structures are designed for efficiently retrieving data under different usage scenarios. The results of the experiments exhibit the advantages of utilizing data models and indexing structures for various query types.

Download Full-text