Comparative Analysis of Nosql Specimen with Relational Data Store for Big Data in Cloud

Author(s):  
Sangeeta Gupta

The massive amount of data collected by various fields is a challenging aspect for analysis using the available storage technologies. Relational databases are a traditional approach of data storage more suitable for structured data formats and are constrained by ACID properties. As the modern world data in the form of word documents, pdf files, audio and video formats is unstructured, where tables and schema definition is not a major concern. Relational databases such as Mysql may not be suitable to serve such Bigdata. An alternate approach is to use the emerging Nosql databases. This paper presents a comparative analysis of Nosql types such as Hbase, Mongodb, Simple DB and Big Table with relational database like Mysql and specifies their limitations when applied to real world problems. It also proposes solution to overcome these limitations using an integrated data store which serve to be beneficial over the mentioned Nosql and Mysql stores in terms of efficiently implementing simple and complex queries yielding better performance.

Author(s):  
Renu Chaudhary ◽  
Gagangeet Singh

NoSQL databases (commonly interpreted by developers as „not only SQL databases‟ and not „no SQL‟) is an emerging alternative to the most widely used relational databases. As the name suggests, it does not completely replace SQL but compliments it in such a way that they can co-exist. In this paper we will be discussing the NoSQL data model, types of NoSQL data stores, characteristics and features of each data store, query languages used in NoSQL, advantages and disadvantages of NoSQL over RDBMS and the future prospects of NoSQL. Motivation/Background:NoSQL systems exhibit the ability to store and index arbitrarily big data sets while enabling a large amount of concurrent user requests. Method:Many people think NoSQL is a derogatory term created to poke at SQL. In reality, the term means Not Only SQL. The idea is that both technologies can coexist and each has its place. Results:Large-scale data processing (parallel processing over distributed systems); Embedded IR (basic machine-to-machine information look-up & retrieval); Exploratory analytics on semi-structured data (expert level); Large volume data storage (unstructured, semi-structured, small-packet structured). Conclusions:This study report motivation to provide an independent understanding of the strengths and weaknesses of various NoSQL database approaches to supporting applications that process huge volumes of data; as well as to provide a global overview of this non-relational  NoSQL databases.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255562
Author(s):  
Eman Khashan ◽  
Ali Eldesouky ◽  
Sally Elghamrawy

The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneous data models via numerous APIs and query languages imposes challenging tasks on multi-data processing developers. Indeed, complex queries concerning homogenous data structures cannot currently be performed in a declarative manner when found in single data storage applications and therefore require additional development efforts. Many models were presented in order to address complex queries Via multistore applications. Some of these models implemented a complex unified and fast model, while others’ efficiency is not good enough to solve this type of complex database queries. This paper provides an automated, fast and easy unified architecture to solve simple and complex SQL and NoSQL queries over heterogeneous data stores (CQNS). This proposed framework can be used in cloud environments or for any big data application to automatically help developers to manage basic and complicated database queries. CQNS consists of three layers: matching selector layer, processing layer, and query execution layer. The matching selector layer is the heart of this architecture in which five of the user queries are examined if they are matched with another five queries stored in a single engine stored in the architecture library. This is achieved through a proposed algorithm that directs the query to the right SQL or NoSQL database engine. Furthermore, CQNS deal with many NoSQL Databases like MongoDB, Cassandra, Riak, CouchDB, and NOE4J databases. This paper presents a spark framework that can handle both SQL and NoSQL Databases. Four scenarios’ benchmarks datasets are used to evaluate the proposed CQNS for querying different NoSQL Databases in terms of optimization process performance and query execution time. The results show that, the CQNS achieves best latency and throughput in less time among the compared systems.


2020 ◽  
Vol 9 (5) ◽  
pp. 331
Author(s):  
Dongming Guo ◽  
Erling Onstein

Geospatial information has been indispensable for many application fields, including traffic planning, urban planning, and energy management. Geospatial data are mainly stored in relational databases that have been developed over several decades, and most geographic information applications are desktop applications. With the arrival of big data, geospatial information applications are also being modified into, e.g., mobile platforms and Geospatial Web Services, which require changeable data schemas, faster query response times, and more flexible scalability than traditional spatial relational databases currently have. To respond to these new requirements, NoSQL (Not only SQL) databases are now being adopted for geospatial data storage, management, and queries. This paper reviews state-of-the-art geospatial data processing in the 10 most popular NoSQL databases. We summarize the supported geometry objects, main geometry functions, spatial indexes, query languages, and data formats of these 10 NoSQL databases. Moreover, the pros and cons of these NoSQL databases are analyzed in terms of geospatial data processing. A literature review and analysis showed that current document databases may be more suitable for massive geospatial data processing than are other NoSQL databases due to their comprehensive support for geometry objects and data formats and their performance, geospatial functions, index methods, and academic development. However, depending on the application scenarios, graph databases, key-value, and wide column databases have their own advantages.


2018 ◽  
Vol 14 (3) ◽  
pp. 44-68 ◽  
Author(s):  
Fatma Abdelhedi ◽  
Amal Ait Brahim ◽  
Gilles Zurfluh

Nowadays, most organizations need to improve their decision-making process using Big Data. To achieve this, they have to store Big Data, perform an analysis, and transform the results into useful and valuable information. To perform this, it's necessary to deal with new challenges in designing and creating data warehouse. Traditionally, creating a data warehouse followed well-governed process based on relational databases. The influence of Big Data challenged this traditional approach primarily due to the changing nature of data. As a result, using NoSQL databases has become a necessity to handle Big Data challenges. In this article, the authors show how to create a data warehouse on NoSQL systems. They propose the Object2NoSQL process that generates column-oriented physical models starting from a UML conceptual model. To ensure efficient automatic transformation, they propose a logical model that exhibits a sufficient degree of independence so as to enable its mapping to one or more column-oriented platforms. The authors provide experiments of their approach using a case study in the health care field.


Author(s):  
Zongmin Ma ◽  
Li Yan

The resource description framework (RDF) is a model for representing information resources on the web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the web, a huge amount of RDF data is being proliferated and becoming available. So, RDF data management is of increasing importance and has attracted attention in the database community as well as the Semantic Web community. Currently, much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (not only SQL) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


Author(s):  
Zongmin Ma ◽  
Li Yan

The Resource Description Framework (RDF) is a model for representing information resources on the Web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the Web, a huge amount of RDF data is being proliferated and becoming available. So RDF data management is of increasing importance, and has attracted attentions in the database community as well as the Semantic Web community. Currently much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (“not only SQL”) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


Author(s):  
Berkay Aydin ◽  
Vijay Akkineni ◽  
Rafal A Angryk

With the ever-growing nature of spatiotemporal data, it is inevitable to use non-relational and distributed database systems for storing massive spatiotemporal datasets. In this chapter, the important aspects of non-relational (NoSQL) databases for storing large-scale spatiotemporal trajectory data are investigated. Mainly, two data storage schemata are proposed for storing trajectories, which are called traditional and partitioned data models. Additionally spatiotemporal and non-spatiotemporal indexing structures are designed for efficiently retrieving data under different usage scenarios. The results of the experiments exhibit the advantages of utilizing data models and indexing structures for various query types.


2018 ◽  
Vol 8 (2) ◽  
pp. 113-129 ◽  
Author(s):  
Sangeeta Gupta ◽  
Narsimha Gugulothu

The work presented in this article brings into light the security issues with NoSQL databases- MongoDB, HBase and Cassandra. A literature survey is carried out to identify the modern world scenarios of the applications using NoSQL databases and limitations are identified. A solution is proposed by designing a framework to achieve security for the web crawler applications using Cassandra, a NoSQL data store. Experimental results are presented to show the effectiveness of the work by designing an appropriate algorithm to trigger security for scalable web crawler architecture. Amazon Web Services (AWS), a familiar cloud platform, and bitnami cloud hosting services are used to procure the required servers and virtual machines. Performance changes on the virtual machines are brought into consideration before and after encrypting and decrypting the voluminous data and an improvement in efficiency is observed with the proposed model.


Information ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 241
Author(s):  
Geomar A. Schreiner ◽  
Denio Duarte ◽  
Ronaldo dos S. Melo

Several data-centric applications today produce and manipulate a large volume of data, the so-called Big Data. Traditional databases, in particular, relational databases, are not suitable for Big Data management. As a consequence, some approaches that allow the definition and manipulation of large relational data sets stored in NoSQL databases through an SQL interface have been proposed, focusing on scalability and availability. This paper presents a comparative analysis of these approaches based on an architectural classification that organizes them according to their system architectures. Our motivation is that wrapping is a relevant strategy for relational-based applications that intend to move relational data to NoSQL databases (usually maintained in the cloud). We also claim that this research area has some open issues, given that most approaches deal with only a subset of SQL operations or give support to specific target NoSQL databases. Our intention with this survey is, therefore, to contribute to the state-of-art in this research area and also provide a basis for choosing or even designing a relational-to-NoSQL data wrapping solution.


Author(s):  
Eman A. Khashan ◽  
Ali I. El Desouky ◽  
Sally M. Elghamrawy

The increasing of data on the web poses major confrontations. The amount of stored data and query data sources have become needful features for huge data systems. There are a large number of platforms used to handle the NoSQL database model such as: Spark, H2O and Hadoop HDFS / MapReduce, which are suitable for controlling and managing the amount of big data. Developers of different applications impose data stores on difficult tasks by interacting with mixed data models through different APIs and queries. In this paper, a complex SQL Query and NoSQL (CQNS) framework that acts as an interpreter sends complex queries received from any data store to its corresponding executable engine called CQNS. The proposed framework supports application queries and database transformation at the same time, which in turn speeds up the process. Moreover, CQNS handles many NoSQL databases like MongoDB and Cassandra. This paper provides a spark framework that can handle SQL and NoSQL databases. This work also examines the importance of MongoDB block sharding and composition. Cassandra database deals with two types of sections vertex and edge Portioning. The four scenarios criteria datasets are used to evaluate the proposed CQNS to query the various NOSQL databases in terms of optimization performance and timing of query execution. The results show that among the comparative system, CQNS achieves optimum latency and productivity in less time.


Sign in / Sign up

Export Citation Format

Share Document