scholarly journals An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study

2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Angelo Augusto Frozza ◽  
Eduardo Dias Defreyn ◽  
Ronaldo Dos Santos Mello

Although NoSQL databases do not require a schema a priori, being aware of the database schema is essential for activities like data integration, data validation, or data interoperability. This paper presents a process for the extraction of columnar NoSQL database schemas. We adopt JSON as a canonical format for data representation, and we validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we innovate by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, and a resulting schema that follows the JSON Schema format.

2020 ◽  
Author(s):  
Angelo Augusto Frozza ◽  
Eduardo Dias Defreyn ◽  
Ronaldo Dos Santos Mello

Although NoSQL Databases do not require a schema a priori, to be aware of the database schema is essential for activities like data integration, data validation or data interoperability. This paper presents a process for inference of columnar NoSQL DB schemas. We validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we novel by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, as well as a generated schema that follows the JSON Schema format.


The chapter presents a real case study of the integration of relational and NoSQL databases. The example of a real project related to vehicle registration, particularly to testing vehicles for compliance with environmental standards, explains how those two worlds can be integrated. Oracle database is used as a relational database, while MongoDB is used as NoSQL database. The chapter sustains that the COMN notation can be successfully used in the process of modeling both relational and nonrelational data. All three ways of integration of relational and NoSQL databases are tested. The native solution was tested by using of native drivers for communication with Oracle and MongoDB databases. The hybrid solution used a Unity product. The reducing-to-one option, in this case, SQL, was tested on Oracle database. The capabilities of Oracle 12c database to work both with relational and nonrelational data by using SQL were tested.


Author(s):  
Vinod Kumar ◽  
Ramjeevan Singh Thakur

With every passing day, data generation is increasing exponentially, its volume, variety, velocity are making it quite challenging to analyze, interpret, visualize for gaining the greater insights from the available data. Billions of networked sensors are being embedded in devices such as smart phones, automobiles, social media sites, laptop, PC's and industrial machines etc. that operates, generate and communicate data. Thus, the data obtained from various resources exists in structured, semi-structured and unstructured form. The traditional database system is not suitable to handle these data formats. Therefore, new tools and techniques are developed to work with these data. NoSQL is one of them. Currently, many NoSQL database are available in the market, each one of them specially designed to solve specific type of data handling problems, most of the NoSQL databases are developed with special attention to problem of business organizations and enterprises. The chapter focuses various aspects of NoSQL as tool for handling the big data.


2021 ◽  
Vol 342 ◽  
pp. 05001
Author(s):  
Ioan Cristian Schuszter ◽  
Marius Cioca

Fault-tolerant systems are an important discussion subject in our world of interconnected devices. One of the major failure points of every distributed infrastructure is the database. A data migration or an overload of one of the servers could lead to a cascade of failures and service downtime for the users. NoSQL databases sacrifice some of the consistency provided by traditional SQL databases while privileging availability and partition tolerance. This paper presents the design and implementation of a distributed in-memory database that is based on the actor model. The benefits of the actor model and development using functional languages are detailed, and suitable performance metrics are presented. A case study is also performed, showcasing the system’s capacity to quickly recover from the loss of one of its machines and maintain functionality.


Data ◽  
2019 ◽  
Vol 4 (4) ◽  
pp. 148 ◽  
Author(s):  
Obaid Alotaibi ◽  
Eric Pardede

Relational database has been the de-facto database choice in most IT applications. In the last decade there has been increasing demand for applications that have to deal with massive and un-normalized data. To satisfy the demand, there is a big shift to use more relaxed databases in the form of NoSQL databases. Alongside with this shift, there is a need to have a structured methodology to transform existing data in relational database (RDB) to NoSQL database. The transformation from RDB to NoSQL database has become more challenging because there is no current standard on NoSQL database. The aim of this paper is to propose transformation rules of RDB Schema to various NoSQL database schema, namely document-based, column-based and graph-based databases. The rules are applied based on the type of relationships that can appear in data within a database. As a proof of concept, we apply the rules into a case study using three NoSQL databases, namely MongoDB, Cassandra, and Neo4j. A set of queries is run in these databases to demonstrate the correctness of the transformation results. In addition, the completeness of our transformation rules are compared against existing work.


Azure SQL and Atlas Mongodb NoSQL(Azure instance) databases are the most popular, systematic process to database solutions. Which Azure SQL database is also referred to as RDBMS (Relational Database Management Systems). The data are structured into tables or associations. The Atlas Mongodb NoSQL database is called a non-relational database management systems. The data are included in unstructured tables or associations. In this research, evaluate both the Azure SQL and Atlas Mongodb NoSQL databases. During the experiment compare the loading time, response time, and retrieval time of both Azure SQL and Atlas Mongodb NoSQL databases, and justify which one is fast, efficient and better performance.


2019 ◽  
Vol 7 (7) ◽  
pp. 351-359
Author(s):  
Yashraj Sharma ◽  
Yashasvi Sharma

On the basis of reliability, rational models are useful but not in terms of systems which involve huge amount of data; in such cases, non-relational models are much more useful. To store large chunks of data, NoSQL databases are used. NoSQL databases are scalable and wide ranged because they are non-relationally distributed. In relational databases, it was not possible to manage data which involved very large number of Big Data applications hence the concept of NoSQL database was introduced. There are a lot of advantages of NoSQL which not only involve its own features but also some features of relational database management system. The severe benefit of NoSQL database is that it is an open source system which helps to adapt many numbers of features for newly generated applications. This paper is focused on understanding the concepts of non-relational database system architecture with relational database system architecture and figure out the advantages and disadvantages of both simultaneously.


2019 ◽  
Vol 9 (1) ◽  
pp. 561-570
Author(s):  
Khoa Dang ◽  
Igor Trotskii

AbstractEver growing building energy consumption requires advanced automation and monitoring solutions in order to improve building energy efficiency. Furthermore, aggregation of building automation data, similarly to industrial scenarios allows for condition monitoring and fault diagnostics of the Heating, Ventilations and Air Conditioning (HVAC) system. For existing buildings, the commissioned SCADA solutions provide historical trends, alarms management and setpoint curve adjustments, which are essential features for facility management personnel. The development in Internet of Things (IoT) and Industry 4.0, as well as software microservices enables higher system integration, data analytics and rich visualization to be integrated into the existing infrastructure. This paper presents the implementation of a technology stack, which can be used as a framework for improving existing and new building automation systems by increasing interconnection and integrating data analytics solutions. The implementation solution is realized and evaluated for a nearly zero energy building, as a case study.


Sign in / Sign up

Export Citation Format

Share Document