XML2HBase: Storing and querying large collections of XML documents using a NoSQL database system

Author(s):  
Liang Bao ◽  
Jin Yang ◽  
Chase Q. Wu ◽  
Haiyang Qi ◽  
Xin Zhang ◽  
...  
Author(s):  
Khaled Dehdouh

In the big data warehouses context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this chapter, the main contribution is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.


Author(s):  
Vinod Kumar ◽  
Ramjeevan Singh Thakur

With every passing day, data generation is increasing exponentially, its volume, variety, velocity are making it quite challenging to analyze, interpret, visualize for gaining the greater insights from the available data. Billions of networked sensors are being embedded in devices such as smart phones, automobiles, social media sites, laptop, PC's and industrial machines etc. that operates, generate and communicate data. Thus, the data obtained from various resources exists in structured, semi-structured and unstructured form. The traditional database system is not suitable to handle these data formats. Therefore, new tools and techniques are developed to work with these data. NoSQL is one of them. Currently, many NoSQL database are available in the market, each one of them specially designed to solve specific type of data handling problems, most of the NoSQL databases are developed with special attention to problem of business organizations and enterprises. The chapter focuses various aspects of NoSQL as tool for handling the big data.


2020 ◽  
Vol 12 (1) ◽  
pp. 1-24
Author(s):  
Khaled Dehdouh ◽  
Omar Boussaid ◽  
Fadila Bentayeb

In the Big Data warehouse context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this article, the focus is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.


2001 ◽  
Vol 30 (3) ◽  
pp. 20-26 ◽  
Author(s):  
Jayavel Shanmugasundaram ◽  
Eugene Shekita ◽  
Jerry Kiernan ◽  
Rajasekar Krishnamurthy ◽  
Efstratios Viglas ◽  
...  

2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Angelo Augusto Frozza ◽  
Eduardo Dias Defreyn ◽  
Ronaldo Dos Santos Mello

Although NoSQL databases do not require a schema a priori, being aware of the database schema is essential for activities like data integration, data validation, or data interoperability. This paper presents a process for the extraction of columnar NoSQL database schemas. We adopt JSON as a canonical format for data representation, and we validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we innovate by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, and a resulting schema that follows the JSON Schema format.


Author(s):  
Rodrigo Aniceto ◽  
Rene Xavier ◽  
Maristela Holanda ◽  
Maria Emilia Walter ◽  
Sergio Lifschitz

2020 ◽  
Author(s):  
Angelo Augusto Frozza ◽  
Eduardo Dias Defreyn ◽  
Ronaldo Dos Santos Mello

Although NoSQL Databases do not require a schema a priori, to be aware of the database schema is essential for activities like data integration, data validation or data interoperability. This paper presents a process for inference of columnar NoSQL DB schemas. We validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we novel by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, as well as a generated schema that follows the JSON Schema format.


2003 ◽  
pp. 1-35 ◽  
Author(s):  
Nick Bassiliades ◽  
Ioannis Vlahavas ◽  
Dimitros Sampson

In this chapter, we propose the use of first-order logic, in the form of deductive database rules, as a query language for XML data, and we present X-Device, an extension of the deductive object-oriented database system Device, for storing and querying XML data. XML documents are stored into the OODB by automatically mapping the DTD to an object schema. XML elements are treated either as classes or attributes based on their complexity, without loosing the relative order of elements in the original document. Furthermore, this chapter describes the extension of the system’s deductive rule query language with second-order variables, general path and ordering expressions, for querying over the stored, tree-structured XML data and constructing XML documents as a result. The extensions were implemented by translating all the extended features into the basic, first-order deductive rule language of Device using meta-data about stored XML objects.


2019 ◽  
Vol 7 (7) ◽  
pp. 351-359
Author(s):  
Yashraj Sharma ◽  
Yashasvi Sharma

On the basis of reliability, rational models are useful but not in terms of systems which involve huge amount of data; in such cases, non-relational models are much more useful. To store large chunks of data, NoSQL databases are used. NoSQL databases are scalable and wide ranged because they are non-relationally distributed. In relational databases, it was not possible to manage data which involved very large number of Big Data applications hence the concept of NoSQL database was introduced. There are a lot of advantages of NoSQL which not only involve its own features but also some features of relational database management system. The severe benefit of NoSQL database is that it is an open source system which helps to adapt many numbers of features for newly generated applications. This paper is focused on understanding the concepts of non-relational database system architecture with relational database system architecture and figure out the advantages and disadvantages of both simultaneously.


Sign in / Sign up

Export Citation Format

Share Document