scholarly journals Ingestion of a Data Lake into a NoSQL Data Warehouse: The Case of Relational Databases

2021 ◽  
Author(s):  
Fatma Abdelhedi ◽  
Rym Jemmali ◽  
Gilles Zurfluh
Author(s):  
Anderson Chaves Carniel ◽  
Aried de Aguiar Sa ◽  
Vinicius Henrique Porto Brisighello ◽  
Marcela Xavier Ribeiro ◽  
Renato Bueno ◽  
...  

2018 ◽  
Vol 14 (3) ◽  
pp. 44-68 ◽  
Author(s):  
Fatma Abdelhedi ◽  
Amal Ait Brahim ◽  
Gilles Zurfluh

Nowadays, most organizations need to improve their decision-making process using Big Data. To achieve this, they have to store Big Data, perform an analysis, and transform the results into useful and valuable information. To perform this, it's necessary to deal with new challenges in designing and creating data warehouse. Traditionally, creating a data warehouse followed well-governed process based on relational databases. The influence of Big Data challenged this traditional approach primarily due to the changing nature of data. As a result, using NoSQL databases has become a necessity to handle Big Data challenges. In this article, the authors show how to create a data warehouse on NoSQL systems. They propose the Object2NoSQL process that generates column-oriented physical models starting from a UML conceptual model. To ensure efficient automatic transformation, they propose a logical model that exhibits a sufficient degree of independence so as to enable its mapping to one or more column-oriented platforms. The authors provide experiments of their approach using a case study in the health care field.


Author(s):  
Dr. C. K. Gomathy

Abstract: Apache Sqoop is mainly used to efficiently transfer large volumes of data between Apache Hadoop and relational databases. It helps to certain tasks, such as ETL (Extract transform load) processing, from an enterprise data warehouse to Hadoop, for efficient execution at a much less cost. Here first we import the table which presents in MYSQL Database with the help of command-line interface application called Sqoop and there is a chance of addition of new rows and updating new rows then we have to execute the query again. So, with the help of our project there is no need of executing queries again for that we are using Sqoop job, which consists of total commands for import and next after import we retrieve the data from hive using Java JDBC and we convert the data to JSON Format, which consists of data in an organized way and easy to access manner by using GSON Library. Keywords: Sqoop, Json, Gson, Maven and JDBC


2008 ◽  
pp. 2364-2370
Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (e.g. Wal-Mart’s data warehouse) and astronomical data (e.g. SKICAT) in scientific research, with textual data providing a descriptive rather than a central role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for non-numeric data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


2008 ◽  
pp. 408-428
Author(s):  
Manuel Serrano ◽  
Coral Calero ◽  
Mario Piattini

Data warehouses are large repositories that integrate data from several sources for analysis and decision support. Data warehouse quality is crucial, because a bad data warehouse design may lead to the rejection of the decision support system or may result in non-productive decisions. In the last years, we have been working on the definition and validation of software metrics in order to assure data warehouse quality. Some of the metrics are adapted directly from previous ones defined for relational databases, and others are specific for data warehouses. In this paper, we present part of the empirical work we have developed in order to know if the proposed metrics can be used as indicators of data warehouse quality. Previously, we have developed an experiment and its replication, and in this paper, we present the second replication we have made with the purpose of assessing data warehouse maintainability. As a result of the whole empirical work, we have obtained a subset of the proposed metrics that seem to be good indicators of data warehouse quality.


Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (in say Wal-Mart’s data warehouse (Westerman, 2000)) and astronomical data (for example SKICAT) in scientific research, with textual data providing a descriptive rather than a central analytic role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for ‘non-numeric’ data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model, and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


Author(s):  
Wilfred Ng ◽  
Mark Levene

Data warehousing is a corporate strategy that needs to integrate information from several sources of separately developed Database Management Systems (DBMSs). A future DBMS of a data warehouse should provide adequate facilities to manage a wide range of information arising from such integration. We propose that the capabilities of database languages should be enhanced to manipulate user-defined data orderings, since business queries in an enterprise usually involve order. We extend the relational model to incorporate partial orderings into data domains and describe the ordered relational model. We have already defined and implemented a minimal extension of SQL, called OSQL, which allows querying over ordered relational databases. One of the important facilities provided by OSQL is that it allows users to capture the underlying semantics of the ordering of the data for a given application. Herein we demonstrate that OSQL aided with a package discipline can be an effective means to manage the inter-related operations and the underlying data domains of a wide range of advanced applications that are vital in data warehousing, such as temporal, incomplete and fuzzy information. We present the details of the generic operations arising from these applications in the form of three OSQL packages called: OSQL_TIME, OSQL_INCOMP and OSQL_FUZZY.


2018 ◽  
Vol 6 (3) ◽  
pp. 1-6
Author(s):  
Valdrin Haxhiu

Data warehouses are a collection of several databases, whose goal is to help different companies and corporations make important decisions about their activities. These decisions are taken from the analyses that are made to the data within the data warehouse. These data are taken from data that companies and corporations collect on daily basis from their branches that may be located in different cities, regions, states and continents. Data that are entered to data warehouses are historical data and they represent that part of data that is important for making decisions. These data go under a transformation process in order to accommodate with the structure of the objects within the databases in the data warehouse. This is done because the structure of the relational databases is not similar with the structure of the databases (multidimensional databases) within the data warehouse. The first ones are optimized for transactions on daily basis like: entering, changing, deleting and retrieving data through simple queries, the second ones are optimized for retrieving data through multidimensional queries, which enable us to extract important information. This information helps to make important decisions by learning which are the weak points and the strong points of the company, in order to invest more on the weak points and to strengthen the strong points, increasing the profits of the company. The goal of this paper is to treat data analyses for decision making from a data warehouse by using OLAP (online analytical processing) analysis. For this treatment we used the Analysis Services of Microsoft SQL Server 2016 platform. We analyzed the data of an IT Store with branches in different cities in Kosovo and came to a conclusion for some sales trends. This paper emphasizes the role of data warehouses in decision making.


Author(s):  
John M. Artz

Data warehousing is an emerging technology that greatly extends the capabilities of relational databases specifically in the analysis of very large sets of time-oriented data. The emergence of data warehousing has been somewhat eclipsed over the past decade by the simultaneous emergence of Web technologies. However, Web technologies and data warehousing have some natural synergies that are not immediately obvious. First, Web technologies make data warehouse data more easily available to a much wider variety of users. Second, data warehouse technologies can be used to analyze traffic to a Web site in order to gain a much better understanding of the visitors to the Web site. It is this second synergy that is the focus of this article.


Author(s):  
Guntis Bārzdiņš ◽  
Sergejs Rikačovs ◽  
Marta Veilande ◽  
Mārtiņš Zviedris

Ontological Re-engineering of Medical Databases This paper describes data export from multiple medical databases (relational databases) into a single shared Medical Data Warehouse (RDF database structured according to an integrated OWL ontology). The exported data is conveniently accessible via SPARQL or via graphical query language ViziQuer based on UML profile for OWL. The approach is illustrated on one of Latvian Medical databases - Injury Register.


Sign in / Sign up

Export Citation Format

Share Document