Review of Contemporary Database Design and Implication for Big Data

2021 ◽  
Vol 12 (4) ◽  
pp. 1-11
Author(s):  
Halima E. Samra ◽  
Alice S. Li ◽  
Ben Soh ◽  
Mohammed A. AlZain

In general, databases provide a single comprehensive view suitable for analysis and relevant information for a variety of organizational purposes. The intent of this paper is to review the contemporary database design in terms of data modelling, process modelling, relational databases, and data storage. The review indicates the contemporary relational database architecture provides numerous advantages such as high consistency and availability. However, it is not suitable for big data because its performance decreases as the data grows and faces scalability constraints as it is impossible to scale horizontally, and its vertical growth is limited. An implication here is that big data requires more than a relational database and the traditional SQL.

2009 ◽  
pp. 2360-2383
Author(s):  
Guntis Barzdins ◽  
Janis Barzdins ◽  
Karlis Cerans

This chapter introduces the UML profile for OWL as an essential instrument for bridging the gap between the legacy relational databases and OWL ontologies. We address one of the long-standing relational database design problems where initial conceptual model (a semantically clear domain conceptualization ontology) gets “lost” during conversion into the normalized database schema. The problem is that such “loss” makes database inaccessible for direct query by domain experts familiar with the conceptual model only. This problem can be avoided by exporting the database into RDF according to the original conceptual model (OWL ontology) and formulating semantically clear queries in SPARQL over the RDF database. Through a detailed example we show how UML/OWL profile is facilitating this new and promising approach.


2020 ◽  
Vol 10 (23) ◽  
pp. 8524
Author(s):  
Cornelia A. Győrödi ◽  
Diana V. Dumşe-Burescu ◽  
Doina R. Zmaranda ◽  
Robert Ş. Győrödi ◽  
Gianina A. Gabor ◽  
...  

In the current context of emerging several types of database systems (relational and non-relational), choosing the type and database system for storing large amounts of data in today’s big data applications has become an important challenge. In this paper, we aimed to provide a comparative evaluation of two popular open-source database management systems (DBMSs): MySQL as a relational DBMS and, more recently, as a non-relational DBMS, and CouchDB as a non-relational DBMS. This comparison was based on performance evaluation of CRUD (CREATE, READ, UPDATE, DELETE) operations for different amounts of data to show how these two databases could be modeled and used in an application and highlight the differences in the response time and complexity. The main objective of the paper was to make a comparative analysis of the impact that each specific DBMS has on application performance when carrying out CRUD requests. To perform the analysis and to ensure the consistency of tests, two similar applications were developed in Java, one using MySQL and the other one using CouchDB database; these applications were further used to evaluate the time responses for each database technology on the same CRUD operations on the database. Finally, a comprehensive discussion based on the results of the analysis was performed that centered on the results obtained and several conclusions were revealed. Advantages and drawbacks for each DBMS are outlined to support a decision for choosing a specific type of DBMS that could be used in a big data application.


2017 ◽  
Vol 30 (3) ◽  
pp. 503-525
Author(s):  
Kamal Hamaz ◽  
Fouzia Benchikha

Purpose With the development of systems and applications, the number of users interacting with databases has increased considerably. The relational database model is still considered as the most used model for data storage and manipulation. However, it does not offer any semantic support for the stored data which can facilitate data access for the users. Indeed, a large number of users are intimidated when retrieving data because they are non-technical or have little technical knowledge. To overcome this problem, researchers are continuously developing new techniques for Natural Language Interfaces to Databases (NLIDB). Nowadays, the usage of existing NLIDBs is not widespread due to their deficiencies in understanding natural language (NL) queries. In this sense, the purpose of this paper is to propose a novel method for an intelligent understanding of NL queries using semantically enriched database sources. Design/methodology/approach First a reverse engineering process is applied to extract relational database hidden semantics. In the second step, the extracted semantics are enriched further using a domain ontology. After this, all semantics are stored in the same relational database. The phase of processing NL queries uses the stored semantics to generate a semantic tree. Findings The evaluation part of the work shows the advantages of using a semantically enriched database source to understand NL queries. Additionally, enriching a relational database has given more flexibility to understand contextual and synonymous words that may be used in a NL query. Originality/value Existing NLIDBs are not yet a standard option for interfacing a relational database due to their lack for understanding NL queries. Indeed, the techniques used in the literature have their limits. This paper handles those limits by identifying the NL elements by their semantic nature in order to generate a semantic tree. This last is a key solution towards an intelligent understanding of NL queries to relational databases.


Big data is traditionally associated with distributed systems and this is understandable given that the volume dimension of Big Data appears to be best accommodated by the continuous addition of resources over a distributed network rather than the continuous upgrade of a central storage resource. Based on this implementation context, non- distributed relational database models are considered volume-inefficient and a departure from their usage contemplated by the database community. Distributed systems depend on data partitioning to determine chunks of related data and where in storage they can be accommodated. In existing Database Management Systems (DBMS), data partitioning is automated which in the opinion of this paper does not give the best results since partitioning is an NP-hard problem in terms of algorithmic time complexity. The NP-hardness is shown to be reduced by a partitioning strategy that relies on the discretion of the programmer which is more effective and flexible though requires extra coding effort. NP-hard problems are solved more effectively by a combination of discretion rather than full automation. In this paper, the partitioning process is reviewed and a programmer-based partitioning strategy implemented for an application with a relational DBMS backend. By doing this, the relational DBMS is made adaptive in the volume dimension of big data. The ACID properties (atomicity, consistency, isolation, and durability) of the relational database model which constitutes a major attraction especially for applications that process transactions is thus harnessed. On a more general note, the results of this research suggest that databases can be made adaptive in the areas of their weaknesses as a one-size-fits- all database management system may no longer be feasible.


10.28945/3199 ◽  
2008 ◽  
Author(s):  
Milos Bogdanovic ◽  
Aleksandar Stanimirovic ◽  
Nikola Davidovic ◽  
Leonid Stoimenov

Most universities where students study informational technologies and computer science have an introductory course dealing with the development and design of databases. These courses often include usage of database design tools. In this paper, the #EER tool is presented, the task of which is to make the process of relational databases design easier for the students and partially automatize it. The tool evolved due to the experience in using similar tools for educational purposes. It enables fast and efficient development of the relational database conceptual model and its automatized compilation into a relational model and further to data definition language (DDL) commands. #EER tool is based on the extended entity-relationship (EER) model for conceptual modeling of relational databases. Modular architecture of the tool, the development of which is based on the usage of the design patterns, along with the benefits that its usage brings, is also presented.


The chapter presents how relational databases answer to typical NoSQL features, and, vice versa, how NoSQL databases answer to typical relational features. Open issues related to the integration of relational and NoSQL databases, as well as next database generation features are discussed. The big relational database vendors have continuously worked to incorporate NoSQL features into their databases, as well as NoSQL vendors are trying to make their products more like relational databases. The convergence of these two groups of databases has been a driving force in the evolution of database market, in establishing a new level of focus to resolving big data requirements, and in enabling users to fully use data potential, wherever data is stored, in relational or NoSQL databases. In turn, the database of choice in the future will likely be one that provides the best of both worlds: flexible data model, high availability, and enterprise reliability.


Author(s):  
Mirella M. Moro ◽  
Lipyeow Lim ◽  
Yuan-Chi Chang

It is well known that XML has been widely adopted for its flexible and self-describing nature. However, relational data will continue to co-exist with XML for several different reasons one of which is the high cost of transferring everything to XML. In this context, data designers face the problem of modeling both relational and XML data within an integrated environment. This chapter highlights important questions on hybrid XML-relational database design and discusses use cases, requirements, and deficiencies in existing design methodologies especially in the light of data and schema evolution. The authors’ analysis results in several design guidelines and a series of challenges to be addressed by future research.


2018 ◽  
Vol 7 (2.6) ◽  
pp. 83
Author(s):  
Gourav Bathla ◽  
Rinkle Rani ◽  
Himanshu Aggarwal

Big data is a collection of large scale of structured, semi-structured and unstructured data. It is generated due to Social networks, Business organizations, interaction and views of social connected users. It is used for important decision making in business and research organizations. Storage which is efficient to process this large scale of data to extract important information in less response time is the need of current competitive time. Relational databases which have ruled the storage technology for such a long time seems not suitable for mixed types of data. Data can not be represented just in the form of rows and columns in tables. NoSQL (Not only SQL) is complementary to SQL technology which can provide various formats for storage that can be easily compatible with high velocity,large volume and different variety of data. NoSQL databases are categorized in four techniques- Column oriented, Key Value based, Graph based and Document oriented databases. There are approximately 120 real solutions existing for these categories; most commonly used solutions are elaborated in Introduction section. Several research works have been carried out to analyze these NoSQL technology solutions. These studies have not mentioned the situations in which a particular data storage technique is to be chosen. In this study and analysis, we have tried our best to provide answer on technology selection based on specific requirement to the reader. In previous research, comparisons amongNoSQL data storage techniques have been described by using real examples like MongoDB, Neo4J etc. Our observation is that if users have adequate knowledge of NoSQL categories and their comparison, then it is easy for them to choose best suitable category and then real solutions can be selected from this category.


In the present era, as technology is emerging widely data storage is also increasing its volume or space of storage enormously; which is the current buzz defined as Big Data. Existing Big Data modelling includes mostly in handling structured data but no defined approach was designed for modelling Big Data which includes structured, semi-structured and unstructured data. Among the existing challenges on Big Data, the most imperative challenge is modelling Big Data. This paper proposes a generic modelling approach for modelling Big Data. The effectiveness of this innovative approach is sensed by modelling oncology data using MongoDB. This modelling facilitates ease analytics and is independent of context.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255562
Author(s):  
Eman Khashan ◽  
Ali Eldesouky ◽  
Sally Elghamrawy

The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneous data models via numerous APIs and query languages imposes challenging tasks on multi-data processing developers. Indeed, complex queries concerning homogenous data structures cannot currently be performed in a declarative manner when found in single data storage applications and therefore require additional development efforts. Many models were presented in order to address complex queries Via multistore applications. Some of these models implemented a complex unified and fast model, while others’ efficiency is not good enough to solve this type of complex database queries. This paper provides an automated, fast and easy unified architecture to solve simple and complex SQL and NoSQL queries over heterogeneous data stores (CQNS). This proposed framework can be used in cloud environments or for any big data application to automatically help developers to manage basic and complicated database queries. CQNS consists of three layers: matching selector layer, processing layer, and query execution layer. The matching selector layer is the heart of this architecture in which five of the user queries are examined if they are matched with another five queries stored in a single engine stored in the architecture library. This is achieved through a proposed algorithm that directs the query to the right SQL or NoSQL database engine. Furthermore, CQNS deal with many NoSQL Databases like MongoDB, Cassandra, Riak, CouchDB, and NOE4J databases. This paper presents a spark framework that can handle both SQL and NoSQL Databases. Four scenarios’ benchmarks datasets are used to evaluate the proposed CQNS for querying different NoSQL Databases in terms of optimization process performance and query execution time. The results show that, the CQNS achieves best latency and throughput in less time among the compared systems.


Sign in / Sign up

Export Citation Format

Share Document