scholarly journals Relative Scalability of NoSQL Databases for Genotype Data Manipulation

2018 ◽  
Vol 25 (2) ◽  
pp. 93
Author(s):  
Arthur Lorenzi Almeida ◽  
Vinícius Junqueira Schettino ◽  
Thiago Jesus Rodrigues Barbosa ◽  
Pedro Fernandes Freitas ◽  
Pedro Gabriel Silva Guimarães ◽  
...  

Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual.

Azure SQL and Atlas Mongodb NoSQL(Azure instance) databases are the most popular, systematic process to database solutions. Which Azure SQL database is also referred to as RDBMS (Relational Database Management Systems). The data are structured into tables or associations. The Atlas Mongodb NoSQL database is called a non-relational database management systems. The data are included in unstructured tables or associations. In this research, evaluate both the Azure SQL and Atlas Mongodb NoSQL databases. During the experiment compare the loading time, response time, and retrieval time of both Azure SQL and Atlas Mongodb NoSQL databases, and justify which one is fast, efficient and better performance.


10.28945/4033 ◽  
2018 ◽  
Vol 15 ◽  
pp. 035-042
Author(s):  
Robert Thomas Mason

Aim/Purpose: This paper investigates the changing paradigms for technical skills that are needed by Data Engineers in 2018. Background: A decade ago, data engineers needed technical skills for Relational Database Management Systems (RDBMS), such as Oracle and Microsoft SQL Server. With the advent of Hadoop and NoSQL Databases in recent years, Data Engineers require new skills to support the large distributed datastores (Big Data) that currently exist. Job demand for Data Scientists and Data Engineers has increased over the last five years. Methodology: This research methodology leveraged the Pig programming language that used MapReduce software located on the Amazon Web Services (AWS) Cloud. Data was collected from 100 Indeed.com job advertisements during July of 2017 and then was uploaded to the AWS Cloud. Using MapReduce, phrases/words were counted and then sorted. The sorted phrase / word counts were then leveraged to create the list of the 20 top skills needed by a Data Engineer based on the job advertisements. This list was compared to the 20 top skills for a Data Engineer presented by Stitch that surveyed 6,500 Data Engineers in 2016. Contribution: This paper presents a list of the 20 top technical skills required by a Data Engineer.


2019 ◽  
Vol 19 (2) ◽  
pp. 117-132
Author(s):  
Fernando Almeida ◽  
Pedro Silva ◽  
Fernando Araújo

Abstract Databases provide an efficient way to store, retrieve and analyze data. Oracle relational database is one of the most popular database management systems that is widely used in a different variety of industries and businesses. Therefore, it is important to guarantee that the database access and data manipulation is optimized for reducing database system response time. This paper intends to analyze the performance and the main optimization techniques (Forall, Returning, and Bulk Collect) that can be adopted for Oracle Relational Databases. The results have shown that the adoption of Forall and Bulk Collect approaches bring significant benefits in terms of execution time. Furthermore, the growth rate of the average execution time is lower for Bulk Collect than Forall. However, adoption of Returning approach doesn’t bring significant statistical benefits.


Sign in / Sign up

Export Citation Format

Share Document