scholarly journals Efficient Transfer of data from RDBMS to HDFS and conversion to JSON format

Author(s):  
Dr. C. K. Gomathy

Abstract: Apache Sqoop is mainly used to efficiently transfer large volumes of data between Apache Hadoop and relational databases. It helps to certain tasks, such as ETL (Extract transform load) processing, from an enterprise data warehouse to Hadoop, for efficient execution at a much less cost. Here first we import the table which presents in MYSQL Database with the help of command-line interface application called Sqoop and there is a chance of addition of new rows and updating new rows then we have to execute the query again. So, with the help of our project there is no need of executing queries again for that we are using Sqoop job, which consists of total commands for import and next after import we retrieve the data from hive using Java JDBC and we convert the data to JSON Format, which consists of data in an organized way and easy to access manner by using GSON Library. Keywords: Sqoop, Json, Gson, Maven and JDBC

1994 ◽  
Vol 05 (05) ◽  
pp. 805-809 ◽  
Author(s):  
SALIM G. ANSARI ◽  
PAOLO GIOMMI ◽  
ALBERTO MICOL

On 3rd November, 1993, ESIS announced its Homepage on the World Wide Web (WWW) to the user community. Ever since then, ESIS has steadily increased its Web support to the astronomical community to include a bibliographic service, the ESIS catalogue documentation and the ESIS Data Browser. More functionality will be added in the near future. All these services share a common ESIS structure that is used by other ESIS user paradigms such as the ESIS Graphical User Interface (Giommi and Ansari, 1993), and the ESIS Command Line Interface. A forms-based paradigm, each ESIS-Web application interfaces to the hypertext transfer protocol (http) translating queries from/to the hypertext markup language (html) format understood by the NCSA Mosaic interface. In this paper, we discuss the ESIS system and show how each ESIS service works on the World Wide Web client.


Author(s):  
Anderson Chaves Carniel ◽  
Aried de Aguiar Sa ◽  
Vinicius Henrique Porto Brisighello ◽  
Marcela Xavier Ribeiro ◽  
Renato Bueno ◽  
...  

2018 ◽  
Vol 14 (3) ◽  
pp. 44-68 ◽  
Author(s):  
Fatma Abdelhedi ◽  
Amal Ait Brahim ◽  
Gilles Zurfluh

Nowadays, most organizations need to improve their decision-making process using Big Data. To achieve this, they have to store Big Data, perform an analysis, and transform the results into useful and valuable information. To perform this, it's necessary to deal with new challenges in designing and creating data warehouse. Traditionally, creating a data warehouse followed well-governed process based on relational databases. The influence of Big Data challenged this traditional approach primarily due to the changing nature of data. As a result, using NoSQL databases has become a necessity to handle Big Data challenges. In this article, the authors show how to create a data warehouse on NoSQL systems. They propose the Object2NoSQL process that generates column-oriented physical models starting from a UML conceptual model. To ensure efficient automatic transformation, they propose a logical model that exhibits a sufficient degree of independence so as to enable its mapping to one or more column-oriented platforms. The authors provide experiments of their approach using a case study in the health care field.


2021 ◽  
Vol 9 ◽  
Author(s):  
Caio Ribeiro ◽  
Lucas Oliveira ◽  
Romina Batista ◽  
Marcos De Sousa

The use of Ultraconserved Elements (UCEs) as genetic markers in phylogenomics has become popular and has provided promising results. Although UCE data can be easily obtained from targeted enriched sequencing, the protocol for in silico analysis of UCEs consist of the execution of heterogeneous and complex tools, a challenge for scientists without training in bioinformatics. Developing tools with the adoption of best practices in research software can lessen this problem by improving the execution of computational experiments, thus promoting better reproducibility. We present UCEasy, an easy-to-install and easy-to-use software package with a simple command line interface that facilitates the computational analysis of UCEs from sequencing samples, following the best practices of research software. UCEasy is a wrapper that standardises, automates and simplifies the quality control of raw reads, assembly and extraction and alignment of UCEs, generating at the end a data matrix with different levels of completeness that can be used to infer phylogenetic trees. We demonstrate the functionalities of UCEasy by reproducing the published results of phylogenomic studies of the bird genus Turdus (Aves) and of Adephaga families (Coleoptera) containing genomic datasets to efficiently extract UCEs.


2021 ◽  
Author(s):  
Fatma Abdelhedi ◽  
Rym Jemmali ◽  
Gilles Zurfluh

Author(s):  
Patrick Moore

As networks have evolved, there has been an evolution in how they are managed as well. This evolution has seen a move from manual configuration via command line interface (CLI) to script-based automation and eventually to a template-based approach with workflow to coordinate multiple templates and scripts. The next step in this evolution is the introduction of models to provide a more dynamic capability than is in place today. This chapter will discuss three major layers of modelling that should be considered during implementation of this approach: device models focused on the configuration of the hardware itself; service models focused on the customer or network facing services that leverage the hardware level configuration; and operational models focused on people, processes, and tools involved in application of device and service models. This includes the orchestration of activities with other tools, such as operational support systems (OSS) and business support systems (BSS).


2008 ◽  
pp. 2364-2370
Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (e.g. Wal-Mart’s data warehouse) and astronomical data (e.g. SKICAT) in scientific research, with textual data providing a descriptive rather than a central role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for non-numeric data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


2019 ◽  
Vol 35 (18) ◽  
pp. 3538-3540 ◽  
Author(s):  
Mehdi Ali ◽  
Charles Tapley Hoyt ◽  
Daniel Domingo-Fernández ◽  
Jens Lehmann ◽  
Hajira Jabeen

Abstract Summary Knowledge graph embeddings (KGEs) have received significant attention in other domains due to their ability to predict links and create dense representations for graphs’ nodes and edges. However, the software ecosystem for their application to bioinformatics remains limited and inaccessible for users without expertise in programing and machine learning. Therefore, we developed BioKEEN (Biological KnowlEdge EmbeddiNgs) and PyKEEN (Python KnowlEdge EmbeddiNgs) to facilitate their easy use through an interactive command line interface. Finally, we present a case study in which we used a novel biological pathway mapping resource to predict links that represent pathway crosstalks and hierarchies. Availability and implementation BioKEEN and PyKEEN are open source Python packages publicly available under the MIT License at https://github.com/SmartDataAnalytics/BioKEEN and https://github.com/SmartDataAnalytics/PyKEEN Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Vol 73 (6) ◽  
pp. 469-477 ◽  
Author(s):  
Tom Burnley ◽  
Colin M. Palmer ◽  
Martyn Winn

As part of its remit to provide computational support to the cryo-EM community, the Collaborative Computational Project for Electron cryo-Microscopy (CCP-EM) has produced a software framework which enables easy access to a range of programs and utilities. The resulting software suite incorporates contributions from different collaborators by encapsulating them in Python task wrappers, which are then made accessibleviaa user-friendly graphical user interface as well as a command-line interface suitable for scripting. The framework includes tools for project and data management. An overview of the design of the framework is given, together with a survey of the functionality at different levels. The currentCCP-EMsuite has particular strength in the building and refinement of atomic models into cryo-EM reconstructions, which is described in detail.


Sign in / Sign up

Export Citation Format

Share Document