data staging
Recently Published Documents


TOTAL DOCUMENTS

49
(FIVE YEARS 8)

H-INDEX

7
(FIVE YEARS 1)

2022 ◽  
Vol 71 ◽  
pp. 103200
Author(s):  
André Fonseca ◽  
Camila Sardeto Deolindo ◽  
Taisa Miranda ◽  
Edgard Morya ◽  
Edson Amaro Jr ◽  
...  

Author(s):  
Ardhian Agung Yulianto

While a data warehouse is designed to support the decision-making function, the most time-consuming partis the Extract Transform Load (ETL) process. Case in Academic Data Warehouse, when data source came from thefaculty’s distributed database, although having a typical database but become not easier to integrate. This paperpresents how to an ETL process in distributed database academic data warehouse. Following Data Flow Threadprocess in the data staging area, a deep analysis performed for identifying all tables in each data sources, includingcontent profiling. Then the cleaning, confirming, and data delivery steps pour the different data source into the datawarehouse (DW). Since DW development using bottom-up Kimball’s multidimensional approach, we found the threetypes of extraction activities from data source table: merge, merge-union, and union. Result for cleaning andconforming step set by creating conform dimension on data source analysis, refinement, and hierarchy structure. Thefinal of the ETL step is loading it into integrating dimension and fact tables by a generation of a surrogate key. Thoseprocesses are running gradually from each distributed database data sources until it incorporated. This technicalactivity in distributed database ETL process generally can be adopted widely in other industries which designer musthave advance knowledge to structure and content of data source.


2019 ◽  
Vol 90 ◽  
pp. 102566 ◽  
Author(s):  
Thaddeus Koehn ◽  
Peter Athanas

Electronics ◽  
2019 ◽  
Vol 8 (9) ◽  
pp. 982 ◽  
Author(s):  
Alberto Cascajo ◽  
David E. Singh ◽  
Jesus Carretero

This work presents a HPC framework that provides new strategies for resource management and job scheduling, based on executing different applications in shared compute nodes, maximizing platform utilization. The framework includes a scalable monitoring tool that is able to analyze the platform’s compute node utilization. We also introduce an extension of CLARISSE, a middleware for data-staging coordination and control on large-scale HPC platforms that uses the information provided by the monitor in combination with application-level analysis to detect performance degradation in the running applications. This degradation, caused by the fact that the applications share the compute nodes and may compete for their resources, is avoided by means of dynamic application migration. A description of the architecture, as well as a practical evaluation of the proposal, shows significant performance improvements up to 20% in the makespan and 10% in energy consumption compared to a non-optimized execution.


Author(s):  
Ardhian Agung Yulianto

While data warehouse is designed to support the decision-making function, the most time-consuming part is Extract Transform Load (ETL) process. Case in Academic Data Warehouse, when data source came from faculty’s distributed database, although having a typical database but become not easier to integrate. This paper presents the ETL detail process following Data Flow Thread in data staging area for identifying, profiling, the content analyzing including all tables in data sources, and then cleaning, confirming dimension and data delivery to the data warehouse. Those processes are running gradually from each distributed database data sources until it merged. Dimension table and fact table are generated in a multidimensional model. ETL tool is Pentaho Data Integration 6.1. ETL testing is done by comparing data source and data target and DW testing conducted by comparing the data analysis between SQL query and Saiku Analytics plugin in Pentaho Business Analytic Server.


2018 ◽  
Vol 2 ◽  
pp. e28093
Author(s):  
Lisa Palmer

How long does it take to digitize 11,000 film-based slides? Converting film to a raster graphic may take a relatively short period of time, but what is needed to prepare for the process, and then once images are digitized, what work is required to push data out for public access? And how much does the entire conversion process cost? A case study of a rapid-capture digitization project at the Smithsonian Institution will be reviewed. In early 2016, the Smithsonian Institution National Museum of Natural History (NMNH) Division of Fishes acquired 10,559 film-based slides from world-renown ichthyologist John (Jack) Randall. The first-generation slides contain images of color patterns of hundreds of fish species with locality information for each specimen written on the cardboard slide mount. When Jack began his photography in the 1960’s, his images were at the forefront of color photography for fishes. He also collected specimens in remote island archipelagos in the Pacific and Indian Oceans, thus many localities were, and continue to be, rare. The species represented on the slide are important to the scientific community, and the collection event data written on the slide mount makes the image and its metadata an invaluable package of information. Upon receipt of Jack’s significant donation, the Division of Fishes received multiple requests from ichthyologists for digital access to the slides. The Division of Fishes immediately implemented a plan to digitally capture data. With many rapid-capture projects at the Smithsonian, the objects and specimens are digitized, and then at some later point, any associated data is transcribed. The Division approached this project differently in that the Randall collection was relatively small, and Smithsonian staff, primarily interns, were available to transcribe data before image conversion. Post-production work included hiring two contractors to import images and associated metadata into NMNH’s collections management system. This presentation will review our processes before, during, and after data conversion. Workflows include transcribing handwritten data, staging and digitizing film, and importing data into the EMu client as well as using redundancies to ensure quality of data.


Sign in / Sign up

Export Citation Format

Share Document