Data Archiving Strategies in Data Lakes

Author(s):  
Saurabh Gupta ◽  
Venkata Giri
Keyword(s):  
1990 ◽  
Author(s):  
J. GREEN ◽  
K. KLENK ◽  
L. TREINISH
Keyword(s):  

1985 ◽  
Author(s):  
G.R. Crane ◽  
W.H. Lee ◽  
M. E. O'Neill

2018 ◽  
Vol 4 (1) ◽  
pp. 87-96
Author(s):  
Yanni Suherman

Research conducted at the Office of Archives and Library of Padang Pariaman Regency aims to find out the data processing system library and data archiving. All data processing is done is still very manual by using the document in writing and there is also a stacking of archives on the service. By utilizing library information systems and archives that will be applied to the Office of Archives and Library of Padang Pariaman Regency can improve the quality of service that has not been optimal. This research was made by using System Development Life Cycle (SDLC) which is better known as waterfall method. The first step taken on this method is to go directly to the field by conducting interviews and discussions. This information system will be able to assist the work of officers in terms of data processing libraries and facilitate in search data archives by presenting reports more accurate, effective and efficient.


2016 ◽  
Vol Volume 112 (Number 7/8) ◽  
Author(s):  
Margaret M. Koopman ◽  
Karin de Jager ◽  
◽  

Abstract Digital data archiving and research data management have become increasingly important for institutions in South Africa, particularly after the announcement by the National Research Foundation, one of the principal South African academic research funders, recommending these actions for the research that they fund. A case study undertaken during the latter half of 2014, among the biological sciences researchers at a South African university, explored the state of data management and archiving at this institution and the readiness of researchers to engage with sharing their digital research data through repositories. It was found that while some researchers were already engaged with digital data archiving in repositories, neither researchers nor the university had implemented systematic research data management.


2008 ◽  
Vol 26 (2) ◽  
pp. 345-351 ◽  
Author(s):  
V. Romano ◽  
S. Pau ◽  
M. Pezzopane ◽  
E. Zuccheretti ◽  
B. Zolesi ◽  
...  

Abstract. The eSWua project is based on measurements performed by all the instruments installed by the upper atmosphere physics group of the Istituto Nazionale di Geofisica e Vulcanologia, Italy and on all the related studies. The aim is the realization of a hardware-software system to standardize historical and real-time observations for different instruments. An interactive Web site, supported by a well organized database, can be a powerful tool for the scientific and technological community in the field of telecommunications and space weather. The most common and useful database type for our purposes is the relational one, in which data are organized in tables for petabytes data archiving and the complete flexibility in data retrieving. The project started in June 2005 and will last till August 2007. In the first phase the major effort has been focused on the design of hardware and database architecture. The first two databases related to the DPS4 digisonde and GISTM measurements are complete concerning populating, tests and output procedures. Details on the structure and Web tools concerning these two databases are presented, as well as the general description of the project and technical features.


Author(s):  
Holger Frick ◽  
Pia Stieger ◽  
Christoph Scheidegger

More than 60 million specimens are housed in geological and biological collections in numerous museums and botanical gardens located all over Switzerland. They are of national and international origin. Taken together they form an entity with a high scientific value and international recognition for their contribution to scientific research. Due to the federalistic organisation of Switzerland, natural history collections are located and curated in numerous institutions. So far, no common strategy for digitisation, documentation and long-term data archiving has been developed. This shortcoming has been widely identified by concerned parties. Under the lead of the Swiss Academy of Sciences, several organisations have assembled information about Swiss natural history collections. They identified measures to be taken to promote the scientific and educational potential of natural history collections in Switzerland (Beer et al. 2019). With a national initiative, the Swiss Natural History Collections Network (SwissCollNet) aims to unite Swiss natural history collections under a common vision and with a common strategy. The goal is to promote the collections themselves and to harness the scientific and educational potential of these collections for research and training. SwissCollNet consists of representatives of research, teaching, museums and botanical gardens, the data centers for information on the national fauna and flora, the Swiss Systematics Society and the Swiss node of GBIF, the Global Biodiversity Information Facility. The initiative aims to foster research on natural history collections. It will provide a single decentralised data infrastructure framework for Swiss research related to natural history. It will help to harmonise nationwide collection data management, digitisation and long-term data archiving. It will facilitate identification of specimens and revision of taxonomic groups. New research techniques, fast-evolving computer technologies and internet connectivity, create new opportunities for deciphering and using the wealth of information housed in Swiss and international collections. The development of an agreed strategy and research priorities on a national scale will allow fluent, fluid and permanent collaboration across all Swiss natural history collections by promoting interoperability and unified access to collections as well as creating opportunities for scientific collaboration and innovation. This national approach will create an internationally compatible research data infrastructure, while respecting and integrating regional and decentralized conditions and requirements. Thus, it will maximize the impact for science, policy and society.


2008 ◽  
Vol 3 (1) ◽  
pp. 29-43 ◽  
Author(s):  
Eugène Dürr ◽  
Kees Van der Meer ◽  
Wim Luxemburg ◽  
Ronald Dekker

The purpose of the DareLux (Data Archiving River Environment Luxembourg) Project was the preservation of unique and irreplaceable datasets, for which we chose hydrology data that will be required to be used in future climatic models. The results are: an operational archive built with XML containers, the OAI-PMH protocol and an architecture based upon web services. Major conclusions are: quality control on ingest is important; digital rights management demands attention; and cost aspects of ingest and retrieval cannot be underestimated. We propose a new paradigm for information retrieval of this type of dataset. We recommend research into visualisation tools for the search and retrieval of this type of dataset.


Author(s):  
Y. Xu ◽  
L. P. Xin ◽  
X. H. Han ◽  
H. B. Cai ◽  
L. Huang ◽  
...  

GWAC will have been built an integrated FOV of 5,000 degree2 and have already built 1,800 square degree2. The limit magnitude of a 10-second exposure image in the moonless night is 16R. In each observation night, GWAC produces about 0.7TB of raw data, and the data processing pipeline generates millions of single frame alerts. We describe the GWAC Data Processing and Management System (GPMS), including hardware architecture, database, detection-filtering-validation of transient candidates, data archiving, and user interfaces for the check of transient and the monitor of the system. GPMS combines general technology and software in astronomy and computer field, and use some advanced technologies such as deep learning. Practical results show that GPMS can fully meet the scientific data processing requirement of GWAC. It can online accomplish the detection, filtering and validation of millions of transient candidates, and feedback the final results to the astronomer in real-time. During the observation from October of 2018 to December of 2019, we have already found 102 transients.


2015 ◽  
Author(s):  
Peter Weiland ◽  
Ina Dehnhard

See video of the presentation.The benefits of making research data permanently accessible through data archives is widely recognized: costs can be reduced by reusing existing data, research results can be compared and validated with results from archived studies, fraud can be more easily detected, and meta-analyses can be conducted. Apart from that, authors may gain recognition and reputation for producing the datasets. Since 2003, the accredited research data center PsychData (part of the Leibniz Institute for Psychology Information in Trier, Germany) documents and archives research data from all areas of psychology and related fields. In the beginning, the main focus was on datasets that provide a high potential for reuse, e.g. longitudinal studies, large-scale cross sectional studies, or studies that were conducted during historically unique conditions. Presently, more and more journal publishers and project funding agencies require researchers to archive their data and make them accessible for the scientific community. Therefore, PsychData also has to serve this need.In this presentation we report on our experiences in operating a discipline-specific research data archive in a domain where data sharing is met with considerable resistance. We will focus on the challenges for data sharing and data reuse in psychology, e.g.large amount of domain-specific knowledge necessary for data curationhigh costs for documenting the data because of a wide range on non-standardized measuressmall teams and little established infrastructures compared with the "big data" disciplinesstudies in psychology not designed for reuse (in contrast to the social sciences)data protectionresistance to sharing dataAt the end of the presentation, we will provide a brief outlook on DataWiz, a new project funded by the German Research Foundation (DFG). In this project, tools will be developed to support researchers in documenting their data during the research phase.


Sign in / Sign up

Export Citation Format

Share Document