Landmark and IBM get together on data management and storage

First Break ◽  
1997 ◽  
Vol 15 (7) ◽  
Keyword(s):  
Author(s):  
Vincent Breton ◽  
Eddy Caron ◽  
Frederic Desprez ◽  
Gael Le Mahec

As grids become more and more attractive for solving complex problems with high computational and storage requirements, bioinformatics starts to be ported on large scale platforms. The BLAST kernel, one of the main cornerstone of high performance genomics, was one the first application ported on such platform. However, if a simple parallelization was enough for the first proof of concept, its use in production platform needed more optimized algorithms. In this chapter, we review existing parallelization and “gridification” approaches as well as related issues such as data management and replication, and a case study using the DIET middleware over the Grid’5000 experimental platform.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1977
Author(s):  
Guangyu Zhu ◽  
Jaehyun Han ◽  
Sangjin Lee ◽  
Yongseok Son

The emergence of non-volatile memories (NVM) brings new opportunities and challenges to data management system design. As an important part of the data management systems, several new file systems are developed to take advantage of the characteristics of NVM. However, these NVM-aware file systems are usually designed and evaluated based on simulations or emulations. In order to explore the performance and characteristics of these file systems on real hardware, in this article, we provide an empirical evaluation of NVM-aware file systems on the first commercially available byte-addressable NVM (i.e., the Intel Optane DC Persistent Memory Module (DCPMM)). First, to compare the performance difference between traditional file systems and NVM-aware file systems, we evaluate the performance of Ext4, XFS, F2FS, Ext4-DAX, XFS-DAX, and NOVA file systems on DCPMMs. To compare DCPMMs with other secondary storage devices, we also conduct the same evaluations on Optane SSDs and NAND-flash SSDs. Second, we observe how remote NUMA node access and device mapper striping affect the performance of DCPMMs. Finally, we evaluate the performance of the database (i.e., MySQL) on DCPMMs with Ext4 and Ext4-DAX file systems. We summarize several observations from the evaluation results and performance analysis. We anticipate that these observations will provide implications for various memory and storage systems.


2016 ◽  
Vol 11 (1) ◽  
pp. 128-149 ◽  
Author(s):  
Christine L. Borgman ◽  
Milena S. Golshan ◽  
Ashley E. Sands ◽  
Jillian C. Wallis ◽  
Rebekah L. Cummings ◽  
...  

Scientists in all fields face challenges in managing and sustaining access to their research data. The larger and longer term the research project, the more likely that scientists are to have resources and dedicated staff to manage their technology and data, leaving those scientists whose work is based on smaller and shorter term projects at a disadvantage. The volume and variety of data to be managed varies by many factors, only two of which are the number of collaborators and length of the project. As part of an NSF project to conceptualize the Institute for Empowering Long Tail Research, we explored opportunities offered by Software as a Service (SaaS). These cloud-based services are popular in business because they reduce costs and labor for technology management, and are gaining ground in scientific environments for similar reasons. We studied three settings where scientists conduct research in small and medium-sized laboratories. Two were NSF Science and Technology Centers (CENS and C-DEBI) and the third was a workshop of natural reserve scientists and managers. These laboratories have highly diverse data and practices, make minimal use of standards for data or metadata, and lack resources for data management or sustaining access to their data, despite recognizing the need. We found that SaaS could address technical needs for basic document creation, analysis, and storage, but did not support the diverse and rapidly changing needs for sophisticated domain-specific tools and services. These are much more challenging knowledge infrastructure requirements that require long-term investments by multiple stakeholders. 


2014 ◽  
Vol 9 (1) ◽  
pp. 220-230 ◽  
Author(s):  
David Minor ◽  
Matt Critchlow ◽  
Arwen Hutt ◽  
Declan Fleming ◽  
Mary Linn Bergstrom ◽  
...  

In the spring of 2011, the UC San Diego Research Cyberinfrastructure (RCI) Implementation Team invited researchers and research teams to participate in a research curation and data management pilot program. This invitation took the form of a campus-wide solicitation. More than two dozen applications were received and, after due deliberation, the RCI Oversight Committee selected five curation-intensive projects. These projects were chosen based on a number of criteria, including how they represented campus research, varieties of topics, researcher engagement, and the various services required. The pilot process began in September 2011, and will be completed in early 2014. Extensive lessons learned from the pilots are being compiled and are being used in the on-going design and implementation of the permanent Research Data Curation Program in the UC San Diego Library. In this paper, we present specific implementation details of these various services, as well as lessons learned. The program focused on many aspects of contemporary scholarship, including data creation and storage, description and metadata creation, citation and publication, and long term preservation and access. Based on the lessons learned in our processes, the Research Data Curation Program will provide a suite of services from which campus users can pick and choose, as necessary. The program will provide support for the data management requirements from national funding agencies.


2000 ◽  
pp. 76-91
Author(s):  
David M. Coder
Keyword(s):  

2018 ◽  
Vol 50 ◽  
pp. 02008
Author(s):  
Alberto Lázaro-López ◽  
María L. González-SanJosé ◽  
Vicente D. Gómez-Miguel

Methodology of the Integrated Zoning of the Terroir involve the generation and management of a high volume of big size data that are interrelated conceptually and spatially, of which its analysis and storage is necessary. These actions are critical in the investigation process and, therefore, there is a special interest in an effectively and efficiently data management. The databases are widely used because of their capabilities on integrity, consistency, redundancy and independence of data management as a method for optimizing resources, so the main goal is to develop a spatial database that facilitates the access, management and analysis of the data of the ZIT. The result is TEZISdb, an acronym for Terroir Zoning Information Service Database, a multilocal and multiscalar modeling that allows to collect thematic and spatial data of all environmental factors from terroir as well as functions for its analysis.


1980 ◽  
Vol 45 (3) ◽  
pp. 462-471 ◽  
Author(s):  
Sylvia W. Gaines ◽  
Warren M. Gaines

Rapidly evolving computer technology can play a significant role in all phases of future archaeological research. Technical trends are summarized, including hardware (data processing and storage), software (languages and data management), and data processing and entry. The utility and benefits of these new techniques are explored in relation to several current archaeological projects.


2020 ◽  
Vol 245 ◽  
pp. 04035
Author(s):  
Martin Barisits ◽  
Mikhail Borodin ◽  
Alessandro Di Girolamo ◽  
Johannes Elmsheuser ◽  
Dmitry Golubkov ◽  
...  

The ATLAS experiment at CERN’s LHC stores detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide, currently in total about 200PB on disk and 250PB on tape. Data have different access characteristics due to various computational workflows, and can be accessed from different media, such as remote I/O, disk cache on hard disk drives or SSDs. Also, larger data centers provide the majority of offline storage capability via tape systems. For the HighLuminosity LHC (HL-LHC), the estimated data storage requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing was very successful in the last years with high performance and high throughput computing integration and in using opportunistic computing resources for the Monte Carlo simulation. On the other hand, equivalent opportunistic storage does not exist. ATLAS started the Data Carousel project to increase the usage of less expensive storage, i.e. tapes or even commercial storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage, such that only a small percentage of input data are available at any one time. With this project, we aim to demonstrate that this is the natural way to dramatically reduce our storage cost. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Phase II now requires a tight integration of the workload and data management systems. Additionally, the Data Carousel studies the feasibility to run multiple computing workflows from tape. The project is progressing very well and the results presented in this document will be used before the LHC Run 3.


Sign in / Sign up

Export Citation Format

Share Document