Database management in scientific computing - II. Data structures and program architecture

OpenMemDB is an in-memory database that is implemented solely using wait-free data structures. OpenMemDB is the ﬁrst and only database currently developed in such a way. OpenMemDB also provides linearizable correctness guarantees for all operations executed on the database. OpenMemDB uses a form of snapshot isolation to ensure linearizability, and avoids the write-skew problem that can occur when using snapshot isolation by eliminating writes that are out of data. OpenMemDBs biggest contribution is its completely wait-free implementation. Every operation executed in OpenMemDB is guaranteed to be wait-free and linearizable. This implementation also scales competitively when compared against similar in-memory database management systems. OpenMemDB achieves its best scaling in select heavy operation loads with nearly 12 times speedup at 16 threads. This is better scaling than either VoltDB or MemSQL showed in our testing.

Download Full-text

SciKit-GStat 1.0: A SciPy flavoured geostatistical variogram estimation toolbox written in Python

10.5194/gmd-2021-174 ◽

2021 ◽

Author(s):

Mirko Mälicke

Keyword(s):

Data Structures ◽

Scientific Computing ◽

Interpolation Method ◽

Spatial Models ◽

Complex Data ◽

Spatial Covariance ◽

Geostatistical Methods ◽

Variogram Estimation ◽

Almost All

Abstract. Geostatistical methods are widely used in almost all geoscientific disciplines, i.e. for interpolation, re-scaling, data assimilation or modelling. At its core geostatistics aims to detect, quantify, describe, analyze and model spatial covariance of observations. The variogram, a tool to describe this spatial covariance in a formalized way, is at the heart of every such method. Unfortunately, many applications of geostatistics rather focus on the interpolation method or the result, than the quality of the estimated variogram. Not least because estimating a variogram is commonly left as a task for computers and some software implementations do not even show a variogram to the user. This is a miss, because the quality of the variogram largely determines, whether the application of geostatistics makes sense at all. Furthermore, the Python programming language was missing a mature, well-established and tested package for variogram estimation a couple of years ago. Here I present SciKit-GStat, an open source Python package for variogram estimation, that fits well into established frameworks for scientific computing and puts the focus on the variogram before more sophisticated methods are about to be applied. SciKit-GStat is written in a mutable, object-oriented way that mimics the typical geostatistical analysis workflow. Its main strength is the ease of usage and interactivity and it is therefore usable with only a little or even no knowledge in Python. During the last few years, other libraries covering geostatistics for Python developed along with SciKit-GStat. Today, the most important ones can be interfaced by SciKit-GStat. Additionally, established data structures for scientific computing are reused internally, to keep the user from learning complex data models, just for using SciKit-GStat. Common data structures along with powerful interfaces enable the user to use SciKit-GStat along with other packages in established workflows, rather than forcing the user to stick to the authors programming paradigms. SciKit-GStat ships with a large number of predefined procedures, algorithms and models, such as variogram estimators, theoretical spatial models or binning algorithms. Common approaches to estimate variograms are covered and can be used out of the box. At the same time, the base class is very flexible and can be adjusted to less common problems, as well. Last but not least, it was made sure, that a user is aided at implementing new procedures, or even extending the core functionality as much as possible, to extend SciKit-GStat to uncovered use-cases. With broad documentation, user guide, tutorials and good unit-test coverage, SciKit-GStat enables the user to focus on variogram estimation, rather than implementation details.

Download Full-text

Advanced Scientific Computing Environment Team new scientific database management task

10.2172/5560250 ◽

1991 ◽

Author(s):

J.P. Church ◽

J.C. Roberts ◽

R.N. Sims ◽

A.O. Smetana ◽

B.W. Westmoreland

Keyword(s):

Database Management ◽

Scientific Computing ◽

Computing Environment ◽

Scientific Database

Download Full-text

Protein sequence databases: database management, data structures and data access

Biochemical Society Transactions ◽

10.1042/bst0170843 ◽

1989 ◽

Vol 17 (5) ◽

pp. 843-845 ◽

Cited By ~ 2

Author(s):

H. W. MEWES ◽

A. ELZANOWSKI ◽

D. G. GEORGE

Keyword(s):

Data Structures ◽

Protein Sequence ◽

Database Management ◽

Data Access ◽

Sequence Databases

Download Full-text

Advanced Scientific Computing Environment Group new scientific database management task program plan

10.2172/10130290 ◽

1991 ◽

Author(s):

J.P. Church

Keyword(s):

Database Management ◽

Scientific Computing ◽

Computing Environment ◽

Scientific Database ◽

Program Plan ◽

Task Program

Download Full-text

Data Reengineering of Legacy Systems

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch005 ◽

2009 ◽

pp. 37-44

Author(s):

Richard C. Millham

Keyword(s):

Data Structures ◽

Database Management ◽

Significant Part ◽

Management Systems ◽

Second Step ◽

Legacy Systems ◽

Business Systems ◽

Business World ◽

Software Reengineering ◽

Legacy Data

Legacy systems, from a data-centric view, could be defined as old, business-critical, and standalone systems that have been built around legacy databases, such as IMS or CODASYL, or legacy database management systems, such as ISAM (Brodie & Stonebraker, 1995). Because of the huge scope of legacy systems in the business world (it is estimated that there are 100 billion lines of COBOL code alone for legacy business systems; Bianchi, 2000), data reengineering, along with its related step of program reengineering, of legacy systems and their data constitute a significant part of the software reengineering market. Data reengineering of legacy systems focuses on two parts. The first step involves recognizing the data structures and semantics followed by the second step where the data are converted to the new or converted system. Usually, the second step involves substantial changes not only to the data structures but to the data values of the legacy data themselves (Aebi & Largo, 1994).

Download Full-text