Database management in scientific computing - II. Data structures and program architecture

1981 ◽  
Vol 13 (1) ◽  
pp. 50
Author(s):  
Michael McGee ◽  
Robert Medina ◽  
Neil Moore ◽  
Jason Stavrinaky

OpenMemDB is an in-memory database that is implemented solely using wait-free data structures. OpenMemDB is the first and only database currently developed in such a way. OpenMemDB also provides linearizable correctness guarantees for all operations executed on the database. OpenMemDB uses a form of snapshot isolation to ensure linearizability, and avoids the write-skew problem that can occur when using snapshot isolation by eliminating writes that are out of data. OpenMemDBs biggest contribution is its completely wait-free implementation. Every operation executed in OpenMemDB is guaranteed to be wait-free and linearizable. This implementation also scales competitively when compared against similar in-memory database management systems. OpenMemDB achieves its best scaling in select heavy operation loads with nearly 12 times speedup at 16 threads. This is better scaling than either VoltDB or MemSQL showed in our testing.


2021 ◽  
Author(s):  
Mirko Mälicke

Abstract. Geostatistical methods are widely used in almost all geoscientific disciplines, i.e. for interpolation, re-scaling, data assimilation or modelling. At its core geostatistics aims to detect, quantify, describe, analyze and model spatial covariance of observations. The variogram, a tool to describe this spatial covariance in a formalized way, is at the heart of every such method. Unfortunately, many applications of geostatistics rather focus on the interpolation method or the result, than the quality of the estimated variogram. Not least because estimating a variogram is commonly left as a task for computers and some software implementations do not even show a variogram to the user. This is a miss, because the quality of the variogram largely determines, whether the application of geostatistics makes sense at all. Furthermore, the Python programming language was missing a mature, well-established and tested package for variogram estimation a couple of years ago. Here I present SciKit-GStat, an open source Python package for variogram estimation, that fits well into established frameworks for scientific computing and puts the focus on the variogram before more sophisticated methods are about to be applied. SciKit-GStat is written in a mutable, object-oriented way that mimics the typical geostatistical analysis workflow. Its main strength is the ease of usage and interactivity and it is therefore usable with only a little or even no knowledge in Python. During the last few years, other libraries covering geostatistics for Python developed along with SciKit-GStat. Today, the most important ones can be interfaced by SciKit-GStat. Additionally, established data structures for scientific computing are reused internally, to keep the user from learning complex data models, just for using SciKit-GStat. Common data structures along with powerful interfaces enable the user to use SciKit-GStat along with other packages in established workflows, rather than forcing the user to stick to the authors programming paradigms. SciKit-GStat ships with a large number of predefined procedures, algorithms and models, such as variogram estimators, theoretical spatial models or binning algorithms. Common approaches to estimate variograms are covered and can be used out of the box. At the same time, the base class is very flexible and can be adjusted to less common problems, as well. Last but not least, it was made sure, that a user is aided at implementing new procedures, or even extending the core functionality as much as possible, to extend SciKit-GStat to uncovered use-cases. With broad documentation, user guide, tutorials and good unit-test coverage, SciKit-GStat enables the user to focus on variogram estimation, rather than implementation details.


1991 ◽  
Author(s):  
J.P. Church ◽  
J.C. Roberts ◽  
R.N. Sims ◽  
A.O. Smetana ◽  
B.W. Westmoreland

Author(s):  
Richard C. Millham

Legacy systems, from a data-centric view, could be defined as old, business-critical, and standalone systems that have been built around legacy databases, such as IMS or CODASYL, or legacy database management systems, such as ISAM (Brodie & Stonebraker, 1995). Because of the huge scope of legacy systems in the business world (it is estimated that there are 100 billion lines of COBOL code alone for legacy business systems; Bianchi, 2000), data reengineering, along with its related step of program reengineering, of legacy systems and their data constitute a significant part of the software reengineering market. Data reengineering of legacy systems focuses on two parts. The first step involves recognizing the data structures and semantics followed by the second step where the data are converted to the new or converted system. Usually, the second step involves substantial changes not only to the data structures but to the data values of the legacy data themselves (Aebi & Largo, 1994).


Sign in / Sign up

Export Citation Format

Share Document