scholarly journals Array Databases

Author(s):  
David Haynes ◽  
Keyword(s):  
2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Peter Baumann ◽  
Dimitar Misev ◽  
Vlad Merticariu ◽  
Bang Pham Huu

AbstractMulti-dimensional arrays (also known as raster data or gridded data) play a key role in many, if not all science and engineering domains where they typically represent spatio-temporal sensor, image, simulation output, or statistics “datacubes”. As classic database technology does not support arrays adequately, such data today are maintained mostly in silo solutions, with architectures that tend to erode and not keep up with the increasing requirements on performance and service quality. Array Database systems attempt to close this gap by providing declarative query support for flexible ad-hoc analytics on large n-D arrays, similar to what SQL offers on set-oriented data, XQuery on hierarchical data, and SPARQL and CIPHER on graph data. Today, Petascale Array Database installations exist, employing massive parallelism and distributed processing. Hence, questions arise about technology and standards available, usability, and overall maturity. Several papers have compared models and formalisms, and benchmarks have been undertaken as well, typically comparing two systems against each other. While each of these represent valuable research to the best of our knowledge there is no comprehensive survey combining model, query language, architecture, and practical usability, and performance aspects. The size of this comparison differentiates our study as well with 19 systems compared, four benchmarked to an extent and depth clearly exceeding previous papers in the field; for example, subsetting tests were designed in a way that systems cannot be tuned to specifically these queries. It is hoped that this gives a representative overview to all who want to immerse into the field as well as a clear guidance to those who need to choose the best suited datacube tool for their application. This article presents results of the Research Data Alliance (RDA) Array Database Assessment Working Group (ADA:WG), a subgroup of the Big Data Interest Group. It has elicited the state of the art in Array Databases, technically supported by IEEE GRSS and CODATA Germany, to answer the question: how can data scientists and engineers benefit from Array Database technology? As it turns out, Array Databases can offer significant advantages in terms of flexibility, functionality, extensibility, as well as performance and scalability—in total, the database approach of offering “datacubes” analysis-ready heralds a new level of service quality. Investigation shows that there is a lively ecosystem of technology with increasing uptake, and proven array analytics standards are in place. Consequently, such approaches have to be considered a serious option for datacube services in science, engineering and beyond. Tools, though, vary greatly in functionality and performance as it turns out.


2013 ◽  
Vol 2 (4) ◽  
pp. 33-46 ◽  
Author(s):  
P. K. Nizar Banu ◽  
H. Hannah Inbarani

As the micro array databases increases in dimension and results in complexity, identifying the most informative genes is a challenging task. Such difficulty is often related to the huge number of genes with very few samples. Research in medical data mining addresses this problem by applying techniques from data mining and machine learning to the micro array datasets. In this paper Unsupervised Tolerance Rough Set based Quick Reduct (U-TRS-QR), a diverse feature selection algorithm, which extends the existing equivalent rough sets for unsupervised learning, is proposed. Genes selected by the proposed method leads to a considerably improved class predictions in wide experiments on two gene expression datasets: Brain Tumor and Colon Cancer. The results indicate consistent improvement among 12 classifiers.


2007 ◽  
Vol 17 (1) ◽  
pp. 151-168 ◽  
Author(s):  
Roberto Cornacchia ◽  
Sándor Héman ◽  
Marcin Zukowski ◽  
Arjen P. de Vries ◽  
Peter Boncz
Keyword(s):  

2020 ◽  
Author(s):  
Peter Baumann

<p>Datacubes form an accepted cornerstone for analysis (and visualization) ready spatio-temporal data offerings. Beyond the multi-dimensional data structure, the paradigm also suggests rich services, abstracting away from the untractable zillions of files and products - actionable datacubes as established by Array Databases enable users to ask "any query, any time" without programming. The principle of location-transparent federations establishes a single, coherent information space.</p><p>The EarthServer federation is a large, growing data center network offering Petabytes of a critical variety, such as radar and optical satellite data, atmospheric data, elevation data, and thematic cubes like global sea ice. Around CODE-DE and DIASs an ecosystem of data has been established that is available to users as a single pool, in particular for efficient distributed data fusion irrespective of data location.</p><p>In our talk we present technology, services, and governance of this unique intercontinental line-up of data centers. A live demo will show dist<br>ributed datacube fusion.</p><p> </p>


2004 ◽  
Vol 20 (4) ◽  
pp. 507-517 ◽  
Author(s):  
T. V. Karpinets ◽  
B. D. Foy ◽  
J. M. Frazier

Sign in / Sign up

Export Citation Format

Share Document