Journal of Information and Data Management

A Natural Language Interface to Database (NLIDB) refers to a database interface that translates a question asked in natural language into a structured query. Aggregation questions express aggregation functions, such as count, sum, average, minimum and maximum, and optionally a group by clause and a having clause. NLIDBs deliver good results for standard questions but usually do not deal with aggregation questions. The main contribution of this article is a generic module, called GLAMORISE (GeneraL Aggregation MOdule using a RelatIonal databaSE), that extends NLIDBs to cope with aggregation questions. GLAMORISE covers aggregations with ambiguities, timescale differences, aggregations in multiple attributes, the use of superlative adjectives, basic recognition of measurement units, and aggregations in attributes with compound names.

Download Full-text

Multimodal Provenance-based Analysis of Collaboration in Business Processes

Journal of Information and Data Management ◽

10.5753/jidm.2021.1923 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Maria Luiza Falci ◽

Andréa Magalhães ◽

Aline Paes ◽

Vanessa Braganholo ◽

Daniel De Oliveira

Keyword(s):

Present Article ◽

Feasibility Study ◽

Business Process ◽

Business Processes ◽

Free Text ◽

Provenance Analysis ◽

Provenance Data ◽

Multiple Type

Modeling business processes as a set of activities to accomplish goals naturally makes them be executed several times. Usually, such executions produce a large portion of provenance data in different formats such as text, audio, and video. Such a multiple-type nature gives origin to multimodal provenance data. Analyzing multimodal provenance data in an integrated form may be complex and error-prone when manually performed as it requires extracting information from free-text, audio, and video files. However, such an analysis may generate valuable insights into the business process. The present article presents MINERVA (Multimodal busINEss pRoVenance Analysis). This approach focuses on identifying improvements that can be implemented in business processes, as well as in collaboration analysis using multimodal provenance data. MINERVA was evaluated through a feasibility study that used data from a consulting company.

Download Full-text

Query co-planning for shared execution in key-value stores

Journal of Information and Data Management ◽

10.5753/jidm.2021.1946 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Josué Ttito ◽

Renato Marroquín ◽

Sergio Lifschitz ◽

Lewis McGibbney ◽

José Talavera

Keyword(s):

Data Structure ◽

Data Structures ◽

Data Model ◽

Range Queries ◽

Model Data ◽

Data Movement ◽

Segment Tree ◽

Interval Tree ◽

Simple Interface ◽

Arbitrary Objects

Key-value stores propose a straightforward yet powerful data model. Data is modeled using key-value pairs where values can be arbitrary objects and written/read using the key associated with it. In addition to their simple interface, such data stores also provide read operations such as full and range scans. However, due to the simplicity of its interface, trying to optimize data accesses becomes challenging. This work aims to enable the shared execution of concurrent range and point queries on key-value stores. Thus, reducing the overall data movement when executing a complete workload. To accomplish this, we analyze different possible data structures and propose our variation of a segment tree, Updatable Interval Tree. Our data structure helps us co-planning and co-executing multiple range queries together and reduces redundant work. This results in executing workloads more efficiently and overall increased throughput, as we show in our evaluation.

Download Full-text

Online Deep Learning Hyperparameter Tuning based on Provenance Analysis

Journal of Information and Data Management ◽

10.5753/jidm.2021.1924 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Liliane Kunstmann ◽

Débora Pina ◽

Filipe Silva ◽

Aline Paes ◽

Patrick Valduriez ◽

...

Keyword(s):

Deep Learning ◽

Data Analysis ◽

Fine Tuning ◽

Training Dataset ◽

Final Decision ◽

Provenance Analysis ◽

Online Analysis ◽

Provenance Data ◽

Validation Set ◽

Parameter Values

Training Deep Learning (DL) models require adjusting a series of hyperparameters. Although there are several tools to automatically choose the best hyperparameter configuration, the user is still the main actor to take the final decision. To decide whether the training should continue or try different configurations, the user needs to analyze online the hyperparameters most adequate to the training dataset, observing metrics such as accuracy and loss values. Provenance naturally represents data derivation relationships (i.e., transformations, parameter values, etc.), which provide important support in this data analysis. Most of the existing provenance solutions define their own and proprietary data representations to support DL users in choosing the best hyperparameter configuration, which makes data analysis and interoperability difficult. We present Keras-Prov and its extension, named Keras-Prov++, which provides an analytical dashboard to support online hyperparameter fine-tuning. Different from the current mainstream solutions, Keras-Prov automatically captures the provenance data of DL applications using the W3C PROV recommendation, allowing for hyperparameter online analysis to help the user deciding on changing hyperparameters’ values after observing the performance of the models on a validation set. We provide an experimental evaluation of Keras-Prov++ using AlexNet and a real case study, named DenseED, that acts as a surrogate model for solving equations. During the online analysis, the users identify scenarios that suggest reducing the number of epochs to avoid unnecessary executions and fine-tuning the learning rate to improve the model accuracy.

Download Full-text

Query Answer Reformulation over Knowledge Bases

Journal of Information and Data Management ◽

10.5753/jidm.2021.1914 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

João Pedro V. Pinheiro ◽

Marco A. Casanova ◽

Elisa S. Menendez

Keyword(s):

Knowledge Base ◽

Knowledge Bases ◽

Proper Treatment ◽

Redundant Data ◽

Query Answer

The answer of a query, submitted to a database or a knowledge base, is often long and may contain redundant data. The user is frequently forced to browse through a long answer or refine and repeat the query until the answer reaches a manageable size. Without proper treatment, consuming the answer may indeed become a tedious task. This article then proposes a process that modifies the presentation of a query answer to improve the quality of the user’s experience in the context of an RDF knowledge base. The process reorganizes the original query answer by applying heuristics to summarize the results and to select template questions that create a user dialog that guides the presentation of the results. The article also includes experiments based on RDF versions of MusicBrainz, enriched with DBpedia data, and IMDb, each with over 200 million RDF triples. The experiments use sample queries from well-known benchmarks.

Download Full-text

An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study

Journal of Information and Data Management ◽

10.5753/jidm.2021.1966 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Angelo Augusto Frozza ◽

Eduardo Dias Defreyn ◽

Ronaldo Dos Santos Mello

Keyword(s):

A Priori ◽

Database System ◽

Data Representation ◽

Data Types ◽

Data Interoperability ◽

Nosql Databases ◽

Prototype Tool ◽

Nosql Database ◽

Integration Data

Although NoSQL databases do not require a schema a priori, being aware of the database schema is essential for activities like data integration, data validation, or data interoperability. This paper presents a process for the extraction of columnar NoSQL database schemas. We adopt JSON as a canonical format for data representation, and we validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we innovate by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, and a resulting schema that follows the JSON Schema format.

Download Full-text

Using Inverted Index for Fingerprint Search

Journal of Information and Data Management ◽

10.5753/jidm.2021.1918 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Johnny Marcos S. Soares ◽

Luciano Barbosa ◽

Paulo Antonio Leal Rego ◽

Regis Pires Magalhães ◽

Jose Antônio F. de Macêdo

Keyword(s):

Information Retrieval ◽

Penetration Rate ◽

Locality Sensitive Hashing ◽

Inverted Index ◽

Text Documents ◽

Data Set ◽

Textual Information ◽

Data Indexing ◽

Biometric Information ◽

Fingerprint Data

Fingerprints are the most used biometric information for identifying people. With the increase in fingerprint data, indexing techniques are essential to perform an efficient search. In this work, we devise a solution that applies traditional inverted index, widely used in textual information retrieval, for fingerprint search. For that, it first converts fingerprints to text documents using techniques, such as Minutia Cylinder-Code and Locality-Sensitive Hashing, and then indexes them in inverted files. In the experimental evaluation, our approach obtained 0.42% of error rate with 10% of penetration rate in the FVC2002 DB1a data set, surpassing some established methods.

Download Full-text

A Platform for Keyword Search and its Application for COVID-19 Pandemic Data

Journal of Information and Data Management ◽

10.5753/jidm.2021.1904 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Yenier T. Izquierdo ◽

Grettel M. Garcia ◽

Melissa Lemos ◽

Alexandre Novello ◽

Bruno Novelli ◽

...

Keyword(s):

Relational Databases ◽

Keyword Search ◽

Data Retrieval ◽

Technical Skills ◽

Third Party ◽

Attractive Alternative ◽

Database Access ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Retrieval Mechanism

Keyword search is typically associated with information retrieval systems. However, recently, keyword search has been expanded to relational databases and RDF datasets, as an attractive alternative to traditional database access. This paper introduces DANKE, a platform for keyword search over databases, and discusses how third-party applications can be equipped with DANKE to take advantage of a data retrieval mechanism that does not require users to have specific technical skills for searching, retrieving and exploring data. The paper ends with the description of an application, called CovidKeyS, which uses DANKE to implement keyword search over three COVID-19 data scenarios.

Download Full-text

Privacy-preserving of patients with Differential Privacy: an experimental evaluation in COVID-19 dataset

Journal of Information and Data Management ◽

10.5753/jidm.2021.1947 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Manuel E. B. Filho ◽

Eduardo R. Duarte Neto ◽

Javam C. Machado

Keyword(s):

Health Care Providers ◽

Private Information ◽

Data Privacy ◽

Differential Privacy ◽

Health Data ◽

Care Providers ◽

Privacy Concerns ◽

Effective Strategies ◽

The World ◽

New Challenges

The pandemic of the new coronavirus (COVID-19) has brought new challenges to health systems in almost every corner of the world, many of them overburdened. The data analysis has given support in the fight against the coronavirus. Through this analysis, government authorities, together with health care providers, adopted effective strategies. Yet, those strategies can not be careless of privacy concerns. The individuals’ privacy is a right of each citizen. Privacy techniques guarantee the analysis of health data without exposing individuals’ private information. However, a balance between data privacy and utility is essential for a good analysis of the data. This work will demonstrate that it is possible to guarantee the privacy of infected patients and maintain the utility of the data, allowing a sound analysis on them, from the visualization of the application of differentially private mechanisms on queries in the data of patients tested in the State of Ceará - Brazil.

Download Full-text

The Impact of Privacy Regulations on DB Systems

Journal of Information and Data Management ◽

10.5753/jidm.2021.1958 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Javam C. Machado ◽

Paulo R. P. Amora

Keyword(s):

Personal Data ◽

Physical World ◽

Research Opportunities ◽

Data Usage ◽

Information Usage ◽

Digital World ◽

The Impact

Personal data usage and collection are activities that used to grow unrestricted. However, several laws in the physical world ensure rights to people regarding their privacy and information usage. In the last years, legislators passed many laws, regulations, and acts to replicate these rights to the digital world. By doing so, new constraints, rights, and duties appear on every component of the data usage and collection workflow. In this paper, we discuss legislations’ implications, identifying impacts that these regulations introduce to current DBMS, and survey recent works that aim to solve the problems raised by these impacts, highlighting research opportunities and identifying how solutions can be achieved for the problems.

Download Full-text

Journal of Information and Data Management
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Sociedade Brasileira De Computacao - SB

Empowering Natural Language Interfaces to Databases with Aggregations

Multimodal Provenance-based Analysis of Collaboration in Business Processes

Query co-planning for shared execution in key-value stores

Online Deep Learning Hyperparameter Tuning based on Provenance Analysis

Query Answer Reformulation over Knowledge Bases

An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study

Using Inverted Index for Fingerprint Search

A Platform for Keyword Search and its Application for COVID-19 Pandemic Data

Privacy-preserving of patients with Differential Privacy: an experimental evaluation in COVID-19 dataset

The Impact of Privacy Regulations on DB Systems

Export Citation Format

Journal of Information and Data ManagementLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Sociedade Brasileira De Computacao - SB

Empowering Natural Language Interfaces to Databases with Aggregations

Multimodal Provenance-based Analysis of Collaboration in Business Processes

Query co-planning for shared execution in key-value stores

Online Deep Learning Hyperparameter Tuning based on Provenance Analysis

Query Answer Reformulation over Knowledge Bases

An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study

Using Inverted Index for Fingerprint Search

A Platform for Keyword Search and its Application for COVID-19 Pandemic Data

Privacy-preserving of patients with Differential Privacy: an experimental evaluation in COVID-19 dataset

The Impact of Privacy Regulations on DB Systems

Journal of Information and Data Management
Latest Publications