Journal of Information and Data Management
Latest Publications


TOTAL DOCUMENTS

35
(FIVE YEARS 35)

H-INDEX

0
(FIVE YEARS 0)

Published By Sociedade Brasileira De Computacao - SB

2178-7107, 2178-7107

2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Alexandre F. Novello ◽  
Marco A. Casanova

A Natural Language Interface to Database (NLIDB) refers to a database interface that translates a question asked in natural language into a structured query. Aggregation questions express aggregation functions, such as count, sum, average, minimum and maximum, and optionally a group by clause and a having clause. NLIDBs deliver good results for standard questions but usually do not deal with aggregation questions. The main contribution of this article is a generic module, called GLAMORISE (GeneraL Aggregation MOdule using a RelatIonal databaSE), that extends NLIDBs to cope with aggregation questions. GLAMORISE covers aggregations with ambiguities, timescale differences, aggregations in multiple attributes, the use of superlative adjectives, basic recognition of measurement units, and aggregations in attributes with compound names.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Maria Luiza Falci ◽  
Andréa Magalhães ◽  
Aline Paes ◽  
Vanessa Braganholo ◽  
Daniel De Oliveira

Modeling business processes as a set of activities to accomplish goals naturally makes them be executed several times. Usually, such executions produce a large portion of provenance data in different formats such as text, audio, and video. Such a multiple-type nature gives origin to multimodal provenance data. Analyzing multimodal provenance data in an integrated form may be complex and error-prone when manually performed as it requires extracting information from free-text, audio, and video files. However, such an analysis may generate valuable insights into the business process. The present article presents MINERVA (Multimodal busINEss pRoVenance Analysis). This approach focuses on identifying improvements that can be implemented in business processes, as well as in collaboration analysis using multimodal provenance data. MINERVA was evaluated through a feasibility study that used data from a consulting company.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Josué Ttito ◽  
Renato Marroquín ◽  
Sergio Lifschitz ◽  
Lewis McGibbney ◽  
José Talavera

Key-value stores propose a straightforward yet powerful data model. Data is modeled using key-value pairs where values can be arbitrary objects and written/read using the key associated with it. In addition to their simple interface, such data stores also provide read operations such as full and range scans. However, due to the simplicity of its interface, trying to optimize data accesses becomes challenging. This work aims to enable the shared execution of concurrent range and point queries on key-value stores. Thus, reducing the overall data movement when executing a complete workload. To accomplish this, we analyze different possible data structures and propose our variation of a segment tree, Updatable Interval Tree. Our data structure helps us co-planning and co-executing multiple range queries together and reduces redundant work. This results in executing workloads more efficiently and overall increased throughput, as we show in our evaluation.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Liliane Kunstmann ◽  
Débora Pina ◽  
Filipe Silva ◽  
Aline Paes ◽  
Patrick Valduriez ◽  
...  

Training Deep Learning (DL) models require adjusting a series of hyperparameters. Although there are several tools to automatically choose the best hyperparameter configuration, the user is still the main actor to take the final decision. To decide whether the training should continue or try different configurations, the user needs to analyze online the hyperparameters most adequate to the training dataset, observing metrics such as accuracy and loss values. Provenance naturally represents data derivation relationships (i.e., transformations, parameter values, etc.), which provide important support in this data analysis. Most of the existing provenance solutions define their own and proprietary data representations to support DL users in choosing the best hyperparameter configuration, which makes data analysis and interoperability difficult. We present Keras-Prov and its extension, named Keras-Prov++, which provides an analytical dashboard to support online hyperparameter fine-tuning. Different from the current mainstream solutions, Keras-Prov automatically captures the provenance data of DL applications using the W3C PROV recommendation, allowing for hyperparameter online analysis to help the user deciding on changing hyperparameters’ values after observing the performance of the models on a validation set. We provide an experimental evaluation of Keras-Prov++ using AlexNet and a real case study, named DenseED, that acts as a surrogate model for solving equations. During the online analysis, the users identify scenarios that suggest reducing the number of epochs to avoid unnecessary executions and fine-tuning the learning rate to improve the model accuracy.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
João Pedro V. Pinheiro ◽  
Marco A. Casanova ◽  
Elisa S. Menendez

The answer of a query, submitted to a database or a knowledge base, is often long and may contain redundant data. The user is frequently forced to browse through a long answer or refine and repeat the query until the answer reaches a manageable size. Without proper treatment, consuming the answer may indeed become a tedious task. This article then proposes a process that modifies the presentation of a query answer to improve the quality of the user’s experience in the context of an RDF knowledge base. The process reorganizes the original query answer by applying heuristics to summarize the results and to select template questions that create a user dialog that guides the presentation of the results. The article also includes experiments based on RDF versions of MusicBrainz, enriched with DBpedia data, and IMDb, each with over 200 million RDF triples. The experiments use sample queries from well-known benchmarks.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Angelo Augusto Frozza ◽  
Eduardo Dias Defreyn ◽  
Ronaldo Dos Santos Mello

Although NoSQL databases do not require a schema a priori, being aware of the database schema is essential for activities like data integration, data validation, or data interoperability. This paper presents a process for the extraction of columnar NoSQL database schemas. We adopt JSON as a canonical format for data representation, and we validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we innovate by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, and a resulting schema that follows the JSON Schema format.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Yenier T. Izquierdo ◽  
Grettel M. Garcia ◽  
Melissa Lemos ◽  
Alexandre Novello ◽  
Bruno Novelli ◽  
...  

Keyword search is typically associated with information retrieval systems. However, recently, keyword search has been expanded to relational databases and RDF datasets, as an attractive alternative to traditional database access. This paper introduces DANKE, a platform for keyword search over databases, and discusses how third-party applications can be equipped with DANKE to take advantage of a data retrieval mechanism that does not require users to have specific technical skills for searching, retrieving and exploring data. The paper ends with the description of an application, called CovidKeyS, which uses DANKE to implement keyword search over three COVID-19 data scenarios.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Manuel E. B. Filho ◽  
Eduardo R. Duarte Neto ◽  
Javam C. Machado

The pandemic of the new coronavirus (COVID-19) has brought new challenges to health systems in almost every corner of the world, many of them overburdened. The data analysis has given support in the fight against the coronavirus. Through this analysis, government authorities, together with health care providers, adopted effective strategies. Yet, those strategies can not be careless of privacy concerns. The individuals’ privacy is a right of each citizen. Privacy techniques guarantee the analysis of health data without exposing individuals’ private information. However, a balance between data privacy and utility is essential for a good analysis of the data. This work will demonstrate that it is possible to guarantee the privacy of infected patients and maintain the utility of the data, allowing a sound analysis on them, from the visualization of the application of differentially private mechanisms on queries in the data of patients tested in the State of Ceará - Brazil.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Johnny Marcos S. Soares ◽  
Luciano Barbosa ◽  
Paulo Antonio Leal Rego ◽  
Regis Pires Magalhães ◽  
Jose Antônio F. de Macêdo

Fingerprints are the most used biometric information for identifying people. With the increase in fingerprint data, indexing techniques are essential to perform an efficient search. In this work, we devise a solution that applies traditional inverted index, widely used in textual information retrieval, for fingerprint search. For that, it first converts fingerprints to text documents using techniques, such as Minutia Cylinder-Code and Locality-Sensitive Hashing, and then indexes them in inverted files. In the experimental evaluation, our approach obtained 0.42% of error rate with 10% of penetration rate in the FVC2002 DB1a data set, surpassing some established methods.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Maria de Lourdes M. Silva ◽  
Iago C. Chaves ◽  
Javam C. Machado

In this article we propose a differentially private reverse top-k query. Our strategy allows obtaining the less frequent data according to a search criteria, with a high guarantee of privacy of the individuals who contributed with personal data in the original database. We apply our strategy on public data for COVID-19 in the State of Ceará using two different queries. Our experimental results show that the result of the proposed top-k query returns a high degree of similarity to the result of a conventional top-k query, when the chosen budget is suitable, providing useful results for researchers, while ensuring a low probability of re-identification of individuals arising from the properties of differential privacy.


Sign in / Sign up

Export Citation Format

Share Document