Cost Modeling and Range Estimation for Top-k Retrieval in Relational Databases

Author(s):  
Anteneh Ayanso ◽  
Paulo B. Goes ◽  
Kumar Mehta

Relational databases have increasingly become the basis for a wide range of applications that require efficient methods for exploratory search and retrieval. Top-k retrieval addresses this need and involves finding a limited number of records whose attribute values are the closest to those specified in a query. One of the approaches in the recent literature is query-mapping which deals with converting top-k queries into equivalent range queries that relational database management systems (RDBMSs) normally support. This approach combines the advantages of simplicity as well as practicality by avoiding the need for modifications to the query engine, or specialized data structures and indexing techniques to handle top-k queries separately. This paper reviews existing query-mapping techniques in the literature and presents a range query estimation method based on cost modeling. Experiments on real world and synthetic data sets show that the cost-based range estimation method performs at least as well as prior methods and avoids the need to calibrate workloads on specific database contents.

2009 ◽  
Vol 20 (4) ◽  
pp. 1-25 ◽  
Author(s):  
Anteneh Ayanso ◽  
Paulo B. Goes ◽  
Kumar Mehta

Finding efficient methods for supporting top-k relational queries has received significant attention in academic research. One of the approaches in the recent literature is query-mapping, in which top-k queries are mapped (translated) into equivalent range queries that relational database systems (RDBMSs) normally support. This approach combines the advantage of simplicity as well as practicality by avoiding the need for modifications to the query engine, or specialized data structures or indexing techniques to handle top-k queries separately. However, existing methods following this approach fall short of adequately modeling the problem environment and providing consistent results. In this article, the authors propose a cost-based range estimation model for the query-mapping approach. They provide a methodology for trading-off relevant query execution cost components and mapping a top-k query into a cost-optimal range query for efficient execution. Their experiments on real world and synthetic data sets show that the proposed strategy not only avoids the need to calibrate workloads on specific database contents, but also performs at least as well as prior methods.


Sensors ◽  
2019 ◽  
Vol 19 (14) ◽  
pp. 3158
Author(s):  
Jian Yang ◽  
Xiaojuan Ban ◽  
Chunxiao Xing

With the rapid development of mobile networks and smart terminals, mobile crowdsourcing has aroused the interest of relevant scholars and industries. In this paper, we propose a new solution to the problem of user selection in mobile crowdsourcing system. The existing user selection schemes mainly include: (1) find a subset of users to maximize crowdsourcing quality under a given budget constraint; (2) find a subset of users to minimize cost while meeting minimum crowdsourcing quality requirement. However, these solutions have deficiencies in selecting users to maximize the quality of service of the task and minimize costs. Inspired by the marginalism principle in economics, we wish to select a new user only when the marginal gain of the newly joined user is higher than the cost of payment and the marginal cost associated with integration. We modeled the scheme as a marginalism problem of mobile crowdsourcing user selection (MCUS-marginalism). We rigorously prove the MCUS-marginalism problem to be NP-hard, and propose a greedy random adaptive procedure with annealing randomness (GRASP-AR) to achieve maximize the gain and minimize the cost of the task. The effectiveness and efficiency of our proposed approaches are clearly verified by a large scale of experimental evaluations on both real-world and synthetic data sets.


2020 ◽  
Vol 633 ◽  
pp. A46
Author(s):  
L. Siltala ◽  
M. Granvik

Context. The bulk density of an asteroid informs us about its interior structure and composition. To constrain the bulk density, one needs an estimated mass of the asteroid. The mass is estimated by analyzing an asteroid’s gravitational interaction with another object, such as another asteroid during a close encounter. An estimate for the mass has typically been obtained with linearized least-squares methods, despite the fact that this family of methods is not able to properly describe non-Gaussian parameter distributions. In addition, the uncertainties reported for asteroid masses in the literature are sometimes inconsistent with each other and are suspected to be unrealistically low. Aims. We aim to present a Markov-chain Monte Carlo (MCMC) algorithm for the asteroid mass estimation problem based on asteroid-asteroid close encounters. We verify that our algorithm works correctly by applying it to synthetic data sets. We use astrometry available through the Minor Planet Center to estimate masses for a select few example cases and compare our results with results reported in the literature. Methods. Our mass-estimation method is based on the robust adaptive Metropolis algorithm that has been implemented into the OpenOrb asteroid orbit computation software. Our method has the built-in capability to analyze multiple perturbing asteroids and test asteroids simultaneously. Results. We find that our mass estimates for the synthetic data sets are fully consistent with the ground truth. The nominal masses for real example cases typically agree with the literature but tend to have greater uncertainties than what is reported in recent literature. Possible reasons for this include different astrometric data sets and weights, different test asteroids, different force models or different algorithms. For (16) Psyche, the target of NASA’s Psyche mission, our maximum likelihood mass is approximately 55% of what is reported in the literature. Such a low mass would imply that the bulk density is significantly lower than previously expected and thus disagrees with the theory of (16) Psyche being the metallic core of a protoplanet. We do, however, note that masses reported in recent literature remain within our 3-sigma limits. Results. The new MCMC mass-estimation algorithm performs as expected, but a rigorous comparison with results from a least-squares algorithm with the exact same data set remains to be done. The matters of uncertainties in comparison with other algorithms and correlations of observations also warrant further investigation.


2013 ◽  
Vol 748 ◽  
pp. 590-594
Author(s):  
Li Liao ◽  
Yong Gang Lu ◽  
Xu Rong Chen

We propose a novel density estimation method using both the k-nearest neighbor (KNN) graph and the potential field of the data points to capture the local and global data distribution information respectively. The clustering is performed based on the computed density values. A forest of trees is built using each data point as the tree node. And the clusters are formed according to the trees in the forest. The new clustering method is evaluated by comparing with three popular clustering methods, K-means++, Mean Shift and DBSCAN. Experiments on two synthetic data sets and one real data set show that our approach can effectively improve the clustering results.


Author(s):  
Timothy M. Laseter

This case introduces a framework for cost modeling. Two data sets (one for injection-molded plastic parts and another for compressors) allow students to apply the cost-driver framework in conjunction with basic spreadsheet and regression analyses. Although obviously applicable in a course on supply chain management, the case can also be used to teach competitive cost analysis for strategic decision making.


2019 ◽  
Vol 28 (06) ◽  
pp. 1960002 ◽  
Author(s):  
Brankica Bratić ◽  
Michael E. Houle ◽  
Vladimir Kurbalija ◽  
Vincent Oria ◽  
Miloš Radovanović

The K-nearest neighbor graph (K-NNG) is a data structure used by many machine-learning algorithms. Naive computation of the K-NNG has quadratic time complexity, which in many cases is not efficient enough, producing the need for fast and accurate approximation algorithms. NN-Descent is one such algorithm that is highly efficient, but has a major drawback in that K-NNG approximations are accurate only on data of low intrinsic dimensionality. This paper represents an experimental analysis of this behavior, and investigates possible solutions. Experimental results show that there is a link between the performance of NN-Descent and the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs – points with large in-degrees in the K-NNG. First, we explain how the presence of the hubness phenomenon causes bad NN-Descent performance. In light of that, we propose four NN-Descent variants to alleviate the observed negative inuence of hubs. By evaluating the proposed approaches on several real and synthetic data sets, we conclude that our approaches are more accurate, but often at the cost of higher scan rates.


Geophysics ◽  
2018 ◽  
Vol 83 (1) ◽  
pp. O1-O13 ◽  
Author(s):  
Anders U. Waldeland ◽  
Hao Zhao ◽  
Jorge H. Faccipieri ◽  
Anne H. Schistad Solberg ◽  
Leiv-J. Gelius

The common-reflection-surface (CRS) method offers a stack with higher signal-to-noise ratio at the cost of a time-consuming semblance search to obtain the stacking parameters. We have developed a fast method for extracting the CRS parameters using local slope and curvature. We estimate the slope and curvature with the gradient structure tensor and quadratic structure tensor on stacked data. This is done under the assumption that a stacking velocity is already available. Our method was compared with an existing slope-based method, in which the slope is extracted from prestack data. An experiment on synthetic data shows that our method has increased robustness against noise compared with the existing method. When applied to two real data sets, our method achieves accuracy comparable with the pragmatic and full semblance searches. Our method has the advantage of being approximately two and four orders of magnitude faster than the semblance searches.


Author(s):  
Timothy M. Laseter

This case introduces a framework for cost modeling. Two data sets (one for injection-molded plastic parts and another for compressors) allow students to apply the cost-driver framework in conjunction with basic spreadsheet and regression analyses. Although obviously applicable in a course on supply chain management, the case can also be used to teach competitive cost analysis for strategic decision making.


2020 ◽  
Vol 4 (4) ◽  
pp. 1-14
Author(s):  
Farrukh Mahmood ◽  
Shumaila Hashim ◽  
Uzma Iram ◽  
Muhammad Zubair Chishti

Wage disparities research hardly incorporate for the cost of living differences due to data restriction, while the wage disparity issue is the crucial area of economist interest. The study aims to examine the wage disparities between high and low wage cities for Punjab and Sindh province of Pakistan with and without the cost of living, deploying the data of Pakistan Social and Living Standards Measurement Survey (PSLM) with Household Integrated Economic Survey (HIES) for 2005, 2007, 2010, and 2013. Applying the Oaxaca-Blinder estimation method, the findings infer that wage dispersion is high without the cost of living model for both provinces (Punjab and Sindh) as compared to with cost of the living model. Moreover, the results reveal that the wage dispersion is greater in Punjab province than Sindh province. For policymakers, our study suggests that the cost of living is an essential component of the wage dispersion in Pakistan’s cities; it should be considered while formulating for wage policy.


Author(s):  
Cong Gao ◽  
Ping Yang ◽  
Yanping Chen ◽  
Zhongmin Wang ◽  
Yue Wang

AbstractWith large deployment of wireless sensor networks, anomaly detection for sensor data is becoming increasingly important in various fields. As a vital data form of sensor data, time series has three main types of anomaly: point anomaly, pattern anomaly, and sequence anomaly. In production environments, the analysis of pattern anomaly is the most rewarding one. However, the traditional processing model cloud computing is crippled in front of large amount of widely distributed data. This paper presents an edge-cloud collaboration architecture for pattern anomaly detection of time series. A task migration algorithm is developed to alleviate the problem of backlogged detection tasks at edge node. Besides, the detection tasks related to long-term correlation and short-term correlation in time series are allocated to cloud and edge node, respectively. A multi-dimensional feature representation scheme is devised to conduct efficient dimension reduction. Two key components of the feature representation trend identification and feature point extraction are elaborated. Based on the result of feature representation, pattern anomaly detection is performed with an improved kernel density estimation method. Finally, extensive experiments are conducted with synthetic data sets and real-world data sets.


Sign in / Sign up

Export Citation Format

Share Document