Cost Modeling and Range Estimation for Top-k Retrieval in Relational Databases

Relational databases have increasingly become the basis for a wide range of applications that require efficient methods for exploratory search and retrieval. Top-k retrieval addresses this need and involves finding a limited number of records whose attribute values are the closest to those specified in a query. One of the approaches in the recent literature is query-mapping which deals with converting top-k queries into equivalent range queries that relational database management systems (RDBMSs) normally support. This approach combines the advantages of simplicity as well as practicality by avoiding the need for modifications to the query engine, or specialized data structures and indexing techniques to handle top-k queries separately. This paper reviews existing query-mapping techniques in the literature and presents a range query estimation method based on cost modeling. Experiments on real world and synthetic data sets show that the cost-based range estimation method performs at least as well as prior methods and avoids the need to calibrate workloads on specific database contents.

Download Full-text

A Cost-Based Range Estimation for Mapping Top-k Selection Queries over Relational Databases

Journal of Database Management ◽

10.4018/jdm.2009062501 ◽

2009 ◽

Vol 20 (4) ◽

pp. 1-25 ◽

Cited By ~ 5

Author(s):

Anteneh Ayanso ◽

Paulo B. Goes ◽

Kumar Mehta

Keyword(s):

Relational Databases ◽

Academic Research ◽

Synthetic Data ◽

Database Systems ◽

Data Sets ◽

Estimation Model ◽

Range Estimation ◽

Query Engine ◽

Relational Database Systems ◽

Query Mapping

Finding efficient methods for supporting top-k relational queries has received significant attention in academic research. One of the approaches in the recent literature is query-mapping, in which top-k queries are mapped (translated) into equivalent range queries that relational database systems (RDBMSs) normally support. This approach combines the advantage of simplicity as well as practicality by avoiding the need for modifications to the query engine, or specialized data structures or indexing techniques to handle top-k queries separately. However, existing methods following this approach fall short of adequately modeling the problem environment and providing consistent results. In this article, the authors propose a cost-based range estimation model for the query-mapping approach. They provide a methodology for trading-off relevant query execution cost components and mapping a top-k query into a cost-optimal range query for efficient execution. Their experiments on real world and synthetic data sets show that the proposed strategy not only avoids the need to calibrate workloads on specific database contents, but also performs at least as well as prior methods.

Download Full-text

Using Greedy Random Adaptive Procedure to Solve the User Selection Problem in Mobile Crowdsourcing

Sensors ◽

10.3390/s19143158 ◽

2019 ◽

Vol 19 (14) ◽

pp. 3158

Author(s):

Jian Yang ◽

Xiaojuan Ban ◽

Chunxiao Xing

Keyword(s):

Mobile Networks ◽

Large Scale ◽

Rapid Development ◽

Synthetic Data ◽

Data Sets ◽

User Selection ◽

Adaptive Procedure ◽

Mobile Crowdsourcing ◽

Marginal Gain ◽

The Cost

With the rapid development of mobile networks and smart terminals, mobile crowdsourcing has aroused the interest of relevant scholars and industries. In this paper, we propose a new solution to the problem of user selection in mobile crowdsourcing system. The existing user selection schemes mainly include: (1) find a subset of users to maximize crowdsourcing quality under a given budget constraint; (2) find a subset of users to minimize cost while meeting minimum crowdsourcing quality requirement. However, these solutions have deficiencies in selecting users to maximize the quality of service of the task and minimize costs. Inspired by the marginalism principle in economics, we wish to select a new user only when the marginal gain of the newly joined user is higher than the cost of payment and the marginal cost associated with integration. We modeled the scheme as a marginalism problem of mobile crowdsourcing user selection (MCUS-marginalism). We rigorously prove the MCUS-marginalism problem to be NP-hard, and propose a greedy random adaptive procedure with annealing randomness (GRASP-AR) to achieve maximize the gain and minimize the cost of the task. The effectiveness and efficiency of our proposed approaches are clearly verified by a large scale of experimental evaluations on both real-world and synthetic data sets.

Download Full-text

Asteroid mass estimation with the robust adaptive Metropolis algorithm

Astronomy and Astrophysics ◽

10.1051/0004-6361/201935608 ◽

2020 ◽

Vol 633 ◽

pp. A46

Author(s):

L. Siltala ◽

M. Granvik

Keyword(s):

Least Squares ◽

Bulk Density ◽

Recent Literature ◽

Estimation Method ◽

Synthetic Data ◽

Metropolis Algorithm ◽

Data Sets ◽

Data Set ◽

Mass Estimation ◽

Robust Adaptive

Context. The bulk density of an asteroid informs us about its interior structure and composition. To constrain the bulk density, one needs an estimated mass of the asteroid. The mass is estimated by analyzing an asteroid’s gravitational interaction with another object, such as another asteroid during a close encounter. An estimate for the mass has typically been obtained with linearized least-squares methods, despite the fact that this family of methods is not able to properly describe non-Gaussian parameter distributions. In addition, the uncertainties reported for asteroid masses in the literature are sometimes inconsistent with each other and are suspected to be unrealistically low. Aims. We aim to present a Markov-chain Monte Carlo (MCMC) algorithm for the asteroid mass estimation problem based on asteroid-asteroid close encounters. We verify that our algorithm works correctly by applying it to synthetic data sets. We use astrometry available through the Minor Planet Center to estimate masses for a select few example cases and compare our results with results reported in the literature. Methods. Our mass-estimation method is based on the robust adaptive Metropolis algorithm that has been implemented into the OpenOrb asteroid orbit computation software. Our method has the built-in capability to analyze multiple perturbing asteroids and test asteroids simultaneously. Results. We find that our mass estimates for the synthetic data sets are fully consistent with the ground truth. The nominal masses for real example cases typically agree with the literature but tend to have greater uncertainties than what is reported in recent literature. Possible reasons for this include different astrometric data sets and weights, different test asteroids, different force models or different algorithms. For (16) Psyche, the target of NASA’s Psyche mission, our maximum likelihood mass is approximately 55% of what is reported in the literature. Such a low mass would imply that the bulk density is significantly lower than previously expected and thus disagrees with the theory of (16) Psyche being the metallic core of a protoplanet. We do, however, note that masses reported in recent literature remain within our 3-sigma limits. Results. The new MCMC mass-estimation algorithm performs as expected, but a rigorous comparison with results from a least-squares algorithm with the exact same data set remains to be done. The matters of uncertainties in comparison with other algorithms and correlations of observations also warrant further investigation.

Download Full-text

Clustering Based on a Novel Density Estimation Method

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.748.590 ◽

2013 ◽

Vol 748 ◽

pp. 590-594

Author(s):

Li Liao ◽

Yong Gang Lu ◽

Xu Rong Chen

Keyword(s):

Density Estimation ◽

Nearest Neighbor ◽

Mean Shift ◽

Estimation Method ◽

Synthetic Data ◽

Real Data ◽

Data Sets ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Data Set

We propose a novel density estimation method using both the k-nearest neighbor (KNN) graph and the potential field of the data points to capture the local and global data distribution information respectively. The clustering is performed based on the computed density values. A forest of trees is built using each data point as the tree node. And the clusters are formed according to the trees in the forest. The new clustering method is evaluated by comparing with three popular clustering methods, K-means++, Mean Shift and DBSCAN. Experiments on two synthetic data sets and one real data set show that our approach can effectively improve the clustering results.

Download Full-text

Whirlpool Corporation Global Procurement

Darden Business Publishing Cases ◽

10.1108/case.darden.2021.000012 ◽

2003 ◽

pp. 1-13

Author(s):

Timothy M. Laseter

Keyword(s):

Cost Driver ◽

Strategic Decision ◽

Cost Modeling ◽

Strategic Decision Making ◽

Data Sets ◽

Regression Analyses ◽

Injection Molded ◽

Plastic Parts ◽

The Cost ◽

Injection Molded Plastic

This case introduces a framework for cost modeling. Two data sets (one for injection-molded plastic parts and another for compressors) allow students to apply the cost-driver framework in conjunction with basic spreadsheet and regression analyses. Although obviously applicable in a course on supply chain management, the case can also be used to teach competitive cost analysis for strategic decision making.

Download Full-text

The Influence of Hubness on NN-Descent

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019600029 ◽

2019 ◽

Vol 28 (06) ◽

pp. 1960002 ◽

Cited By ~ 3

Author(s):

Brankica Bratić ◽

Michael E. Houle ◽

Vladimir Kurbalija ◽

Vincent Oria ◽

Miloš Radovanović

Keyword(s):

Nearest Neighbor ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Data Sets ◽

Accurate Approximation ◽

K Nearest Neighbor ◽

Major Drawback ◽

Neighbor Graph ◽

Nearest Neighbor Graph ◽

The Cost

The K-nearest neighbor graph (K-NNG) is a data structure used by many machine-learning algorithms. Naive computation of the K-NNG has quadratic time complexity, which in many cases is not efficient enough, producing the need for fast and accurate approximation algorithms. NN-Descent is one such algorithm that is highly efficient, but has a major drawback in that K-NNG approximations are accurate only on data of low intrinsic dimensionality. This paper represents an experimental analysis of this behavior, and investigates possible solutions. Experimental results show that there is a link between the performance of NN-Descent and the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs – points with large in-degrees in the K-NNG. First, we explain how the presence of the hubness phenomenon causes bad NN-Descent performance. In light of that, we propose four NN-Descent variants to alleviate the observed negative inuence of hubs. By evaluating the proposed approaches on several real and synthetic data sets, we conclude that our approaches are more accurate, but often at the cost of higher scan rates.

Download Full-text

Fast and robust common-reflection-surface parameter estimation

Geophysics ◽

10.1190/geo2017-0113.1 ◽

2018 ◽

Vol 83 (1) ◽

pp. O1-O13 ◽

Cited By ~ 5

Author(s):

Anders U. Waldeland ◽

Hao Zhao ◽

Jorge H. Faccipieri ◽

Anne H. Schistad Solberg ◽

Leiv-J. Gelius

Keyword(s):

Signal To Noise Ratio ◽

Synthetic Data ◽

Real Data ◽

Structure Tensor ◽

Data Sets ◽

Fast Method ◽

Surface Parameter ◽

Common Reflection Surface ◽

The Common ◽

The Cost

The common-reflection-surface (CRS) method offers a stack with higher signal-to-noise ratio at the cost of a time-consuming semblance search to obtain the stacking parameters. We have developed a fast method for extracting the CRS parameters using local slope and curvature. We estimate the slope and curvature with the gradient structure tensor and quadratic structure tensor on stacked data. This is done under the assumption that a stacking velocity is already available. Our method was compared with an existing slope-based method, in which the slope is extracted from prestack data. An experiment on synthetic data shows that our method has increased robustness against noise compared with the existing method. When applied to two real data sets, our method achieves accuracy comparable with the pragmatic and full semblance searches. Our method has the advantage of being approximately two and four orders of magnitude faster than the semblance searches.

Download Full-text

Whirlpool Corporation Global Procurement

Darden Business Publishing Cases ◽

10.1108/case.darden.2016.000350 ◽

2017 ◽

pp. 1-13

Author(s):

Timothy M. Laseter

Keyword(s):

Cost Driver ◽

Strategic Decision ◽

Cost Modeling ◽

Strategic Decision Making ◽

Data Sets ◽

Regression Analyses ◽

Injection Molded ◽

Plastic Parts ◽

The Cost ◽

Injection Molded Plastic

Download Full-text

Wage disparities between high and low wage cities with and without the cost of living within Punjab and Sindh: An application of Oaxaca-Blinder using PSLM with HIES

Journal of Applied Economics and Business Studies ◽

10.34260/jaebs.441 ◽

2020 ◽

Vol 4 (4) ◽

pp. 1-14

Author(s):

Farrukh Mahmood ◽

Shumaila Hashim ◽

Uzma Iram ◽

Muhammad Zubair Chishti

Keyword(s):

Estimation Method ◽

Living Standards ◽

Wage Dispersion ◽

Cost Of Living ◽

Wage Policy ◽

Essential Component ◽

Sindh Province ◽

Economic Survey ◽

The Cost ◽

Wage Disparities

Wage disparities research hardly incorporate for the cost of living differences due to data restriction, while the wage disparity issue is the crucial area of economist interest. The study aims to examine the wage disparities between high and low wage cities for Punjab and Sindh province of Pakistan with and without the cost of living, deploying the data of Pakistan Social and Living Standards Measurement Survey (PSLM) with Household Integrated Economic Survey (HIES) for 2005, 2007, 2010, and 2013. Applying the Oaxaca-Blinder estimation method, the findings infer that wage dispersion is high without the cost of living model for both provinces (Punjab and Sindh) as compared to with cost of the living model. Moreover, the results reveal that the wage dispersion is greater in Punjab province than Sindh province. For policymakers, our study suggests that the cost of living is an essential component of the wage dispersion in Pakistan’s cities; it should be considered while formulating for wage policy.

Download Full-text

An edge-cloud collaboration architecture for pattern anomaly detection of time series in wireless sensor networks

Complex & Intelligent Systems ◽

10.1007/s40747-021-00442-6 ◽

2021 ◽

Author(s):

Cong Gao ◽

Ping Yang ◽

Yanping Chen ◽

Zhongmin Wang ◽

Yue Wang

Keyword(s):

Time Series ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Anomaly Detection ◽

Estimation Method ◽

Feature Representation ◽

Sensor Data ◽

Wireless Sensor ◽

Data Sets ◽

Edge Node

AbstractWith large deployment of wireless sensor networks, anomaly detection for sensor data is becoming increasingly important in various fields. As a vital data form of sensor data, time series has three main types of anomaly: point anomaly, pattern anomaly, and sequence anomaly. In production environments, the analysis of pattern anomaly is the most rewarding one. However, the traditional processing model cloud computing is crippled in front of large amount of widely distributed data. This paper presents an edge-cloud collaboration architecture for pattern anomaly detection of time series. A task migration algorithm is developed to alleviate the problem of backlogged detection tasks at edge node. Besides, the detection tasks related to long-term correlation and short-term correlation in time series are allocated to cloud and edge node, respectively. A multi-dimensional feature representation scheme is devised to conduct efficient dimension reduction. Two key components of the feature representation trend identification and feature point extraction are elaborated. Based on the result of feature representation, pattern anomaly detection is performed with an improved kernel density estimation method. Finally, extensive experiments are conducted with synthetic data sets and real-world data sets.

Download Full-text