Get a Sample for a Discount: Sampling-Based XML Data Pricing
While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendorsand data quality is not negotiable. In this paper we study a modelwhere data quality can be traded for a discount. We focus on the case ofXML documents and consider completeness as the quality dimension. Inour setting, the data provider offers an XML document, and sets boththe price of the document and a weight to each node of the document,depending on its potential worth. The data consumer proposes a price.If the proposed price is lower than that of the entire document, thenthe data consumer receives a sample, i.e., a random rooted subtree ofthe document whose selection depends on the discounted price and theweight of nodes. By requesting several samples, the data consumer caniteratively explore the data in the document. We show that the uniformrandom sampling of a rooted subtree with prescribed weight isunfortunately intractable. However, we are able to identify several practical casesthat are tractable. The first case is uniform random sampling of a rootedsubtree with prescribed size; the second case restricts to binary weights.For both these practical cases we present polynomial-time algorithmsand explain how they can be integrated into an iterative exploratorysampling approach.