scholarly journals Get a Sample for a Discount: Sampling-Based XML Data Pricing

2017 ◽  
Author(s):  
Ruiming Tang ◽  
Antoine Amarilli ◽  
Pierre Senellart ◽  
Stéphane Bressan

While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendorsand data quality is not negotiable. In this paper we study a modelwhere data quality can be traded for a discount. We focus on the case ofXML documents and consider completeness as the quality dimension. Inour setting, the data provider offers an XML document, and sets boththe price of the document and a weight to each node of the document,depending on its potential worth. The data consumer proposes a price.If the proposed price is lower than that of the entire document, thenthe data consumer receives a sample, i.e., a random rooted subtree ofthe document whose selection depends on the discounted price and theweight of nodes. By requesting several samples, the data consumer caniteratively explore the data in the document. We show that the uniformrandom sampling of a rooted subtree with prescribed weight isunfortunately intractable. However, we are able to identify several practical casesthat are tractable. The first case is uniform random sampling of a rootedsubtree with prescribed size; the second case restricts to binary weights.For both these practical cases we present polynomial-time algorithmsand explain how they can be integrated into an iterative exploratorysampling approach.

2017 ◽  
Author(s):  
Ruiming Tang ◽  
Antoine Amarilli ◽  
Pierre Senellart ◽  
Stéphane Bressan

While price and data quality should define the major tradeoff for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document.We present a pseudo-polynomial time algorithm to select a rooted subtree with prescribed weight uniformly at random, but show that this problem is unfortunately intractable. Yet, we are able to identify several practical cases where our algorithm runs in polynomial time. The first case is uniform random sampling of a rooted subtree with prescribed size rather than weights; the second case restricts to binary weights.As a more challenging scenario for the sampling problem, we also study the uniform sampling of a rooted subtree of prescribed weight and prescribed height. We adapt our pseudo-polynomial time algorithm to this setting and identify tractable cases.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Yue Zhao ◽  
Ye Yuan ◽  
Guoren Wang

This paper describes a keyword search measure on probabilistic XML data based on ELM (extreme learning machine). We use this method to carry out keyword search on probabilistic XML data. A probabilistic XML document differs from a traditional XML document to realize keyword search in the consideration of possible world semantics. A probabilistic XML document can be seen as a set of nodes consisting of ordinary nodes and distributional nodes. ELM has good performance in text classification applications. As the typical semistructured data; the label of XML data possesses the function of definition itself. Label and context of the node can be seen as the text data of this node. ELM offers significant advantages such as fast learning speed, ease of implementation, and effective node classification. Set intersection can compute SLCA quickly in the node sets which is classified by using ELM. In this paper, we adopt ELM to classify nodes and compute probability. We propose two algorithms that are based on ELM and probability threshold to improve the overall performance. The experimental results verify the benefits of our methods according to various evaluation metrics.


10.37236/3977 ◽  
2016 ◽  
Vol 23 (1) ◽  
Author(s):  
O. Bodini ◽  
A. Genitrini ◽  
F. Peschanski

In this paper, we study the interleaving – or pure merge – operator that most often characterizes parallelism in concurrency theory. This operator is a principal cause of the so-called combinatorial explosion that makes the analysis of process behaviours e.g. by model-checking, very hard – at least from the point of view of computational complexity. The originality of our approach is to study this combinatorial explosion phenomenon on average, relying on advanced analytic combinatorics techniques. We study various measures that contribute to a better understanding of the process behaviours represented as plane rooted trees: the number of runs (corresponding to the width of the trees), the expected total size of the trees as well as their overall shape. Two practical outcomes of our quantitative study are also presented: (1) a linear-time algorithm to compute the probability of a concurrent run prefix, and (2) an efficient algorithm for uniform random sampling of concurrent runs. These provide interesting responses to the combinatorial explosion problem.


2018 ◽  
Vol 1 (1) ◽  
pp. 17
Author(s):  
Saripin Saripin ◽  
Kristi Agust ◽  
Rafni Rafni
Keyword(s):  

Berdasarkan pengamatan penulis dalam proses pembelajaran meja Belajar dimana siswa kurang baik kemampuan backhand. Ini terbukti dari backhand siswa yang belum maksimal dalam kembalinya bola, dan dalam melakukan backhand smash. Diduga jumlah faktor yang rendah seperti kondisi fisik: Fleksibilitas pergelangan tangan, kekuatan otot lengan bahu, daya tahan otot, keseimbangan, fleksibilitas punggung, kecepatan reaksi. Instrumen penelitian adalah uji kelenturan tes pergelangan tangan dan backhand, teknik pengambilan sampel porpusive penelitian ini menggunakan random sampling dengan sampel 14 siswa Prodi penjaskerek putra. Penelitian ini merupakan jenis penelitian korelasi. Kemudian, analisis data menggunakan rumus korelasi Product Moment Correlation. Berdasarkan hasil penelitian dan pengolahan data menggunakan prosedur statistik, penelitian menyimpulkan bahwa ada trade-off antara kelenturan pergelangan tangan dengan kemampuan backhand di mana rhitungnya -0058.


Author(s):  
Mohammed Ragheb Hakawati ◽  
Yasmin Yacob ◽  
Amiza Amir ◽  
Jabiry M. Mohammed ◽  
Khalid Jamal Jadaa

Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.


2021 ◽  
Vol 3 (3) ◽  
pp. 73-77
Author(s):  
Volkan Emirdar ◽  
Gulcin Ekizceli ◽  
Yagmur Dilber ◽  
Sevinc Inan ◽  
Muzaffer Sanci

Objective: The aim of the study to show the relation of  T cells in placental villous fragments with FOXP3,JAK1 and STAT5  receptors in different conditions such as   GDM, PE and IUGR placental tissues.  Methods: Specimens of ten(10) diabetic placentas, ten(10) preeclamptic, ten(10) intrauterine growth restricted placentas  and ten(10) control placentas were collected by systematic uniform random sampling. Immunohistochemical detections of FOXP3, JAK1 and STAT5 were performed in histological sections for each group’s placental tissue. The H-score value was derived for each specimen by calculating the sum of the percentage of syncytiotrophoblast and syncytial nodes in placenta and intervillus area. They were categorized by intensity of staining, multiplied by its respective score. Results: FOXP3, JAK1 and STAT5 immunoreactivity comparisons are shown in four groups of placentas. FOXP3 immunoreactions significantly increase in GDM group.  JAK1 and STAT5 immunoreactions significantly decrease in PE group. STAT5 immunoreactivity was detected crucially increase  in GDM group. Discussion: The results showed that in different conditions such as PE,GDM and IUGR,  T cells in   placental villous fragments have relation with FOXP3,JAK1 and STAT5  receptors and that FOXP3 can inactivate the PE and IUGR in the placental tissue. We have also confirmed as other studies that  JAK-STAT pathway plays important role in PE,IUGR and GDM placental tissue.


Sign in / Sign up

Export Citation Format

Share Document