Get a Sample for a Discount: Sampling-Based XML Data Pricing

While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendorsand data quality is not negotiable. In this paper we study a modelwhere data quality can be traded for a discount. We focus on the case ofXML documents and consider completeness as the quality dimension. Inour setting, the data provider offers an XML document, and sets boththe price of the document and a weight to each node of the document,depending on its potential worth. The data consumer proposes a price.If the proposed price is lower than that of the entire document, thenthe data consumer receives a sample, i.e., a random rooted subtree ofthe document whose selection depends on the discounted price and theweight of nodes. By requesting several samples, the data consumer caniteratively explore the data in the document. We show that the uniformrandom sampling of a rooted subtree with prescribed weight isunfortunately intractable. However, we are able to identify several practical casesthat are tractable. The first case is uniform random sampling of a rootedsubtree with prescribed size; the second case restricts to binary weights.For both these practical cases we present polynomial-time algorithmsand explain how they can be integrated into an iterative exploratorysampling approach.

Download Full-text

A Framework for Sampling-Based XML Data Pricing

10.31219/osf.io/m3xcn ◽

2017 ◽

Author(s):

Ruiming Tang ◽

Antoine Amarilli ◽

Pierre Senellart ◽

Stéphane Bressan

Keyword(s):

Data Quality ◽

Polynomial Time ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Uniform Sampling ◽

First Case ◽

Sampling Problem ◽

Uniform Random Sampling ◽

Data Consumer ◽

Pseudo Polynomial Time

While price and data quality should define the major tradeoff for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document.We present a pseudo-polynomial time algorithm to select a rooted subtree with prescribed weight uniformly at random, but show that this problem is unfortunately intractable. Yet, we are able to identify several practical cases where our algorithm runs in polynomial time. The first case is uniform random sampling of a rooted subtree with prescribed size rather than weights; the second case restricts to binary weights.As a more challenging scenario for the sampling problem, we also study the uniform sampling of a rooted subtree of prescribed weight and prescribed height. We adapt our pseudo-polynomial time algorithm to this setting and identify tractable cases.

Download Full-text

Keyword Search over Probabilistic XML Documents Based on Node Classification

Mathematical Problems in Engineering ◽

10.1155/2015/210961 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Yue Zhao ◽

Ye Yuan ◽

Guoren Wang

Keyword(s):

Keyword Search ◽

Possible World ◽

Xml Data ◽

Fast Learning ◽

Probabilistic Xml ◽

Learning Speed ◽

Xml Document ◽

Probability Threshold ◽

Node Classification ◽

Learning Machine

This paper describes a keyword search measure on probabilistic XML data based on ELM (extreme learning machine). We use this method to carry out keyword search on probabilistic XML data. A probabilistic XML document differs from a traditional XML document to realize keyword search in the consideration of possible world semantics. A probabilistic XML document can be seen as a set of nodes consisting of ordinary nodes and distributional nodes. ELM has good performance in text classification applications. As the typical semistructured data; the label of XML data possesses the function of definition itself. Label and context of the node can be seen as the text data of this node. ELM offers significant advantages such as fast learning speed, ease of implementation, and effective node classification. Set intersection can compute SLCA quickly in the node sets which is classified by using ELM. In this paper, we adopt ELM to classify nodes and compute probability. We propose two algorithms that are based on ELM and probability threshold to improve the overall performance. The experimental results verify the benefits of our methods according to various evaluation metrics.

Download Full-text

Uniform random sampling not recommended for large graph size estimation

Information Sciences ◽

10.1016/j.ins.2017.08.030 ◽

2017 ◽

Vol 421 ◽

pp. 136-153 ◽

Cited By ~ 2

Author(s):

Jianguo Lu ◽

Hao Wang

Keyword(s):

Random Sampling ◽

Size Estimation ◽

Large Graph ◽

Graph Size ◽

Uniform Random Sampling

Download Full-text

A Quantitative Study of Pure Parallel Processes

The Electronic Journal of Combinatorics ◽

10.37236/3977 ◽

2016 ◽

Vol 23 (1) ◽

Author(s):

O. Bodini ◽

A. Genitrini ◽

F. Peschanski

Keyword(s):

Quantitative Study ◽

Random Sampling ◽

Linear Time ◽

Time Algorithm ◽

Point Of View ◽

Total Size ◽

Combinatorial Explosion ◽

Analytic Combinatorics ◽

Parallel Processes ◽

Uniform Random Sampling

In this paper, we study the interleaving – or pure merge – operator that most often characterizes parallelism in concurrency theory. This operator is a principal cause of the so-called combinatorial explosion that makes the analysis of process behaviours e.g. by model-checking, very hard – at least from the point of view of computational complexity. The originality of our approach is to study this combinatorial explosion phenomenon on average, relying on advanced analytic combinatorics techniques. We study various measures that contribute to a better understanding of the process behaviours represented as plane rooted trees: the number of runs (corresponding to the width of the trees), the expected total size of the trees as well as their overall shape. Two practical outcomes of our quantitative study are also presented: (1) a linear-time algorithm to compute the probability of a concurrent run prefix, and (2) an efficient algorithm for uniform random sampling of concurrent runs. These provide interesting responses to the combinatorial explosion problem.

Download Full-text

HUBUNGAN ANTARA FLEKSIBILITAS DENGAN PERGELANGAN TANGAN BAGIAN BELAKANG TERHADAP PEMBELAJARAN TENIS MEJA DALAM PROGRAM PENDIDIKAN JASMANI KESEHATAN MAHASISWA DAN REKREASI

Journal Of Sport Education (JOPE) ◽

10.31258/jope.1.1.17-21 ◽

2018 ◽

Vol 1 (1) ◽

pp. 17

Author(s):

Saripin Saripin ◽

Kristi Agust ◽

Rafni Rafni

Keyword(s):

Random Sampling ◽

Trade Off

Berdasarkan pengamatan penulis dalam proses pembelajaran meja Belajar dimana siswa kurang baik kemampuan backhand. Ini terbukti dari backhand siswa yang belum maksimal dalam kembalinya bola, dan dalam melakukan backhand smash. Diduga jumlah faktor yang rendah seperti kondisi fisik: Fleksibilitas pergelangan tangan, kekuatan otot lengan bahu, daya tahan otot, keseimbangan, fleksibilitas punggung, kecepatan reaksi. Instrumen penelitian adalah uji kelenturan tes pergelangan tangan dan backhand, teknik pengambilan sampel porpusive penelitian ini menggunakan random sampling dengan sampel 14 siswa Prodi penjaskerek putra. Penelitian ini merupakan jenis penelitian korelasi. Kemudian, analisis data menggunakan rumus korelasi Product Moment Correlation. Berdasarkan hasil penelitian dan pengolahan data menggunakan prosedur statistik, penelitian menyimpulkan bahwa ada trade-off antara kelenturan pergelangan tangan dengan kemampuan backhand di mana rhitungnya -0058.

Download Full-text

Discovering XML Conditional Dependencies for Data Quality Issues

European Journal of Electrical Engineering and Computer Science ◽

10.24018/ejece.2020.4.1.156 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Mohammed Ragheb Hakawati ◽

Yasmin Yacob ◽

Amiza Amir ◽

Jabiry M. Mohammed ◽

Khalid Jamal Jadaa

Keyword(s):

Data Quality ◽

Primary Standard ◽

Markup Language ◽

Document Type ◽

Data Dependencies ◽

Master Data ◽

Xml Document ◽

Extensible Markup ◽

Quality Issues ◽

Mining Algorithms

Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.

Download Full-text

Mapping Bitemporal XML Data Model to XML Document

Lecture Notes in Computer Science - Computer Supported Cooperative Work in Design IV ◽

10.1007/978-3-540-92719-8_31 ◽

2008 ◽

pp. 342-352 ◽

Cited By ~ 1

Author(s):

Na Tang ◽

Yong Tang

Keyword(s):

Data Model ◽

Xml Data ◽

Xml Document

Download Full-text

Manual Vs. Automated Assessment Of Mean Linear Intercept Using Systematic Uniform Random Sampling Combined With Statistical Parameter Optimization

10.1164/ajrccm-conference.2011.183.1_meetingabstracts.a5214 ◽

2011 ◽

Author(s):

Joseph P. Foley ◽

Seungtaek Lee ◽

Edit Kurali ◽

Charles Kotzer ◽

Brian J. Bolognese ◽

...

Keyword(s):

Parameter Optimization ◽

Random Sampling ◽

Statistical Parameter ◽

Automated Assessment ◽

Mean Linear Intercept ◽

Uniform Random Sampling

Download Full-text

Uniform Random Sampling Product Configurations of Feature Models That Have Numerical Features

Proceedings of the 23rd International Systems and Software Product Line Conference - volume A - SPLC '19 ◽

10.1145/3336294.3336297 ◽

2019 ◽

Cited By ~ 3

Author(s):

Daniel-Jesus Munoz ◽

Jeho Oh ◽

Mónica Pinto ◽

Lidia Fuentes ◽

Don Batory

Keyword(s):

Random Sampling ◽

Feature Models ◽

Uniform Random Sampling

Download Full-text

Immunolocalization of FOXP3, JAK1 and STAT5 in Preeclamptic, Intrauterine Growth Restricted and Gestational Diabetic Human Placentas

Aegean Journal of Obstetrics and Gynecology ◽

10.46328/aejog.v3i3.101 ◽

2021 ◽

Vol 3 (3) ◽

pp. 73-77

Author(s):

Volkan Emirdar ◽

Gulcin Ekizceli ◽

Yagmur Dilber ◽

Sevinc Inan ◽

Muzaffer Sanci

Keyword(s):

T Cells ◽

Random Sampling ◽

Group Discussion ◽

Placental Tissue ◽

Intrauterine Growth ◽

Histological Sections ◽

Uniform Random Sampling ◽

Stat Pathway

Objective: The aim of the study to show the relation of T cells in placental villous fragments with FOXP3,JAK1 and STAT5 receptors in different conditions such as GDM, PE and IUGR placental tissues. Methods: Specimens of ten(10) diabetic placentas, ten(10) preeclamptic, ten(10) intrauterine growth restricted placentas and ten(10) control placentas were collected by systematic uniform random sampling. Immunohistochemical detections of FOXP3, JAK1 and STAT5 were performed in histological sections for each group’s placental tissue. The H-score value was derived for each specimen by calculating the sum of the percentage of syncytiotrophoblast and syncytial nodes in placenta and intervillus area. They were categorized by intensity of staining, multiplied by its respective score. Results: FOXP3, JAK1 and STAT5 immunoreactivity comparisons are shown in four groups of placentas. FOXP3 immunoreactions significantly increase in GDM group. JAK1 and STAT5 immunoreactions significantly decrease in PE group. STAT5 immunoreactivity was detected crucially increase in GDM group. Discussion: The results showed that in different conditions such as PE,GDM and IUGR, T cells in placental villous fragments have relation with FOXP3,JAK1 and STAT5 receptors and that FOXP3 can inactivate the PE and IUGR in the placental tissue. We have also confirmed as other studies that JAK-STAT pathway plays important role in PE,IUGR and GDM placental tissue.

Download Full-text