Modeling, Querying, and Mining Uncertain XML Data

This chapter deals with data mining in uncertain XML data models, whose uncertainty typically comes from imprecise automatic processes. We first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semistructured counterparts. We focus on a specific probabilistic XML model, which allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. We summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.

Download Full-text

A Study of XML Models for Data Mining

Data Mining ◽

10.4018/978-1-4666-2455-9.ch001 ◽

2013 ◽

pp. 1-27

Author(s):

Sangeetha Kutty ◽

Richi Nayak ◽

Tien Tran

Keyword(s):

Data Mining ◽

Data Model ◽

Data Representation ◽

Data Models ◽

Data Mining Techniques ◽

Xml Documents ◽

The Future ◽

Future Data

With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques can be used to derive this interesting information. However, mining of XML documents is impacted by the data model used in data representation due to the semi-structured nature of these documents. In this chapter, we present an overview of the various models of XML documents representations, how these models are used for mining, and some of the issues and challenges inherent in these models. In addition, this chapter also provides some insights into the future data models of XML documents for effectively capturing its two important features, structure and content, for mining.

Download Full-text

A Study of XML Models for Data Mining

Advances in Data Mining and Database Management - XML Data Mining ◽

10.4018/978-1-61350-356-0.ch001 ◽

2011 ◽

pp. 1-28

Author(s):

Sangeetha Kutty ◽

Richi Nayak ◽

Tien Tran

Keyword(s):

Data Mining ◽

Data Model ◽

Data Representation ◽

Data Models ◽

Data Mining Techniques ◽

Xml Documents ◽

The Future ◽

Future Data

With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques can be used to derive this interesting information. However, mining of XML documents is impacted by the data model used in data representation due to the semi-structured nature of these documents. In this chapter, we present an overview of the various models of XML documents representations, how these models are used for mining, and some of the issues and challenges inherent in these models. In addition, this chapter also provides some insights into the future data models of XML documents for effectively capturing its two important features, structure and content, for mining.

Download Full-text

On the Connections between Relational and XML Probabilistic Data Models

10.31219/osf.io/t6ghw ◽

2017 ◽

Author(s):

Antoine Amarilli ◽

Pierre Senellart

Keyword(s):

Relational Databases ◽

Possible Worlds ◽

Probability Distributions ◽

Uncertain Data ◽

Data Models ◽

Query Complexity ◽

Probabilistic Data ◽

Probabilistic Xml ◽

Compact Representations ◽

The Impact

A number of uncertain data models have been proposed,based on the notion of compact representations of probability distributionsover possible worlds. In probabilistic relational models, tuples areannotated with probabilities or formulae over Boolean random variables.In probabilistic XML models, XML trees are augmented with nodesthat specify probability distributions over their children. Both kinds ofmodels have been extensively studied, with respect to their expressivepower, compactness, and query efficiency, among other things. Probabilisticdatabase systems have also been implemented, in both relationaland XML settings. However, these studies have mostly been carried outindependently and the translations between relational and XML models,as well as the impact for probabilistic relational databases of resultsabout query complexity in probabilistic XML and vice versa, have notbeen made explicit: we detail such translations in this article, in bothdirections, study their impact in terms of complexity results, and presentinteresting open issues about the connections between relational andXML probabilistic data models.

Download Full-text

TOWARDS THE INTEGRATION OF INDOORGML AND INDOORLOCATIONGML FOR INDOOR APPLICATIONS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-2-w4-343-2017 ◽

2017 ◽

Vol IV-2/W4 ◽

pp. 343-348 ◽

Cited By ~ 3

Author(s):

L. Liu ◽

S. Zlatanova ◽

Q. Zhu ◽

K. Li

Keyword(s):

Data Model ◽

Data Models ◽

Location Based Services ◽

Use Cases ◽

Data Standards ◽

Indoor Location ◽

Automatic Integration ◽

First Case ◽

Future Work ◽

Further Development

This paper introduces and compares two types of GML-based data standards for indoor location-based services, i.e., IndoorGML and IndoorLocationGML. By elaborating the advantages of the both standards and their data models, we conclude that the two data standards are complementary to each other. A jointed data model is presented to show the integration of the two standards. IndoorGML can supply subdivision of building for data of IndoorLocationGML, and the semantics of locations defined in IndoorLocationGML can be added to IndoorGML. By proposing two use cases, we take the initiative in attempting to combine the use of the two standards. The first case is to collect details from files of the two standards for an indoor path; the second one is to generate verbal directions for indoor guidance from files of the two standards. Some future work is given for further development, such as automatic integration of separate data from both standards.

Download Full-text

Mining Association Rules from XML Data

Data Mining and Knowledge Discovery Technologies ◽

10.4018/978-1-59904-960-1.ch003 ◽

2008 ◽

pp. 59-71 ◽

Cited By ~ 1

Author(s):

Qin Ding ◽

Gnanasekaran Sundarraj

Keyword(s):

Data Mining ◽

World Wide Web ◽

Association Rules ◽

World Wide ◽

Structured Data ◽

Rule Mining ◽

Xml Data ◽

Xml Documents ◽

The World ◽

Tools And Techniques

With the growing usage of XML in the World Wide Web and elsewhere as a standard for the exchange of data and to represent semi-structured data, there is an imminent need for tools and techniques to perform data mining on XML documents and XML repositories. In this chapter, we propose a framework for association rule mining on XML data. We present a Java-based implementation of the Apriori and the FP-Growth algorithms for this task and compare their performances. We also compare the performance of our implementation with an XQuery-based implementation.

Download Full-text

Information Processing Based on Mixed - Classical and Fuzzy - Data Models

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2001.p0044 ◽

2001 ◽

Vol 5 (1) ◽

pp. 44-50

Author(s):

Orsolya Takács ◽

◽

Annamária R. Várkonyi-Kóczy

Keyword(s):

Information Processing ◽

Data Model ◽

Uncertain Data ◽

Data Models ◽

Mixed Data ◽

Fuzzy Data ◽

Classical Probability ◽

Mixed Use ◽

Achievable Accuracy ◽

Data Processing Methods

The model used to represent information during information processing could affect achievable accuracy and could determine the usability of different calculation methods. The data model must also be able to represent uncertainty and inaccuracy both of input data and results. The two most popular data models for representation of uncertain data is the "classical", probability based, and the recently introduced fuzzy data models. Both data models have their own calculation and data processing methods, but with the increasing complexity of calculation problems, a method for the mixed use of these data models is be needed. This paper deals with possible solutions for information processing based on mixed data models and examines the different conversion methods between fuzzy and probability theory based data models.

Download Full-text

NONLINEAR PANEL DATA MODELS WITH DISTRIBUTION-FREE CORRELATED RANDOM EFFECTS

Econometric Theory ◽

10.1017/s0266466620000481 ◽

2021 ◽

pp. 1-25

Author(s):

Yu-Chin Hsu ◽

Ji-Liang Shiu

Keyword(s):

Panel Data ◽

Data Model ◽

Conditional Distribution ◽

Unobserved Heterogeneity ◽

Random Effect ◽

Data Models ◽

Panel Data Model ◽

Panel Data Models ◽

Likelihood Functions ◽

Correlated Random Effects

Under a Mundlak-type correlated random effect (CRE) specification, we first show that the average likelihood of a parametric nonlinear panel data model is the convolution of the conditional distribution of the model and the distribution of the unobserved heterogeneity. Hence, the distribution of the unobserved heterogeneity can be recovered by means of a Fourier transformation without imposing a distributional assumption on the CRE specification. We subsequently construct a semiparametric family of average likelihood functions of observables by combining the conditional distribution of the model and the recovered distribution of the unobserved heterogeneity, and show that the parameters in the nonlinear panel data model and in the CRE specification are identifiable. Based on the identification result, we propose a sieve maximum likelihood estimator. Compared with the conventional parametric CRE approaches, the advantage of our method is that it is not subject to misspecification on the distribution of the CRE. Furthermore, we show that the average partial effects are identifiable and extend our results to dynamic nonlinear panel data models.

Download Full-text

Challenges to the validity of topic reconstruction

Scientometrics ◽

10.1007/s11192-021-03920-3 ◽

2021 ◽

Author(s):

Matthias Held ◽

Grit Laudel ◽

Jochen Gläser

Keyword(s):

Data Model ◽

Ground Truth ◽

Data Models ◽

Bibliographic Coupling ◽

Parameter Setting ◽

Meso Level ◽

Resolution Level ◽

Micro Level ◽

Research Questions ◽

Model Algorithm

AbstractIn this paper we utilize an opportunity to construct ground truths for topics in the field of atomic, molecular and optical physics. Our research questions in this paper focus on (i) how to construct a ground truth for topics and (ii) the suitability of common algorithms applied to bibliometric networks to reconstruct these topics. We use the ground truths to test two data models (direct citation and bibliographic coupling) with two algorithms (the Leiden algorithm and the Infomap algorithm). Our results are discomforting: none of the four combinations leads to a consistent reconstruction of the ground truths. No combination of data model and algorithm simultaneously reconstructs all micro-level topics at any resolution level. Meso-level topics are not reconstructed at all. This suggests (a) that we are currently unable to predict which combination of data model, algorithm and parameter setting will adequately reconstruct which (types of) topics, and (b) that a combination of several data models, algorithms and parameter settings appears to be necessary to reconstruct all or most topics in a set of papers.

Download Full-text

Probabilistic XML data exchange: An algorithm for materializing probabilistic solutions

2010 IEEE International Conference on Progress in Informatics and Computing ◽

10.1109/pic.2010.5687857 ◽

2010 ◽

Author(s):

Haitao Ma ◽

Changyong Yu ◽

Miao Fang

Keyword(s):

Data Exchange ◽

Xml Data ◽

Probabilistic Xml

Download Full-text