Information Processing Based on Mixed - Classical and Fuzzy - Data Models

Author(s):  
Orsolya Takács ◽  
◽  
Annamária R. Várkonyi-Kóczy

The model used to represent information during information processing could affect achievable accuracy and could determine the usability of different calculation methods. The data model must also be able to represent uncertainty and inaccuracy both of input data and results. The two most popular data models for representation of uncertain data is the "classical", probability based, and the recently introduced fuzzy data models. Both data models have their own calculation and data processing methods, but with the increasing complexity of calculation problems, a method for the mixed use of these data models is be needed. This paper deals with possible solutions for information processing based on mixed data models and examines the different conversion methods between fuzzy and probability theory based data models.

2014 ◽  
Vol 2014 ◽  
pp. 1-9
Author(s):  
Julie Yu-Chih Liu

Functional dependency is the basis of database normalization. Various types of fuzzy functional dependencies have been proposed for fuzzy relational database and applied to the process of database normalization. However, the problem of achieving lossless join decomposition occurs when employing the fuzzy functional dependencies to database normalization in an extended possibility-based fuzzy data models. To resolve the problem, this study defined fuzzy functional dependency based on a notion of approximate equality for extended possibility-based fuzzy relational databases. Examples show that the notion is more applicable than other similarity concept to the research related to the extended possibility-based data model. We provide a decomposition method of using the proposed fuzzy functional dependency for database normalization and prove the lossless join property of the decomposition method.


Data Mining ◽  
2013 ◽  
pp. 669-691 ◽  
Author(s):  
Evgeny Kharlamov ◽  
Pierre Senellart

This chapter deals with data mining in uncertain XML data models, whose uncertainty typically comes from imprecise automatic processes. We first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semistructured counterparts. We focus on a specific probabilistic XML model, which allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. We summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.


Author(s):  
Evgeny Kharlamov ◽  
Pierre Senellart

This chapter deals with data mining in uncertain XML data models, whose uncertainty typically comes from imprecise automatic processes. We first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semistructured counterparts. We focus on a specific probabilistic XML model, which allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. We summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.


2021 ◽  
pp. 1-25
Author(s):  
Yu-Chin Hsu ◽  
Ji-Liang Shiu

Under a Mundlak-type correlated random effect (CRE) specification, we first show that the average likelihood of a parametric nonlinear panel data model is the convolution of the conditional distribution of the model and the distribution of the unobserved heterogeneity. Hence, the distribution of the unobserved heterogeneity can be recovered by means of a Fourier transformation without imposing a distributional assumption on the CRE specification. We subsequently construct a semiparametric family of average likelihood functions of observables by combining the conditional distribution of the model and the recovered distribution of the unobserved heterogeneity, and show that the parameters in the nonlinear panel data model and in the CRE specification are identifiable. Based on the identification result, we propose a sieve maximum likelihood estimator. Compared with the conventional parametric CRE approaches, the advantage of our method is that it is not subject to misspecification on the distribution of the CRE. Furthermore, we show that the average partial effects are identifiable and extend our results to dynamic nonlinear panel data models.


2021 ◽  
Author(s):  
Matthias Held ◽  
Grit Laudel ◽  
Jochen Gläser

AbstractIn this paper we utilize an opportunity to construct ground truths for topics in the field of atomic, molecular and optical physics. Our research questions in this paper focus on (i) how to construct a ground truth for topics and (ii) the suitability of common algorithms applied to bibliometric networks to reconstruct these topics. We use the ground truths to test two data models (direct citation and bibliographic coupling) with two algorithms (the Leiden algorithm and the Infomap algorithm). Our results are discomforting: none of the four combinations leads to a consistent reconstruction of the ground truths. No combination of data model and algorithm simultaneously reconstructs all micro-level topics at any resolution level. Meso-level topics are not reconstructed at all. This suggests (a) that we are currently unable to predict which combination of data model, algorithm and parameter setting will adequately reconstruct which (types of) topics, and (b) that a combination of several data models, algorithms and parameter settings appears to be necessary to reconstruct all or most topics in a set of papers.


2010 ◽  
Vol 38 (1) ◽  
pp. 1-39 ◽  
Author(s):  
Bin Jiang ◽  
Jian Pei ◽  
Xuemin Lin ◽  
Yidong Yuan
Keyword(s):  

2021 ◽  
Vol 3 (1) ◽  
pp. 1-13
Author(s):  
Muhammad Anus Hayat Khan ◽  
Ijaz Hussain

Each year more than three thousand people die and get serious injuries in traffic accidents. Count data model provide more precise tools for planners and decision makers to conduct proactive road safety planning.We analyzed the exploratory research of Road Traffic Accidents (RTAs) and furthermore explores the factors affecting the RTAs frequency in 36 districts of the Punjab over a time period of three years (July 1, 2013 June 30, 2016) with monthly data using panel count data models. Among the models considered, the random parameters Poisson panel count data model is found to fit the data best. The exploratory analysis shows that highly dense populated districts with large number of registered vehicles causes more accidents as compared to low density populated districts. It is found that, most of the variables used to control the variation in the frequency of RTAs counts play vital role with higher significance levels. The application of regression analysis and modeling of RTAs at district level in Punjab will help to identification of districts with high RTAs rates and this could help more efficient road safety management in the Punjab.


2016 ◽  
Vol 23 (3) ◽  
pp. 178-182
Author(s):  
Andrzej Zygmuniak ◽  
Violetta Sokoła-Szewioła

Abstract This study is aimed at exposing differences between two data models in case of code lists values provided there. The first of them is an obligatory one for managing Geodesic Register of Utility Networks databases in Poland [9] and the second is the model originating from the Technical Guidelines issued to the INSPIRE Directive. Since the second one mentioned is the basis for managing spatial databases among European parties, correlating these two data models has an effect in easing the way of harmonizing and, in consequence, exchanging spatial data. Therefore, the study presents the possibilities of increasing compatibility between the values of the code lists concerning attributes for objects provided in both models. In practice, it could lead to an increase of the competitiveness of entities managing or processing such databases and to greater involvement in scientific or research projects when it comes to the mining industry. Moreover, since utility networks located on mining areas are under particular protection, the ability of making them more fitted to their own needs will make it possible for mining plants to exchange spatial data in a more efficient way.


Author(s):  
Etienne E. Kerre ◽  
Guoqing Chen
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document