Entity Resolution on Graph Data Set

In this chapter, the authors study entity resolution on graph data set. In order to conduct entity resolution on graph data, the authors need to define the distance of graph. The authors compute these distances or approximately compute them for time efficiency. At last, the authors utilize the distances to get the final result of entity resolution. The approximate graph matching algorithms may be index-based like the NH-Index method or kernel function based like G-hash method. Other methods concentrate on providing new definitions of similar graph that are easier to compute than traditional methods, like the Web-collection method and the Grafil method. To increase the resolution ability of traditional methods, researchers provide some methods to recognize similar graphs, like graph-bounded simulation and p-homomorphism. Section 8.1 introduces existing methods on defining the distance of graph, which has a direct impact on the computation of graph similarity. Section 8.1 introduces pair-wise entity resolution on graph data set, including index techniques, graph-bounded simulation, and graph p-homomorphism.

Author(s):  
Guangqing Hu ◽  
Guijian Liu ◽  
Dun Wu ◽  
Wenyong Zhang ◽  
Biao Fu

AbstractBased on analysis of a large data set and supplementary sampling and analysis for hazardous trace elements in coal samples from the Huainan Coalfield, a generalized contrast-weighted scale index method was used to establish a model to evaluate the grade of coal cleanliness and its regional distribution in the main coal seam (No. 13-1) The results showed that: (1) The contents of Cr, Mn and Ni in the coal seam are relatively high and the average values are greater than 20 μg/g. The contents of Se and Hg are at a high level while most other trace elements are at normal levels. (2) The cleanliness grade of the coal seam is mainly grade III–IV, which corresponds to a relatively good-medium coal cleanliness grade. However, some parts of the seam are at grade V (relatively poor coal cleanliness). (3) Coal of relatively good cleanliness grade (grade III) is distributed mainly in the regions corresponding to the Zhuji-Dingji-Gubei coal mines and in the eastern periphery of the Panji coal mine. Coal of medium cleanliness (grade IV) is distributed mainly in the regions of the Panji-Xiejiaji and Kouzidong coalmines. Relatively poor grade coal (grade V) is distributed in the southwest regions of the coalfield and the contents of Cr, As and Hg in coal collected from the relatively poor coal cleanliness regions often exceed the regulatory standards for the maximum concentration limits.


2021 ◽  
Author(s):  
Bo Galle ◽  

<p>We present a detailed global data-set of volcanic sulphur dioxide (SO2) emissions during the period 2005-2017. Measurements were obtained by scanning-DOAS instruments of the NOVAC network at 32 volcanoes, and processed using a standardized procedure. We reveal the daily statistics of volcanic gas emissions under a variety of volcanological and meteorological conditions. Data from several volcanoes are presented for the first time. Our results  are compared with yearly averages derived from measurements from space by the Aura/OMI instrument and with historical inventories of GEIA. This comparison shows some interesting differences which reasons are briefly discussed. Data is openly available through the web repository at https://novac.chalmers.se/.</p>


Author(s):  
Antonio F. L. Jacob ◽  
Eulália C. da Mata ◽  
Ádamo L. Santana ◽  
Carlos R. L. Francês ◽  
João C. W. A. Costa ◽  
...  

The Web is providing greater freedom for users to create and obtain information in a more dynamic and appropriate way. One means of obtaining information on this platform, which complements or replaces other forms, is the use of conversation robots or Chatterbots. Several factors must be taken into account for the effective use of this technology; the first of which is the need to employ a team of professionals from various fields to build the knowledge base of the system and be provided with a wide range of responses, i.e. interactions. It is a multidisciplinary task to ensure that the use of this system can be targeted to children. In this context, this chapter carries out a study of the technology of Chatterbots and shows some of the changes that have been implemented for the effective use of this technology for children. It also highlights the need for a shift away from traditional methods of interaction so that an affective computing model can be implemented.


Author(s):  
Wei Shen ◽  
Jianyong Wang ◽  
Ping Luo ◽  
Min Wang

Relation extraction from the Web data has attracted a lot of attention recently. However, little work has been done when it comes to the enterprise data regardless of the urgent needs to such work in real applications (e.g., E-discovery). One distinct characteristic of the enterprise data (in comparison with the Web data) is its low redundancy. Previous work on relation extraction from the Web data largely relies on the data's high redundancy level and thus cannot be applied to the enterprise data effectively. This chapter reviews related work on relation extraction and introduces an unsupervised hybrid framework REACTOR for semantic relation extraction over enterprise data. REACTOR combines a statistical method, classification, and clustering to identify various types of relations among entities appearing in the enterprise data automatically. REACTOR was evaluated over a real-world enterprise data set from HP that contains over three million pages and the experimental results show its effectiveness.


Author(s):  
Simon Giesecke ◽  
Gerriet Reents

In this chapter, we present the Web-based carpooling system ORISS, which was initially developed by a student project group at University of Oldenburg. It is currently being deployed at Carl von Ossietzky University of Oldenburg with support of the DBU (Federal German Foundation for the Environment). We describe the role of carpools in traffic, particularly in commuter traffic, and show perspectives of an increased usage of carpools. A significant impact on the eco-balance of the university can be expected. We explain how Internet technologies and geographic information systems can be used for the arrangement of carpools, and show advantages over traditional methods of carpooling. The concrete architecture of ORISS and the algorithms used are outlined. We conclude the chapter by describing the circumstances of deployment and propose possible future extensions of the system.


Author(s):  
Nanda Kumar

This chapter reviews the different types of personalization systems commonly employed by Web sites and argues that their deployment as Web site interface design decisions may have as big an impact as the personalization systems themselves. To accomplish this, this chapter makes a case for treating Human-Computer Interaction (HCI) issues seriously. It also argues that Web site interface design decisions made by organizations, such as the type and level of personalization employed by a Web site, have a direct impact on the communication capability of that Web site. This chapter also explores the impact of the deployment of personalization systems on users’ loyalty towards the Web site, thus underscoring the practical relevance of these design decisions.


Author(s):  
Claudia Plant ◽  
Christian Böhm

Clustering or finding a natural grouping of a data set is essential for knowledge discovery in many applications. This chapter provides an overview on emerging trends within the vital research area of clustering including subspace and projected clustering, correlation clustering, semi-supervised clustering, spectral clustering and parameter-free clustering. To raise the awareness of the reader for the challenges associated with clustering, the chapter first provides a general problem specification and introduces basic clustering paradigms. The requirements from concrete example applications in life sciences and the web provide the motivation for the discussion of novel approaches to clustering. Thus, this chapter is intended to appeal to all those interested in the state-of-the art in clustering including basic researchers as well as practitioners.


2009 ◽  
pp. 212-219
Author(s):  
Nanda Kumar

This chapter reviews the different types of personalization systems commonly employed by Web sites and argues that their deployment as Web site interface design decisions may have as big an impact as the personalization systems themselves. To accomplish this, this chapter makes a case for treating Human-Computer Interaction (HCI) issues seriously. It also argues that Web site interface design decisions made by organizations, such as the type and level of personalization employed by a Web site, have a direct impact on the communication capability of that Web site. This chapter also explores the impact of the deployment of personalization systems on users’ loyalty towards the Web site, thus underscoring the practical relevance of these design decisions.


2014 ◽  
Vol 31 (8) ◽  
pp. 1778-1789
Author(s):  
Hongkang Lin

Purpose – The clustering/classification method proposed in this study, designated as the PFV-index method, provides the means to solve the following problems for a data set characterized by imprecision and uncertainty: first, discretizing the continuous values of all the individual attributes within a data set; second, evaluating the optimality of the discretization results; third, determining the optimal number of clusters per attribute; and fourth, improving the classification accuracy (CA) of data sets characterized by uncertainty. The paper aims to discuss these issues. Design/methodology/approach – The proposed method for the solution of the clustering/classifying problem, designated as PFV-index method, combines a particle swarm optimization algorithm, fuzzy C-means method, variable precision rough sets theory, and a new cluster validity index function. Findings – This method could cluster the values of the individual attributes within the data set and achieves both the optimal number of clusters and the optimal CA. Originality/value – The validity of the proposed approach is investigated by comparing the classification results obtained for UCI data sets with those obtained by supervised classification BPNN, decision-tree methods.


2011 ◽  
Vol 2011 ◽  
pp. 1-21 ◽  
Author(s):  
G. Rossmanith ◽  
H. Modest ◽  
C. Räth ◽  
A. J. Banday ◽  
K. M. Górski ◽  
...  

In the recent years, non-Gaussianity and statistical isotropy of the Cosmic Microwave Background (CMB) was investigated with various statistical measures, first and foremost by means of the measurements of the WMAP satellite. In this paper, we focus on the analyses that were accomplished with a measure of local type, the so-calledScaling Index Method(SIM). The SIM is able to detect structural characteristics of a given data set and has proven to be highly valuable in CMB analysis. It was used for comparing the data set with simulations as well as surrogates, which are full-sky maps generated by randomisation of previously selected features of the original map. During these investigations, strong evidence for non-Gaussianities as well as asymmetries and local features could be detected. In combination with the surrogates approach, the SIM detected the highest significances for non-Gaussianity to date.


Sign in / Sign up

Export Citation Format

Share Document