Entity Resolution on Graph Data Set

Advances in Data Mining and Database Management - Innovative Techniques and Applications of Entity Resolution ◽

10.4018/978-1-4666-5198-2.ch008 ◽

2014 ◽

pp. 171-194

Keyword(s):

Kernel Function ◽

Graph Matching ◽

Entity Resolution ◽

Direct Impact ◽

Traditional Methods ◽

Index Method ◽

Data Set ◽

Collection Method ◽

Graph Data ◽

The Web

In this chapter, the authors study entity resolution on graph data set. In order to conduct entity resolution on graph data, the authors need to define the distance of graph. The authors compute these distances or approximately compute them for time efficiency. At last, the authors utilize the distances to get the final result of entity resolution. The approximate graph matching algorithms may be index-based like the NH-Index method or kernel function based like G-hash method. Other methods concentrate on providing new definitions of similar graph that are easier to compute than traditional methods, like the Web-collection method and the Grafil method. To increase the resolution ability of traditional methods, researchers provide some methods to recognize similar graphs, like graph-bounded simulation and p-homomorphism. Section 8.1 introduces existing methods on defining the distance of graph, which has a direct impact on the computation of graph similarity. Section 8.1 introduces pair-wise entity resolution on graph data set, including index techniques, graph-bounded simulation, and graph p-homomorphism.

Download Full-text

Method for evaluation of the cleanliness grade of coal resources in the Huainan Coalfield, Anhui, China: a case study

International Journal of Coal Science & Technology ◽

10.1007/s40789-020-00400-6 ◽

2021 ◽

Author(s):

Guangqing Hu ◽

Guijian Liu ◽

Dun Wu ◽

Wenyong Zhang ◽

Biao Fu

Keyword(s):

Trace Elements ◽

Coal Seam ◽

Regional Distribution ◽

Large Data ◽

Index Method ◽

Data Set ◽

Poor Grade ◽

Huainan Coalfield ◽

Coal Grade ◽

Grade Iii

AbstractBased on analysis of a large data set and supplementary sampling and analysis for hazardous trace elements in coal samples from the Huainan Coalfield, a generalized contrast-weighted scale index method was used to establish a model to evaluate the grade of coal cleanliness and its regional distribution in the main coal seam (No. 13-1) The results showed that: (1) The contents of Cr, Mn and Ni in the coal seam are relatively high and the average values are greater than 20 μg/g. The contents of Se and Hg are at a high level while most other trace elements are at normal levels. (2) The cleanliness grade of the coal seam is mainly grade III–IV, which corresponds to a relatively good-medium coal cleanliness grade. However, some parts of the seam are at grade V (relatively poor coal cleanliness). (3) Coal of relatively good cleanliness grade (grade III) is distributed mainly in the regions corresponding to the Zhuji-Dingji-Gubei coal mines and in the eastern periphery of the Panji coal mine. Coal of medium cleanliness (grade IV) is distributed mainly in the regions of the Panji-Xiejiaji and Kouzidong coalmines. Relatively poor grade coal (grade V) is distributed in the southwest regions of the coalfield and the contents of Cr, As and Hg in coal collected from the relatively poor coal cleanliness regions often exceed the regulatory standards for the maximum concentration limits.

Download Full-text

The NOVAC database of volcanic SO2 emissions

10.5194/egusphere-egu21-7577 ◽

2021 ◽

Author(s):

Bo Galle ◽

Keyword(s):

Sulphur Dioxide ◽

Meteorological Conditions ◽

So2 Emissions ◽

Volcanic Gas ◽

Data Set ◽

Gas Emissions ◽

First Time ◽

Global Data ◽

Web Repository ◽

The Web

<p>We present a detailed global data-set of volcanic sulphur dioxide (SO2) emissions during the period 2005-2017. Measurements were obtained by scanning-DOAS instruments of the NOVAC network at 32 volcanoes, and processed using a standardized procedure. We reveal the daily statistics of volcanic gas emissions under a variety of volcanological and meteorological conditions. Data from several volcanoes are presented for the first time. Our results&#160; are compared with yearly averages derived from measurements from space by the Aura/OMI instrument and with historical inventories of GEIA. This comparison shows some interesting differences which reasons are briefly discussed. Data is openly available through the web repository at https://novac.chalmers.se/.</p>

Download Full-text

Adapting Chatterbots’ Interaction for Use in Children’s Education

Emerging Research and Trends in Interactivity and the Human-Computer Interface ◽

10.4018/978-1-4666-4623-0.ch021 ◽

2014 ◽

pp. 413-428

Author(s):

Antonio F. L. Jacob ◽

Eulália C. da Mata ◽

Ádamo L. Santana ◽

Carlos R. L. Francês ◽

João C. W. A. Costa ◽

...

Keyword(s):

Knowledge Base ◽

Affective Computing ◽

Traditional Methods ◽

Children's Education ◽

Computing Model ◽

Wide Range ◽

Children’S Education ◽

Effective Use ◽

The Web

The Web is providing greater freedom for users to create and obtain information in a more dynamic and appropriate way. One means of obtaining information on this platform, which complements or replaces other forms, is the use of conversation robots or Chatterbots. Several factors must be taken into account for the effective use of this technology; the first of which is the need to employ a team of professionals from various fields to build the knowledge base of the system and be provided with a wide range of responses, i.e. interactions. It is a multidisciplinary task to ensure that the use of this system can be targeted to children. In this context, this chapter carries out a study of the technology of Chatterbots and shows some of the changes that have been implemented for the effective use of this technology for children. It also highlights the need for a shift away from traditional methods of interaction so that an affective computing model can be implemented.

Download Full-text

On Semantic Relation Extraction Over Enterprise Data

Innovations, Developments, and Applications of Semantic Web and Information Systems - Advances in Web Technologies and Engineering ◽

10.4018/978-1-5225-5042-6.ch003 ◽

2018 ◽

pp. 62-84 ◽

Cited By ~ 2

Author(s):

Wei Shen ◽

Jianyong Wang ◽

Ping Luo ◽

Min Wang

Keyword(s):

Statistical Method ◽

Real World ◽

Relation Extraction ◽

Semantic Relation ◽

Experimental Results ◽

Web Data ◽

Data Set ◽

Redundancy Level ◽

Hybrid Framework ◽

The Web

Relation extraction from the Web data has attracted a lot of attention recently. However, little work has been done when it comes to the enterprise data regardless of the urgent needs to such work in real applications (e.g., E-discovery). One distinct characteristic of the enterprise data (in comparison with the Web data) is its low redundancy. Previous work on relation extraction from the Web data largely relies on the data's high redundancy level and thus cannot be applied to the enterprise data effectively. This chapter reviews related work on relation extraction and introduces an unsupervised hybrid framework REACTOR for semantic relation extraction over enterprise data. REACTOR combines a statistical method, classification, and clustering to identify various types of relations among entities appearing in the enterprise data automatically. REACTOR was evaluated over a real-world enterprise data set from HP that contains over three million pages and the experimental results show its effectiveness.

Download Full-text

ORISS

Information Systems for Sustainable Development ◽

10.4018/978-1-59140-342-5.ch016 ◽

2005 ◽

pp. 260-276 ◽

Cited By ~ 1

Author(s):

Simon Giesecke ◽

Gerriet Reents

Keyword(s):

Information Systems ◽

Geographic Information Systems ◽

Geographic Information ◽

Traditional Methods ◽

Web Based ◽

Internet Technologies ◽

Project Group ◽

The University ◽

The Web

In this chapter, we present the Web-based carpooling system ORISS, which was initially developed by a student project group at University of Oldenburg. It is currently being deployed at Carl von Ossietzky University of Oldenburg with support of the DBU (Federal German Foundation for the Environment). We describe the role of carpools in traffic, particularly in commuter traffic, and show perspectives of an increased usage of carpools. A significant impact on the eco-balance of the university can be expected. We explain how Internet technologies and geographic information systems can be used for the arrangement of carpools, and show advantages over traditional methods of carpooling. The concrete architecture of ORISS and the algorithms used are outlined. We conclude the chapter by describing the circumstances of deployment and propose possible future extensions of the system.

Download Full-text

Personalization Systems and Their Deployment as Web Site Interface Design

Web Systems Design and Online Consumer Behavior ◽

10.4018/978-1-59140-327-2.ch008 ◽

2011 ◽

pp. 147-155

Author(s):

Nanda Kumar

Keyword(s):

Human Computer Interaction ◽

Web Sites ◽

Interface Design ◽

Web Site ◽

Practical Relevance ◽

Direct Impact ◽

Design Decisions ◽

Communication Capability ◽

The Impact ◽

The Web

This chapter reviews the different types of personalization systems commonly employed by Web sites and argues that their deployment as Web site interface design decisions may have as big an impact as the personalization systems themselves. To accomplish this, this chapter makes a case for treating Human-Computer Interaction (HCI) issues seriously. It also argues that Web site interface design decisions made by organizations, such as the type and level of personalization employed by a Web site, have a direct impact on the communication capability of that Web site. This chapter also explores the impact of the deployment of personalization systems on users’ loyalty towards the Web site, thus underscoring the practical relevance of these design decisions.

Download Full-text

Novel Trends in Clustering

Evolving Application Domains of Data Warehousing and Mining ◽

10.4018/978-1-60566-816-1.ch009 ◽

2010 ◽

pp. 185-211 ◽

Cited By ~ 1

Author(s):

Claudia Plant ◽

Christian Böhm

Keyword(s):

Knowledge Discovery ◽

General Problem ◽

Life Sciences ◽

Research Area ◽

Data Set ◽

Correlation Clustering ◽

Projected Clustering ◽

Emerging Trends ◽

Novel Approaches ◽

The Web

Clustering or finding a natural grouping of a data set is essential for knowledge discovery in many applications. This chapter provides an overview on emerging trends within the vital research area of clustering including subspace and projected clustering, correlation clustering, semi-supervised clustering, spectral clustering and parameter-free clustering. To raise the awareness of the reader for the challenges associated with clustering, the chapter first provides a general problem specification and introduces basic clustering paradigms. The requirements from concrete example applications in life sciences and the web provide the motivation for the discussion of novel approaches to clustering. Thus, this chapter is intended to appeal to all those interested in the state-of-the art in clustering including basic researchers as well as practitioners.

Download Full-text

Personalization Systems and Their Deployment as Web Site Interface Design Decisions

Human Computer Interaction ◽

10.4018/978-1-87828-991-9.ch016 ◽

2009 ◽

pp. 212-219

Author(s):

Nanda Kumar

Keyword(s):

Human Computer Interaction ◽

Web Sites ◽

Interface Design ◽

Web Site ◽

Practical Relevance ◽

Direct Impact ◽

Design Decisions ◽

Communication Capability ◽

The Impact ◽

The Web

Download Full-text

A classification approach based on variable precision rough sets and cluster validity index function

Engineering Computations ◽

10.1108/ec-11-2012-0297 ◽

2014 ◽

Vol 31 (8) ◽

pp. 1778-1789

Author(s):

Hongkang Lin

Keyword(s):

Optimal Number ◽

Data Sets ◽

Cluster Validity ◽

Cluster Validity Index ◽

Index Method ◽

Data Set ◽

Content Type ◽

The Individual ◽

Variable Precision Rough Sets ◽

Optimal Number Of Clusters

Purpose – The clustering/classification method proposed in this study, designated as the PFV-index method, provides the means to solve the following problems for a data set characterized by imprecision and uncertainty: first, discretizing the continuous values of all the individual attributes within a data set; second, evaluating the optimality of the discretization results; third, determining the optimal number of clusters per attribute; and fourth, improving the classification accuracy (CA) of data sets characterized by uncertainty. The paper aims to discuss these issues. Design/methodology/approach – The proposed method for the solution of the clustering/classifying problem, designated as PFV-index method, combines a particle swarm optimization algorithm, fuzzy C-means method, variable precision rough sets theory, and a new cluster validity index function. Findings – This method could cluster the values of the individual attributes within the data set and achieves both the optimal number of clusters and the optimal CA. Originality/value – The validity of the proposed approach is investigated by comparing the classification results obtained for UCI data sets with those obtained by supervised classification BPNN, decision-tree methods.

Download Full-text

Search for Non-Gaussianities in the WMAP Data with the Scaling Index Method

Advances in Astronomy ◽

10.1155/2011/174873 ◽

2011 ◽

Vol 2011 ◽

pp. 1-21 ◽

Cited By ~ 5

Author(s):

G. Rossmanith ◽

H. Modest ◽

C. Räth ◽

A. J. Banday ◽

K. M. Górski ◽

...

Keyword(s):

Structural Characteristics ◽

Microwave Background ◽

Index Method ◽

Data Set ◽

Local Type ◽

Statistical Measures ◽

Wmap Data ◽

Statistical Isotropy ◽

Scaling Index ◽

Scaling Index Method

In the recent years, non-Gaussianity and statistical isotropy of the Cosmic Microwave Background (CMB) was investigated with various statistical measures, first and foremost by means of the measurements of the WMAP satellite. In this paper, we focus on the analyses that were accomplished with a measure of local type, the so-calledScaling Index Method(SIM). The SIM is able to detect structural characteristics of a given data set and has proven to be highly valuable in CMB analysis. It was used for comparing the data set with simulations as well as surrogates, which are full-sky maps generated by randomisation of previously selected features of the original map. During these investigations, strong evidence for non-Gaussianities as well as asymmetries and local features could be detected. In combination with the surrogates approach, the SIM detected the highest significances for non-Gaussianity to date.

Download Full-text