A Novel Approach of Deduplication on Indian Demographic Variation for Large Structured Data

Author(s):  
Krishnanjan Bhattacharjee ◽  
Chahat Garg ◽  
S. Shivakarthik ◽  
Swati Mehta ◽  
Ajai Kumar ◽  
...  
Author(s):  
Khayra Bencherif ◽  
Mimoun Malki ◽  
Djamel Amar Bensaber

This article describes how the Linked Open Data Cloud project allows data providers to publish structured data on the web according to the Linked Data principles. In this context, several link discovery frameworks have been developed for connecting entities contained in knowledge bases. In order to achieve a high effectiveness for the link discovery task, a suitable link configuration is required to specify the similarity conditions. Unfortunately, such configurations are specified manually; which makes the link discovery task tedious and more difficult for the users. In this article, the authors address this drawback by proposing a novel approach for the automatic determination of link specifications. The proposed approach is based on a neural network model to combine a set of existing metrics into a compound one. The authors evaluate the effectiveness of the proposed approach in three experiments using real data sets from the LOD Cloud. In addition, the proposed approach is compared against link specifications approaches to show that it outperforms them in most experiments.


2018 ◽  
Vol 10 (10) ◽  
pp. 98
Author(s):  
Prakash Hardaha ◽  
Shailendra Singh

Due to the exponential growth of the data and its services, visiting multiple webs/apps by a user raises three issues—(1) consumption of extra bytes; (2) time killing process of surfing inside the webs/apps; (3) tedious task of remembering address of webs/apps with their credentials. The data mashup is a set of techniques and user-friendly approaches which not only resolves above issues but also allows ordinary user to fetch required data from multiple disparate data sources and to create the integrated view in his defined digital place. In this paper, we have proposed an extension of existing REST protocol called Structured Data REST (SDRest) protocol and user-friendly novel approach which allows even ordinary users to develop end to end data mashup, using the innovative concept of Structured Data Mashup Box (SDMB) and One Time Configuration (OTC)-Any Time Access (ATA) models. Our implementation shows that pre-mashup configuration can easily be performed by an ordinary user and an integrated user interface view of end user data mashup can be created without any technical knowledge or programming. We have also evaluated the proposed work by comparing it with some of the related works and found that the proposed work has developed user friendly configurable approach using the current state of the art techniques to involve not only the ordinary user but also the mashup service provider and the data service provider to develop public, private and hybrid data mashup.


Author(s):  
Protima Banerjee

Over the past few decades, data mining has emerged as a field of research critical to understanding and assimilating the large stores of data accumulated by corporations, government agencies, and laboratories. Early on, mining algorithms and techniques were limited to relational data sets coming directly from On-Line Transaction Processing (OLTP) systems, or from a consolidated enterprise data warehouse. However, recent work has begun to extend the limits of data mining strategies to include “semi-structured data such as HTML and XML texts, symbolic sequences, ordered trees and relations represented by advanced logics.” (Washio and Motoda, 2003) The goal of any data mining endeavor is to detect and extract patterns in the data sets being examined. Semantic data mining is a novel approach that makes use of graph topology, one of the most fundamental and generic mathematical constructs, and semantic meaning, to scan semi-structured data for patterns. This technique has the potential to be especially powerful as graph data representation can capture so many types of semantic relationships. Current research efforts in this field are focused on utilizing graph-structured semantic information to derive complex and meaningful relationships in a wide variety of application areas- - national security and web mining being foremost among these. In this article, we review significant segments of recent data mining research that feed into semantic data mining and describe some promising application areas.


2018 ◽  
Vol 48 (3) ◽  
pp. 1175-1218 ◽  
Author(s):  
Matthias A. Fahrenwaldt ◽  
Stefan Weber ◽  
Kerstin Weske

AbstractWe develop a novel approach for pricing cyber insurance contracts. The considered cyber threats, such as viruses and worms, diffuse in a structured data network. The spread of the cyber infection is modeled by an interacting Markov chain. Conditional on the underlying infection, the occurrence and size of claims are described by a marked point process. We introduce and analyze a new polynomial approximation of claims together with a mean-field approach that allows to compute aggregate expected losses and prices of cyber insurance. Numerical case studies demonstrate the impact of the network topology and indicate that higher order approximations are indispensable for the analysis of non-linear claims.


Author(s):  
JIAN-WEI TIAN ◽  
WEN-HUI QI ◽  
XIAO-XIAO LIU

A great deal of data on the Web lies in the hidden databases, or the deep Web. Most of the deep Web data is not directly available and can only be accessed through the query interfaces. Current research on deep Web search has focused on crawling the deep Web data via Web interfaces with keywords queries. However, these keywords-based methods have inherent limitations because of the multi-attributes and top-k features of the deep Web. In this paper we propose a novel approach for siphoning structured data with structured queries. Firstly, in order to retrieve all the data non-repeatedly in hidden databases, we model the hidden database as a hierarchy tree. Under this theoretical framework, data retrieving is transformed into the traversing problem in a tree. We also propose techniques to narrow the query space by using heuristic rule, based on mutual information, to guide the traversal process. We conduct extensive experiments over real deep Web sites and controlled databases to illustrate the coverage and efficiency of our techniques.


Author(s):  
Jaemin Yoo ◽  
Hyunsik Jeon ◽  
U Kang

Given graph-structured data, how can we train a robust classifier in a semi-supervised setting that performs well without neighborhood information? In this work, we propose belief propagation networks (BPN), a novel approach to train a deep neural network in a hard inductive setting, where the test data are given without neighborhood information. BPN uses a differentiable classifier to compute the prior distributions of nodes, and then diffuses the priors through the graphical structure, independently from the prior computation. This separable structure improves the generalization performance of BPN for isolated test instances, compared with previous approaches that jointly use the feature and neighborhood without distinction. As a result, BPN outperforms state-of-the-art methods in four datasets with an average margin of 2.4% points in accuracy.


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-29
Author(s):  
Stefan Malewski ◽  
Michael Greenberg ◽  
Éric Tanter

Dynamically-typed languages offer easy interaction with ad hoc data such as JSON and S-expressions; statically-typed languages offer powerful tools for working with structured data, notably algebraic datatypes , which are a core feature of typed languages both functional and otherwise. Gradual typing aims to reconcile dynamic and static typing smoothly. The gradual typing literature has extensively focused on the computational aspect of types, such as type safety, effects, noninterference, or parametricity, but the application of graduality to data structuring mechanisms has been much less explored. While row polymorphism and set-theoretic types have been studied in the context of gradual typing, algebraic datatypes in particular have not, which is surprising considering their wide use in practice. We develop, formalize, and prototype a novel approach to gradually structured data with algebraic datatypes. Gradually structured data bridges the gap between traditional algebraic datatypes and flexible data management mechanisms such as tagged data in dynamic languages, or polymorphic variants in OCaml. We illustrate the key ideas of gradual algebraic datatypes through the evolution of a small server application from dynamic to progressively more static checking, formalize a core functional language with gradually structured data, and establish its metatheory, including the gradual guarantees.


Author(s):  
Anusha A R

With the rapid growth in number and dimension of databases and database applications in Healthcare records, it is necessary to design a system to achieve automatic extraction of facts from huge table. At the same point, there is a provocation in controlling unstructured data as it highly difficult to analyze and extract actionable intelligence. Preprocessing is an important task and critical step in Text Mining, Regular Expression and Information retrieval. The accession of key data from unstructured data is often difficult. The objective of this project is to transform the unstructured healthcare data to structured data particularly to gain perception and to generate appropriate structured data.


Sign in / Sign up

Export Citation Format

Share Document