CDEF: Conceptual Data Extraction Framework for Heterogeneous Data

As the next step in the development of intelligent computing systems is the addition of human expertise and knowledge, it is a priority to build strong computable and well-documented knowledge bases. Ontologies partially respond to this challenge by providing formalisms for knowledge representation. However, one major remaining task is the population of these ontologies with concrete application. Based on Model-Driven Engineering principles, a generic metamodel for the extraction of heterogeneous data is presented in this article. The metamodel has been designed with two objectives, namely (1) the need of genericity regarding the source of collected pieces of knowledge and (2) the intent to stick to a structure close to an ontological structure. As well, an example of instantiation of the metamodel for textual data in chemistry domain and an insight of how this metamodel could be integrated in a larger automated domain independent ontology population framework are given.

Download Full-text

The road towards data integration in human genomics: players, steps and interactions

Briefings in Bioinformatics ◽

10.1093/bib/bbaa080 ◽

2020 ◽

Cited By ~ 1

Author(s):

Anna Bernasconi ◽

Arif Canakoglu ◽

Marco Masseroli ◽

Stefano Ceri

Keyword(s):

Data Integration ◽

Data Extraction ◽

Genomic Data ◽

Heterogeneous Data ◽

Biological Knowledge ◽

Multiple Perspectives ◽

The Road ◽

Effective Combination ◽

Experimental Values ◽

Meaningful Relationships

Abstract Thousands of new experimental datasets are becoming available every day; in many cases, they are produced within the scope of large cooperative efforts, involving a variety of laboratories spread all over the world, and typically open for public use. Although the potential collective amount of available information is huge, the effective combination of such public sources is hindered by data heterogeneity, as the datasets exhibit a wide variety of notations and formats, concerning both experimental values and metadata. Thus, data integration is becoming a fundamental activity, to be performed prior to data analysis and biological knowledge discovery, consisting of subsequent steps of data extraction, normalization, matching and enrichment; once applied to heterogeneous data sources, it builds multiple perspectives over the genome, leading to the identification of meaningful relationships that could not be perceived by using incompatible data formats. In this paper, we first describe a technological pipeline from data production to data integration; we then propose a taxonomy of genomic data players (based on the distinction between contributors, repository hosts, consortia, integrators and consumers) and apply the taxonomy to describe about 30 important players in genomic data management. We specifically focus on the integrator players and analyse the issues in solving the genomic data integration challenges, as well as evaluate the computational environments that they provide to follow up data integration by means of visualization and analysis tools.

Download Full-text

Big Data Analysis of Web Data Extraction

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.37.24095 ◽

2018 ◽

Vol 7 (4.37) ◽

pp. 168

Author(s):

Nadia Ibrahim ◽

Alaa Hassan ◽

Marwah Nihad

Keyword(s):

Data Mining ◽

Data Analysis ◽

High Performance ◽

Data Extraction ◽

Large Data ◽

Heterogeneous Data ◽

Web Pages ◽

Target Domain ◽

Data Mining Algorithms ◽

The Web

In this study, the large data extraction techniques; include detection of patterns and secret relationships between factors numbering and bring in the required information. Rapid analysis of massive data can lead to innovation and concepts of the theoretical value. Compared with results from mining between traditional data sets and the vast amount of large heterogeneous data interdependent it has the ability expand the knowledge and ideas about the target domain. We studied in this research data mining on the Internet. The various networks that are used to extract data onto different locations complex may appear sometimes and has been used to extract information on the web technology to extract and data analysis (Marwah et al., 2016). In this research, we extracted the information on large quantities of the web pages and examined the pages of the site using Java code, and we added the extracted information on a special database for the web page. We used the data network function to get accurate results of evaluating and categorizing the data pages found, which identifies the trusted web or risky web pages, and imported the data onto a CSV extension. Consequently, examine and categorize these data using WEKA to obtain accurate results. We concluded from the results that the applied data mining algorithms are better than other techniques in classification and extraction of data and high performance.

Download Full-text

Extracting and Integrating Nutrition Related Linked Data

International Journal of Semantic Computing ◽

10.1142/s1793351x15400103 ◽

2015 ◽

Vol 09 (03) ◽

pp. 353-372

Author(s):

Qingliang Miao ◽

Ruiyu Fang ◽

Yao Meng

Keyword(s):

Linked Data ◽

Data Extraction ◽

Open Data ◽

Relevant Information ◽

Heterogeneous Data ◽

Data Sources ◽

Food Ingredient ◽

Modern Health Care ◽

Disease Entities ◽

Nutritional Data

The development of modern health care and clinical practice increase the need of nutritional and medical data extraction and integration across heterogeneous data sources. It can be useful for researchers and patients if there is a way to extract relevant information and organize it as easily shared and machine processable linked data. In this paper, we describe an automatic approach that extracts and publishes nutritional linked data including nutritional concepts and relationships extracted from nutritional data sources. Moreover, we link the nutritional data with Linked Open Data. In particular, a CRF-based approach is used to mine food, ingredient, disease entities and their relationships from nutritional text. And then, an extended nutritional ontology is used to organize the extracted data. Finally, we assign semantic links between food, ingredient, disease entities and other equivalent entities in DBPedia, Diseasome and LinkedCT.

Download Full-text

Exit-surface wave reconstruction using a focal series

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100129577 ◽

1992 ◽

Vol 50 (2) ◽

pp. 988-989

Author(s):

W.J. de Ruijter ◽

M.R. McCartney ◽

David J. Smith ◽

J.K. Weiss

Keyword(s):

Surface Wave ◽

Spherical Aberration ◽

Data Extraction ◽

High Sensitivity ◽

Geometric Distortion ◽

Variation Method ◽

Objective Lens ◽

Immediate Reconstruction ◽

Exit Surface ◽

Electron Microscopes

Further advances in resolution enhancement of transmission electron microscopes can be expected from digital processing of image data recorded with slow-scan CCD cameras. Image recording with these new cameras is essential because of their high sensitivity, extreme linearity and negligible geometric distortion. Furthermore, digital image acquisition allows for on-line processing which yields virtually immediate reconstruction results. At present, the most promising techniques for exit-surface wave reconstruction are electron holography and the recently proposed focal variation method. The latter method is based on image processing applied to a series of images recorded at equally spaced defocus.Exit-surface wave reconstruction using the focal variation method as proposed by Van Dyck and Op de Beeck proceeds in two stages. First, the complex image wave is retrieved by data extraction from a parabola situated in three-dimensional Fourier space. Then the objective lens spherical aberration, astigmatism and defocus are corrected by simply dividing the image wave by the wave aberration function calculated with the appropriate objective lens aberration coefficients which yields the exit-surface wave.

Download Full-text

Defining the Data Challenge for Family, Domestic and Sexual Violence, Australia: A Conceptual Data Framework

PsycEXTRA Dataset ◽

10.1037/e633092013-001 ◽

2013 ◽

Cited By ~ 3

Author(s):

Keyword(s):

Sexual Violence ◽

Data Framework ◽

Conceptual Data

Download Full-text

Data Extraction Method from Printed Images with Different Formats

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e100.a.2355 ◽

2017 ◽

Vol E100.A (11) ◽

pp. 2355-2357

Author(s):

Mitsuji MUNEYASU ◽

Nayuta JINDA ◽

Yuuya MORITANI ◽

Soh YOSHIDA

Keyword(s):

Extraction Method ◽

Data Extraction

Download Full-text

Evaluation Performance of ELK Reasoner for Semantic Ontology Based Heterogeneous Data Framework in E-Learning Environment

Journal of Environmental Science Computer Science and Engineering & Technology ◽

10.24214/jecet.b.8.4.30115 ◽

2019 ◽

Vol 8 (4) ◽

Keyword(s):

Learning Environment ◽

Heterogeneous Data ◽

Evaluation Performance ◽

Évaluation Performance ◽

Data Framework ◽

E Learning ◽

Semantic Ontology

Download Full-text

A FRAME WORK FOR WEB INFORMATION EXTRACTION AND ANALYSIS

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v7i2.3459 ◽

2013 ◽

Vol 7 (2) ◽

pp. 574-579 ◽

Cited By ~ 3

Author(s):

Dr Sunitha Abburu ◽

G. Suresh Babu

Keyword(s):

Information Extraction ◽

Data Extraction ◽

Research Work ◽

Web Pages ◽

Web Documents ◽

E Learning ◽

Structured Information ◽

Frame Work ◽

Effective Decision ◽

The Web

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.Â But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies Â data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.Â Â It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.Â The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.

Download Full-text