CDEF: Conceptual Data Extraction Framework for Heterogeneous Data

Author(s):  
Apurva Kulkarni ◽  
Chandrashekar Ramanathan
2021 ◽  
pp. 016555152198964
Author(s):  
Yohann Chasseray ◽  
Anne-Marie Barthe-Delanoë ◽  
Stéphane Négny ◽  
Jean-Marc Le Lann

As the next step in the development of intelligent computing systems is the addition of human expertise and knowledge, it is a priority to build strong computable and well-documented knowledge bases. Ontologies partially respond to this challenge by providing formalisms for knowledge representation. However, one major remaining task is the population of these ontologies with concrete application. Based on Model-Driven Engineering principles, a generic metamodel for the extraction of heterogeneous data is presented in this article. The metamodel has been designed with two objectives, namely (1) the need of genericity regarding the source of collected pieces of knowledge and (2) the intent to stick to a structure close to an ontological structure. As well, an example of instantiation of the metamodel for textual data in chemistry domain and an insight of how this metamodel could be integrated in a larger automated domain independent ontology population framework are given.


Author(s):  
Anna Bernasconi ◽  
Arif Canakoglu ◽  
Marco Masseroli ◽  
Stefano Ceri

Abstract Thousands of new experimental datasets are becoming available every day; in many cases, they are produced within the scope of large cooperative efforts, involving a variety of laboratories spread all over the world, and typically open for public use. Although the potential collective amount of available information is huge, the effective combination of such public sources is hindered by data heterogeneity, as the datasets exhibit a wide variety of notations and formats, concerning both experimental values and metadata. Thus, data integration is becoming a fundamental activity, to be performed prior to data analysis and biological knowledge discovery, consisting of subsequent steps of data extraction, normalization, matching and enrichment; once applied to heterogeneous data sources, it builds multiple perspectives over the genome, leading to the identification of meaningful relationships that could not be perceived by using incompatible data formats. In this paper, we first describe a technological pipeline from data production to data integration; we then propose a taxonomy of genomic data players (based on the distinction between contributors, repository hosts, consortia, integrators and consumers) and apply the taxonomy to describe about 30 important players in genomic data management. We specifically focus on the integrator players and analyse the issues in solving the genomic data integration challenges, as well as evaluate the computational environments that they provide to follow up data integration by means of visualization and analysis tools.


2018 ◽  
Vol 7 (4.37) ◽  
pp. 168
Author(s):  
Nadia Ibrahim ◽  
Alaa Hassan ◽  
Marwah Nihad

In this study, the large data extraction techniques; include detection of patterns and secret relationships between factors numbering and bring in the required information. Rapid analysis of massive data can lead to innovation and concepts of the theoretical value. Compared with results from mining between traditional data sets and the vast amount of large heterogeneous data interdependent it has the ability expand the knowledge and ideas about the target domain. We studied in this research data mining on the Internet. The various networks that are used to extract data onto different locations complex may appear sometimes and has been used to extract information on the web technology to extract and data analysis (Marwah et al., 2016). In this research, we extracted the information on large quantities of the web pages and examined the pages of the site using Java code, and we added the extracted information on a special database for the web page. We used the data network function to get accurate results of evaluating and categorizing the data pages found, which identifies the trusted web or risky web pages, and imported the data onto a CSV extension. Consequently, examine and categorize these data using WEKA to obtain accurate results. We concluded from the results that the applied data mining algorithms are better than other techniques in classification and extraction of data and high performance.  


2015 ◽  
Vol 09 (03) ◽  
pp. 353-372
Author(s):  
Qingliang Miao ◽  
Ruiyu Fang ◽  
Yao Meng

The development of modern health care and clinical practice increase the need of nutritional and medical data extraction and integration across heterogeneous data sources. It can be useful for researchers and patients if there is a way to extract relevant information and organize it as easily shared and machine processable linked data. In this paper, we describe an automatic approach that extracts and publishes nutritional linked data including nutritional concepts and relationships extracted from nutritional data sources. Moreover, we link the nutritional data with Linked Open Data. In particular, a CRF-based approach is used to mine food, ingredient, disease entities and their relationships from nutritional text. And then, an extended nutritional ontology is used to organize the extracted data. Finally, we assign semantic links between food, ingredient, disease entities and other equivalent entities in DBPedia, Diseasome and LinkedCT.


Author(s):  
W.J. de Ruijter ◽  
M.R. McCartney ◽  
David J. Smith ◽  
J.K. Weiss

Further advances in resolution enhancement of transmission electron microscopes can be expected from digital processing of image data recorded with slow-scan CCD cameras. Image recording with these new cameras is essential because of their high sensitivity, extreme linearity and negligible geometric distortion. Furthermore, digital image acquisition allows for on-line processing which yields virtually immediate reconstruction results. At present, the most promising techniques for exit-surface wave reconstruction are electron holography and the recently proposed focal variation method. The latter method is based on image processing applied to a series of images recorded at equally spaced defocus.Exit-surface wave reconstruction using the focal variation method as proposed by Van Dyck and Op de Beeck proceeds in two stages. First, the complex image wave is retrieved by data extraction from a parabola situated in three-dimensional Fourier space. Then the objective lens spherical aberration, astigmatism and defocus are corrected by simply dividing the image wave by the wave aberration function calculated with the appropriate objective lens aberration coefficients which yields the exit-surface wave.


Author(s):  
Mitsuji MUNEYASU ◽  
Nayuta JINDA ◽  
Yuuya MORITANI ◽  
Soh YOSHIDA

2013 ◽  
Vol 7 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Dr Sunitha Abburu ◽  
G. Suresh Babu

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.  But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies  data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.   It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.  The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.


Sign in / Sign up

Export Citation Format

Share Document