Integration of Biomolecular Interaction Data in a Genomic and Proteomic Data Warehouse to Support Biomedical Knowledge Discovery

Author(s):  
Arif Canakoglu ◽  
Giorgio Ghisalberti ◽  
Marco Masseroli
Author(s):  
Harkiran Kaur ◽  
Kawaljeet Singh ◽  
Tejinder Kaur

Background: Numerous E – Migrants databases assist the migrants to locate their peers in various countries; hence contributing largely in communication of migrants, staying overseas. Presently, these traditional E – Migrants databases face the issues of non – scalability, difficult search mechanisms and burdensome information update routines. Furthermore, analysis of migrants’ profiles in these databases has remained unhandled till date and hence do not generate any knowledge. Objective: To design and develop an efficient and multidimensional knowledge discovery framework for E - Migrants databases. Method: In the proposed technique, results of complex calculations related to most probable On-Line Analytical Processing operations required by end users, are stored in the form of Decision Trees, at the pre- processing stage of data analysis. While browsing the Cube, these pre-computed results are called; thus offering Dynamic Cubing feature to end users at runtime. This data-tuning step reduces the query processing time and increases efficiency of required data warehouse operations. Results: Experiments conducted with Data Warehouse of around 1000 migrants’ profiles confirm the knowledge discovery power of this proposal. Using the proposed methodology, authors have designed a framework efficient enough to incorporate the amendments made in the E – Migrants Data Warehouse systems on regular intervals, which was totally missing in the traditional E – Migrants databases. Conclusion: The proposed methodology facilitate migrants to generate dynamic knowledge and visualize it in the form of dynamic cubes. Applying Business Intelligence mechanisms, blending it with tuned OLAP operations, the authors have managed to transform traditional datasets into intelligent migrants Data Warehouse.


2006 ◽  
Vol 45 (02) ◽  
pp. 153-157 ◽  
Author(s):  
F. Azuaje ◽  
N. Bolshakova

Summary Objectives: The main objective of the research is an application of the clustering and cluster validity methods to estimate the number of clusters in cancer tumor datasets. A weighed voting technique is going to be used to improve the prediction of the number of clusters based on different data mining techniques. These tools may be used for the identification of new tumour classes using DNA microarray datasets. This estimation approach may perform a useful tool to support biological and biomedical knowledge discovery. Methods: Three clustering and two validation algorithms were applied to two cancer tumor datasets. Recent studies confirm that there is no universal pattern recognition and clustering model to predict molecular profiles across different datasets. Thus, it is useful not to rely on one single clustering or validation method, but to apply a variety of approaches. Therefore, combination of these methods may be successfully used for the estimation of the number of clusters. Results: The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. The results show that this estimation approach may represent an effective tool to support biomedical knowledge discovery and healthcare applications. Conclusion: The methods implemented in this research may be successfully used for the estimation of the number of clusters. The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. These tools may be used for the identification of new tumour classes using gene expression profiles.


2017 ◽  
Author(s):  
Andysah Putera Utama Siahaan

Knowledge discovery is the process of adding knowledge from a large amount of data. The quality of knowledge generated from the process of knowledge discovery greatly affects the results of the decisions obtained. Existing data must be qualified and tested to ensure knowledge discovery processes can produce knowledge or information that is useful and feasible. It deals with strategic decision-making for an organization. Combining multiple operational databases and external data create the data warehouse. This treatment is very vulnerable to incomplete, inconsistent, and noisy data. Data mining provides a mechanism to clear this deficiency before finally stored in the data warehouse. This research tries to give technique to improve the quality of information in the data warehouse.


2017 ◽  
Vol 19 (1) ◽  
pp. 17-28 ◽  
Author(s):  
Siew-Phek T. Su ◽  
Ashwin Needamangala

Data warehousing technology has been defined by John Ladley as "a set of methods, techniques, and tools that are leveraged together and used to produce a vehicle that delivers data to end users on an integrated platform." (1) This concept h s been applied increasingly by industries worldwide to develop data warehouses for decision support and knowledge discovery. In the academic sector, several universities have developed data warehouses containing the universities' financial, payroll, personnel, budget, and student data. (2) These data warehouses across all industries and academia have met with varying degrees of success. Data warehousing technology and its related issues have been widely discussed and published. (3) Little has been done, however, on the application of this cutting edge technology in the library environment using library data.


2016 ◽  
Vol 25 (S 01) ◽  
pp. S117-S29 ◽  
Author(s):  
L. Sacchi ◽  
J. H. Holmes

Summary Objectives: We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. Methods: We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. Results: A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992-2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Conclusions: Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains.Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data.


2006 ◽  
Vol 3 (2) ◽  
pp. 162-176 ◽  
Author(s):  
Emilie Guérin ◽  
Gwenaëlle Marquet ◽  
Julie Chabalier ◽  
Marie-Bérengère Troadec ◽  
Christiane Guguen-Guillouzo ◽  
...  

Abstract In biomedical research, interpretation of microarray data requires confrontation of data and knowledge from heterogeneous resources, either in the biomedical domain or in genomics, as well as restitution and analysis methods adapted to huge amounts of data. We present a combined approach that relies on two components: BioMeKE annotates sets of genes using biomedical GO and UMLS concepts, and GEDAW, a Gene Expression Data Warehouse, uses BioMeKE to enrich experimental results with biomedical concepts, thus performing complex analyses of expression measurements through analysis workflows. The strength of our approach has been demonstrated within the framework of analysis of data resulting from the liver transcriptome study. It allowed new genes potentially associated with liver diseases to be highlighted.


Author(s):  
Nadeem N. Rather ◽  
◽  
Chintan O. Patel ◽  
Sharib A. Khan

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Zheng Gao ◽  
Gang Fu ◽  
Chunping Ouyang ◽  
Satoshi Tsutsui ◽  
Xiaozhong Liu ◽  
...  

2010 ◽  
Vol 7 (3) ◽  
Author(s):  
Giorgio Ghisalberti ◽  
Marco Masseroli ◽  
Luca Tettamanti

SummaryNumerous biomolecular data are available, but they are scattered in many databases and only some of them are curated by experts. Most available data are computationally derived and include errors and inconsistencies. Effective use of available data in order to derive new knowledge hence requires data integration and quality improvement. Many approaches for data integration have been proposed. Data warehousing seams to be the most adequate when comprehensive analysis of integrated data is required. This makes it the most suitable also to implement comprehensive quality controls on integrated data. We previously developed GFINDer (http://www.bioinformatics.polimi.it/GFINDer/), a web system that supports scientists in effectively using available information. It allows comprehensive statistical analysis and mining of functional and phenotypic annotations of gene lists, such as those identified by high-throughput biomolecular experiments. GFINDer backend is composed of a multi-organism genomic and proteomic data warehouse (GPDW). Within the GPDW, several controlled terminologies and ontologies, which describe gene and gene product related biomolecular processes, functions and phenotypes, are imported and integrated, together with their associations with genes and proteins of several organisms. In order to ease maintaining updated the GPDW and to ensure the best possible quality of data integrated in subsequent updating of the data warehouse, we developed several automatic procedures. Within them, we implemented numerous data quality control techniques to test the integrated data for a variety of possible errors and inconsistencies. Among other features, the implemented controls check data structure and completeness, ontological data consistency, ID format and evolution, unexpected data quantification values, and consistency of data from single and multiple sources. We use the implemented controls to analyze the quality of data available from several different biological databases and integrated in the GFINDer data warehouse. By doing so, we identified in these data a variety of different types of errors and inconsistencies; this enables us to ensure good quality of the data in the GFINDer data warehouse. We reported all identified data errors and inconsistencies to the curators of the original databases from where the data were retrieved, who mainly corrected them in subsequent updating of the original database. This contributed to improve the quality of the data available, in the original databases, to the whole scientific community.


Sign in / Sign up

Export Citation Format

Share Document