Visualising Inconsistency and Incompleteness in RDF Gene Expression Data using FCA

Author(s):  
Honour Chika Nwagwu

The integration of data from different data sources can result to the existence of inconsistent or incomplete data (IID). IID can undermine the validity of information retrieved from an integrated dataset. There is therefore a need to identify these anomalies. This work presents SPARQL queries that retrieve from an EMAGE dataset, information which are inconsistent or incomplete. Also, it will be shown how Formal Concept Analysis (FCA) tools notably FcaBedrock and Concept Explorer can be applied to identify and visualise IID existing in these retrieved information. Although, instances of IID can exist in most data formats, the investigation is focused on RDF dataset.

2011 ◽  
Vol 181 (10) ◽  
pp. 1989-2001 ◽  
Author(s):  
Mehdi Kaytoue ◽  
Sergei O. Kuznetsov ◽  
Amedeo Napoli ◽  
Sébastien Duplessis

Author(s):  
Erliang Zeng ◽  
Chengyong Yang ◽  
Tao Li ◽  
Giri Narasimhan

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.


Author(s):  
Erliang Zeng ◽  
Chengyong Yang ◽  
Tao Li ◽  
Giri Narasimhan

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.


Biotechnology ◽  
2019 ◽  
pp. 265-304
Author(s):  
David Correa Martins Jr. ◽  
Fabricio Martins Lopes ◽  
Shubhra Sankar Ray

The inference of Gene Regulatory Networks (GRNs) is a very challenging problem which has attracted increasing attention since the development of high-throughput sequencing and gene expression measurement technologies. Many models and algorithms have been developed to identify GRNs using mainly gene expression profile as data source. As the gene expression data usually has limited number of samples and inherent noise, the integration of gene expression with several other sources of information can be vital for accurately inferring GRNs. For instance, some prior information about the overall topological structure of the GRN can guide inference techniques toward better results. In addition to gene expression data, recently biological information from heterogeneous data sources have been integrated by GRN inference methods as well. The objective of this chapter is to present an overview of GRN inference models and techniques with focus on incorporation of prior information such as, global and local topological features and integration of several heterogeneous data sources.


Author(s):  
David Correa Martins Jr. ◽  
Fabricio Martins Lopes ◽  
Shubhra Sankar Ray

The inference of Gene Regulatory Networks (GRNs) is a very challenging problem which has attracted increasing attention since the development of high-throughput sequencing and gene expression measurement technologies. Many models and algorithms have been developed to identify GRNs using mainly gene expression profile as data source. As the gene expression data usually has limited number of samples and inherent noise, the integration of gene expression with several other sources of information can be vital for accurately inferring GRNs. For instance, some prior information about the overall topological structure of the GRN can guide inference techniques toward better results. In addition to gene expression data, recently biological information from heterogeneous data sources have been integrated by GRN inference methods as well. The objective of this chapter is to present an overview of GRN inference models and techniques with focus on incorporation of prior information such as, global and local topological features and integration of several heterogeneous data sources.


Author(s):  
José María González Calabozo ◽  
Carmen Peláez-Moreno ◽  
Francisco José Valverde-Albacete

Author(s):  
Hidenobu Hashikami ◽  
◽  
Takanari Tanabata ◽  
Fumiaki Hirose ◽  
Nur Hasanah ◽  
...  

A data-analytic system is proposed for microarray gene expression data based on Formal Concept Analysis (FCA). The purpose of the system is to systematically organize data and to build a complete lattice that analyzes complex relations among genes and give biological interpretation of microarray data. In the system, formal concept analysis handles complex relations, so the microarray data is binarized by setting up a threshold. When change occurs in a conventional algorithm, formal concepts that are nodes of the lattice were calculated from the beginning, but the calculation is inefficient. This paper proposes a new algorithm that has two phase of matrix detection and updating concepts to efficiently update only altered concepts from previously generated concepts. Experiments on run time show that the algorithm takes an average of 0.94 seconds to process real microarray data containing of 43,734 genes and 6 gene expression values.


Author(s):  
Takanari Tanabata ◽  
◽  
Fumiaki Hirose ◽  
Hidenobu Hashikami ◽  
Hajime Nobuhara ◽  
...  

The DNA microarray analysis can explain gene functions by measuring tens of thousands of gene expressions at once and analyzing gene expression profiles that are obtained from the measurement. However, gene expression profiles have such a vast amount of information and therefore most analyses work are done on the data narrowed down by statistical methods, there remains a possibility ofmissing out on genes that consist the factors of phenomena from their evaluations. This study propose a method based on a formal concept analysis to visualize all gene expression profiles and characteristic information that can be obtained from annotation information of each gene so that the user can overview them. In the formal concept analysis, a lattice structure that allows genes to be hierarchically classified and made viewable is built based on the inclusion relations of attributes from a context table in which gene is the object and the attributes are expression profiles and binarized characteristic information. With the proposed method, the user can change the overview state by adjusting the expression ratio and the binary state of characteristic information, understand the relational structure of gene expressions, and carry out analyses of gene functions. We develop software to practice the proposed method, and then ask a biologist to evaluate effectiveness of proposed method applied to a function analysis of genes related to blue light signaling of rice seedlings.


Sign in / Sign up

Export Citation Format

Share Document