scholarly journals The MGED Ontology: A Framework for Describing Functional Genomics Experiments

2003 ◽  
Vol 4 (1) ◽  
pp. 127-132 ◽  
Author(s):  
Christian J. Stoeckert ◽  
Helen Parkinson

The Microarray Gene Expression Data (MGED) society was formed with an initial focus on experiments involving microarray technology. Despite the diversity of applications, there are common concepts used and a common need to capture experimental information in a standardized manner. In building the MGED ontology, it was recognized that it would be impractical to cover all the different types of experiments on all the different types of organisms by listing and defining all the types of organisms and their properties. Our solution was to create a framework for describing microarray experiments with an initial focus on the biological sample and its manipulation. For concepts that are common for many species, we could provide a manageable listing of controlled terms. For concepts that are species-specific or whose values cannot be readily listed, we created an ‘OntologyEntry’ concept that referenced an external resource. The MGED ontology is a work in progress that needs additional instances and particularly needs constraints to be added. The ontology currently covers the experimental sample and design, and we have begun capturing aspects of the microarrays themselves as well. The primary application of the ontology will be to develop forms for entering information into databases, and consequently allowing queries, taking advantage of the structure provided by the ontology. The application of an ontology of experimental conditions extends beyond microarray experiments and, as the scope of MGED includes other aspects of functional genomics, so too will the MGED ontology.

Author(s):  
B.Hari Babu ◽  
N.Subash Chandra ◽  
T. Venu Gopal

Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research and development. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzed and improved. In this paper, we presented a brief comparison of the existing algorithms that were mainly focusing at clustering on high dimensional data.


Author(s):  
Pyingkodi Maran ◽  
Shanthi S. ◽  
Thenmozhi K. ◽  
Hemalatha D. ◽  
Nanthini K.

Computational biology is the research area that contributes to the analysis of biological information. The selection of the subset of cancer-related genes is one amongst the foremost promising clinical research of gene expression data. Since a gene can take the role of various biological pathways that in turn can be active only under specific experimental conditions, the stacked denoising auto-encoder(SDAE) and the genetic algorithm were combined to perform biclustering of cancer genes from huge dimensional microarray gene expression data. The Genetic-SDAE proved superior to recently proposed biclustering methods and better to determine the maximum similarity of a set of biclusters of gene expression data with lower MSR and higher gene variance. This work also assesses the results with respect to the discovered genes and spot that the extracted set of biclusters are supported by biological evidence, such as enrichment of gene functions and biological processes.


Author(s):  
Georgia Tsiliki ◽  
Dimitrios Vlachakis ◽  
Sophia Kossida

With the extensive use of microarray technology as a potential prognostic and diagnostic tool, the comparison and reproducibility of results obtained from the use of different platforms is of interest. The integration of those datasets can yield more informative results corresponding to numerous datasets and microarray platforms. We developed a novel integration technique for microarray gene-expression data derived by different studies for the purpose of a two-way Bayesian partition modelling which estimates co-expression profiles under subsets of genes and between biological samples or experimental conditions. The suggested methodology transforms disparate gene-expression data on a common probability scale to obtain inter-study-validated gene signatures. We evaluated the performance of our model using artificial data. Finally, we applied our model to six publicly available cancer gene-expression datasets and compared our results with well-known integrative microarray data methods. Our study shows that the suggested framework can relieve the limited sample size problem while reporting high accuracies by integrating multi-experiment data.


2009 ◽  
Vol 07 (05) ◽  
pp. 853-868 ◽  
Author(s):  
ANIRBAN MUKHOPADHYAY ◽  
UJJWAL MAULIK ◽  
SANGHAMITRA BANDYOPADHYAY

Biclustering methods are used to identify a subset of genes that are co-regulated in a subset of experimental conditions in microarray gene expression data. Many biclustering algorithms rely on optimizing mean squared residue to discover biclusters from a gene expression dataset. Recently it has been proved that mean squared residue is only good in capturing constant and shifting biclusters. However, scaling biclusters cannot be detected using this metric. In this article, a new coherence measure called scaling mean squared residue (SMSR) is proposed. Theoretically it has been proved that the proposed new measure is able to detect the scaling patterns effectively and it is invariant to local or global scaling of the input dataset. The effectiveness of the proposed coherence measure in detecting scaling patterns has been demonstrated experimentally on artificial and real-life benchmark gene expression datasets. Moreover, biological significance tests have been conducted to show that the biclusters identified using the proposed measure are composed of functionally enriched sets of genes.


2004 ◽  
Vol 02 (02) ◽  
pp. 273-288 ◽  
Author(s):  
HIDEO BANNAI ◽  
SHUNSUKE INENAGA ◽  
AYUMI SHINOHARA ◽  
MASAYUKI TAKEDA ◽  
SATORU MIYANO

We present an efficient algorithm for detecting putative regulatory elements in the upstream DNA sequences of genes, using gene expression information obtained from microarray experiments. Based on a generalized suffix tree, our algorithm looks for motif patterns whose appearance in the upstream region is most correlated with the expression levels of the genes. We are able to find the optimal pattern, in time linear in the total length of the upstream sequences. We implement and apply our algorithm to publicly available microarray gene expression data, and show that our method is able to discover biologically significant motifs, including various motifs which have been reported previously using the same data set. We further discuss applications for which the efficiency of the method is essential, as well as possible extensions to our algorithm.


2019 ◽  
Vol 8 (3) ◽  
pp. 8844-8848

Clustering is a technique in data mining which deals with huge amount of data. Clustering is intended to help a user in discovering and understanding the natural structure in a data set and abstract the meaning of large dataset. It is the task of partitioning objects of a data set into distinct groups such that two objects from one cluster are similar to each other, whereas two objects from distinct clusters are dissimilar. Clustering is unsupervised learning in which we are not provided with classes, where we can place the data objects. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high.


Author(s):  
Qiang Zhao ◽  
Jianguo Sun

Statistical analysis of microarray gene expression data has recently attracted a great deal of attention. One problem of interest is to relate genes to survival outcomes of patients with the purpose of building regression models for the prediction of future patients' survival based on their gene expression data. For this, several authors have discussed the use of the proportional hazards or Cox model after reducing the dimension of the gene expression data. This paper presents a new approach to conduct the Cox survival analysis of microarray gene expression data with the focus on models' predictive ability. The method modifies the correlation principal component regression (Sun, 1995) to handle the censoring problem of survival data. The results based on simulated data and a set of publicly available data on diffuse large B-cell lymphoma show that the proposed method works well in terms of models' robustness and predictive ability in comparison with some existing partial least squares approaches. Also, the new approach is simpler and easy to implement.


Sign in / Sign up

Export Citation Format

Share Document