Integration of mutual information and text mining methods for extracting gene-gene interactions from gene expression data

Author(s):  
David H. Millis ◽  
Jeffrey L. Solka ◽  
Lakshmi K. Matukumalli
2013 ◽  
Vol 2013 ◽  
pp. 1-5 ◽  
Author(s):  
Jeyakumar Natarajan

Current microarray data mining methods such as clustering, classification, and association analysis heavily rely on statistical and machine learning algorithms for analysis of large sets of gene expression data. In recent years, there has been a growing interest in methods that attempt to discover patterns based on multiple but related data sources. Gene expression data and the corresponding literature data are one such example. This paper suggests a new approach to microarray data mining as a combination of text mining (TM) and information extraction (IE). TM is concerned with identifying patterns in natural language text and IE is concerned with locating specific entities, relations, and facts in text. The present paper surveys the state of the art of data mining methods for microarray data analysis. We show the limitations of current microarray data mining methods and outline how text mining could address these limitations.


Author(s):  
Saman Farahmand ◽  
Corey O’Connor ◽  
Jill A Macoska ◽  
Kourosh Zarringhalam

Abstract Inference of active regulatory mechanisms underlying specific molecular and environmental perturbations is essential for understanding cellular response. The success of inference algorithms relies on the quality and coverage of the underlying network of regulator–gene interactions. Several commercial platforms provide large and manually curated regulatory networks and functionality to perform inference on these networks. Adaptation of such platforms for open-source academic applications has been hindered by the lack of availability of accurate, high-coverage networks of regulatory interactions and integration of efficient causal inference algorithms. In this work, we present CIE, an integrated platform for causal inference of active regulatory mechanisms form differential gene expression data. Using a regularized Gaussian Graphical Model, we construct a transcriptional regulatory network by integrating publicly available ChIP-seq experiments with gene-expression data from tissue-specific RNA-seq experiments. Our GGM approach identifies high confidence transcription factor (TF)–gene interactions and annotates the interactions with information on mode of regulation (activation vs. repression). Benchmarks against manually curated databases of TF–gene interactions show that our method can accurately detect mode of regulation. We demonstrate the ability of our platform to identify active transcriptional regulators by using controlled in vitro overexpression and stem-cell differentiation studies and utilize our method to investigate transcriptional mechanisms of fibroblast phenotypic plasticity.


2008 ◽  
Vol 2 (1) ◽  
pp. 10 ◽  
Author(s):  
John Watkinson ◽  
Xiaodong Wang ◽  
Tian Zheng ◽  
Dimitris Anastassiou

2013 ◽  
Vol 6 (1) ◽  
Author(s):  
Kristina M Hettne ◽  
André Boorsma ◽  
Dorien A M van Dartel ◽  
Jelle J Goeman ◽  
Esther de Jong ◽  
...  

PLoS ONE ◽  
2014 ◽  
Vol 9 (10) ◽  
pp. e109569 ◽  
Author(s):  
Federico M. Giorgi ◽  
Gonzalo Lopez ◽  
Jung H. Woo ◽  
Brygida Bisikirska ◽  
Andrea Califano ◽  
...  

2021 ◽  
Author(s):  
Jingyi Zhang ◽  
Farhan Ibrahim ◽  
Doaa Altarawy ◽  
Lenwood S Heath ◽  
Sarah Tulin

Abstract BackgroundGene regulatory network (GRN) inference can now take advantage of powerful machine learning algorithms to predict the entire landscape of gene-to-gene interactions with the potential to complement traditional experimental methods in building gene networks. However, the dynamical nature of embryonic development -- representing the time-dependent interactions between thousands of transcription factors, signaling molecules, and effector genes -- is one of the most challenging arenas for GRN prediction. ResultsIn this work, we show that successful GRN predictions for developmental systems from gene expression data alone can be obtained with the Priors Enriched Absent Knowledge (PEAK) network inference algorithm. PEAK is a noise-robust method that models gene expression dynamics via ordinary differential equations and selects the best network based on information-theoretic criteria coupled with the machine learning algorithm Elastic net. We test our GRN prediction methodology using two gene expression data sets for the purple sea urchin (S. purpuratus) and cross-check our results against existing GRN models that have been constructed and validated by over 30 years of experimental results. Our results found a remarkably high degree of sensitivity in identifying known gene interactions in the network (maximum 76.32%). We also generated 838 novel predictions for interactions that have not yet been described, which provide a resource for researchers to use to further complete the sea urchin GRN. ConclusionsGRN predictions that match known gene interactions can be produced using gene expression data alone from developmental time series experiments.


Sign in / Sign up

Export Citation Format

Share Document