real gene Latest Research Papers

A Multi-Scale Approach to Modeling E. coli Chemotaxis

Entropy ◽

10.3390/e22101101 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1101

Author(s):

Eran Agmon ◽

Ryan K. Spangler

Keyword(s):

Computational Biology ◽

Gene Sequence ◽

Sequence Data ◽

Base Pairs ◽

E Coli ◽

Multi Scale ◽

Cellular Life ◽

Real Gene ◽

Biophysical Processes ◽

Gene Sequence Data

The degree to which we can understand the multi-scale organization of cellular life is tied to how well our models can represent this organization and the processes that drive its evolution. This paper uses Vivarium—an engine for composing heterogeneous computational biology models into integrated, multi-scale simulations. Vivarium’s approach is demonstrated by combining several sub-models of biophysical processes into a model of chemotactic E. coli that exchange molecules with their environment, express the genes required for chemotaxis, swim, grow, and divide. This model is developed incrementally, highlighting cross-compartment mechanisms that link E. coli to its environment, with models for: (1) metabolism and transport, with transport moving nutrients across the membrane boundary and metabolism converting them to useful metabolites, (2) transcription, translation, complexation, and degradation, with stochastic mechanisms that read real gene sequence data and consume base pairs and ATP to make proteins and complexes, and (3) the activity of flagella and chemoreceptors, which together support navigation in the environment.

RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters

Bioinformatics ◽

10.1093/bioinformatics/btaa630 ◽

2020 ◽

Vol 36 (20) ◽

pp. 5054-5060

Author(s):

Xiangyu Liu ◽

Di Li ◽

Juntao Liu ◽

Zhengchang Su ◽

Guojun Li

Keyword(s):

Gene Expression ◽

Biological Data ◽

Supplementary Information ◽

Gene Expression Matrix ◽

Real Gene ◽

Powerful Approach ◽

Number Of Genes ◽

Functional Patterns ◽

Robustness To Noise ◽

Expression Matrix

Abstract Motivation Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets. Results We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets. Availability and implementation Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717. Supplementary information Supplementary data are available at Bioinformatics online.

Mining approximate frequent dense modules from multiple gene expression datasets

10.29007/d87q ◽

2020 ◽

Author(s):

San Ha Seo ◽

Saeed Salem

Keyword(s):

Gene Expression ◽

Gene Annotation ◽

Expression Data ◽

Frequent Subgraph Mining ◽

Multiple Gene ◽

Real Gene ◽

Frequent Subgraph ◽

Frequent Subgraphs ◽

Coexpression Networks ◽

Functional Gene Annotation

Large amount of gene expression data has been collected for various environmental and biological conditions. Extracting co-expression networks that are recurrent in multiple co-expression networks has been shown promising in functional gene annotation and biomarkers discovery. Frequent subgraph mining reports a large number of subnetworks. In this work, we propose to mine approximate dense frequent subgraphs. Our proposed approach reports representative frequent subgraphs that are also dense. Our experiments on real gene coexpression networks show that frequent subgraphs are biologically interesting as evidenced by the large percentage of biologically enriched frequent dense subgraphs.

Improvements to Bayesian Gene Activity State Estimation from Genome-Wide Transcriptomics Data

10.1101/241000 ◽

2017 ◽

Author(s):

Craig Disselkoen ◽

Nathan Hekman ◽

Brian Gilbert ◽

Sydney Benson ◽

Matthew Anderson ◽

...

Keyword(s):

Gene Expression ◽

Bayesian Method ◽

Gene Activity ◽

Expression Data ◽

Real Gene ◽

Genome Wide ◽

Transcriptomics Data ◽

Activity State ◽

Regulatory Models ◽

Better Than

AbstractAn important question in many biological applications, is to estimate or classify gene activity states (active or inactive) based on genome-wide transcriptomics data. Recently, we proposed a Bayesian method, titled MultiMM, which showed superior results compared to existing methods. In short, MultiMM performed better than existing methods on both simulated and real gene expression data, confirming well-known biological results and yielding better agreement with fluxomics data. Despite these promising results, MultiMM has numerous limitations. First, MultiMM leverages co-regulatory models to improve activity state estimates, but information about co-regulation is incorporated in a manner that assumes that networks are known with certainty. Second, MultiMM assumes that genes that change states in the dataset can be distinguished with certainty from those that remain in one state. Third, the model can be sensitive to extreme measures (outliers) of gene expression. In this manuscript, we propose a modified Bayesian approach, which addresses these three limitations by improving outlier handling and by explicitly modeling network and other uncertainty yielding improved gene activity state estimates when compared to MultiMM.

sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters

BMC Bioinformatics ◽

10.1186/s12859-017-1731-8 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 6

Author(s):

Shailesh Tripathi ◽

Jason Lloyd-Price ◽

Andre Ribeiro ◽

Olli Yli-Harja ◽

Matthias Dehmer ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Network Structure ◽

Gene Network ◽

R Package ◽

Expression Data ◽

Real Gene

Robust Significance Analysis of Microarrays by Minimum β-Divergence Method

BioMed Research International ◽

10.1155/2017/5310198 ◽

2017 ◽

Vol 2017 ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Md. Shahjaman ◽

Nishith Kumar ◽

Md. Manir Hossain Mollah ◽

Md. Shakil Ahmed ◽

Anjuman Ara Begum ◽

...

Keyword(s):

Gene Expression ◽

Maximum Likelihood ◽

Statistical Approach ◽

Maximum Likelihood Estimators ◽

Small Sample ◽

Gene Expressions ◽

Large Sample ◽

Significance Analysis ◽

Real Gene ◽

Better Than

Identification of differentially expressed (DE) genes with two or more conditions is an important task for discovery of few biomarker genes. Significance Analysis of Microarrays (SAM) is a popular statistical approach for identification of DE genes for both small- and large-sample cases. However, it is sensitive to outlying gene expressions and produces low power in presence of outliers. Therefore, in this paper, an attempt is made to robustify the SAM approach using the minimum β-divergence estimators instead of the maximum likelihood estimators of the parameters. We demonstrated the performance of the proposed method in a comparison of some other popular statistical methods such as ANOVA, SAM, LIMMA, KW, EBarrays, GaGa, and BRIDGE using both simulated and real gene expression datasets. We observe that all methods show good and almost equal performance in absence of outliers for the large-sample cases, while in the small-sample cases only three methods (SAM, LIMMA, and proposed) show almost equal and better performance than others with two or more conditions. However, in the presence of outliers, on an average, only the proposed method performs better than others for both small- and large-sample cases with each condition.

Evaluation of Plaid Models in Biclustering of Gene Expression Data

Scientifica ◽

10.1155/2016/3059767 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Hamid Alavi Majd ◽

Soodeh Shahsavari ◽

Ahmad Reza Baghestani ◽

Seyyed Mohammad Tabatabaei ◽

Naghme Khadem Bashi ◽

...

Keyword(s):

Gene Expression ◽

Statistical Model ◽

Gene Expression Data ◽

High Dimensional ◽

Expression Data ◽

Simulation Data ◽

Intrinsic Structure ◽

Go Analysis ◽

Real Gene

Background.Biclustering algorithms for the analysis of high-dimensional gene expression data were proposed. Among them, the plaid model is arguably one of the most flexible biclustering models up to now.Objective.The main goal of this study is to provide an evaluation of plaid models. To that end, we will investigate this model on both simulation data and real gene expression datasets.Methods.Two simulated matrices with different degrees of overlap and noise are generated and then the intrinsic structure of these data is compared with biclusters result. Also, we have searched biologically significant discovered biclusters by GO analysis.Results.When there is no noise the algorithm almost discovered all of the biclusters but when there is moderate noise in the dataset, this algorithm cannot perform very well in finding overlapping biclusters and if noise is big, the result of biclustering is not reliable.Conclusion.The plaid model needs to be modified because when there is a moderate or big noise in the data, it cannot find good biclusters. This is a statistical model and is a quite flexible one. In summary, in order to reduce the errors, model can be manipulated and distribution of error can be changed.

Inferring Phylogenetic Networks from Gene Order Data

BioMed Research International ◽

10.1155/2013/503193 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Alexey Anatolievich Morozov ◽

Yuri Pavlovich Galachyants ◽

Yelena Valentinovna Likhoshway

Keyword(s):

Case Studies ◽

Gene Order ◽

Distance Matrix ◽

Phylogenetic Networks ◽

Simulation Studies ◽

Network Construction ◽

Intermediate Data ◽

Real Gene ◽

Binary Encoding ◽

Construction Algorithms

Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary), sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm). Binary encoding can also be useful, but only when the methods mentioned above cannot be used.

AdaptiveL1/2Shooting Regularization Method for Survival Analysis Using Gene Expression Data

The Scientific World JOURNAL ◽

10.1155/2013/475702 ◽

2013 ◽

Vol 2013 ◽

pp. 1-5 ◽

Cited By ~ 3

Author(s):

Xiao-Ying Liu ◽

Yong Liang ◽

Zong-Ben Xu ◽

Hai Zhang ◽

Kwong-Sak Leung

Keyword(s):

Gene Expression ◽

Variable Selection ◽

Regularization Method ◽

Proportional Hazards ◽

Adaptive Lasso ◽

High Dimensional ◽

Expression Data ◽

Artificial Data ◽

Real Gene ◽

Shooting Algorithm

A new adaptiveL1/2shooting regularization method for variable selection based on the Cox’s proportional hazards mode being proposed. This adaptiveL1/2shooting algorithm can be easily obtained by the optimization of a reweighed iterative series ofL1penalties and a shooting strategy ofL1/2penalty. Simulation results based on high dimensional artificial data show that the adaptiveL1/2shooting regularization method can be more accurate for variable selection than Lasso and adaptive Lasso methods. The results from real gene expression dataset (DLBCL) also indicate that theL1/2regularization method performs competitively.

Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

Cancer Informatics ◽

10.4137/cin.s3805 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S3805 ◽

Cited By ~ 6

Author(s):

Yingdong Zhao ◽

Richard Simon

Keyword(s):

Gene Expression ◽

Linear Regression ◽

Gene Expression Data ◽

Cross Validation ◽

Expression Profiles ◽

Linear Regression Method ◽

Expression Data ◽

Linear Regression Models ◽

Continuous Response ◽

Real Gene

There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high dimensional gene expression data. The three algorithms are the least angle regression (LAR), the least absolute shrinkage and selection operator (LASSO), and the averaged linear regression method (ALM). All methods are tested using simulations based on a real gene expression dataset and analyses of two sets of real gene expression data and using an unbiased complete cross validation approach. Our results show that the LASSO algorithm often provides a model with somewhat lower prediction error than the LAR method, but both of them perform more efficiently than the ALM predictor. We have developed a plug-in for BRB-ArrayTools that implements the LAR and the LASSO algorithms with complete cross-validation.

real gene
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Multi-Scale Approach to Modeling E. coli Chemotaxis

RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters

Mining approximate frequent dense modules from multiple gene expression datasets

Improvements to Bayesian Gene Activity State Estimation from Genome-Wide Transcriptomics Data

sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters

Robust Significance Analysis of Microarrays by Minimum β-Divergence Method

Evaluation of Plaid Models in Biclustering of Gene Expression Data

Inferring Phylogenetic Networks from Gene Order Data

AdaptiveL1/2Shooting Regularization Method for Survival Analysis Using Gene Expression Data

Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

Export Citation Format

real geneRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Multi-Scale Approach to Modeling E. coli Chemotaxis

RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters

Mining approximate frequent dense modules from multiple gene expression datasets

Improvements to Bayesian Gene Activity State Estimation from Genome-Wide Transcriptomics Data

sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters

Robust Significance Analysis of Microarrays by Minimum β-Divergence Method

Evaluation of Plaid Models in Biclustering of Gene Expression Data

Inferring Phylogenetic Networks from Gene Order Data

AdaptiveL1/2Shooting Regularization Method for Survival Analysis Using Gene Expression Data

Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

real gene
Recently Published Documents