gene expression data
Recently Published Documents





2022 ◽  
Kay Spiess ◽  
Timothy Fulton ◽  
Seogwon Hwang ◽  
Kane Toh ◽  
Dillan Saunders ◽  

The study of pattern formation has benefited from reverse-engineering gene regulatory network (GRN) structure from spatio-temporal quantitative gene expression data. Traditional approaches omit tissue morphogenesis, hence focusing on systems where the timescales of pattern formation and morphogenesis can be separated. In such systems, pattern forms as an emergent property of the underlying GRN. This is not the case in many animal patterning systems, where patterning and morphogenesis are simultaneous. To address pattern formation in these systems we need to adapt our methodologies to explicitly accommodate cell movements and tissue shape changes. In this work we present a novel framework to reverse-engineer GRNs underlying pattern formation in tissues experiencing morphogenetic changes and cell rearrangements. By combination of quantitative data from live and fixed embryos we approximate gene expression trajectories (AGETs) in single cells and use a subset to reverse-engineer candidate GRNs using a Markov Chain Monte Carlo approach. GRN fit is assessed by simulating on cell tracks (live-modelling) and comparing the output to quantitative data-sets. This framework outputs candidate GRNs that recapitulate pattern formation at the level of the tissue and the single cell. To our knowledge, this inference methodology is the first to integrate cell movements and gene expression data, making it possible to reverse-engineer GRNs patterning tissues undergoing morphogenetic changes.

2022 ◽  
Yongsheng Zhang ◽  
Yunlong Wang ◽  
Jichuang Wang ◽  
Kaixiang Zhang

Abstract Bladder cancer (BLCA) is among the most frequent types of cancer. Patients with BLCA have a significant recurrence rate and a poor post-surgery survival rate. Recent research has found a link between tumor immune cell infiltration (ICI) and the prognosis of BLCA patients. However, the ICI picture of BLCA remains unclear. Common gene expression data was obtained by combining the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) expression databases. Two computational algorithms were proposed to unravel the ICI landscape of BLCA patients. The R package "limma" was applied to find differentially expressed genes (DEGs). Principal-component analysis (PCA) was used to calculate the ICI score. A total of 569 common gene expression data were retrieved from TCGA and GEO cohorts. CD8+ T cells were found to have a substantial positive connection with activated memory CD4+ T cells and immune score. On the contrary, CD8+ T cells were found to have a substantial negative connection with Macrophages M0. Thirty-eight DEGs were selected. Two ICI patterns were defined by unsupervised clustering method. Patients of BLCA were separated into two groups. The high ICI score group exhibits better outcome than the low one (p < 0.001). Finally, the group with a high tumor mutation burden (TMB) as well as a high ICI score had the best outcome. (p <0.001). Combining TMB and ICI score resulted in a more accurate survival prediction, suggesting that ICI score could be used as a prognostic marker for BLCA patients.

BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Honglin Wang ◽  
Pujan Joshi ◽  
Seung-Hyun Hong ◽  
Peter F. Maye ◽  
David W. Rowe ◽  

Abstract Background Interferon regulatory factor-8 (IRF8) and nuclear factor-activated T cells c1 (NFATc1) are two transcription factors that have an important role in osteoclast differentiation. Thanks to ChIP-seq technology, scientists can now estimate potential genome-wide target genes of IRF8 and NFATc1. However, finding target genes that are consistently up-regulated or down-regulated across different studies is hard because it requires analysis of a large number of high-throughput expression studies from a comparable context. Method We have developed a machine learning based method, called, Cohort-based TF target prediction system (cTAP) to overcome this problem. This method assumes that the pathway involving the transcription factors of interest is featured with multiple “functional groups” of marker genes pertaining to the concerned biological process. It uses two notions, Gene-Present Sufficiently (GP) and Gene-Absent Insufficiently (GA), in addition to log2 fold changes of differentially expressed genes for the prediction. Target prediction is made by applying multiple machine-learning models, which learn the patterns of GP and GA from log2 fold changes and four types of Z scores from the normalized cohort’s gene expression data. The learned patterns are then associated with the putative transcription factor targets to identify genes that consistently exhibit Up/Down gene regulation patterns within the cohort. We applied this method to 11 publicly available GEO data sets related to osteoclastgenesis. Result Our experiment identified a small number of Up/Down IRF8 and NFATc1 target genes as relevant to osteoclast differentiation. The machine learning models using GP and GA produced NFATc1 and IRF8 target genes different than simply using a log2 fold change alone. Our literature survey revealed that all predicted target genes have known roles in bone remodeling, specifically related to the immune system and osteoclast formation and functions, suggesting confidence and validity in our method. Conclusion cTAP was motivated by recognizing that biologists tend to use Z score values present in data sets for the analysis. However, using cTAP effectively presupposes assembling a sizable cohort of gene expression data sets within a comparable context. As public gene expression data repositories grow, the need to use cohort-based analysis method like cTAP will become increasingly important.

2022 ◽  
Vol 2022 ◽  
pp. 1-13
Hatim Z Almarzouki

The quantity of data required to give a valid analysis grows exponentially as machine learning dimensionality increases. In a single experiment, microarrays or gene expression profiling assesses and determines gene expression levels and patterns in various cell types or tissues. The advent of DNA microarray technology has enabled simultaneous intensive care of hundreds of gene expressions on a single chip, advancing cancer categorization. The most challenging aspect of categorization is working out many information points from many sources. The proposed approach uses microarray data to train deep learning algorithms on extracted features and then uses the Latent Feature Selection Technique to reduce classification time and increase accuracy. The feature-selection-based techniques will pick the important genes before classifying microarray data for cancer prediction and diagnosis. These methods improve classification accuracy by removing duplicate and superfluous information. The Artificial Bee Colony (ABC) technique of feature selection was proposed in this research using bone marrow PC gene expression data. The ABC algorithm, based on swarm intelligence, has been proposed for gene identification. The ABC has been used here for feature selection that generates a subset of features and every feature produced by the spectators, making this a wrapper-based feature selection system. This method’s main goal is to choose the fewest genes that are critical to PC performance while also increasing prediction accuracy. Convolutional Neural Networks were used to classify tumors without labelling them. Lung, kidney, and brain cancer datasets were used in the procedure’s training and testing stages. Using the cross-validation technique of k-fold methodology, the Convolutional Neural Network has an accuracy rate of 96.43%. The suggested research includes techniques for preprocessing and modifying gene expression data to enhance future cancer detection accuracy.

2022 ◽  
Martin Treppner ◽  
Harald Binder ◽  
Moritz Hess

AbstractDeep generative models can learn the underlying structure, such as pathways or gene programs, from omics data. We provide an introduction as well as an overview of such techniques, specifically illustrating their use with single-cell gene expression data. For example, the low dimensional latent representations offered by various approaches, such as variational auto-encoders, are useful to get a better understanding of the relations between observed gene expressions and experimental factors or phenotypes. Furthermore, by providing a generative model for the latent and observed variables, deep generative models can generate synthetic observations, which allow us to assess the uncertainty in the learned representations. While deep generative models are useful to learn the structure of high-dimensional omics data by efficiently capturing non-linear dependencies between genes, they are sometimes difficult to interpret due to their neural network building blocks. More precisely, to understand the relationship between learned latent variables and observed variables, e.g., gene transcript abundances and external phenotypes, is difficult. Therefore, we also illustrate current approaches that allow us to infer the relationship between learned latent variables and observed variables as well as external phenotypes. Thereby, we render deep learning approaches more interpretable. In an application with single-cell gene expression data, we demonstrate the utility of the discussed methods.

F1000Research ◽  
2022 ◽  
Vol 9 ◽  
pp. 1159
Qian (Vicky) Wu ◽  
Wei Sun ◽  
Li Hsu

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub,

2022 ◽  
Kimberly Badal ◽  
Jerome E. Foster ◽  
Rajini Haraksingh ◽  
Melford John

Abstract BackgroundRadiation therapy (RT) is frequently recommended for post-surgery treatment of early-stage breast cancer (BC) patients, though not all benefit. Clinical factors currently guide RT treatment decisions. At present, models to predict RT-benefit predominantly use statistical methods with modest performance. In this paper we present a high-accuracy genomic Machine Learning (ML) model to predict RT-benefit in early-stage BC patients. We also present a novel method for selecting genomic features for training ML algorithms. MethodsGene expression data from 463 early-stage BC patients treated with surgery and RT from the METABRIC cohort were obtained. Wilcoxon Rank Sum (Wilcoxon RS) test and Cox Proportional Hazards (Cox PH) were used to reduce the number of genes used to train eight ML algorithms. ML algorithms were trained on 80% of data using 10-fold cross validation and tested on 20% of data to assess performance in predicting relapse status. Results Genome-wide gene expression data was reduced by 96% using Wilcoxon RS and Cox PH to a 1,596 gene set and a 977 gene set. These gene sets were used to train eight ML algorithms resulting in models that ranged in performance accuracies from 54.01% to 95.6%. Highest accuracies were obtained using Support Vector Machine (SVM977–93.41%, SVM1596–95.6%) and Neural Networks algorithms (NN977 – 92.31%, NN1596 – 93.41%). In RT-untreated patients, accuracies of all models were 30% to 40% lower compared to RT-treated patients. SVM977 had the highest sensitivity of 91.09%. Members of the 977 set were enriched with genes involved in cell cycle and differentiation as well as genes associated with radiosensitivity and radioresistance. Conclusion This study presents a novel genomic feature selection approach that used Wilcoxon RS followed by Cox PH to reduce the number of genes from genome-wide gene expression data used for training ML algorithms by 96%. This approach led to an SVM model that used the expression values of 977 genes to predict RT-benefit in early-stage BC patients with 93.41% accuracy. This work demonstrates that ML models can be clinically useful for predicting cancer patient outcomes.

2022 ◽  
Vol 532 ◽  
pp. 110923
Jia-Xing Gao ◽  
Zhen-Yi Wang ◽  
Michael Q. Zhang ◽  
Min-Ping Qian ◽  
Da-Quan Jiang

2022 ◽  
Vol 70 (2) ◽  
pp. 4009-4025
Phimmarin Keerin ◽  
Tossapon Boongoen

Data in Brief ◽  
2022 ◽  
pp. 107787
Thaise Mayumi Taira ◽  
Vítor Luís Ribeiro ◽  
Yuri Jivago Silva Ribeiro ◽  
Raquel Assed Bezerra da Silva ◽  
Léa Assed Bezerra da Silva ◽  

Sign in / Sign up

Export Citation Format

Share Document