scholarly journals USING ASYMMETRIC DISTRIBUTIONS FOR MODELING GENE EXPRESSION DATA

2021 ◽  
Vol 39 (2) ◽  
pp. 266-278
Author(s):  
Walkiria Maria de Oliveira MACERAU ◽  
Luis Aparecido MILAN

We present a short review of the asymmetric distributions alpha-stable, skew normal, skew Student’s t and skew Laplace. We compare the performance for these distributions, in general, are used to model asymmetric data, using AIC and BIC. These criterias were able to selecting the best model for each data set. We also apply these models to gene expression data and we verify these distributions are qualified to model these  observations.

2000 ◽  
Vol 3 (1) ◽  
pp. 9-15 ◽  
Author(s):  
PETER J. WOOLF ◽  
YIXIN WANG

Woolf, Peter J., and Yixin Wang. A fuzzy logic approach to analyzing gene expression data. Physiol Genomics 3: 9–15, 2000.—We have developed a novel algorithm for analyzing gene expression data. This algorithm uses fuzzy logic to transform expression values into qualitative descriptors that can be evaluated by using a set of heuristic rules. In our tests we designed a model to find triplets of activators, repressors, and targets in a yeast gene expression data set. For the conditions tested, the predictions made by the algorithm agree well with experimental data in the literature. The algorithm can also assist in determining the function of uncharacterized proteins and is able to detect a substantially larger number of transcription factors than could be found at random. This technology extends current techniques such as clustering in that it allows the user to generate a connected network of genes using only expression data.


2007 ◽  
Vol 1 (S1) ◽  
Author(s):  
Alfonso Buil ◽  
Alexandre Perera-Lluna ◽  
Ramon Souto ◽  
Juan M Peralta ◽  
Laura Almasy ◽  
...  

Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 2663-2663
Author(s):  
Matthew A Care ◽  
Stephen M Thirdborough ◽  
Andrew J Davies ◽  
Peter W.M. Johnson ◽  
Andrew Jack ◽  
...  

Abstract Purpose To assess whether comparative gene network analysis can reveal characteristic immune response signatures that predict clinical response in Diffuse large B-cell lymphoma (DLBCL). Background The wealth of available gene expression data sets for DLBCL and other cancer types provides a resource to define recurrent pathological processes at the level of gene expression and gene correlation neighbourhoods. This is of particular relevance in the context of cancer immune responses, where convergence onto common patterns may drive shared gene expression profiles. Where existing and novel immunotherapies harness the immune response for therapeutic benefit such responses may provide predictive biomarkers. Methods We independently analysed publically available DLBCL gene expression data sets and a wide compendium of gene expression data from diverse cancer types, and then asked whether common elements of cancer host response could be identified from resulting networks. Using 10 DLBCL gene expression data sets, encompassing 2030 cases, we established pairwise gene correlation matrices per data set, which were merged to generate median correlations of gene pairs across all data sets. Gene network analysis and unsupervised clustering was then applied to define global representations of DLBCL gene expression neighbourhoods. In parallel a diverse range of solid and lymphoid malignancies including; breast, colorectal, oesophageal, head and neck, non-small cell lung, prostate, pancreatic cancer, Hodgkin lymphoma, Follicular lymphoma and DLBCL were independently analysed using an orthogonal weighted gene correlation network analysis of gene expression data sets from which correlated modules across diverse cancer types were identified. The biology of resulting gene neighbourhoods was assessed by signature and ontology enrichment, and the overlap between gene correlation neighbourhoods and WGCNA derived modules associated with immune/host responses was analysed. Results Amongst DLBCL data, we identified distinct gene correlation neighbourhoods associated with the immune response. These included both elements of IFN-polarised responses, core T-cell, and cytotoxic signatures as well as distinct macrophage responses. Neighbourhoods linked to macrophages separated CD163 from CD68 and CD14. In the WGCNA analysis of diverse cancer types clusters corresponding to these immune response neighbourhoods were independently identified including a highly similar cluster related to CD163. The overlapping CD163 clusters in both analyses linked to diverse Fc-Receptors, complement pathway components and patterns of scavenger receptors potentially linked to alternative macrophage activation. The relationship between the CD163 macrophage gene expression cluster and outcome was tested in DLBCL data sets, identifying a poor response in CD163 -cluster high patients, which reached statistical significance in one data set (GSE10846). Notably, the effect of the CD163-associated gene neighbourhood which correlates with poor outcome post rituximab containing immunochemotherapy is distinct from the effect of IFNG-STAT1-IRF1 polarised cytotoxic responses. The latter represents the predominant immune response pattern separating cell of origin unclassifiable (Type-III) DLBCL from either ABC or GCB DLBCL subsets, and is associated with a trend toward positive outcome. Conclusion Comparative gene expression network analysis identifies common immune response signatures shared between DLBCL and other cancer types. Gene expression clusters linked to CD163 macrophage responses and IFNG-STAT1-IRF1 polarised cytotoxic responses are common patterns with apparent divergent outcome association. Disclosures Davies: CTI: Honoraria; GIlead: Consultancy, Honoraria, Research Funding; Mundipharma: Honoraria, Research Funding; Bayer: Research Funding; Takeda: Honoraria, Research Funding; Janssen: Honoraria, Research Funding; Roche: Honoraria, Research Funding; GSK: Research Funding; Pfizer: Honoraria; Celgene: Honoraria, Research Funding. Jack:Jannsen: Research Funding.


2015 ◽  
Vol 25 (6) ◽  
pp. 1000-1009 ◽  
Author(s):  
Reem Abdallah ◽  
Hye Sook Chon ◽  
Nadim Bou Zgheib ◽  
Douglas C. Marchion ◽  
Robert M. Wenham ◽  
...  

ObjectivesCytoreductive surgery is the cornerstone of ovarian cancer (OVCA) treatment. Detractors of initial maximal surgical effort argue that aggressive tumor biology will dictate survival, not the surgical effort. We investigated the role of biology in achieving optimal cytoreduction in serous OVCA using microarray gene expression analysis.MethodsFor the initial model, we used a gene expression signature from a microarray expression analysis of 124 women with serous OVCA, defining optimal cytoreduction as removal of all disease greater than 1 cm (with 64 women having optimal and 60 suboptimal cytoreduction). We then applied this model to 2 independent data sets: the Australian Ovarian Cancer Study (AOCS; 190 samples) and The Cancer Genome Atlas (TCGA; 468 samples). We performed a second analysis, defining optimal cytoreduction as removal of all disease to microscopic residual, using data from AOCS to create the gene signature and validating results in TCGA data set.ResultsOf the 12,718 genes included in the initial analysis, 58 predicted accuracy of cytoreductive surgery 69% of the time (P= 0.005). The performance of this classifier, measured by the area under the receiver operating characteristic curve, was 73%. When applied to TCGA and AOCS, accuracy was 56% (P= 0.16) and 62% (P= 0.01), respectively, with performance at 57% and 65%, respectively. In the second analysis, 220 genes predicted accuracy of cytoreductive surgery in the AOCS set 74% of the time, with performance of 73%. When these results were validated in TCGA set, accuracy was 57% (P= 0.31) and performance was at 62%.ConclusionGene expression data, used as a proxy of tumor biology, do not predict accurately nor consistently the ability to perform optimal cytoreductive surgery. Other factors, including surgical effort, may also explain part of the model. Additional studies integrating more biological and clinical data may improve the prediction model.


PLoS ONE ◽  
2010 ◽  
Vol 5 (4) ◽  
pp. e10162 ◽  
Author(s):  
Eric Bonnet ◽  
Marianthi Tatari ◽  
Anagha Joshi ◽  
Tom Michoel ◽  
Kathleen Marchal ◽  
...  

Gene expression data clustering is a significant problem to be resolved as it provides functional relationships of genes in a biological process. Finding co-expressed groups of genes is a challenging problem. To identify interesting patterns from the given gene expression data set, a Tanimoto Coefficient Similarity based Mean Shift Gentle Adaptive Boosted Clustering (TCS-MSGABC) Model is proposed. TCS-MSGABC model comprises two processes namely feature selection and clustering. In first process, Tanimoto Coefficient Similarity Measurement based Feature selection (TCSM-FS) is introduced to identify relevant gene features based on the similarity value for performing the genomic expression clustering. Tanimoto Coefficient Similarity Value ranges from ‘ ’ to ‘ ’ where ‘ ’ is highest similarity. The gene feature with higher similarity value is taken to perform clustering process. After feature selection, Mean Shift Gentle Adaptive Boosted Clustering (MSGABC) algorithm is carried out in TCS-MSGABC model to cluster the similar gene expression data based on the selected features. The MSGABC algorithm is a boosting method for combining the many weak clustering results into one strong learner. By this way, the similar gene expression data are clustered with higher accuracy with minimal time. Experimental evaluation of TCS-MSGABC model is carried out on factors such as clustering accuracy, clustering time and error rate with respect to number of gene data. The experimental results show that the TCS-MSGABC model is able to increases the clustering accuracy and also minimizes clustering time of genomic predictive pattern analytics as compared to state-of-the-art works.


2017 ◽  
Author(s):  
Ionas Erb ◽  
Thomas Quinn ◽  
David Lovell ◽  
Cedric Notredame

AbstractGene expression data, such as those generated by next generation sequencing technologies (RNA-seq), are of an inherently relative nature: the total number of sequenced reads has no biological meaning. This issue is most often addressed with various normalization techniques which all face the same problem: once information about the total mRNA content of the origin cells is lost, it cannot be recovered by mere technical means. Additional knowledge, in the form of an unchanged reference, is necessary; however, this reference can usually only be estimated. Here we propose a novel method where sample normalization is unnecessary, but important insights can be obtained nevertheless. Instead of trying to recover absolute abundances, our method is entirely based on ratios, so normalization factors cancel by default. Although the differential expression of individual genes cannot be recovered this way, the ratios themselves can be differentially expressed (even when their constituents are not). Yet, most current analyses are blind to these cases, while our approach reveals them directly. Specifically, we show how the differential expression of gene ratios can be formalized by decomposing log-ratio variance (LRV) and deriving intuitive statistics from it. Although small LRVs have been used to detect proportional genes in gene expression data before, we focus here on the change in proportionality factors between groups of samples (e.g. tissue-specific proportionality). For this, we propose a statistic that is equivalent to the squared t-statistic of one-way ANOVA, but for gene ratios. In doing so, we show how precision weights can be incorporated to account for the peculiarities of count data, and, moreover, how a moderated statistic can be derived in the same way as the one following from a hierarchical model for individual genes. We also discuss approaches to deal with zero counts, deriving an expression of our statistic that is able to incorporate them. In providing a detailed analysis of the connections between the differential expression of genes and the differential proportionality of pairs, we facilitate a clear interpretation of new concepts. The proposed framework is applied to a data set from GTEx consisting of 98 samples from the cerebellum and cortex, with selected examples shown. A computationally efficient implementation of the approach in R has been released as an addendum to the propr package.1


2007 ◽  
Vol 05 (02a) ◽  
pp. 251-279 ◽  
Author(s):  
WENYUAN LI ◽  
YANXIONG PENG ◽  
HUNG-CHUNG HUANG ◽  
YING LIU

In most real-world gene expression data sets, there are often multiple sample classes with ordinals, which are categorized into the normal or diseased type. The traditional feature or attribute selection methods consider multiple classes equally without paying attention to the up/down regulation across the normal and diseased types of classes, while the specific gene selection methods particularly consider the differential expressions across the normal and diseased, but ignore the existence of multiple classes. In this paper, to improve the biomarker discovery, we propose to make the best use of these two aspects: the differential expressions (that can be viewed as the domain knowledge of gene expression data) and the multiple classes (that can be viewed as a kind of data set characteristic). Therefore, we simultaneously take into account these two aspects by employing the 1-rank generalized matrix approximations (GMA). Our results show that GMA cannot only improve the accuracy of classifying the samples, but also provide a visualization method to effectively analyze the gene expression data on both genes and samples. Based on the mechanism of matrix approximation, we further propose an algorithm, CBiomarker, to discover compact biomarker by reducing the redundancy.


Author(s):  
Guro Dørum ◽  
Lars Snipen ◽  
Margrete Solheim ◽  
Solve Saebo

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.


Sign in / Sign up

Export Citation Format

Share Document