Increased Cross-Platform Microarray Data Set Correlation via Substrate-Independent Nanofilms

2011 ◽  
Vol 83 (14) ◽  
pp. 5592-5597 ◽  
Author(s):  
Scott D. Spillman
2008 ◽  
Vol 06 (02) ◽  
pp. 261-282 ◽  
Author(s):  
AO YUAN ◽  
WENQING HE

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.


2009 ◽  
Vol 3 (Suppl 4) ◽  
pp. S13 ◽  
Author(s):  
Christèle Robert-Granié ◽  
Kim-Anh Lê Cao ◽  
Magali SanCristobal
Keyword(s):  

Author(s):  
I.Y. Boyko ◽  
D.S. Anisimov ◽  
L.L. Smolyakova ◽  
M.A. Ryazanov

In modern biomedical research aimed at finding methods for early diagnosis of cancer, microarrays containing certain biological information about patients are used. Based on these data, patients are assigned to one of two classes, corresponding to the presence and absence of some diagnosis. When solving this problem, one of the steps that have a decisive influence on the quality of classification is the significant features selection. This paper proposes a criterion for the selection of significant features, based on the ledge-coefficient of correlation. The ledge-coefficient was previously used to estimate the degree of interrelation of numerical and binary features. For two sets of microarray data, comparative examples of their binary classification are presented using three feature selection algorithms, three dimensionality reduction methods, six classification models. The use of the ledge-criterion for feature selection made it possible to obtain a classification quality comparable to the results of using common methods of feature selection, such as t-test and U-test. For the data set of the peptide microarrays considered in the paper, the effectiveness of applying the projection method to latent structures had previously been identified. The use of this method in combination with the significant features’ selection using the ledge-criterion made it possible to obtain a higher classification quality measure.


2019 ◽  
Author(s):  
Shahan Mamoor

Sepsis, the body’s reaction to infection in what is normally a sterile bloodstream, is a major cause of mortality in the United States (1). I used a microarray data set from a cohort of thirty-one patients with septic shock or systemic inflammatory response syndrome (2) to determine the major transcriptional changes associated with each disease state. I found that globally, the granulocytes of patients with SIRS resembled that of patients with septic shock at the level of transcription. For many genes expressed in the granulocyte, SIRS represented an “intermediate” gene expression state between that of control patients and those of patients with septic shock. The identification of the most differentially expressed genes in the granulocytic immune cells of patients with septic shock can facilitate the development of novel therapeutics or diagnostics for a condition that, despite decades of research, possesses a 14.7% to 29.9% in-hospital mortality rate (1).


2017 ◽  
Author(s):  
Magdalena E Strauß ◽  
John E Reid ◽  
Lorenz Wernisch

AbstractMotivationA number of pseudotime methods have provided point estimates of the ordering of cells for scRNA-seq data. A still limited number of methods also model the uncertainty of the pseudotime estimate. However, there is still a need for a method to sample from complicated and multi-modal distributions of orders, and to estimate changes in the amount of the uncertainty of the order during the course of a biological development, as this can support the selection of suitable cells for the clustering of genes or for network inference.ResultsIn an application to a microarray data set our proposed method, GPseudoRank, identifies two modes of the distribution, each of them corresponding to point estimates of orders obtained by a different established method. In an application to scRNA-seq data we demonstrate the potential of GPseudoRank to identify phases of lower and higher pseudotime uncertainty during a biological process. GPseudoRank also correctly identifies cells precocious in their antiviral response.Availability and implementationOur method is available on github: https://github.com/magStra/GPseudoRank.Contactmagdalena.strauss@mrc-bsu.cam.ac.ukSupplementary informationSupplementary materials are available.


BioTechniques ◽  
2012 ◽  
Vol 53 (1) ◽  
pp. 33-40 ◽  
Author(s):  
Dimos Kapetis ◽  
Ferdinando Clarelli ◽  
Federico Vitulli ◽  
Nicole Kerlero de Rosbo ◽  
Ottavio Beretta ◽  
...  

2005 ◽  
Vol 44 (03) ◽  
pp. 418-422 ◽  
Author(s):  
C. Ittrich

Summary Objectives: In two-channel microarray experiments the measured gene expression levels are affected by many sources of systematic variation. Normalization refers to the process of removing such systematic sources of variation, to make measured intensities within and between slides comparable. Some commonly used normalization methods removing intensity-dependent dye bias and adjusting differences in variability between slides will be reviewed with the main focus on intensity-dependent normalization methods. Methods: This article describes different intensity-dependent within-slide normalization methods for the log ratios of red and green channel intensities but also refers to single channel normalization methods incorporating all single channels of the slides at once. Results: The described procedures provide a useful approach to remove systematic sources of variation like intensity-dependent dye bias and variability between slides in cDNA microarray experiments. This is illustrated by an experimental data set. Conclusions: Several reasonable normalization procedures for two-channel microarray data have recently been proposed. Deciding on which method would perform well for a concrete experiment is difficult. Designed spike-in experiments or dilution series with known differences for some selected genes would be helpful to assess the different methods, but may be impractical for most laboratories due to the high costs.


2019 ◽  
Vol 29 (1) ◽  
pp. 258-271
Author(s):  
Rosa Arboretti ◽  
Arne C Bathke ◽  
Eleonora Carrozzo ◽  
Fortunato Pesarin ◽  
Luigi Salmaso

Very often, data collected in medical research are characterized by censored observations and/or data with mass on the value zero. This happens for example when some measurements fall below the detection limits of the specific instrument used. This type of left censored observations is called “nondetects”. Such a situation of an excessive number of zeros in a data set is also referred to as zero-inflated data. In the present work, we aim at comparing different multivariate permutation procedures in two-sample testing for data with nondetects. The effect of censoring is investigated with regard to the different values that may be attributed to nondetected values, both under the null hypothesis and under alternative. We motivate the problem using data from allergy research.


Sign in / Sign up

Export Citation Format

Share Document