maqc project
Recently Published Documents


TOTAL DOCUMENTS

9
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

Author(s):  
Florian Wagner

Robust multi-array average (RMA) is a highly successful method for processing raw data from Affymetrix expression microarrays. However, most of the work on microarray data processing predates the widespread use of Python in scientific computing. Here, I describe pyAffy, an efficient implementation of the RMA method in Python/Cython. Using data from the MAQC project, I show that this implementation produces virtually identical results compared to the RMA reference implementation in the affy R package, while running more than five times faster and consuming significantly less memory. I also show how individual steps of the RMA method affect the final expression estimates. The source code for pyAffy is available from PyPI and GitHub (https://github.com/flo-compbio/pyaffy) under an OSI-approved license. I intend to periodically revise this article to ensure that it accurately reflects the functionalities available in the pyAffy Python package.


2016 ◽  
Author(s):  
Florian Wagner

Robust multi-array average (RMA) is a highly successful method for processing raw data from Affymetrix expression microarrays. However, most of the work on microarray data processing predates the widespread use of Python in scientific computing. Here, I describe pyAffy, an efficient implementation of the RMA method in Python/Cython. Using data from the MAQC project, I show that this implementation produces virtually identical results compared to the RMA reference implementation in the affy R package, while running more than five times faster and consuming significantly less memory. I also show how individual steps of the RMA method affect the final expression estimates. The source code for pyAffy is available from PyPI and GitHub (https://github.com/flo-compbio/pyaffy) under an OSI-approved license. I intend to periodically revise this article to ensure that it accurately reflects the functionalities available in the pyAffy Python package.


2015 ◽  
Vol 13 (06) ◽  
pp. 1542001 ◽  
Author(s):  
Zengmiao Wang ◽  
Jun Wang ◽  
Changjing Wu ◽  
Minghua Deng

Estimation of gene or isoform expression is a fundamental step in many transcriptome analysis tasks, such as differential expression analysis, eQTL (or sQTL) studies, and biological network construction. RNA-seq technology enables us to monitor the expression on genome-wide scale at single base pair resolution and offers the possibility of accurately measuring expression at the level of isoform. However, challenges remain because of non-uniform read sampling and the presence of various biases in RNA-seq data. In this paper, we present a novel hierarchical Bayesian method to estimate isoform expression. While most of the existing methods treat gene expression as a by-product, we incorporate it into our model and explicitly describe its relationship with corresponding isoform expression using a Multinomial distribution. In this way, gene and isoform expression are included in a unified framework and it helps us achieve a better performance over other state-of-the-art algorithms for isoform expression estimation. The effectiveness of the proposed method is demonstrated using both simulated data with known ground truth and two real RNA-seq datasets from MAQC project. The codes are available at http://www.math.pku.edu.cn/teachers/dengmh/GIExp/ .


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Sreevidya Sadananda Sadasiva Rao ◽  
Lori A. Shepherd ◽  
Andrew E. Bruno ◽  
Song Liu ◽  
Jeffrey C. Miecznikowski

Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with is most accurate under the error measures considered. The k-nearest neighbor method with has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with has the best overall performance and k-nearest neighbor method with has the worst overall performance. These results hold true for both 5% and 10% missing values.


Author(s):  
Zhining Wen ◽  
Zhenqiang Su ◽  
Jie Liu ◽  
Baitang Ning ◽  
Lei Guo ◽  
...  

2009 ◽  
Vol 07 (01) ◽  
pp. 157-173 ◽  
Author(s):  
SHIHONG MAO ◽  
CHARLES WANG ◽  
GUOZHU DONG

Microarray technology has great potential for improving our understanding of biological processes, medical conditions, and diseases. Often, microarray datasets are collected using different microarray platforms (provided by different companies) under different conditions in different laboratories. The cross-platform and cross-laboratory concordance of the microarray technology needs to be evaluated before it can be successfully and reliably applied in biological/clinical practice. New measures and techniques are proposed for comparing and evaluating the quality of microarray datasets generated from different platforms/laboratories. These measures and techniques are based on the following philosophy: the practical usefulness of the microarray technology may be confirmed if discriminating genes and classifiers, which are the focus of most, if not all, comparative investigations, discovered/trained from data collected in one lab/platform combination can be transferred to another lab/platform combination. The rationale is that the nondiscriminating genes might not be as strongly regulated as the discriminating genes, by the biological process of the tissue cells under study, and hence they may behave more randomly than the discriminating genes. Our experiment results, on microarray datasets generated from different platforms/laboratories using the reference mRNA samples in the Microarray Quality Control (MAQC) project, showed that DNA microarrays can produce highly repeatable data in a cross-platform cross-lab manner, when one focuses on the discriminating genes and classifiers. In our comparative study, we compare samples of one type against samples of another type; the methodology can be applied to situations where one compares one arbitrary class of data against another. Other findings include: (1) using three discriminating-gene/classifier-based methods to test the concordance between microarray datasets gave consistent results; (2) when noisy (nondiscriminating) genes were removed, the microarray datasets from different laboratories using common platform were found to be highly concordant, and the data generated using most of the commercial platforms studied here were also found to be concordant with each other; (3) several series of artificial datasets with known degree of difference were created, to establish a bridge between consistency rate and P-value, allowing us to estimate P-value if consistency rate between two datasets is known.


2006 ◽  
Vol 24 (9) ◽  
pp. 1140-1150 ◽  
Author(s):  
Tucker A Patterson ◽  
Edward K Lobenhofer ◽  
Stephanie B Fulmer-Smentek ◽  
Patrick J Collins ◽  
Tzu-Ming Chu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document