Improve consensus partitioning via a hierarchical procedure

Mapping Intimacies ◽

10.1101/2021.09.03.458844 ◽

2021 ◽

Author(s):

Zuguang Gu ◽

Daniel Huebschmann

Keyword(s):

Dna Methylation ◽

Data Analysis ◽

High Throughput ◽

R Package ◽

Complete Analysis ◽

High Throughput Data ◽

Unsupervised Method ◽

Large Numbers ◽

New Strategy ◽

High Throughput Data Analysis

Consensus partitioning is an unsupervised method widely used in high throughput data analysis for revealing subgroups and assigns stability for the classification. However, standard consensus partitioning procedures are weak to identify large numbers of stable subgroups. There are two main issues. 1. Subgroups with small differences are difficult to separate if they are simultaneously detected with subgroups with large differences. And 2. stability of classification generally decreases as the number of subgroups increases. In this work, we proposed a new strategy to solve these two issues by applying consensus partitionings in a hierarchical procedure. We demonstrated hierarchical consensus partitioning can be efficient to reveal more subgroups. We also tested the performance of hierarchical consensus partitioning on revealing a great number of subgroups with a DNA methylation dataset. The hierarchical consensus partitioning is implemented in the R package cola with comprehensive functionality for analysis and visualizations. It can also automate the analysis only with a minimum of two lines of code, which generates a detailed HTML report containing the complete analysis.

Download Full-text

Unraveling molecular mechanism underlying biomaterial and stem cells interaction during cell fate commitment using high throughput data analysis

10.22541/au.158888182.24689205 ◽

2020 ◽

Author(s):

Erfan Sharifi ◽

Niusha Khazaei ◽

Nicholas Kieran ◽

Sahel Jahangiri Esfahani ◽

Abdulshakour Mohammadnia ◽

...

Keyword(s):

Stem Cells ◽

Data Analysis ◽

Molecular Mechanism ◽

Cell Fate ◽

High Throughput ◽

High Throughput Data ◽

High Throughput Data Analysis

Download Full-text

High-Throughput Data Analysis of Proteomic Mass Spectra on the SwissBioGrid

Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine, and Healthcare ◽

10.4018/978-1-60566-374-6.ch012 ◽

2011 ◽

pp. 228-244

Author(s):

Andreas Quandt ◽

Sergio Maffioletti ◽

Cesare Pautasso ◽

Heinz Stockinger ◽

Frederique Lisacek

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

High Throughput ◽

Protein Function ◽

Mass Spectra ◽

High Throughput Data ◽

Analysis Process ◽

Speed Up ◽

Grid Infrastructures ◽

High Throughput Data Analysis

Proteomics is currently one of the most promising fields in bioinformatics as it provides important insights into the protein function of organisms. Mass spectrometry is one of the techniques to study the proteome, and several software tools exist for this purpose. The authors provide an extendable software platform called swissPIT that combines different existing tools and exploits Grid infrastructures to speed up the data analysis process for the proteomics pipeline.

Download Full-text

Sparse partial least-squares regression and its applications to high-throughput data analysis

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2011.07.002 ◽

2011 ◽

Vol 109 (1) ◽

pp. 1-8 ◽

Cited By ~ 51

Author(s):

Donghwan Lee ◽

Woojoo Lee ◽

Youngjo Lee ◽

Yudi Pawitan

Keyword(s):

Data Analysis ◽

Least Squares ◽

Partial Least Squares ◽

High Throughput ◽

Partial Least Squares Regression ◽

Least Squares Regression ◽

High Throughput Data ◽

High Throughput Data Analysis

Download Full-text

Unraveling molecular mechanism underlying biomaterial and stem cells interaction during cell fate commitment using high throughput data analysis

Gene ◽

10.1016/j.gene.2021.146111 ◽

2021 ◽

pp. 146111

Author(s):

Erfan Sharifi ◽

Niusha Khazaei ◽

Nicholas W. Kieran ◽

Sahel Jahangiri Esfahani ◽

Abdulshakour Mohammadnia ◽

...

Keyword(s):

Stem Cells ◽

Data Analysis ◽

Molecular Mechanism ◽

Cell Fate ◽

High Throughput ◽

High Throughput Data ◽

High Throughput Data Analysis

Download Full-text

Clinical Value for Diagnosis and Prognosis of Signal Sequence Receptor 1 (SSR1) and Its Potential Mechanism in Hepatocellular Carcinoma: A Comprehensive Study Based on High-Throughput Data Analysis

International Journal of General Medicine ◽

10.2147/ijgm.s336725 ◽

2021 ◽

Vol Volume 14 ◽

pp. 7435-7451

Author(s):

Liang Chen ◽

Yunhua Lin ◽

Guoqing Liu ◽

Rubin Xu ◽

Yiming Hu ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Data Analysis ◽

High Throughput ◽

Signal Sequence ◽

Potential Mechanism ◽

Clinical Value ◽

High Throughput Data ◽

Diagnosis And Prognosis ◽

High Throughput Data Analysis ◽

Comprehensive Study

Download Full-text

Graphical Models and Inference on Graphs in Genomics: Challenges of high-throughput data analysis

IEEE Signal Processing Magazine ◽

10.1109/msp.2011.943012 ◽

2012 ◽

Vol 29 (1) ◽

pp. 51-65 ◽

Cited By ~ 5

Author(s):

Manohar Shamaiah ◽

Sang Lee ◽

Haris Vikalo

Keyword(s):

Data Analysis ◽

Graphical Models ◽

High Throughput ◽

High Throughput Data ◽

High Throughput Data Analysis

Download Full-text

Power and Stability Properties of Resampling-Based Multiple Testing Procedures with Applications to Gene Oncology Studies

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/610297 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Dongmei Li ◽

Timothy D. Dye

Keyword(s):

Data Analysis ◽

Sample Size ◽

High Throughput ◽

Rate Control ◽

Multiple Testing ◽

Testing Procedures ◽

High Throughput Data ◽

False Discovery ◽

Multiple Testing Procedures ◽

High Throughput Data Analysis

Resampling-based multiple testing procedures are widely used in genomic studies to identify differentially expressed genes and to conduct genome-wide association studies. However, the power and stability properties of these popular resampling-based multiple testing procedures have not been extensively evaluated. Our study focuses on investigating the power and stability of seven resampling-based multiple testing procedures frequently used in high-throughput data analysis for small sample size data through simulations and gene oncology examples. The bootstrap single-step minPprocedure and the bootstrap step-down minPprocedure perform the best among all tested procedures, when sample size is as small as 3 in each group and either familywise error rate or false discovery rate control is desired. When sample size increases to 12 and false discovery rate control is desired, the permutation maxTprocedure and the permutation minPprocedure perform best. Our results provide guidance for high-throughput data analysis when sample size is small.

Download Full-text

Clipper: p-value-free FDR control on high-throughput data from two conditions

Genome Biology ◽

10.1186/s13059-021-02506-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xinzhou Ge ◽

Yiling Elaine Chen ◽

Dongyuan Song ◽

MeiLu McDermott ◽

Kyla Woyshner ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

Biological Data ◽

P Value ◽

High Throughput Data ◽

Statistical Framework ◽

Large Numbers ◽

Biological Data Analysis ◽

General Statistical ◽

Genomic Regions

AbstractHigh-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measured simultaneously. The most widely used criterion to ensure the analysis reliability is the false discovery rate (FDR), which is primarily controlled based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper is a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.

Download Full-text