scholarly journals Clipper: p-value-free FDR control on high-throughput data from two conditions

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinzhou Ge ◽  
Yiling Elaine Chen ◽  
Dongyuan Song ◽  
MeiLu McDermott ◽  
Kyla Woyshner ◽  
...  

AbstractHigh-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measured simultaneously. The most widely used criterion to ensure the analysis reliability is the false discovery rate (FDR), which is primarily controlled based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper is a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.

2021 ◽  
Author(s):  
Zuguang Gu ◽  
Daniel Huebschmann

Consensus partitioning is an unsupervised method widely used in high throughput data analysis for revealing subgroups and assigns stability for the classification. However, standard consensus partitioning procedures are weak to identify large numbers of stable subgroups. There are two main issues. 1. Subgroups with small differences are difficult to separate if they are simultaneously detected with subgroups with large differences. And 2. stability of classification generally decreases as the number of subgroups increases. In this work, we proposed a new strategy to solve these two issues by applying consensus partitionings in a hierarchical procedure. We demonstrated hierarchical consensus partitioning can be efficient to reveal more subgroups. We also tested the performance of hierarchical consensus partitioning on revealing a great number of subgroups with a DNA methylation dataset. The hierarchical consensus partitioning is implemented in the R package cola with comprehensive functionality for analysis and visualizations. It can also automate the analysis only with a minimum of two lines of code, which generates a detailed HTML report containing the complete analysis.


2020 ◽  
Author(s):  
Xinzhou Ge ◽  
Yiling Elaine Chen ◽  
Dongyuan Song ◽  
MeiLu McDermott ◽  
Kyla Woyshner ◽  
...  

AbstractHigh-throughput biological data analysis commonly involves identifying “interesting” features (e.g., genes, genomic regions, and proteins), whose values differ between two conditions, from numerous features measured simultaneously. The most widely-used criterion to ensure the analysis reliability is the false discovery rate (FDR), the expected proportion of uninteresting features among the identified ones. Existing bioinformatics tools primarily control the FDR based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions, two requirements that are often unmet in biological studies. To address this issue, we propose Clipper, a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper is applicable to identifying both enriched and differential features from high-throughput biological data of diverse types. In comprehensive simulation and real-data benchmarking, Clipper outperforms existing generic FDR control methods and specific bioinformatics tools designed for various tasks, including peak calling from ChIP-seq data, differentially expressed gene identification from RNA-seq data, differentially interacting chromatin region identification from Hi-C data, and peptide identification from mass spectrometry data. Notably, our benchmarking results for peptide identification are based on the first mass spectrometry data standard with a realistic dynamic range. Our results demonstrate Clipper’s flexibility and reliability for FDR control, as well as its broad applications in high-throughput data analysis.


Author(s):  
Mária Ždímalová ◽  
Tomáš Bohumel ◽  
Katarína Plachá-Gregorovská ◽  
Peter Weismann ◽  
Hisham El Falougy

2020 ◽  
Author(s):  
Erfan Sharifi ◽  
Niusha Khazaei ◽  
Nicholas Kieran ◽  
Sahel Jahangiri Esfahani ◽  
Abdulshakour Mohammadnia ◽  
...  

Author(s):  
Calin Ciufudean

Modern medical devices involves information technology (IT) based on electronic structures for data and signals sensing and gathering, data and signals transmission as well as data and signals processing in order to assist and help the medical staff to diagnose, cure and to monitors the evolution of patients. By focusing on biological signals processing we may notice that numerical processing of information delivered by sensors has a significant importance for a fair and optimum design and manufacture of modern medical devices. We consider for this approach fuzzy set as a formalism of analysis of biological signals processing and we propose to be accomplished this goal by developing fuzzy operators for filtering the noise of biological signals measurement. We exemplify this approach on neurological measurements performed with an Electro-Encephalograph (EEG).


Author(s):  
Andreas Quandt ◽  
Sergio Maffioletti ◽  
Cesare Pautasso ◽  
Heinz Stockinger ◽  
Frederique Lisacek

Proteomics is currently one of the most promising fields in bioinformatics as it provides important insights into the protein function of organisms. Mass spectrometry is one of the techniques to study the proteome, and several software tools exist for this purpose. The authors provide an extendable software platform called swissPIT that combines different existing tools and exploits Grid infrastructures to speed up the data analysis process for the proteomics pipeline.


Sign in / Sign up

Export Citation Format

Share Document