LipidFinder 2.0: advanced informatics pipeline for lipidomics discovery applications

Bioinformatics ◽

10.1093/bioinformatics/btaa856 ◽

2020 ◽

Author(s):

Jorge Alvarez-Jarreta ◽

Patricia R S Rodrigues ◽

Eoin Fahy ◽

Anne O’Connor ◽

Anna Price ◽

...

Keyword(s):

Real Data ◽

Supplementary Information ◽

Supplementary Data ◽

Scatter Plot ◽

Lipid Profiling ◽

False Discovery ◽

False Discovery Rate Method ◽

Rate Method ◽

Assess Data Quality ◽

Lipid Structures

Abstract Summary We present LipidFinder 2.0, incorporating four new modules that apply artefact filters, remove lipid and contaminant stacks, in-source fragments and salt clusters, and a new isotope deletion method which is significantly more sensitive than available open-access alternatives. We also incorporate a novel false discovery rate method, utilizing a target–decoy strategy, which allows users to assess data quality. A renewed lipid profiling method is introduced which searches three different databases from LIPID MAPS and returns bulk lipid structures only, and a lipid category scatter plot with color blind friendly pallet. An API interface with XCMS Online is made available on LipidFinder’s online version. We show using real data that LipidFinder 2.0 provides a significant improvement over non-lipid metabolite filtering and lipid profiling, compared to available tools. Availability and implementation LipidFinder 2.0 is freely available at https://github.com/ODonnell-Lipidomics/LipidFinder and http://lipidmaps.org/resources/tools/lipidfinder. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

LipidFinder 2.0: advanced informatics pipeline for lipidomics discovery applications

10.1101/2020.08.16.250878 ◽

2020 ◽

Author(s):

Jorge Alvarez-Jarreta ◽

Patricia R.S. Rodrigues ◽

Eoin Fahy ◽

Anne O’Connor ◽

Anna Price ◽

...

Keyword(s):

Open Access ◽

Real Data ◽

Supplementary Information ◽

Supplementary Data ◽

Scatter Plot ◽

Lipid Profiling ◽

Link Type ◽

False Discovery ◽

Assess Data Quality ◽

Lipid Structures

AbstractWe present LipidFinder 2.0, incorporating four new modules that apply artefact filters, remove lipid and contaminant stacks, in-source fragments and salt clusters, and a new isotope deletion method which is significantly more sensitive than available open-access alternatives. We also incorporate a novel false discovery rate (FDR) method, utilizing a target-decoy strategy, which allows users to assess data quality. A renewed lipid profiling method is introduced which searches three different databases from LIPID MAPS and returns bulk lipid structures only, and a lipid category scatter plot with color blind friendly pallet. An API interface with XCMS Online is made available on LipidFinder’s online version. We show using real data that LipidFinder 2.0 provides a significant improvement over non-lipid metabolite filtering and lipid profiling, compared to available tools.AvailabilityLipidFinder 2.0 is freely available at https://github.com/ODonnell-Lipidomics/LipidFinder and http://lipidmaps.org/resources/tools/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Identifying signals of potentially harmful medications in pregnancy: use of the double false discovery rate method to adjust for multiple testing

British Journal of Clinical Pharmacology ◽

10.1111/bcp.13799 ◽

2018 ◽

Vol 85 (2) ◽

pp. 356-365 ◽

Cited By ~ 1

Author(s):

Alana Cavadino ◽

David Prieto‐Merino ◽

Joan K. Morris

Keyword(s):

False Discovery Rate ◽

Multiple Testing ◽

False Discovery ◽

False Discovery Rate Method ◽

Rate Method ◽

In Pregnancy

Download Full-text

miqoGraph: fitting admixture graphs using mixed-integer quadratic optimization

Bioinformatics ◽

10.1093/bioinformatics/btaa988 ◽

2020 ◽

Author(s):

Julia Yan ◽

Nick Patterson ◽

Vagheesh M Narasimhan

Keyword(s):

Genetic Relationship ◽

Real Data ◽

Quadratic Optimization ◽

Supplementary Information ◽

Mixed Integer ◽

Supplementary Data ◽

Integer Optimization ◽

Speed Up

Abstract Summary Admixture graphs represent the genetic relationship between a set of populations through splits, drift and admixture. In this article, we present the Julia package miqoGraph, which uses mixed-integer quadratic optimization to fit topology, drift lengths and admixture proportions simultaneously. Through applications of miqoGraph to both simulated and real data, we show that integer optimization can greatly speed up and automate what is usually an arduous manual process. Availability and implementation https://github.com/juliayyan/PhylogeneticTrees.jl. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ipDMR: identification of differentially methylated regions with interval P-values

Bioinformatics ◽

10.1093/bioinformatics/btaa732 ◽

2020 ◽

Author(s):

Zongli Xu ◽

Changchun Xie ◽

Jack A Taylor ◽

Liang Niu

Keyword(s):

Software Tool ◽

Real Data ◽

Supplementary Information ◽

Sequencing Data ◽

Differentially Methylated Regions ◽

R Software ◽

False Discovery Rates ◽

P Values ◽

False Discovery ◽

Bisulfite Sequencing Data

Abstract Summary ipDMR is an R software tool for identification of differentially methylated regions (DMRs) using auto-correlated P-values for individual CpGs from epigenome-wide association analysis using array or bisulfite sequencing data. It summarizes P-values for adjacent CpGs, identifies association peaks and then extends peaks to find boundaries of DMRs. ipDMR uses BED format files as input and is easy to use. Simulations guided by real data found that ipDMR outperformed current available methods and provided slightly higher true positive rates and much lower false discovery rates. Availability and implementation ipDMR is available at https://bioconductor.org/packages/release/bioc/html/ENmix.html. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Analysis of variance when both input and output sets are high-dimensional

10.1101/2020.02.15.950949 ◽

2020 ◽

Author(s):

Gustavo de los Campos ◽

Torsten Pook ◽

Agustin Gonzalez-Raymundez ◽

Henner Simianer ◽

George Mias ◽

...

Keyword(s):

Gene Expression ◽

Linear Span ◽

Copy Number Variants ◽

Real Data ◽

Supplementary Information ◽

High Dimensional ◽

Supplementary Data ◽

Random Effects Models ◽

Input And Output ◽

Data Layers

AbstractMotivationModern genomic data sets often involve multiple data-layers (e.g., DNA-sequence, gene expression), each of which itself can be high-dimensional. The biological processes underlying these data-layers can lead to intricate multivariate association patterns.ResultsWe propose and evaluate two methods for analysis variance when both input and output sets are high-dimensional. Our approach uses random effects models to estimate the proportion of variance of vectors in the linear span of the output set that can be explained by regression on the input set. We consider a method based on orthogonal basis (Eigen-ANOVA) and one that uses random vectors (Monte Carlo ANOVA, MC-ANOVA) in the linear span of the output set. We used simulations to assess the bias and variance of each of the methods, and to compare it with that of the Partial Least Squares (PLS)–an approach commonly used in multivariate-high-dimensional regressions. The MC-ANOVA method gave nearly unbiased estimates in all the simulation scenarios considered. Estimates produced by Eigen-ANOVA and PLS had noticeable biases. Finally, we demonstrate insight that can be obtained with the of MC-ANOVA and Eigen-ANOVA by applying these two methods to the study of multi-locus linkage disequilibrium in chicken genomes and to the assessment of inter-dependencies between gene expression, methylation and copy-number-variants in data from breast cancer tumors.AvailabilityThe Supplementary data includes an R-implementation of each of the proposed methods as well as the scripts used in simulations and in the real-data [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

HiChIP-Peaks: a HiChIP peak calling algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa202 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3625-3631

Author(s):

Chenfu Shi ◽

Magnus Rattray ◽

Gisela Orozco

Keyword(s):

Strong Dependence ◽

Recall Rate ◽

Quantitative Comparison ◽

Sequencing Depth ◽

Supplementary Information ◽

Supplementary Data ◽

Peak Calling ◽

False Discovery ◽

Calling Algorithm ◽

Very High

Abstract Motivation HiChIP is a powerful tool to interrogate 3D chromatin organization. Current tools to analyse chromatin looping mechanisms using HiChIP data require the identification of loop anchors to work properly. However, current approaches to discover these anchors from HiChIP data are not satisfactory, having either a very high false discovery rate or strong dependence on sequencing depth. Moreover, these tools do not allow quantitative comparison of peaks across different samples, failing to fully exploit the information available from HiChIP datasets. Results We develop a new tool based on a representation of HiChIP data centred on the re-ligation sites to identify peaks from HiChIP datasets, which can subsequently be used in other tools for loop discovery. This increases the reliability of these tools and improves recall rate as sequencing depth is reduced. We also provide a method to count reads mapping to peaks across samples, which can be used for differential peak analysis using HiChIP data. Availability and implementation HiChIP-Peaks is freely available at https://github.com/ChenfuShi/HiChIP_peaks. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Triplet-based similarity score for fully multilabeled trees with poly-occurring labels

Bioinformatics ◽

10.1093/bioinformatics/btaa676 ◽

2020 ◽

Author(s):

Simone Ciccolella ◽

Giulia Bernardini ◽

Luca Denti ◽

Paola Bonizzoni ◽

Marco Previtali ◽

...

Keyword(s):

Open Source ◽

Evolutionary History ◽

Similarity Measures ◽

Real Data ◽

Similarity Score ◽

Supplementary Information ◽

Supplementary Data ◽

Wide Range ◽

Golden Standard ◽

History Of

Abstract Motivation The latest advances in cancer sequencing, and the availability of a wide range of methods to infer the evolutionary history of tumors, have made it important to evaluate, reconcile and cluster different tumor phylogenies. Recently, several notions of distance or similarities have been proposed in the literature, but none of them has emerged as the golden standard. Moreover, none of the known similarity measures is able to manage mutations occurring multiple times in the tree, a circumstance often occurring in real cases. Results To overcome these limitations, in this article, we propose MP3, the first similarity measure for tumor phylogenies able to effectively manage cases where multiple mutations can occur at the same time and mutations can occur multiple times. Moreover, a comparison of MP3 with other measures shows that it is able to classify correctly similar and dissimilar trees, both on simulated and on real data. Availability and implementation An open source implementation of MP3 is publicly available at https://github.com/AlgoLab/mp3treesim. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Discovery of tandem and interspersed segmental duplications using high-throughput sequencing

Bioinformatics ◽

10.1093/bioinformatics/btz237 ◽

2019 ◽

Vol 35 (20) ◽

pp. 3923-3930 ◽

Cited By ~ 9

Author(s):

Arda Soylev ◽

Thong Minh Le ◽

Hajar Amini ◽

Can Alkan ◽

Fereydoun Hormozdiari

Keyword(s):

False Discovery Rate ◽

High Throughput ◽

High Throughput Sequencing ◽

Real Data ◽

Read Depth ◽

Supplementary Information ◽

Segmental Duplications ◽

Structural Variations ◽

Multiple Sequence ◽

False Discovery

Abstract Motivation Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. Results We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (<5% for the top 50 predictions). Availability and implementation TARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The performance of a new local false discovery rate method on tests of association between coronary artery disease (CAD) and genome-wide genetic variants

PLoS ONE ◽

10.1371/journal.pone.0185174 ◽

2017 ◽

Vol 12 (9) ◽

pp. e0185174 ◽

Cited By ~ 2

Author(s):

Shuyan Mei ◽

Ali Karimnezhad ◽

Marie Forest ◽

David R. Bickel ◽

Celia M. T. Greenwood

Keyword(s):

Coronary Artery Disease ◽

Coronary Artery ◽

Genetic Variants ◽

Local False Discovery Rate ◽

False Discovery ◽

False Discovery Rate Method ◽

Genome Wide ◽

Tests Of Association ◽

Rate Method ◽

Artery Disease

Download Full-text

A new statistic for efficient detection of repetitive sequences

Bioinformatics ◽

10.1093/bioinformatics/btz262 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4596-4606 ◽

Cited By ~ 1

Author(s):

Sijie Chen ◽

Yixin Chen ◽

Fengzhu Sun ◽

Michael S Waterman ◽

Xuegong Zhang

Keyword(s):

Linear Time ◽

Repetitive Sequences ◽

Real Data ◽

Space Complexity ◽

Supplementary Information ◽

Supplementary Data ◽

Efficient Detection ◽

Time And Space Complexity ◽

Multiple Scenarios ◽

Repeat Detection

Abstract Motivation Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions. Results Using the statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate clustered regularly interspaced short palindromic repeats regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads. Availability and implementation The codes are available at https://github.com/XuegongLab/D2R_codes under GPL 3.0 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text