Boost-HiC: computational enhancement of long-range contacts in chromosomal contact maps

L Carron; J B Morlot; V Matthys; A Lesne; J Mozziconacci

doi:10.1093/bioinformatics/bty1059

Boost-HiC: computational enhancement of long-range contacts in chromosomal contact maps

Bioinformatics ◽

10.1093/bioinformatics/bty1059 ◽

2019 ◽

Vol 35 (16) ◽

pp. 2724-2729 ◽

Cited By ~ 3

Author(s):

L Carron ◽

J B Morlot ◽

V Matthys ◽

A Lesne ◽

J Mozziconacci

Keyword(s):

Long Range ◽

Short Range ◽

Supplementary Information ◽

Supplementary Data ◽

Missing Information ◽

High Confidence ◽

Contact Maps ◽

Genome Wide ◽

Algorithmic Procedure

Abstract Motivation Genome-wide chromosomal contact maps are widely used to uncover the 3D organization of genomes. They rely on collecting millions of contacting pairs of genomic loci. Contacts at short range are usually well measured in experiments, while there is a lot of missing information about long-range contacts. Results We propose to use the sparse information contained in raw contact maps to infer high-confidence contact counts between all pairs of loci. Our algorithmic procedure, Boost-HiC, enables the detection of Hi-C patterns such as chromosomal compartments at a resolution that would be otherwise only attainable by sequencing a hundred times deeper the experimental Hi-C library. Boost-HiC can also be used to compare contact maps at an improved resolution. Availability and implementation Boost-HiC is available at https://github.com/LeopoldC/Boost-HiC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Boost-HiC : Computational enhancement of long-range contacts in chromosomal contact maps

10.1101/471607 ◽

2018 ◽

Author(s):

L. Carron ◽

J.B. Morlot ◽

Matthys V. ◽

A. Lesne ◽

J. Mozziconacci

Keyword(s):

Long Range ◽

Short Range ◽

Missing Information ◽

High Confidence ◽

Contact Maps ◽

Link Type ◽

Genome Wide ◽

Contact Frequency ◽

Algorithmic Procedure

AbstractGenome-wide chromosomal contact maps are widely used to uncover the 3D organisation of genomes. They rely on the collection of millions of contacting pairs of genomic loci. Contact frequencies at short range are usually well measured in experiments, while there is a lot of missing information about long-range contacts.We propose to use the sparse information contained in raw contact maps to determine high-confidence contact frequency between all pairs of loci. Our algorithmic procedure, Boost-HiC, enables the detection of Hi-C patterns such as chromosomal compartments at a resolution that would be otherwise only attainable by sequencing a hundred times deeper the experimental Hi-C library. Boost-HiC can also be used to compare contact maps at an improved resolution.Boost-HiC is available at https://github.com/LeopoldC/Boost-HiC

Download Full-text

Serpentine: a flexible 2D binning method for differential Hi-C analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa249 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3645-3651

Author(s):

Lyam Baudry ◽

Gaël A Millot ◽

Agnes Thierry ◽

Romain Koszul ◽

Vittore F Scolari

Keyword(s):

Deep Sequencing ◽

Low Noise ◽

Supplementary Information ◽

Supplementary Data ◽

Fractal Nature ◽

Contact Map ◽

Signal To Noise ◽

High Quality ◽

Contact Maps ◽

Contact Frequency

Abstract Motivation Hi-C contact maps reflect the relative contact frequencies between pairs of genomic loci, quantified through deep sequencing. Differential analyses of these maps enable downstream biological interpretations. However, the multi-fractal nature of the chromatin polymer inside the cellular envelope results in contact frequency values spanning several orders of magnitude: contacts between loci pairs separated by large genomic distances are much sparser than closer pairs. The same is true for poorly covered regions, such as repeated sequences. Both distant and poorly covered regions translate into low signal-to-noise ratios. There is no clear consensus to address this limitation. Results We present Serpentine, a fast, flexible procedure operating on raw data, which considers the contacts in each region of a contact map. Binning is performed only when necessary on noisy regions, preserving informative ones. This results in high-quality, low-noise contact maps that can be conveniently visualized for rigorous comparative analyses. Availability and implementation Serpentine is available on the PyPI repository and https://github.com/koszullab/serpentine; documentation and tutorials are provided at https://serpentine.readthedocs.io/en/latest/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HiLight-PTM: an online application to aid matching peptide pairs with isotopically labelled PTMs

Bioinformatics ◽

10.1093/bioinformatics/btz654 ◽

2019 ◽

Author(s):

Harry J Whitwell ◽

Peter DiMaggio

Keyword(s):

De Novo ◽

De Novo Sequencing ◽

Mass Shift ◽

Supplementary Information ◽

Database Searching ◽

Supplementary Data ◽

Exact Match ◽

High Confidence ◽

Online Application ◽

Internet Browser

Abstract Motivation Database searching of isotopically labelled PTMs can be problematic and we frequently find that only one, or neither in a heavy/light pair are assigned. In such cases, having a pair of MS/MS spectra that differ due to an isotopic label can assist in identifying the relevant m/z values that support the correct peptide annotation or can be used for de novo sequencing. Results We have developed an online application that identifies matching peaks and peaks differing by the appropriate mass shift (difference between heavy and light PTM) between two MS/MS spectra. Furthermore, the application predicts, from the exact-match peaks, the mass of their complementary ions and highlights these as high confidence matches between the two spectra. The result is a tool to visually compare two spectra, and downloadable peaks lists that can be used to support de novo sequencing. Availability and implementation HiLight-PTM is released using shinyapps.io by RStudio, and can be accessed from any internet browser at https://harrywhitwell.shinyapps.io/hilight-ptm/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

pyseer: a comprehensive tool for microbial pangenome-wide association studies

10.1101/266312 ◽

2018 ◽

Cited By ~ 1

Author(s):

John A Lees ◽

Marco Galardini ◽

Stephen D Bentley ◽

Jeffrey N Weiser ◽

Jukka Corander

Keyword(s):

Input Data ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Supplementary Data ◽

New Methods ◽

Link Type ◽

Genome Wide

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.

Download Full-text

icHET: interactive visualization of cytoplasmic heteroplasmy

Bioinformatics ◽

10.1093/bioinformatics/btz300 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4411-4412 ◽

Cited By ~ 2

Author(s):

Vinhthuy Phan ◽

Diem-Trang Pham ◽

Caroline Melton ◽

Adam J Ramsey ◽

Bernie J Daigle ◽

...

Keyword(s):

Reference Genome ◽

Interactive Visualization ◽

Supplementary Information ◽

Supplementary Data ◽

Short Reads ◽

Genome Wide ◽

Computational Workflow ◽

Multiple Samples

Abstract Summary Although heteroplasmy has been studied extensively in animal systems, there is a lack of tools for analyzing, exploring and visualizing heteroplasmy at the genome-wide level in other taxonomic systems. We introduce icHET, which is a computational workflow that produces an interactive visualization that facilitates the exploration, analysis and discovery of heteroplasmy across multiple genomic samples. icHET works on short reads from multiple samples from any organism with an organellar reference genome (mitochondrial or plastid) and a nuclear reference genome. Availability and implementation The software is available at https://github.com/vtphan/HeteroplasmyWorkflow. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Rapid epistatic mixed-model association studies by controlling multiple polygenic effects

Bioinformatics ◽

10.1093/bioinformatics/btaa610 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4833-4837

Author(s):

Dan Wang ◽

Hui Tang ◽

Jian-Feng Liu ◽

Shizhong Xu ◽

Qin Zhang ◽

...

Keyword(s):

Mixed Model ◽

Association Studies ◽

Approximate Algorithm ◽

Supplementary Information ◽

Simulation Studies ◽

Supplementary Data ◽

Source Codes ◽

Pairwise Interactions ◽

Genome Wide ◽

Model Algorithm

Abstract Summary We have developed a rapid mixed model algorithm for exhaustive genome-wide epistatic association analysis by controlling multiple polygenic effects. Our model can simultaneously handle additive by additive epistasis, dominance by dominance epistasis and additive by dominance epistasis, and account for intrasubject fluctuations due to individuals with repeated records. Furthermore, we suggest a simple but efficient approximate algorithm, which allows the examination of all pairwise interactions in a remarkably fast manner of linear with population size. Simulation studies are performed to investigate the properties of REMMAX. Application to publicly available yeast and human data has showed that our mixed model-based method has similar performance with simple linear model on computational efficiency. It took less than 40 h for the pairwise analysis of 5000 individuals genotyped with roughly 350 000 SNPs with five threads on Intel Xeon E5 2.6 GHz CPU. Availability and implementation Source codes are freely available at https://github.com/chaoning/GMAT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

EpiGEN: an epistasis simulation pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa245 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4957-4959

Author(s):

David B Blumenthal ◽

Lorenzo Viola ◽

Markus List ◽

Jan Baumbach ◽

Paolo Tieri ◽

...

Keyword(s):

Arbitrary Order ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Supplementary Data ◽

Single Nucleotide ◽

Genome Wide

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CROSSalive: a web server for predicting the in vivo structure of RNA molecules

Bioinformatics ◽

10.1093/bioinformatics/btz666 ◽

2019 ◽

Author(s):

Riccardo Delli Ponti ◽

Alexandros Armaos ◽

Andrea Vandelli ◽

Gian Gaetano Tartaglia

Keyword(s):

Protein Interactions ◽

Rna Structure ◽

Cross Validation ◽

Supplementary Information ◽

Supplementary Data ◽

High Confidence ◽

Rna Molecules ◽

Non Coding Rna ◽

Long Non Coding Rna

Abstract Motivation RNA structure is difficult to predict in vivo due to interactions with enzymes and other molecules. Here we introduce CROSSalive, an algorithm to predict the single- and double-stranded regions of RNAs in vivo using predictions of protein interactions. Results Trained on icSHAPE data in presence (m6a+) and absence of N6 methyladenosine modification (m6a-), CROSSalive achieves cross-validation accuracies between 0.70 and 0.88 in identifying high-confidence single- and double-stranded regions. The algorithm was applied to the long non-coding RNA Xist (17 900 nt, not present in the training) and shows an Area under the ROC curve of 0.83 in predicting structured regions. Availability and implementation CROSSalive webserver is freely accessible at http://service.tartaglialab.com/new_submission/crossalive Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HiCEnterprise: identifying long range chromosomal contacts in Hi-C data

PeerJ ◽

10.7717/peerj.10558 ◽

2021 ◽

Vol 9 ◽

pp. e10558

Author(s):

Hanna Kranas ◽

Irina Tuszynska ◽

Bartek Wilczynski

Keyword(s):

Long Range ◽

Short Range ◽

Computational Analysis ◽

Statistical Tests ◽

Software Tool ◽

Research Community ◽

Supplementary Information ◽

Chromosome Domains ◽

Growing Body ◽

Contact Data

Motivation Computational analysis of chromosomal contact data is currently gaining popularity with the rapid advance in experimental techniques providing access to a growing body of data. An important problem in this area is the identification of long range contacts between distinct chromatin regions. Such loops were shown to exist at different scales, either mediating relatively short range interactions between enhancers and promoters or providing interactions between much larger, distant chromosome domains. A proper statistical analysis as well as availability to a wide research community are crucial in a tool for this task. Results We present HiCEnterprise, a first freely available software tool for identification of long range chromatin contacts not only between small regions, but also between chromosomal domains. It implements four different statistical tests for identification of significant contacts for user defined regions or domains as well as necessary functions for input, output and visualization of chromosome contacts. Availability The software and the corresponding documentation are available at: github.com/regulomics/HiCEnterprise. Supplementary information Supplemental data are available in the online version of the article and at the website regulomics.mimuw.edu.pl/wp/hicenterprise.

Download Full-text

MutSpot: detection of non-coding mutation hotspots in cancer genomes

10.1101/740944 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yu Amanda Guo ◽

Mei Mei Chang ◽

Anders Jacobsen Skanderup

Keyword(s):

Somatic Mutations ◽

R Package ◽

Supplementary Information ◽

Patient Specific ◽

Supplementary Data ◽

Link Type ◽

Genome Wide ◽

Cancer Genomes ◽

User Friendly ◽

Regulatory Dna

AbstractSummaryRecurrence and clustering of somatic mutations (hotspots) in cancer genomes may indicate positive selection and involvement in tumorigenesis. MutSpot performs genome-wide inference of mutation hotspots in non-coding and regulatory DNA of cancer genomes. MutSpot performs feature selection across hundreds of epigenetic and sequence features followed by estimation of position and patient-specific background somatic mutation probabilities. MutSpot is user-friendly, works on a standard workstation, and scales to thousands of cancer genomes.Availability and implementationMutSpot is implemented as an R package and is available at https://github.com/skandlab/MutSpot/Supplementary informationSupplementary data are available at https://github.com/skandlab/MutSpot/

Download Full-text