scholarly journals circHiC: circular visualization of Hi-C data and integration of genomic data

2020 ◽  
Author(s):  
Ivan Junier ◽  
Nelle Varoquaux

SummaryGenome wide contact frequencies obtained using Hi-C-like experiments have raised novel challenges in terms of visualization and rationalization of chromosome structuring phenomena. In bacteria, display of Hi-C data should be congruent with the circularity of chromosomes. However, standard representations under the form of square matrices or horizontal bands are not adapted to periodic conditions as those imposed by (most) bacterial chromosomes. Here, we fill this gap and propose a Python library, built upon the widely used Matplotlib library, to display Hi-C data in circular strips, together with the possibility to overlay genomic data. The proposed tools are light and fast, aiming to facilitate the exploration and understanding of bacterial chromosome structuring data. The library further includes the possibility to handle linear chromosomes, providing a fresh way to display and explore eukaryotic data.Availability and implementationThe package runs under Python 3 and is freely available at https://github.com/TrEE-TIMC/circHiC. The documentation can be found at https://tree-timc.github.io/circhic/; images obtained in different organisms are provided in the gallery section and are accompanied with [email protected], [email protected]

2017 ◽  
Author(s):  
Florian Privé ◽  
Hugues Aschard ◽  
Michael G.B. Blum

AbstractMotivation:Genome-wide datasets produced for association studies have dramatically increased in size over the past few years, with modern datasets commonly including millions of variants measured in dozens of thousands of individuals. This increase in data size is a major challenge severely slowing down genomic analyses. Specialized software for every part of the analysis pipeline have been developed to handle large genomic data. However, combining all these software into a single data analysis pipeline might be technically difficult.Results:Here we present two R packages, bigstatsr and bigsnpr, allowing for management and analysis of large scale genomic data to be performed within a single comprehensive framework. To address large data size, the packages use memory-mapping for accessing data matrices stored on disk instead of in RAM. To perform data pre-processing and data analysis, the packages integrate most of the tools that are commonly used, either through transparent system calls to existing software, or through updated or improved implementation of existing methods. In particular, the packages implement a fast derivation of Principal Component Analysis, functions to remove SNPs in Linkage Disequilibrium, and algorithms to learn Polygenic Risk Scores on millions of SNPs. We illustrate applications of the two R packages by analysing a case-control genomic dataset for the celiac disease, performing an association study and computing Polygenic Risk Scores. Finally, we demonstrate the scalability of the R packages by analyzing a simulated genome-wide dataset including 500,000 individuals and 1 million markers on a single desktop computer.Availability:https://privefl.github.io/bigstatsr/ & https://privefl.github.io/bigsnpr/Contact:[email protected] & [email protected] information:Supplementary data are available at Bioinformatics online.


Author(s):  
Dominic A. Stoll ◽  
Nicolas Danylec ◽  
Christina Grimmler ◽  
Sabine E. Kulling ◽  
Melanie Huch

The strain Adlercreutzia caecicola DSM 22242T (=CCUG 57646T=NR06T) was taxonomically described in 2013 and named as Parvibacter caecicola Clavel et al. 2013. In 2018, the name of the strain DSM 22242T was changed to Adlercreutzia caecicola (Clavel et al. 2013) Nouioui et al. 2018 due to taxonomic investigations of the closely related genera Adlercreutzia, Asaccharobacter and Enterorhabdus within the phylum Actinobacteria . However, the first whole draft genome of strain DSM 22242T was published by our group in 2019. Therefore, the genome was not available within the study of Nouioui et al. (2018). The results of the polyphasic approach within this study, including phenotypic and biochemical analyses and genome-based taxonomic investigations [genome-wide average nucleotide identity (gANI), alignment fraction (AF), average amino acid identity (AAI), percentage of orthologous conserved proteins (POCP) and genome blast distance phylogeny (GBDP) tree], indicated that the proposed change of the name Parvibacter caecicola to Adlercreutzia caecicola was not correct. Therefore, it is proposed that the correct name of Adlercreutzia caecicola (Clavel et al. 2013) Nouioui et al. 2018 strain DSM 22242T is Parvibacter caecicola Clavel et al. 2013.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiujin Li ◽  
Hailiang Song ◽  
Zhe Zhang ◽  
Yunmao Huang ◽  
Qin Zhang ◽  
...  

Abstract Background With the emphasis on analysing genotype-by-environment interactions within the framework of genomic selection and genome-wide association analysis, there is an increasing demand for reliable tools that can be used to simulate large-scale genomic data in order to assess related approaches. Results We proposed a theory to simulate large-scale genomic data on genotype-by-environment interactions and added this new function to our developed tool GPOPSIM. Additionally, a simulated threshold trait with large-scale genomic data was also added. The validation of the simulated data indicated that GPOSPIM2.0 is an efficient tool for mimicking the phenotypic data of quantitative traits, threshold traits, and genetically correlated traits with large-scale genomic data while taking genotype-by-environment interactions into account. Conclusions This tool is useful for assessing genotype-by-environment interactions and threshold traits methods.


2019 ◽  
Author(s):  
Ying Sheng ◽  
Chiung-Yu Huang ◽  
Siarhei Lobach ◽  
Lydia Zablotska ◽  
Iryna Lobach ◽  
...  

ABSTRACTLarge-scale genome-wide analyses scans provide massive volumes of genetic variants on large number of cases and controls that can be used to estimate the genetic effects. Yet, the sets of non-genetic variables available in publicly available databases are often brief. It is known that omitting a continuous variable from a logistic regression model can result in biased estimates of odds ratios (OR) (e.g., Gail et al (1984), Neuhaus et al (1993), Hauck et al (1991), Zeger et al (1988)). We are interested to assess what information is needed to recover the bias in the OR estimate of genotype due to omitting a continuous variable in settings when the actual values of the omitted variable are not available. We derive two estimating procedures that can recover the degree of bias based on a conditional density of the omitted variable or knowing the distribution of the omitted variable. Importantly, our derivations show that omitting a continuous variable can result in either under- or over-estimation of the genetic effects. We performed extensive simulation studies to examine bias, variability, false positive rate, and power in the model that omits a continuous variable. We show the application to two genome-wide studies of Alzheimer’s disease.Data Availability StatementThe data that support the findings of this study are openly available in the Database of Genotypes and Phenotypes at [https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/study.cgi?study_id=phs000372.v1.p1], reference number [phs000372.v1.p1] and at the Alzheimer’s Disease Neuroimaging Initiative http://adni.loni.usc.edu/.


Author(s):  
Jiao Huang ◽  
Ying Huang

A novel filamentous Actinobacterium, designated strain FXJ1.1311T, was isolated from soil collected in Ngari (Ali) Prefecture, Qinghai-Tibet Plateau, western PR China. The strain showed antimicrobial activity against Gram-positive bacteria and Fusarium oxysporum. Results of phylogenetic analysis based on 16S rRNA gene sequences indicated that strain FXJ1.1311T belonged to the genus Lentzea and showed the highest sequence similarity to Lentzea guizhouensis DHS C013T (98.04%). Morphological and chemotaxonomic characteristics supported its assignment to the genus Lentzea . The genome-wide average nucleotide identity between strain FXJ1.1311T and L. guizhouensis DHS C013T as well as other Lentzea type strains was <82.2 %. Strain FXJ1.1311T also formed a monophyletic line distinct from the known Lentzea species in the phylogenomic tree. In addition, physiological and chemotaxonomic characteristics allowed phenotypic differentiation of the novel strain from L. guizhouensis . Based on the evidence presented here, strain FXJ1.1311T represents a novel species of the genus Lentzea , for which the name Lentzea tibetensis sp. nov. is proposed. The type strain is FXJ1.1311T (=CGMCC 4.7383T=DSM 104975T).


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Matthias Munz ◽  
Inken Wohlers ◽  
Eric Simon ◽  
Tobias Reinberger ◽  
Hauke Busch ◽  
...  

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).


2014 ◽  
Vol 13s2 ◽  
pp. CIN.S13786 ◽  
Author(s):  
Yang Ni ◽  
Francesco C. Stingo ◽  
Veerabhadran Baladandayuthapani

Rapid development of genome-wide profiling technologies has made it possible to conduct integrative analysis on genomic data from multiple platforms. In this study, we develop a novel integrative Bayesian network approach to investigate the relationships between genetic and epigenetic alterations as well as how these mutations affect a patient's clinical outcome. We take a Bayesian network approach that admits a convenient decomposition of the joint distribution into local distributions. Exploiting the prior biological knowledge about regulatory mechanisms, we model each local distribution as linear regressions. This allows us to analyze multi-platform genome-wide data in a computationally efficient manner. We illustrate the performance of our approach through simulation studies. Our methods are motivated by and applied to a multi-platform glioblastoma dataset, from which we reveal several biologically relevant relationships that have been validated in the literature as well as new genes that could potentially be novel biomarkers for cancer progression.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Iain R. Timmins ◽  
Francesco Zaccardi ◽  
Christopher P. Nelson ◽  
Paul W. Franks ◽  
Thomas Yates ◽  
...  

A Correction to this paper has been published: 10.1038/s42003-020-01447-6.


2020 ◽  
Vol 10 (6) ◽  
pp. 2057-2068 ◽  
Author(s):  
Jessica R. Eisenstatt ◽  
Lars Boeckmann ◽  
Wei-Chun Au ◽  
Valerie Garcia ◽  
Levi Bursch ◽  
...  

The evolutionarily conserved centromeric histone H3 variant (Cse4 in budding yeast, CENP-A in humans) is essential for faithful chromosome segregation. Mislocalization of CENP-A to non-centromeric chromatin contributes to chromosomal instability (CIN) in yeast, fly, and human cells and CENP-A is highly expressed and mislocalized in cancers. Defining mechanisms that prevent mislocalization of CENP-A is an area of active investigation. Ubiquitin-mediated proteolysis of overexpressed Cse4 (GALCSE4) by E3 ubiquitin ligases such as Psh1 prevents mislocalization of Cse4, and psh1Δ strains display synthetic dosage lethality (SDL) with GALCSE4. We previously performed a genome-wide screen and identified five alleles of CDC7 and DBF4 that encode the Dbf4-dependent kinase (DDK) complex, which regulates DNA replication initiation, among the top twelve hits that displayed SDL with GALCSE4. We determined that cdc7-7 strains exhibit defects in ubiquitin-mediated proteolysis of Cse4 and show mislocalization of Cse4. Mutation of MCM5 (mcm5-bob1) bypasses the requirement of Cdc7 for replication initiation and rescues replication defects in a cdc7-7 strain. We determined that mcm5-bob1 does not rescue the SDL and defects in proteolysis of GALCSE4 in a cdc7-7 strain, suggesting a DNA replication-independent role for Cdc7 in Cse4 proteolysis. The SDL phenotype, defects in ubiquitin-mediated proteolysis, and the mislocalization pattern of Cse4 in a cdc7-7 psh1Δ strain were similar to that of cdc7-7 and psh1Δ strains, suggesting that Cdc7 regulates Cse4 in a pathway that overlaps with Psh1. Our results define a DNA replication initiation-independent role of DDK as a regulator of Psh1-mediated proteolysis of Cse4 to prevent mislocalization of Cse4.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Will P. M. Rowe

Abstract Considerable advances in genomics over the past decade have resulted in vast amounts of data being generated and deposited in global archives. The growth of these archives exceeds our ability to process their content, leading to significant analysis bottlenecks. Sketching algorithms produce small, approximate summaries of data and have shown great utility in tackling this flood of genomic data, while using minimal compute resources. This article reviews the current state of the field, focusing on how the algorithms work and how genomicists can utilize them effectively. References to interactive workbooks for explaining concepts and demonstrating workflows are included at https://github.com/will-rowe/genome-sketching.


Sign in / Sign up

Export Citation Format

Share Document