ChromoMap: an R package for interactive visualization of multi-omics data and annotation of chromosomes

Abstract Background The recent advancements in high-throughput sequencing have resulted in the availability of annotated genomes, as well as of multi-omics data for many living organisms. This has increased the need for graphic tools that allow the concurrent visualization of genomes and feature-associated multi-omics data on single publication-ready plots. Results We present chromoMap, an R package, developed for the construction of interactive visualizations of chromosomes/chromosomal regions, mapping of any chromosomal feature with known coordinates (i.e., protein coding genes, transposable elements, non-coding RNAs, microsatellites, etc.), and chromosomal regional characteristics (i.e. genomic feature density, gene expression, DNA methylation, chromatin modifications, etc.) of organisms with a genome assembly. ChromoMap can also integrate multi-omics data (genomics, transcriptomics and epigenomics) in relation to their occurrence across chromosomes. ChromoMap takes tab-delimited files (BED like) or alternatively R objects to specify the genomic co-ordinates of the chromosomes and elements to annotate. Rendered chromosomes are composed of continuous windows of a given range, which, on hover, display detailed information about the elements annotated within that range. By adjusting parameters of a single function, users can generate a variety of plots that can either be saved as static image or as HTML documents. Conclusions ChromoMap’s flexibility allows for concurrent visualization of genomic data in each strand of a given chromosome, or of more than one homologous chromosome; allowing the comparison of multi-omic data between genotypes (e.g. species, varieties, etc.) or between homologous chromosomes of phased diploid/polyploid genomes. chromoMap is an extensive tool that can be potentially used in various bioinformatics analysis pipelines for genomic visualization of multi-omics data.

Download Full-text

An R Package for Divergence Analysis of Omics Data

10.1101/720391 ◽

2019 ◽

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

Genome-wide identification of long non-coding RNAs in the gravid ectoparasite Varroa destructor

10.1101/2020.06.14.151340 ◽

2020 ◽

Cited By ~ 1

Author(s):

Zheguang Lin ◽

Yibing Liu ◽

Xiaomei Chen ◽

Cong Han ◽

Wei Wang ◽

...

Keyword(s):

Varroa Destructor ◽

High Throughput Sequencing ◽

Target Genes ◽

Protein Coding ◽

Genome Wide ◽

Genomic Studies ◽

Genetic Information Processing ◽

Non Coding Rnas ◽

Environmental Responses ◽

Living Organisms

AbstractLong non-coding RNAs (lncRNAs) emerge as critical regulators with various biological functions in living organisms. However, to date, no systematic characterization of lncRNAs has been investigated in the ectoparasitic mite Varroa destructor, the most severe biotic threat to honey bees worldwide. Here, we performed an initial genome-wide identification of lncRNAs in V. destructor via high-throughput sequencing technology and reported, for the first time, the transcriptomic landscape of lncRNAs in the devastating parasite. By means of a lncRNA identification pipeline, 6,645 novel lncRNA transcripts, encoded by 3,897 gene loci, were identified, including 2,066 sense lncRNAs, 2,772 lincRNAs, and 1,807 lncNATs. Compared with protein-coding mRNAs, V. destructor lncRNAs are shorter in terms of full length, as well as of the ORF length, contain less exons, and express at lower level. GO term and KEGG pathway enrichment analyses of the lncRNA target genes demonstrated that these predicted lncRNAs are likely to play key roles in cellular processes, genetic information processing and environmental responses. To our knowledge, this is the first catalog of lncRNA profile in the parasitiformes species, providing a valuable resource for genetic and genomic studies. Understanding the characteristics and features of lncRNAs in V. destructor would promote sustainable pest control.

Download Full-text

Flexible analysis of TSS mapping data and detection of TSS shifts with TSRexploreR

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab051 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Robert A Policastro ◽

Daniel J McDonald ◽

Volker P Brendel ◽

Gabriel E Zentner

Keyword(s):

Gene Expression ◽

Transcription Initiation ◽

High Throughput Sequencing ◽

R Package ◽

Mapping Data ◽

Transcription Start ◽

Transcript Stability ◽

Novel Approach ◽

A Genome ◽

And Shifts

Abstract Heterogeneity in transcription initiation has important consequences for transcript stability and translation, and shifts in transcription start site (TSS) usage are prevalent in various developmental, metabolic, and disease contexts. Accordingly, numerous methods for global TSS profiling have been developed, including most recently Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a method to profile transcription start sites (TSSs) on a genome-wide scale with significant cost and time savings compared to previous methods. In anticipation of more widespread adoption of STRIPE-seq and related methods for construction of promoter atlases and studies of differential gene expression, we built TSRexploreR, an R package for end-to-end analysis of TSS mapping data. TSRexploreR provides functions for TSS and transcription start region (TSR) detection, normalization, correlation, visualization, and differential TSS/TSR analyses. TSRexploreR is highly interoperable, accepting the data structures of TSS and TSR sets generated by several existing tools for processing and alignment of TSS mapping data, such as CAGEr for Cap Analysis of Gene Expression (CAGE) data. Lastly, TSRexploreR implements a novel approach for the detection of shifts in TSS distribution.

Download Full-text

Flexible analysis of TSS mapping data and detection of TSS shifts with TSRexploreR

10.1101/2021.02.14.431114 ◽

2021 ◽

Author(s):

Robert A. Policastro ◽

Daniel J. McDonald ◽

Volker P. Brendel ◽

Gabriel E. Zentner

Keyword(s):

Transcription Initiation ◽

High Throughput Sequencing ◽

R Package ◽

Mapping Data ◽

Transcription Start ◽

Transcript Stability ◽

Novel Approach ◽

A Genome ◽

And Shifts ◽

Cap Analysis

AbstractHeterogeneity in transcription initiation has important consequences for transcript stability and translation, and shifts in transcription start site (TSS) usage are prevalent in various disease and developmental contexts. Accordingly, numerous methods for global TSS profiling have been developed, including our recently published Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a method to profile transcription start sites (TSSs) on a genome-wide scale with minimal cost and time. In parallel to our development of STRIPE-seq, we built TSRexploreR, an R package for end-to-end analysis of TSS mapping data. TSRexploreR provides functions for TSS and TSR detection, normalization, correlation, visualization, and differential TSS/TSR analysis. TSRexploreR is highly interoperable, accepting the data structures of TSS and TSR sets generated by several existing tools for processing and alignment of TSS mapping data, such as CAGEr for Cap Analysis of Gene Expression (CAGE) data. Lastly, TSRexploreR implements a novel approach for the detection of shifts in TSS distribution.

Download Full-text

chromoMap: An R package for Interactive Visualization and Annotation of Chromosomes

10.1101/605600 ◽

2019 ◽

Cited By ~ 10

Author(s):

Lakshay Anand ◽

Carlos M. Rodriguez Lopez

Keyword(s):

Open Source ◽

Interactive Visualization ◽

Living Organism ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Comprehensive Understanding ◽

Single Function ◽

Interactive Visualizations ◽

Supplementary Material

AbstractSummarychromoMap is an R package for constructing interactive visualizations of chromosomes/chromosomal regions, and mapping of chromosomal elements (like genes) onto them, of any living organism. The package takes separate tab-delimited files (BED like) to specify the genomic co-ordinates of the chromosomes and the elements to annotate. Each rendered chromosome is composed of continuous loci of specific ranges where each locus, on hover, displays detailed information about the elements annotated within that locus range. By just tweaking parameters of a single function, users can generate a variety of plots that can either be saved as static image or shared as HTML documents. Users can utilize the various prominent features of chromoMap including, but not limited to, visualizing polyploidy, creating chromosome heatmaps, mapping groups of elements, adding hyperlinks to elements, multi-species chromosome visualization.Availability and implementationThe R package chromoMap is available under the GPL-3 Open Source license. It is included with a vignette for comprehensive understanding of its various features, and is freely available from: https://CRAN.R-project.org/[email protected] informationSupplementary data are available online.

Download Full-text

Unraveling the Genome of a High Yielding Colombian Sugarcane Hybrid

Frontiers in Plant Science ◽

10.3389/fpls.2021.694859 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jhon Henry Trujillo-Montenegro ◽

María Juliana Rodríguez Cubillos ◽

Cristian Darío Loaiza ◽

Manuel Quintero ◽

Héctor Fabio Espitia-Navarro ◽

...

Keyword(s):

Genome Assembly ◽

High Throughput Sequencing ◽

Saccharum Officinarum ◽

Entire Genome ◽

Protein Coding ◽

Sequencing Technologies ◽

Selection Processes ◽

A Genome ◽

Recent Developments ◽

Sugarcane Hybrids

Recent developments in High Throughput Sequencing (HTS) technologies and bioinformatics, including improved read lengths and genome assemblers allow the reconstruction of complex genomes with unprecedented quality and contiguity. Sugarcane has one of the most complicated genomes among grassess with a haploid length of 1Gbp and a ploidies between 8 and 12. In this work, we present a genome assembly of the Colombian sugarcane hybrid CC 01-1940. Three types of sequencing technologies were combined for this assembly: PacBio long reads, Illumina paired short reads, and Hi-C reads. We achieved a median contig length of 34.94 Mbp and a total genome assembly of 903.2 Mbp. We annotated a total of 63,724 protein coding genes and performed a reconstruction and comparative analysis of the sucrose metabolism pathway. Nucleotide evolution measurements between orthologs with close species suggest that divergence between Saccharum officinarum and Saccharum spontaneum occurred <2 million years ago. Synteny analysis between CC 01-1940 and the S. spontaneum genome confirms the presence of translocation events between the species and a random contribution throughout the entire genome in current sugarcane hybrids. Analysis of RNA-Seq data from leaf and root tissue of contrasting sugarcane genotypes subjected to water stress treatments revealed 17,490 differentially expressed genes, from which 3,633 correspond to genes expressed exclusively in tolerant genotypes. We expect the resources presented here to serve as a source of information to improve the selection processes of new varieties of the breeding programs of sugarcane.

Download Full-text

Diploid chromosome-scale assembly of the Muscadinia rotundifolia genome supports chromosome fusion and disease resistance gene expansion during Vitis and Muscadinia divergence

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab033 ◽

2021 ◽

Author(s):

Noé Cochetel ◽

Andrea Minio ◽

Mélanie Massonnet ◽

Amanda M Vondras ◽

Rosa Figueroa-Balderas ◽

...

Keyword(s):

Single Molecule ◽

Interleukin 1 ◽

Plasmopara Viticola ◽

Cabernet Sauvignon ◽

Disease Resistance Gene ◽

Chromosome Fusion ◽

Protein Coding ◽

Downy Mildews ◽

A Genome ◽

Muscadinia Rotundifolia

Abstract Muscadinia rotundifolia, the muscadine grape, has been cultivated for centuries in the southeastern United States. M. rotundifolia is resistant to many of the pathogens that detrimentally affect Vitis vinifera, the grape species commonly used for winemaking. For this reason, M. rotundifolia is a valuable genetic resource for breeding. Single-molecule real-time reads were combined with optical maps to reconstruct the two haplotypes of each of the 20 M. rotundifolia cv. Trayshed chromosomes. The completeness and accuracy of the assembly were confirmed using a high-density linkage map of M. rotundifolia. Protein-coding genes were annotated using an integrated and comprehensive approach. This included using Full-length cDNA sequencing (Iso-Seq) to improve gene structure and hypothetical spliced variant predictions. Our data strongly support that Muscadinia chromosomes 7 and 20 are fused in Vitis and pinpoint the location of the fusion in Cabernet Sauvignon and PN40024 chromosome 7. Disease-related gene numbers in Trayshed and Cabernet Sauvignon were similar, but their clustering locations were different. A dramatic expansion of the Toll/Interleukin-1 Receptor-like Nucleotide-Binding Site Leucine-Rich Repeat (TIR-NBS-LRR) class was detected on Trayshed chromosome 12 at the Resistance to Uncinula necator 1 (RUN1)/ Resistance to Plasmopara viticola 1 (RPV1) locus, which confers strong dominant resistance to powdery and downy mildews. A genome browser for Trayshed, its annotation, and an associated Blast tool are available at .www.grapegenomics.com

Download Full-text

Ecofriendly Synthesis of Silver Nanoparticles by Terrabacter humi sp. nov. and Their Antibacterial Application against Antibiotic-Resistant Pathogens

International Journal of Molecular Sciences ◽

10.3390/ijms21249746 ◽

2020 ◽

Vol 21 (24) ◽

pp. 9746

Author(s):

Shahina Akter ◽

Sun-Young Lee ◽

Muhammad Zubair Siddiqi ◽

Sri Renukadevi Balusamy ◽

Md. Ashrafudoulla ◽

...

Keyword(s):

Silver Nanoparticles ◽

Spherical Shape ◽

Optimal Growth ◽

Rrna Genes ◽

Rrna Gene ◽

Strictly Aerobic ◽

Drug Resistant ◽

Protein Coding ◽

A Genome ◽

The Fourier Transform

It is essential to develop and discover alternative eco-friendly antibacterial agents due to the emergence of multi-drug-resistant microorganisms. In this study, we isolated and characterized a novel bacterium named Terrabacter humi MAHUQ-38T, utilized for the eco-friendly synthesis of silver nanoparticles (AgNPs) and the synthesized AgNPs were used to control multi-drug-resistant microorganisms. The novel strain was Gram stain positive, strictly aerobic, milky white colored, rod shaped and non-motile. The optimal growth temperature, pH and NaCl concentration were 30 °C, 6.5 and 0%, respectively. Based on 16S rRNA gene sequence, strain MAHUQ-38T belongs to the genus Terrabacter and is most closely related to several Terrabacter type strains (98.2%–98.8%). Terrabacter humi MAHUQ-38T had a genome of 5,156,829 bp long (19 contigs) with 4555 protein-coding genes, 48 tRNA and 5 rRNA genes. The culture supernatant of strain MAHUQ-38T was used for the eco-friendly and facile synthesis of AgNPs. The transmission electron microscopy (TEM) image showed the spherical shape of AgNPs with a size of 6 to 24 nm, and the Fourier transform infrared (FTIR) analysis revealed the functional groups responsible for the synthesis of AgNPs. The synthesized AgNPs exhibited strong anti-bacterial activity against multi-drug-resistant pathogens, Escherichia coli and Pseudomonas aeruginosa. Minimal inhibitory/bactericidal concentrations against E. coli and P. aeruginosa were 6.25/50 and 12.5/50 μg/mL, respectively. The AgNPs altered the cell morphology and damaged the cell membrane of pathogens. This study encourages the use of Terrabacter humi for the ecofriendly synthesis of AgNPs to control multi-drug-resistant microorganisms.

Download Full-text

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models

BMC Bioinformatics ◽

10.1186/s12859-021-04221-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Arnaud Liehrmann ◽

Guillem Rigaill ◽

Toby Dylan Hocking

Keyword(s):

Histone Modifications ◽

Count Data ◽

High Throughput Sequencing ◽

Genetic Regulation ◽

Regulation Of Gene Expression ◽

Basic Mechanism ◽

R Package ◽

Detection Accuracy ◽

Full Potential ◽

Segmentation Models

Abstract Background Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS (https://github.com/aLiehrmann/CROCS), detect the peaks more accurately than algorithms which rely on natural assumptions. Conclusion The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.

Download Full-text

Chloroplast genome variation and phylogenetic relationships of Atractylodes species

BMC Genomics ◽

10.1186/s12864-021-07394-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yiheng Wang ◽

Sheng Wang ◽

Yanlei Liu ◽

Qingjun Yuan ◽

Jiahui Sun ◽

...

Keyword(s):

Chloroplast Genome ◽

High Throughput Sequencing ◽

Phylogenetic Analyses ◽

Herbal Medicines ◽

Structural Genomic ◽

Protein Coding ◽

Chloroplast Genomes ◽

Original Plant ◽

Morphological Differences ◽

Rna Genes

Abstract Background Atractylodes DC is the basic original plant of the widely used herbal medicines “Baizhu” and “Cangzhu” and an endemic genus in East Asia. Species within the genus have minor morphological differences, and the universal DNA barcodes cannot clearly distinguish the systemic relationship or identify the species of the genus. In order to solve these question, we sequenced the chloroplast genomes of all species of Atractylodes using high-throughput sequencing. Results The results indicate that the chloroplast genome of Atractylodes has a typical quadripartite structure and ranges from 152,294 bp (A. carlinoides) to 153,261 bp (A. macrocephala) in size. The genome of all species contains 113 genes, including 79 protein-coding genes, 30 transfer RNA genes and four ribosomal RNA genes. Four hotspots, rpl22-rps19-rpl2, psbM-trnD, trnR-trnT(GGU), and trnT(UGU)-trnL, and a total of 42–47 simple sequence repeats (SSR) were identified as the most promising potentially variable makers for species delimitation and population genetic studies. Phylogenetic analyses of the whole chloroplast genomes indicate that Atractylodes is a clade within the tribe Cynareae; Atractylodes species form a monophyly that clearly reflects the relationship within the genus. Conclusions Our study included investigations of the sequences and structural genomic variations, phylogenetics and mutation dynamics of Atractylodes chloroplast genomes and will facilitate future studies in population genetics, taxonomy and species identification.

Download Full-text