RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms

10.7287/peerj.preprints.27928v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zhaodong Hao ◽

Dekang Lv ◽

Ying Ge ◽

Jisen Shi ◽

Dolf Weijers ◽

...

Keyword(s):

Gc Content ◽

R Package ◽

Whole Genome ◽

Data Mapping ◽

Model Species ◽

Chromosomal Distribution ◽

Whole Genome Analysis ◽

Sequencing Technologies ◽

Genome Wide ◽

Genome Wide Data

Background: Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly. As boundaries between model and non-model species are shifting, tools are urgently needs to generate idiograms for a broad range of species are needed to help better understanding fundamental genome characteristics. Results: The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. Conclusion: The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.

Download Full-text

RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms

10.7287/peerj.preprints.27928 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zhaodong Hao ◽

Dekang Lv ◽

Ying Ge ◽

Jisen Shi ◽

Dolf Weijers ◽

...

Keyword(s):

Gc Content ◽

R Package ◽

Whole Genome ◽

Data Mapping ◽

Model Species ◽

Chromosomal Distribution ◽

Whole Genome Analysis ◽

Sequencing Technologies ◽

Genome Wide ◽

Genome Wide Data

Background: Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly. As boundaries between model and non-model species are shifting, tools are urgently needs to generate idiograms for a broad range of species are needed to help better understanding fundamental genome characteristics. Results: The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. Conclusion: The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.

Download Full-text

A curated dataset of modern and ancient high-coverage shotgun human genomes

Scientific Data ◽

10.1038/s41597-021-00980-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Pierpaolo Maisano Delser ◽

Eppie R. Jones ◽

Anahit Hovhannisyan ◽

Lara Cassidy ◽

Ron Pinhasi ◽

...

Keyword(s):

Sequence Data ◽

Whole Genome ◽

Reference Dataset ◽

High Coverage ◽

Sample Distribution ◽

Human Samples ◽

Human Genomes ◽

Genome Wide ◽

Genome Wide Data ◽

Computationally Intensive

AbstractOver the last few years, genome-wide data for a large number of ancient human samples have been collected. Whilst datasets of captured SNPs have been collated, high coverage shotgun genomes (which are relatively few but allow certain types of analyses not possible with ascertained captured SNPs) have to be reprocessed by individual groups from raw reads. This task is computationally intensive. Here, we release a dataset including 35 whole-genome sequenced samples, previously published and distributed worldwide, together with the genetic pipeline used to process them. The dataset contains 72,041,355 sites called across 19 ancient and 16 modern individuals and includes sequence data from four previously published ancient samples which we sequenced to higher coverage (10–18x). Such a resource will allow researchers to analyse their new samples with the same genetic pipeline and directly compare them to the reference dataset without re-processing published samples. Moreover, this dataset can be easily expanded to increase the sample distribution both across time and space.

Download Full-text

Data-adaptive multi-locus association testing in subjects with arbitrary genealogical relationships

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2018-0030 ◽

2019 ◽

Vol 18 (3) ◽

Cited By ~ 1

Author(s):

Gail Gong ◽

Wei Wang ◽

Chih-Lin Hsieh ◽

David J. Van Den Berg ◽

Christopher Haiman ◽

...

Keyword(s):

Prostate Cancer ◽

R Package ◽

Suppressor Gene ◽

Test Statistic ◽

Specific Data ◽

Association Tests ◽

Association Testing ◽

Genome Wide ◽

Genome Wide Data ◽

Data Adaptive

Abstract Genome-wide sequencing enables evaluation of associations between traits and combinations of variants in genes and pathways. But such evaluation requires multi-locus association tests with good power, regardless of the variant and trait characteristics. And since analyzing families may yield more power than analyzing unrelated individuals, we need multi-locus tests applicable to both related and unrelated individuals. Here we describe such tests, and we introduce SKAT-X, a new test statistic that uses genome-wide data obtained from related or unrelated subjects to optimize power for the specific data at hand. Simulations show that: a) SKAT-X performs well regardless of variant and trait characteristics; and b) for binary traits, analyzing affected relatives brings more power than analyzing unrelated individuals, consistent with previous findings for single-locus tests. We illustrate the methods by application to rare unclassified missense variants in the tumor suppressor gene BRCA2, as applied to combined data from prostate cancer families and unrelated prostate cancer cases and controls in the Multi-ethnic Cohort (MEC). The methods can be implemented using open-source code for public use as the R-package GATARS (Genetic Association Tests for Arbitrarily Related Subjects) <https://gailg.github.io/gatars/>.

Download Full-text

Genome-wide microsatellites and species specific markers in genus Phytophthora revealed through whole genome analysis

3 Biotech ◽

10.1007/s13205-020-02430-y ◽

2020 ◽

Vol 10 (10) ◽

Author(s):

Deepu Mathew ◽

P. S. Anju ◽

Amala Tom ◽

Neethu Johnson ◽

M. Lidia George ◽

...

Keyword(s):

Genome Analysis ◽

Whole Genome ◽

Whole Genome Analysis ◽

Genome Wide ◽

Species Specific

Download Full-text

Efficient and accurate determination of genome-wide DNA methylation patterns in Arabidopsis thaliana with enzymatic methyl sequencing

Epigenetics & Chromatin ◽

10.1186/s13072-020-00361-9 ◽

2020 ◽

Vol 13 (1) ◽

Cited By ~ 1

Author(s):

Suhua Feng ◽

Zhenhui Zhong ◽

Ming Wang ◽

Steven E. Jacobsen

Keyword(s):

Dna Methylation ◽

Bisulfite Sequencing ◽

Accurate Determination ◽

Gc Content ◽

Epigenetic Mark ◽

Whole Genome ◽

Whole Genome Bisulfite Sequencing ◽

Genome Wide ◽

A Genome ◽

Genome Bisulfite Sequencing

Abstract Background 5′ methylation of cytosines in DNA molecules is an important epigenetic mark in eukaryotes. Bisulfite sequencing is the gold standard of DNA methylation detection, and whole-genome bisulfite sequencing (WGBS) has been widely used to detect methylation at single-nucleotide resolution on a genome-wide scale. However, sodium bisulfite is known to severely degrade DNA, which, in combination with biases introduced during PCR amplification, leads to unbalanced base representation in the final sequencing libraries. Enzymatic conversion of unmethylated cytosines to uracils can achieve the same end product for sequencing as does bisulfite treatment and does not affect the integrity of the DNA; enzymatic methylation sequencing may, thus, provide advantages over bisulfite sequencing. Results Using an enzymatic methyl-seq (EM-seq) technique to selectively deaminate unmethylated cytosines to uracils, we generated and sequenced libraries based on different amounts of Arabidopsis input DNA and different numbers of PCR cycles, and compared these data to results from traditional whole-genome bisulfite sequencing. We found that EM-seq libraries were more consistent between replicates and had higher mapping and lower duplication rates, lower background noise, higher average coverage, and higher coverage of total cytosines. Differential methylation region (DMR) analysis showed that WGBS tended to over-estimate methylation levels especially in CHG and CHH contexts, whereas EM-seq detected higher CG methylation levels in certain highly methylated areas. These phenomena can be mostly explained by a correlation of WGBS methylation estimation with GC content and methylated cytosine density. We used EM-seq to compare methylation between leaves and flowers, and found that CHG methylation level is greatly elevated in flowers, especially in pericentromeric regions. Conclusion We suggest that EM-seq is a more accurate and reliable approach than WGBS to detect methylation. Compared to WGBS, the results of EM-seq are less affected by differences in library preparation conditions or by the skewed base composition in the converted DNA. It may therefore be more desirable to use EM-seq in methylation studies.

Download Full-text

Applications of Multifactor Dimensionality Reduction to Genome-Wide Data Using the R Package ‘MDR’

Methods in Molecular Biology - Genome-Wide Association Studies and Genomic Prediction ◽

10.1007/978-1-62703-447-0_23 ◽

2013 ◽

pp. 479-498 ◽

Cited By ~ 1

Author(s):

Stacey Winham

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

R Package ◽

Genome Wide ◽

Genome Wide Data

Download Full-text

Patents and Genome-Wide DNA Sequence Analysis: Is it Safe to Go into the Human Genome?

The Journal of Law Medicine & Ethics ◽

10.1111/jlme.12161 ◽

2014 ◽

Vol 42 (S1) ◽

pp. 42-50 ◽

Cited By ~ 6

Author(s):

Robert Cook-Deegan ◽

Subhashini Chandrasekharan

Keyword(s):

Clinical Applications ◽

Patent Infringement ◽

Whole Genome ◽

Whole Genome Analysis ◽

Business Decisions ◽

Gene Patents ◽

Clinical Laboratories ◽

Genome Wide ◽

Human Genes ◽

Genome Analyses

Whether, and to what degree, do patents granted on human genes cast a shadow of uncertainty over genomics and its applications? Will owners of patents on individual genes or clusters of genes sue those performing whole-genome analyses on human samples for patent infringement? These are related questions that have haunted molecular diagnostics companies and services, coloring scientific, clinical, and business decisions. Can the profusion of whole-genome analysis methods proceed without fear of patent infringement liability?Whole-genome sequencing (WGS) is proceeding apace. Academic centers have been performing whole-genome and -exome sequencing (WES) in research for at least five years, and academic clinical laboratories with national reach have been doing sequencing for clinical applications for almost as long. Companies have also been offering WGS and WES as a clinical service for a few years now. So far as we know, no one has been sued for infringement of “gene patents” for performing WGS.

Download Full-text

Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics

Scientific Reports ◽

10.1038/srep24949 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 13

Author(s):

Erdogan Taskesen ◽

Sjoerd M. H. Huisman ◽

Ahmed Mahfouz ◽

Jesse H. Krijthe ◽

Jeroen de Ridder ◽

...

Keyword(s):

Therapy Response ◽

Visual Exploration ◽

Molecular Characteristics ◽

Cancer Type ◽

Breast Cancers ◽

Data Types ◽

Multiple Cancer ◽

Genome Wide ◽

Genome Wide Data ◽

Cancer Types

Abstract The use of genome-wide data in cancer research, for the identification of groups of patients with similar molecular characteristics, has become a standard approach for applications in therapy-response, prognosis-prediction, and drug-development. To progress in these applications, the trend is to move from single genome-wide measurements in a single cancer-type towards measuring several different molecular characteristics across multiple cancer-types. Although current approaches shed light on molecular characteristics of various cancer-types, detailed relationships between patients within cancer clusters are unclear. We propose a novel multi-omic integration approach that exploits the joint behavior of the different molecular characteristics, supports visual exploration of the data by a two-dimensional landscape, and inspection of the contribution of the different genome-wide data-types. We integrated 4,434 samples across 19 cancer-types, derived from TCGA, containing gene expression, DNA-methylation, copy-number variation and microRNA expression data. Cluster analysis revealed 18 clusters, where three clusters showed a complex collection of cancer-types, squamous-cell-carcinoma, colorectal cancers, and a novel grouping of kidney-cancers. Sixty-four samples were identified outside their tissue-of-origin cluster. Known and novel patient subgroups were detected for Acute Myeloid Leukemia’s, and breast cancers. Quantification of the contributions of the different molecular types showed that substructures are driven by specific (combinations of) molecular characteristics.

Download Full-text

Whole-genome analysis of Malawian Plasmodium falciparum isolates identifies potential targets of allele-specific immunity to clinical malaria

10.1101/2020.09.16.20196253 ◽

2020 ◽

Author(s):

Zalak Shah ◽

Myo T Naung ◽

Kara A Moser ◽

Matthew Adams ◽

Andrea G Buchwald ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Sequence Data ◽

Clinical Malaria ◽

Whole Genome Sequence ◽

Whole Genome ◽

Whole Genome Analysis ◽

Multiple Alleles ◽

Vaccine Candidates ◽

Genome Wide ◽

Allele Specific

Individuals acquire immunity to clinical malaria after repeated Plasmodium falciparum infections. This immunity to disease is thought to reflect the acquisition of a repertoire of responses to multiple alleles in diverse parasite antigens. In previous studies, we identified polymorphic sites within individual antigens that are associated with parasite immune evasion by examining antigen allele dynamics in individuals followed longitudinally. Here we expand this approach by analyzing genome-wide polymorphisms using whole genome sequence data from 140 parasite isolates representing malaria cases from a longitudinal study in Malawi and identify 25 genes that encode likely targets of naturally acquired immunity and that should be further characterized for their potential as vaccine candidates.

Download Full-text