scholarly journals StateHub-StatePaintR: rapid and reproducible chromatin state evaluation for custom genome annotation

F1000Research ◽  
2020 ◽  
Vol 7 ◽  
pp. 214 ◽  
Author(s):  
Simon G. Coetzee ◽  
Zachary Ramjan ◽  
Huy Q. Dinh ◽  
Benjamin P. Berman ◽  
Dennis J. Hazelett

Genome annotation is critical to understand the function of disease variants, especially for clinical applications. To meet this need there are segmentations available from public consortia reflecting varying unsupervised approaches to functional annotation based on epigenetics data, but there remains a need for transparent, reproducible, and easily interpreted genomic maps of the functional biology of chromatin. We introduce a new methodological framework for defining a combinatorial epigenomic model of chromatin state on a web database, StateHub. In addition, we created an annotation tool for bioconductor, StatePaintR, which accesses these models and uses them to rapidly (on the order of seconds) produce chromatin state segmentations in standard genome browser formats. Annotations are fully documented with change history and versioning, authorship information, and original source files. StatePaintR calculates ranks for each state from next-gen sequencing peak statistics, facilitating variant prioritization, enrichment testing, and other types of quantitative analysis. StateHub hosts annotation tracks for major public consortia as a resource, and allows users to submit their own alternative models.

F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 214 ◽  
Author(s):  
Simon G. Coetzee ◽  
Zachary Ramjan ◽  
Huy Q. Dinh ◽  
Benjamin P. Berman ◽  
Dennis J. Hazelett

Genome annotation is critical to understand the function of disease variants, especially for clinical applications. To meet this need there are segmentations available from public consortia reflecting varying unsupervised approaches to functional annotation based on epigenetics data, but there remains a need for transparent, reproducible, and easily interpreted genomic maps of the functional biology of chromatin. We introduce a new methodological framework for defining a combinatorial epigenomic model of chromatin state on a web database, StateHub. In addition, we created an annotation tool for bioconductor, StatePaintR, which accesses these models and uses them to rapidly (on the order of seconds) produce chromatin state segmentations in standard genome browser formats. Annotations are fully documented with change history and versioning, authorship information, and original source files. StatePaintR calculates ranks for each state from next-gen sequencing peak statistics, facilitating variant prioritization, enrichment testing, and other types of quantitative analysis. StateHub hosts annotation tracks for major public consortia as a resource, and allows users to submit their own alternative models.


2017 ◽  
Author(s):  
Simon G. Coetzee ◽  
Zachary Ramjan ◽  
Huy Q. Dinh ◽  
Benjamin P. Berman ◽  
Dennis J. Hazelett

AbstractGenome annotation is critical to understand the function of disease variants, especially for clinical applications. To meet this need there are segmentations available from public consortia reflecting varying unsupervised approaches to functional annotation based on epigenetics data, but there remains a need for transparent, reproducible, and easily interpreted genomic maps of the functional biology of chromatin. We introduce a new methodological framework for defining a combinatorial epigenomic model of chromatin state on a web database, StateHub. In addition, we created an annotation tool for bioconductor, StatePaintR, which accesses these models and uses them to rapidly (on the order of seconds) produce chromatin state segmentations in standard genome browser formats. Annotations are fully documented with change history and versioning, authorship information, and original source files. StatePaintR calculates ranks for each state from next-gen sequencing peak statistics, facilitating variant prioritization, enrichment testing, and other types of quantitative analysis. StateHub hosts annotation tracks for major public consortia as a resource, and allows users to submit their own alternative models.


Author(s):  
Matteo Chiara ◽  
Pietro Mandreoli ◽  
Marco Antonio Tangaro ◽  
Anna Maria D’Erchia ◽  
Sandro Sorrentino ◽  
...  

Abstract Motivation Clinical applications of genome re-sequencing technologies typically generate large amounts of data that need to be carefully annotated and interpreted to identify genetic variants potentially associated with pathological conditions. In this context, accurate and reproducible methods for the functional annotation and prioritization of genetic variants are of fundamental importance. Results In this paper, we present VINYL, a flexible and fully automated system for the functional annotation and prioritization of genetic variants. Extensive analyses of both real and simulated datasets suggest that VINYL can identify clinically relevant genetic variants in a more accurate manner compared to equivalent state of the art methods, allowing a more rapid and effective prioritization of genetic variants in different experimental settings. As such we believe that VINYL can establish itself as a valuable tool to assist healthcare operators and researchers in clinical genomics investigations. Availability VINYL is available at http://beaconlab.it/VINYL and https://github.com/matteo14c/VINYL. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 17 (10) ◽  
pp. e1009423
Author(s):  
Maxwell W. Libbrecht ◽  
Rachel C. W. Chan ◽  
Michael M. Hoffman

Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.


2019 ◽  
Author(s):  
Damien Farrell ◽  
Joseph Crispell ◽  
Stephen V. Gordon

AbstractMycobacterium bovis AF2122/97 is the reference strain for the bovine tuberculosis bacillus. We here report an update to the M. bovis AF2122/97 genome annotation to reflect 616 new protein identifications which replace many of the old hypothetical coding sequences and proteins of unknown function in the genome. These changes integrate information from functional assignments of orthologous coding sequences in the Mycobacterium tuberculosis H37Rv genome. We have also added 69 additional new gene names.


2017 ◽  
Vol 12 (12) ◽  
pp. 2478-2492 ◽  
Author(s):  
Jason Ernst ◽  
Manolis Kellis

2021 ◽  
Author(s):  
Surya Saha ◽  
Amanda M Cooksey ◽  
Anna K Childers ◽  
Monica Poelchau ◽  
Fiona McCarthy

Genome sequencing of a diverse array of arthropod genomes is already underway and these genomes will be used to study human health, agriculture, biodiversity and ecology. These new genomes are intended to serve as community resources and provide the foundational information that is required to apply omics technologies to a more diverse set of species. However, biologists require genome annotation to use these genomes and derive a better understanding of complex biological systems. Genome annotation incorporates two related but distinct processes: demarcating genes and other elements present in genome sequences (structural annotation); and associating function with genetic elements (functional annotation). While there are well established and freely available workflows for structural annotation of gene identification in newly assembled genomes, workflows for providing the functional annotation required to support functional genomics studies are less well understood. Genome-scale functional annotation is required for functional modeling (enrichment, networks, etc.) and a first-pass genome-wide functional annotation effort can rapidly identify under-represented gene sets for focused community annotation efforts. We present an open source, open access and containerized pipeline for genome-scale functional annotation of insect proteomes and apply it to a diverse range of arthropod species. We show that the performance of the predictions is consistent across a set of arthropod genomes with varying assembly and annotation quality. Complete instructions for running each component of the functional annotation pipeline on the command line, a high performance computing cluster and the CyVerse Discovery Environment can be found at the readthedocs site (https://agbase-docs.readthedocs.io/en/latest/agbase/workflow.html).


Molecules ◽  
2021 ◽  
Vol 26 (17) ◽  
pp. 5362
Author(s):  
Anahí Martínez-Cárdenas ◽  
Yuridia Cruz-Zamora ◽  
Carlos A. Fajardo-Hernández ◽  
Rodrigo Villanueva-Silva ◽  
Felipe Cruz-García ◽  
...  

The marine-facultative Aspergillus sp. MEXU 27854, isolated from the Caleta Bay in Acapulco, Guerrero, Mexico, has provided an interesting diversity of secondary metabolites, including a series of rare dioxomorpholines, peptides, and butyrolactones. Here, we report on the genomic data, which consists of 11 contigs (N50~3.95 Mb) with a ~30.75 Mb total length of assembly. Genome annotation resulted in the prediction of 10,822 putative genes. Functional annotation was accomplished by BLAST searching protein sequences with different public databases. Of the predicted genes, 75% were assigned gene ontology terms. From the 67 BGCs identified, ~60% belong to the NRPS and NRPS-like classes. Putative BGCs for the dioxomorpholines and other metabolites were predicted by extensive genome mining. In addition, metabolomic molecular networking analysis allowed the annotation of all isolated compounds and revealed the biosynthetic potential of this fungus. This work represents the first report of whole-genome sequencing and annotation from a marine-facultative fungal strain isolated from Mexico.


Sign in / Sign up

Export Citation Format

Share Document