MADAP, a flexible clustering tool for the interpretation of one-dimensional genome annotation data

Abstract0.1MotivationPotential transcription factor (TF) complexes may be identified by testing whether the binding sequences of individual TF proteins form clusters with each other. These clusters may also indicate TF inhibition due to competitive occupancy of enhancer regions. Genome annotation data containing the coordinates of enhancer sequences is highly accessible via position-weight matrix tools.0.2ResultsAn algorithm called CCSeq (Clusters of Colocalized Sequences) was developed for identifying clusters of sequences along a one-dimensional line, such as a chromosome, given genome annotation files and a cut-off distance as inputs. The algorithm was applied to the binding sequences of the constituent proteins of two known transcription factor complexes, the HSF1 homotrimer and one form of the NF-κB complex, a dimer of NFKB2 and RELB. 28 clusters of HSF1 trimer binding sequences were identified on chromosome Y, and 16 clusters of the NFKB2 and RELB dimer were identified on chromosome 17, compared to 0 clusters identified in any of the five simulated random distributions for each of the two sets of TF proteins. Additionally, structural patterns of these binding sequence clusters are described.0.3Availability and ImplementationThis algorithm is freely available as an R package on the open source R repository CRAN at the following link: https://cran.r-project.org/package=colocalized. Genome annotation files were obtained from the PWMScan tool at https://ccg.epfl.ch/pwmtools/pwmscan.php hosted by the Swiss Insitute of Bioinformatics (2) (3).

Download Full-text

AnnoGen: annotating genome-wide pragmatic features

Bioinformatics ◽

10.1093/bioinformatics/btaa027 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2899-2901

Author(s):

Quanhu Sheng ◽

Hui Yu ◽

Olufunmilola Oyebamiji ◽

Jiandong Wang ◽

Danqian Chen ◽

...

Keyword(s):

Genome Annotation ◽

Reference Genome ◽

Bioinformatics Analysis ◽

Sequence Information ◽

Genomic Features ◽

Single Base ◽

Annotation Data ◽

Genome Wide ◽

Genomic Regions ◽

First Time

Abstract Motivation Genome annotation is an important step for all in-depth bioinformatics analysis. It is imperative to augment quantity and diversity of genome-wide annotation data for the latest reference genome to promote its adoption by ongoing and future impactful studies. Results We developed a python toolkit AnnoGen, which at the first time, allows the annotation of three pragmatic genomic features for the GRCh38 genome in enormous base-wise quantities. The three features are chemical binding Energy, sequence information Entropy and Homology Score. The Homology Score is an exceptional feature that captures the genome-wide homology through single-base-offset tiling windows of 100 continual nucleotide bases. AnnoGen is capable of annotating the proprietary pragmatic features for variable user-interested genomic regions and optionally comparing two parallel sets of genomic regions. AnnoGen is characterized with simple utility modes and succinct HTML report of informative statistical tables and plots. Availability and implementation https://github.com/shengqh/annogen.

Download Full-text

Relational database index choices for genome annotation data

2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) ◽

10.1109/bibmw.2010.5703810 ◽

2010 ◽

Cited By ~ 3

Author(s):

Oleksiy Karpenko ◽

Yang Dai

Keyword(s):

Relational Database ◽

Genome Annotation ◽

Annotation Data ◽

Database Index

Download Full-text

Using Chado to Store Genome Annotation Data

Current Protocols in Bioinformatics ◽

10.1002/0471250953.bi0906s12 ◽

2006 ◽

Cited By ~ 7

Author(s):

Pinglei Zhou ◽

David Emmert ◽

Peili Zhang

Keyword(s):

Genome Annotation ◽

Annotation Data

Download Full-text

GFF3sort: a novel tool to sort GFF3 files for tabix indexing

10.1101/145938 ◽

2017 ◽

Author(s):

Tao Zhu ◽

Chengzhen Liang ◽

Zhigang Meng ◽

Sandui Guo ◽

Rui Zhang

Keyword(s):

Data Processing ◽

Genome Annotation ◽

Traditional Method ◽

Gene Annotation ◽

Conversion Process ◽

Start Position ◽

Annotation Data ◽

Novel Strategy ◽

Parent Child

AbstractBackground:The traditional method of visualizing gene annotation data in JBrowse is converting GFF3 files to JSON format, which is time-consuming. The latest version of JBrowse supports rendering sorted GFF3 files indexed by tabix, a novel strategy that is more convenient than the original conversion process. However, current tools available for GFF3 file sorting have some limitations and their sorting results would lead to erroneous rendering in JBrowse.Results:We developed GFF3sort, a script to sort GFF3 files for tabix indexing. Specifically designed for JBrowse rendering, GFF3sort can properly deal with the order of features that have the same chromosome and start position, either by remembering their original orders or by conducting parent-child topology sorting. Based on our test datasets from seven species, GFF3sort produced accurate sorting results with acceptable efficiency compared with currently available tools.Conclusions:GFF3sort is a novel tool to sort GFF3 files for tabix indexing. We anticipate that GFF3sort will be useful to help with genome annotation data processing and visualization.

Download Full-text

MetAlgNet :Metabolic pathway network reconstruction from algae genome annotation data.

10.3390/mol2net-1-f003 ◽

2015 ◽

Author(s):

Kirtan Dave ◽

Darshan Choksi ◽

Hetalkumar Panchal

Keyword(s):

Metabolic Pathway ◽

Genome Annotation ◽

Network Reconstruction ◽

Pathway Network ◽

Annotation Data ◽

Metabolic Pathway Network

Download Full-text

A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data

10.1101/018093 ◽

2015 ◽

Cited By ~ 1

Author(s):

Qiongshi Lu ◽

Yiming Hu ◽

Jiehuan Sun ◽

Yuwei Cheng ◽

Kei-Hoi Cheung ◽

...

Keyword(s):

Human Genome ◽

Genome Annotation ◽

Human Genetics ◽

Integrated Analysis ◽

Whole Genome ◽

Coding Regions ◽

Functional Regions ◽

Statistical Framework ◽

Annotation Data ◽

High Throughput Experiments

Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu

Download Full-text

A one-dimensional self-gravitating stellar gas

Symposium - International Astronomical Union ◽

10.1017/s0074180900105261 ◽

1966 ◽

Vol 25 ◽

pp. 46-48 ◽

Cited By ~ 2

Author(s):

M. Lecar

Keyword(s):

Gravitational Field ◽

Phase Space ◽

Motion Picture ◽

Space Distribution ◽

Two Dimensional ◽

One Dimensional ◽

Phase Space Distribution ◽

Dimensional Phase Space ◽

The Mean

“Dynamical mixing”, i.e. relaxation of a stellar phase space distribution through interaction with the mean gravitational field, is numerically investigated for a one-dimensional self-gravitating stellar gas. Qualitative results are presented in the form of a motion picture of the flow of phase points (representing homogeneous slabs of stars) in two-dimensional phase space.

Download Full-text

Electron-Mirror Microscopic Aspects of Ferroelectric Domains of BaTiO3 And Ca2Sr (C2H5CO2)6

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100069387 ◽

1970 ◽

Vol 28 ◽

pp. 478-479

Author(s):

Teruo Someya ◽

Jinzo Kobayashi

Keyword(s):

Surface Layer ◽

Electric Fields ◽

Quantitative Study ◽

Recent Progress ◽

Ferroelectric Domain ◽

Resolving Power ◽

Surface Pattern ◽

Crystal Surfaces ◽

One Dimensional ◽

Ferroelectric Domains

Recent progress in the electron-mirror microscopy (EMM), e.g., an improvement of its resolving power together with an increase of the magnification makes it useful for investigating the ferroelectric domain physics. English has recently observed the domain texture in the surface layer of BaTiO3. The present authors ) have developed a theory by which one can evaluate small one-dimensional electric fields and/or topographic step heights in the crystal surfaces from their EMM pictures. This theory was applied to a quantitative study of the surface pattern of BaTiO3).

Download Full-text