The Nuclear Receptor Superfamily Has Undergone Extensive Proliferation and Diversification in Nematodes

The nuclear receptor (NR) superfamily is the most abundant class of transcriptional regulators encoded in the Caenorhabditis elegans genome, with >200 predicted genes revealed by the screens and analysis of genomic sequence reported here. This is the largest number of NR genes yet described from a single species, although our analysis of available genomic sequence from the related nematode Caenorhabditis briggsae indicates that it also has a large number. Existing data demonstrate expression for 25% of theC. elegans NR sequences. Sequence conservation and statistical arguments suggest that the majority represent functional genes. An analysis of these genes based on the DNA-binding domain motif revealed that several NR classes conserved in both vertebrates and insects are also represented among the nematode genes, consistent with the existence of ancient NR classes shared among most, and perhaps all, metazoans. Most of the nematode NR sequences, however, are distinct from those currently known in other phyla, and reveal a previously unobserved diversity within the NR superfamily. In C. elegans, extensive proliferation and diversification of NR sequences have occurred on chromosome V, accounting for > 50% of the predicted NR genes.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos.AF083222–AF083225 and AF083251–AF083234.]

Download Full-text

Gene Discovery Using Computational and Microarray Analysis of Transcription in the Drosophila melanogaster Testis

Genome Research ◽

10.1101/gr.159800 ◽

2000 ◽

Vol 10 (12) ◽

pp. 2030-2043

Author(s):

Justen Andrews ◽

Gerard G. Bouffard ◽

Chris Cheadle ◽

Jining Lü ◽

Kevin G. Becker ◽

...

Keyword(s):

Microarray Analysis ◽

Genomic Sequence ◽

Sequence Data ◽

Expression Profiles ◽

Transcript Abundance ◽

Cdna Libraries ◽

Link Type ◽

Microarray Expression ◽

Expressed Sequence ◽

Data Library

Identification and annotation of all the genes in the sequencedDrosophila genome is a work in progress. Wild-type testis function requires many genes and is thus of potentially high value for the identification of transcription units. We therefore undertook a survey of the repertoire of genes expressed in the Drosophilatestis by computational and microarray analysis. We generated 3141 high-quality testis expressed sequence tags (ESTs). Testis ESTs computationally collapsed into 1560 cDNA set used for further analysis. Of those, 11% correspond to named genes, and 33% provide biological evidence for a predicted gene. A surprising 47% fail to align with existing ESTs and 16% with predicted genes in the current genome release. EST frequency and microarray expression profiles indicate that the testis mRNA population is highly complex and shows an extended range of transcript abundance. Furthermore, >80% of the genes expressed in the testis showed onefold overexpression relative to ovaries, or gonadectomized flies. Additionally, >3% showed more than threefold overexpression at p <0.05. Surprisingly, 22% of the genes most highly overexpressed in testis matchDrosophila genomic sequence, but not predicted genes. These data strongly support the idea that sequencing additional cDNA libraries from defined tissues, such as testis, will be important tools for refined annotation of the Drosophila genome. Additionally, these data suggest that the number of genes in Drosophila will significantly exceed the conservative estimate of 13,601.[The sequence data described in this paper have been submitted to the dbEST data library under accession nos.AI944400–AI947263 and BE661985–BE662262.][The microarray data described in this paper have been submitted to the GEO data library under accession nos. GPLS, GSM3–GSM10.]

Download Full-text

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.054171-0 ◽

2014 ◽

Vol 64 (Pt_2) ◽

pp. 316-324 ◽

Cited By ~ 258

Author(s):

Jongsik Chun ◽

Fred A. Rainey

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Original Research ◽

Rrna Gene ◽

New Taxon ◽

Genome Sequences ◽

Microbial World ◽

Content Type ◽

Link Type ◽

Type Strains

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

Download Full-text

Isolation of Zebrafish gdf7 and Comparative Genetic Mapping of Genes Belonging to the Growth/Differentiation Factor 5, 6, 7 Subgroup of the TGF-β Superfamily

Genome Research ◽

10.1101/gr.9.2.121 ◽

1999 ◽

Vol 9 (2) ◽

pp. 121-129

Author(s):

Alan J. Davidson ◽

John H. Postlethwait ◽

Yi-Lin Yan ◽

David R. Beier ◽

Cherie van Doren ◽

...

Keyword(s):

Linkage Group ◽

Sequence Data ◽

Evolutionary Relationships ◽

Mapping Data ◽

Link Type ◽

Differentiation Factor ◽

Mammalian Genes ◽

Comparative Genetic Mapping ◽

Growth Differentiation Factor 5 ◽

Data Library

The Growth/differentiation factor (Gdf)5, 6, 7 genes form a closely related subgroup belonging to the TGF-β superfamily. In zebrafish, there are three genes that belong to the Gdf5, 6, 7subgroup that have been named radar, dynamo, andcontact. The genes radar and dynamo both encode proteins most similar to mouse GDF6. The orthologous identity of these genes on the basis of amino acid similarities has not been clear. We have identified gdf7, a fourth zebrafish gene belonging to the Gdf5, 6, 7 subgroup. To assign correct orthologies and to investigate the evolutionary relationships of the human, mouse, and zebrafish Gdf5, 6, 7subgroup, we have compared genetic map positions of the zebrafish and mammalian genes. We have mapped zebrafish gdf7 to linkage group (LG) 17, contact to LG9, GDF6 to human chromosome (Hsa) 8 and GDF7 to Hsa2p. The radar anddynamo genes have been localized previously to LG16 and LG19, respectively. A comparison of syntenies shared among human, mouse, and zebrafish genomes indicates that gdf7 is the ortholog of mammalian GDF7/Gdf7. LG16 shares syntenic relationships with mouse chromosome (Mmu) 4, including Gdf6. Portions of LG16 and LG19 appear to be duplicate chromosomes, thus suggesting thatradar and dynamo are both orthologs of Gdf6. Finally, the mapping data is consistent with contact being the zebrafish ortholog of mammalian GDF5/Gdf5.[The sequence data described in this paper have been submitted to the GenBank data library under accession numbers AF113022 and AF113023.]

Download Full-text

Comparative Sequence of Human and Mouse BAC Clones from the mnd2 Region of Chromosome 2p13

Genome Research ◽

10.1101/gr.9.1.53 ◽

1999 ◽

Vol 9 (1) ◽

pp. 53-61 ◽

Cited By ~ 9

Author(s):

Wonhee Jang ◽

Axin Hua ◽

Sandra V. Spilson ◽

Webb Miller ◽

Bruce A. Roe ◽

...

Keyword(s):

Genomic Dna ◽

Genomic Sequence ◽

Sequence Data ◽

Lysyl Oxidase ◽

Neuromuscular Disorder ◽

Bac Clone ◽

Link Type ◽

Sequence Elements ◽

Human And Mouse ◽

Mouse Genomic

The mnd2 mutation on mouse chromosome 6 produces a progressive neuromuscular disorder. To determine the gene content of the 400-kb mnd2 nonrecombinant region, we sequenced 108 kb of mouse genomic DNA and 92 kb of human genomic sequence from the corresponding region of chromosome 2p13.3. Three genes with the indicated sizes and intergenic distances were identified:D6Mm5e (⩾81 kb)–787 bp–DOK (2 kb)–845 bp–LOR2 (⩾6 kb). D6Mm5e is expressed in many tissues at very low abundance and the predicted 526-residue protein contains no known functional domains. DOK encodes the p62dok rasGAP binding protein involved in signal transduction. LOR2 encodes a novel lysyl oxidase-related protein of 757 amino acid residues. We describe a simple search protocol for identification of conserved internal exons in genomic sequence. Evolutionary conservation proved to be a useful criterion for distinguishing between authentic exons and artifactual products obtained by exon amplification, RT–PCR, and 5′ RACE. Conserved noncoding sequence elements longer than 80 bp with ⩾75% nucleotide sequence identity comprise ∼1% of the genomic sequence in this region. Comparative analysis of this human and mouse genomic DNA sequence was an efficient method for gene identification and is independent of developmental stage or quantitative level of gene expression.[The sequence data described in this paper have been submitted to the GenBank data library under the following accession numbers: AC003061, mouse BAC clone 245c12; AC003065, human BAC clone h173(E10); AF053368, mouse Lor2 cDNA; AF084363, 108-kb contig from mouse BAC 245c12; AF084364, mouse D6Mm5ecDNA.]

Download Full-text

The Complex Repeats of Dictyostelium discoideum

Genome Research ◽

10.1101/gr.162201 ◽

2001 ◽

Vol 11 (4) ◽

pp. 585-594

Author(s):

Gernot Glöckner ◽

Karol Szafranski ◽

Thomas Winckler ◽

Theodor Dingermann ◽

Michael A. Quail ◽

...

Keyword(s):

Transposable Elements ◽

Dictyostelium Discoideum ◽

Copy Number ◽

Sequence Data ◽

Repetitive Elements ◽

Data Resource ◽

Valuable Data ◽

Link Type ◽

Small Complex ◽

Data Library

In the course of determining the sequence of the Dictyostelium discoideum genome we have characterized in detail the quantity and nature of interspersed repetitive elements present in this species. Several of the most abundant small complex repeats and transposons (DIRS-1; TRE3-A,B; TRE5-A; skipper; Tdd-4; H3R) have been described previously. In our analysis we have identified additional elements. Thus, we can now present a complete list of complex repetitive elements in D. discoideum. All elements add up to 10% of the genome. Some of the newly described elements belong to established classes (TRE3-C, D; TRE5-B,C; DGLT-A,P; Tdd-5). However, we have also defined two new classes of DNA transposable elements (DDT and thug) that have not been described thus far. Based on the nucleotide amount, we calculated the least copy number in each family. These vary between <10 up to >200 copies. Unique sequences adjacent to the element ends and truncation points in elements gave a measure for the fragmentation of the elements. Furthermore, we describe the diversity of single elements with regard to polymorphisms and conserved structures. All elements show insertion preference into loci in which other elements of the same family reside. The analysis of the complex repeats is a valuable data resource for the ongoing assembly of whole D. discoideum chromosomes.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF135841, AF298201, AF298202, AF298203, AF298204,AF298205, AF298206, AF298207, AF298208, AF298209, AF298210 and AF298624.]

Download Full-text

Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results

10.1101/191783 ◽

2017 ◽

Author(s):

Gil Alterovitz ◽

Dennis Dean ◽

Carole Goble ◽

Michael R. Crusoe ◽

Stian Soiland-Reyes ◽

...

Keyword(s):

Precision Medicine ◽

High Throughput Sequencing ◽

Genomic Sequence ◽

Sequence Data ◽

Data Provenance ◽

Provenance Analysis ◽

Link Type ◽

Sequencing Studies ◽

Standardized Reporting ◽

Personalized Approach

AbstractA personalized approach based on a patient’s or pathogen’s unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to FAIR guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet lab procedures to computational methods. The BioCompute framework (https://osf.io/zm97b/) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCO) offer that standard, and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the “Open-Stand.org principles for collaborative open standards development”. By communication of high-throughput sequencing studies using a BCO, regulatory agencies (e.g., FDA), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next generation sequencing workflow exchange, reporting, and regulatory reviews.

Download Full-text

Genomes of the class Erysipelotrichia clarify the firmicute origin of the class Mollicutes

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.048983-0 ◽

2013 ◽

Vol 63 (Pt_7) ◽

pp. 2727-2741 ◽

Cited By ~ 28

Author(s):

James J. Davis ◽

Fangfang Xia ◽

Ross A. Overbeek ◽

Gary J. Olsen

Keyword(s):

Ribosomal Proteins ◽

Genomic Sequence ◽

Sequence Data ◽

Taxonomic Revision ◽

Trna Synthetase ◽

Microbial Evolution ◽

23S Rrna ◽

Content Type ◽

Link Type ◽

Metabolic Functions

The tree of life is paramount for achieving an integrated understanding of microbial evolution and the relationships between physiology, genealogy and genomics. It provides the framework for interpreting environmental sequence data, whether applied to microbial ecology or to human health. However, there remain many instances where there is ambiguity in our understanding of the phylogeny of major lineages, and/or confounding nomenclature. Here we apply recent genomic sequence data to examine the evolutionary history of members of the classes Mollicutes (phylum Tenericutes ) and Erysipelotrichia (phylum Firmicutes ). Consistent with previous analyses, we find evidence of a specific relationship between them in molecular phylogenies and signatures of the 16S rRNA, 23S rRNA, ribosomal proteins and aminoacyl-tRNA synthetase proteins. Furthermore, by mapping functions over the phylogenetic tree we find that the erysipelotrichia lineages are involved in various stages of genomic reduction, having lost (often repeatedly) a variety of metabolic functions and the ability to form endospores. Although molecular phylogeny has driven numerous taxonomic revisions, we find it puzzling that the most recent taxonomic revision of the phyla Firmicutes and Tenericutes has further separated them into distinct phyla, rather than reflecting their common roots.

Download Full-text

rMAP: the Rapid Microbial Analysis Pipeline for ESKAPE bacterial group whole-genome sequence data

Microbial Genomics ◽

10.1099/mgen.0.000583 ◽

2021 ◽

Vol 7 (6) ◽

Author(s):

Ivan Sserwadda ◽

Gerald Mboowa

Keyword(s):

Low Income ◽

Type Species ◽

Genomic Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Analysis Pipeline ◽

Content Type ◽

Link Type ◽

Microbial Analysis

The recent re-emergence of multidrug-resistant pathogens has exacerbated their threat to worldwide public health. The evolution of the genomics era has led to the generation of huge volumes of sequencing data at an unprecedented rate due to the ever-reducing costs of whole-genome sequencing (WGS). We have developed the Rapid Microbial Analysis Pipeline (rMAP), a user-friendly pipeline capable of profiling the resistomes of ESKAPE pathogens ( Enterococcus faecium , Staphylococcus aureus , Klebsiella pneumoniae , Acinetobacter baumannii , Pseudomonas aeruginosa and Enterobacter species) using WGS data generated from Illumina’s sequencing platforms. rMAP is designed for individuals with little bioinformatics expertise, and automates the steps required for WGS analysis directly from the raw genomic sequence data, including adapter and low-quality sequence read trimming, de novo genome assembly, genome annotation, single-nucleotide polymorphism (SNP) variant calling, phylogenetic inference by maximum likelihood, antimicrobial resistance (AMR) profiling, plasmid profiling, virulence factor determination, multi-locus sequence typing (MLST), pangenome analysis and insertion sequence characterization (IS). Once the analysis is finished, rMAP generates an interactive web-like html report. rMAP installation is very simple, it can be run using very simple commands. It represents a rapid and easy way to perform comprehensive bacterial WGS analysis using a personal laptop in low-income settings where high-performance computing infrastructure is limited.

Download Full-text

Comparative Sequence Analysis of Human Minisatellites Showing Meiotic Repeat Instability

Genome Research ◽

10.1101/gr.9.2.130 ◽

1999 ◽

Vol 9 (2) ◽

pp. 130-136 ◽

Cited By ~ 4

Author(s):

John Murray ◽

Jérôme Buard ◽

David L. Neil ◽

Edouard Yeramian ◽

Keiji Tamaki ◽

...

Keyword(s):

Sequence Analysis ◽

Sequence Data ◽

Comparative Sequence Analysis ◽

Gene Promoters ◽

Repeat Instability ◽

Repeat Array ◽

Comparative Sequence ◽

Link Type ◽

Genomic Environment ◽

Data Library

The highly variable human minisatellites MS32 (D1S8), MS31A (D7S21), and CEB1 (D2S90) all show recombination-based repeat instability restricted to the germline. Mutation usually results in polar interallelic conversion or occasionally in crossovers, which, at MS32 at least, extend into DNA flanking the repeat array, defining a localized recombination hotspot and suggesting that cis-acting elements in flanking DNA can influence repeat instability. Therefore, comparative sequence analysis was performed to search for common flanking elements associated with these unstable loci. All three minisatellites are located in GC-rich DNA abundant in dispersed and tandem repetitive elements. There were no significant sequence similarities between different loci upstream of the unstable end of the repeat array. Only one of the three loci showed clear evidence for putative coding sequences near the minisatellite. No consistent patterns of thermal stability or DNA secondary structure were shared by DNA flanking these loci. This work extends previous data on the genomic environment of minisatellites. In addition, this work suggests that recombinational activity is not controlled by primary or secondary characteristics of the DNA sequence flanking the repeat array and is not obviously associated with gene promoters as seen in yeast.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF048727(CEB1), AF048728 (MS31A), and AF048729 (MS32).]

Download Full-text

Finding New Human Minisatellite Sequences in the Vicinity of Long CA-Rich Sequences

Genome Research ◽

10.1101/gr.9.7.647 ◽

1999 ◽

Vol 9 (7) ◽

pp. 647-653 ◽

Cited By ~ 1

Author(s):

Fabienne Giraudeau ◽

Elisabeth Petit ◽

Hervé Avet-Loiseau ◽

Yolande Hauck ◽

Gilles Vergnaud ◽

...

Keyword(s):

Human Genome ◽

Sequence Data ◽

Chromosome 1 ◽

Chromosomal Distribution ◽

Repeat Sequences ◽

Link Type ◽

Tandem Repeat Sequences ◽

Sequences Analysis ◽

Data Library ◽

Chromosomal Bands

Microsatellites and minisatellites are two classes of tandem repeat sequences differing in their size, mutation processes, and chromosomal distribution. The boundary between the two classes is not defined. We have developed a convenient, hybridization-based human library screening procedure able to detect long CA-rich sequences. Analysis of cosmid clones derived from a chromosome 1 library show that cross-hybridizing sequences tested are imperfect CA-rich sequences, some of them showing a minisatellite organization. All but one of the 13 positive chromosome 1 clones studied are localized in chromosomal bands to which minisatellites have previously been assigned, such as the 1pter cluster. To test the applicability of the procedure to minisatellite detection on a larger scale, we then used a large-insert whole-genome PAC library. Altogether, 22 new minisatellites have been identified in positive PAC and cosmid clones and 20 of them are telomeric. Among the 42 positive PAC clones localized within the human genome by FISH and/or linkage analysis, 25 (60%) are assigned to a terminal band of the karyotype, 4 (9%) are juxtacentromeric, and 13 (31%) are interstitial. The localization of at least two of the interstitial PAC clones corresponds to previously characterized minisatellite-containing regions and/or ancestrally telomeric bands, in agreement with this minisatellite-like distribution. The data obtained are in close agreement with the parallel investigation of human genome sequence data and suggest that long human (CA)s are imperfect CA repeats belonging to the minisatellite class of sequences. This approach provides a new tool to efficiently target genomic clones originating from subtelomeric domains, from which minisatellite sequences can readily be obtained.[The sequence data described in this paper have been submitted to the EMBL data library under accession nos.AJ000377–AJ000383.]

Download Full-text