Comparative Sequence Analysis of the X-Inactivation Center Region in Mouse, Human, and Bovine

We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including theXist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5′ of Xist that was recently shown to attract histone modification early after the onset of X inactivation.[The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AJ421478, AJ421479, AJ421480, andAJ421481. Online supplemental data are available athttp://pbil.univ-lyon1.fr/datasets/Xic2002/data.html andwww.genome.org.]

Download Full-text

Retroposed New Genes Out of the X in Drosophila

Genome Research ◽

10.1101/gr.604902 ◽

2002 ◽

Vol 12 (12) ◽

pp. 1854-1859

Author(s):

Esther Betrán ◽

Kevin Thornton ◽

Manyuan Long

Keyword(s):

Population Genetics ◽

Molecular Mechanisms ◽

Sequence Data ◽

Evolutionary Process ◽

Significant Excess ◽

Link Type ◽

New Genes ◽

Asymmetric Pattern ◽

Unpublished Information

New genes that originated by various molecular mechanisms are an essential component in understanding the evolution of genetic systems. We investigated the pattern of origin of the genes created by retroposition in Drosophila. We surveyed the wholeDrosophila melanogaster genome for such new retrogenes and experimentally analyzed their functionality and evolutionary process. These retrogenes, functional as revealed by the analysis of expression, substitution, and population genetics, show a surprisingly asymmetric pattern in their origin. There is a significant excess of retrogenes that originate from the X chromosome and retropose to autosomes; new genes retroposed from autosomes are scarce. Further, we found that most of these X-derived autosomal retrogenes had evolved a testis expression pattern. These observations may be explained by natural selection favoring those new retrogenes that moved to autosomes and avoided the spermatogenesis X inactivation, and suggest the important role of genome position for the origin of new genes.[The sequence data from this study have been submitted to GenBank under accession nos. AY150701–AY150797. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: M.-L. Wu, F. Lemeunier, and P. Gibert.]

Download Full-text

A Random Sequencing Approach for the Analysis of the Trypanosoma cruzi Genome: General Structure, Large Gene and Repetitive DNA Families, and Gene Discovery

Genome Research ◽

10.1101/gr.146300 ◽

2000 ◽

Vol 10 (12) ◽

pp. 1996-2005

Author(s):

Fernán Agüero ◽

Ramiro E. Verdún ◽

Alberto Carlos C. Frasch ◽

Daniel O. Sánchez

Keyword(s):

Trypanosoma Cruzi ◽

Repetitive Dna ◽

Sequence Data ◽

Random Sequence ◽

General Structure ◽

Gc Content ◽

Haploid Genome ◽

Additional Information ◽

Link Type ◽

New Genes

A random sequence survey of the genome of Trypanosoma cruzi, the agent of Chagas disease, was performed and 11,459 genomic sequences were obtained, resulting in ∼4.3 Mb of readable sequences or ∼10% of the parasite haploid genome. The estimated total GC content was 50.9%, with a high representation of A and T di- and trinucleotide repeats. Out of the estimated 5000 parasite genes, 947 putative new genes were identified. Another 1723 sequences corresponded to genes detected previously in T. cruzi through expression sequence tag analysis. 7735 sequences had no matches in the database, but the presence of open reading frames that passed Fickett's test suggests that some might contain coding DNA. The survey was highly redundant, with ∼35% of the sequences included in a few large sequence families. Some of them code for protein families present in dozens of copies, including proteins essential for parasite survival and retrotransposons. Other sequence families include repetitive DNA present in thousands of copies per haploid genome. Some families in the latter group are new, parasite-specific, repetitive DNAs. These results suggest that T. cruzi could constitute an interesting model to analyze gene and genome evolution due to its plasticity in terms of sequence amplification and divergence. Additional information can be found at http://www.iib.unsam.edu.ar/tcruzi.gss.html.[The sequence data described in this paper have been submitted to the dbGSS database under the following GenBank accession nos.:AQ443439–AQ443513, AQ443743–AQ445667, AQ902981–AQ911366,AZ049857–AZ051184, and AZ302116–AZ302563.]

Download Full-text

Comparative Sequence Analysis of Human Minisatellites Showing Meiotic Repeat Instability

Genome Research ◽

10.1101/gr.9.2.130 ◽

1999 ◽

Vol 9 (2) ◽

pp. 130-136 ◽

Cited By ~ 4

Author(s):

John Murray ◽

Jérôme Buard ◽

David L. Neil ◽

Edouard Yeramian ◽

Keiji Tamaki ◽

...

Keyword(s):

Sequence Analysis ◽

Sequence Data ◽

Comparative Sequence Analysis ◽

Gene Promoters ◽

Repeat Instability ◽

Repeat Array ◽

Comparative Sequence ◽

Link Type ◽

Genomic Environment ◽

Data Library

The highly variable human minisatellites MS32 (D1S8), MS31A (D7S21), and CEB1 (D2S90) all show recombination-based repeat instability restricted to the germline. Mutation usually results in polar interallelic conversion or occasionally in crossovers, which, at MS32 at least, extend into DNA flanking the repeat array, defining a localized recombination hotspot and suggesting that cis-acting elements in flanking DNA can influence repeat instability. Therefore, comparative sequence analysis was performed to search for common flanking elements associated with these unstable loci. All three minisatellites are located in GC-rich DNA abundant in dispersed and tandem repetitive elements. There were no significant sequence similarities between different loci upstream of the unstable end of the repeat array. Only one of the three loci showed clear evidence for putative coding sequences near the minisatellite. No consistent patterns of thermal stability or DNA secondary structure were shared by DNA flanking these loci. This work extends previous data on the genomic environment of minisatellites. In addition, this work suggests that recombinational activity is not controlled by primary or secondary characteristics of the DNA sequence flanking the repeat array and is not obviously associated with gene promoters as seen in yeast.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF048727(CEB1), AF048728 (MS31A), and AF048729 (MS32).]

Download Full-text

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.054171-0 ◽

2014 ◽

Vol 64 (Pt_2) ◽

pp. 316-324 ◽

Cited By ~ 258

Author(s):

Jongsik Chun ◽

Fred A. Rainey

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Original Research ◽

Rrna Gene ◽

New Taxon ◽

Genome Sequences ◽

Microbial World ◽

Content Type ◽

Link Type ◽

Type Strains

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

Download Full-text

Machine learning can differentiate venom toxins from other proteins having non-toxic physiological functions

PeerJ Computer Science ◽

10.7717/peerj-cs.90 ◽

2016 ◽

Vol 2 ◽

pp. e90 ◽

Cited By ~ 24

Author(s):

Ranko Gacesa ◽

David J. Barlow ◽

Paul F. Long

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Biological Data ◽

Biological Databases ◽

Web Based ◽

Physiological Functions ◽

Link Type ◽

Venom Toxins ◽

Venomous Animals ◽

Toxin Protein

Ascribing function to sequence in the absence of biological data is an ongoing challenge in bioinformatics. Differentiating the toxins of venomous animals from homologues having other physiological functions is particularly problematic as there are no universally accepted methods by which to attribute toxin function using sequence data alone. Bioinformatics tools that do exist are difficult to implement for researchers with little bioinformatics training. Here we announce a machine learning tool called ‘ToxClassifier’ that enables simple and consistent discrimination of toxins from non-toxin sequences with >99% accuracy and compare it to commonly used toxin annotation methods. ‘ToxClassifer’ also reports the best-hit annotation allowing placement of a toxin into the most appropriate toxin protein family, or relates it to a non-toxic protein having the closest homology, giving enhanced curation of existing biological databases and new venomics projects. ‘ToxClassifier’ is available for free, either to download (https://github.com/rgacesa/ToxClassifier) or to use on a web-based server (http://bioserv7.bioinfo.pbf.hr/ToxClassifier/).

Download Full-text

MHC-Linked Olfactory Receptor Loci Exhibit Polymorphism and Contribute to Extended HLA/OR-Haplotypes

Genome Research ◽

10.1101/gr.120400 ◽

2000 ◽

Vol 10 (12) ◽

pp. 1968-1978 ◽

Cited By ~ 1

Author(s):

Anke Ehlers ◽

Stephan Beck ◽

Simon A. Forbes ◽

John Trowsdale ◽

Armin Volz ◽

...

Keyword(s):

Olfactory Receptor ◽

Sequence Data ◽

Allelic Variation ◽

Mate Preferences ◽

Coding Region ◽

Link Type ◽

Or Gene ◽

Hla Haplotypes ◽

A Minor ◽

Or Genes

Clusters of olfactory receptor (OR) genes are found on most human chromosomes. They are one of the largest mammalian multigene families. Here, we report a systematic study of polymorphism of OR genes belonging to the largest fully sequenced OR cluster. The cluster contains 36 OR genes, of which two belong to the vomeronasal 1 (V1-OR) family. The cluster is divided into a major and a minor region at the telomeric end of the HLA complex on chromosome 6. These OR genes could be involved in MHC-related mate preferences. The polymorphism screen was carried out with 13 genes from the HLA-linked OR cluster and three genes from chromosomes 7, 17, and 19 as controls. Ten human cell lines, representing 18 different chromosome 6s, were analyzed. They were from various ethnic origins and exhibited different HLA haplotypes. All OR genes tested, including those not linked to the HLA complex, were polymorphic. These polymorphisms were dispersed along the coding region and resulted in up to seven alleles for a given OR gene. Three polymorphisms resulted either in stop codons (genes hs6M1-4P,hs6M1-17) or in a 16–bp deletion (gene hs6M1-19P), possibly leading to lack of ligand recognition by the respective receptors in the cell line donors. In total, 13 HLA-linked OR haplotypes could be defined. Therefore, allelic variation appears to be a general feature of human OR genes.[The sequence data reported in this paper have been submitted to EMBL under accession nos. AC006137, AC004178, AJ132194, AL022727, AL031983,AL035402, AL035542, Z98744, CAB55431, AL050339, AL035402, AL096770,AL133267, AL121944, Z98745, AL021808, and AL021807.]

Download Full-text

Identification and Characterization of the Potential Promoter Regions of 1031 Kinds of Human Genes

Genome Research ◽

10.1101/gr.164001 ◽

2001 ◽

Vol 11 (5) ◽

pp. 677-684

Author(s):

Yutaka Suzuki ◽

Tatsuhiko Tsunoda ◽

Jun Sese ◽

Hirotoshi Taira ◽

Junko Mizushima-Sugano ◽

...

Keyword(s):

Large Scale ◽

Expression Patterns ◽

Cpg Islands ◽

Genomic Sequences ◽

Cdna Libraries ◽

Promoter Regions ◽

Specific Expression ◽

Link Type ◽

Potential Promoter ◽

E Boxes

To understand the mechanism of transcriptional regulation, it is essential to identify and characterize the promoter, which is located proximal to the mRNA start site. To identify the promoters from the large volumes of genomic sequences, we used mRNA start sites determined by a large-scale sequencing of the cDNA libraries constructed by the “oligo-capping” method. We aligned the mRNA start sites with the genomic sequences and retrieved adjacent sequences as potential promoter regions (PPRs) for 1031 genes. The PPR sequences were searched to determine the frequencies of major promoter elements. Among 1031 PPRs, 329 (32%) contained TATA boxes, 872 (85%) contained initiators, 999 (97%) contained GC box, and 663 (64%) contained CAAT box. Furthermore, 493 (48%) PPRs were located in CpG islands. This frequency of CpG islands was reduced in TATA+/Inr+PPRs and in the PPRs of ubiquitously expressed genes. In the PPRs of the CGM2 gene, the DRA gene, and theTM30pl genes, which showed highly colon specific expression patterns, the consensus sequences of E boxes were commonly observed. The PPRs were also useful for exploring promoter SNPs.[The nucleotide sequences described in this paper have been deposited in the DDBJ, EMBL, and GenBank data libraries under accession nos.AU098358–AU100608.]

Download Full-text

Investigating the Role of Methylation in Silencing of VDR Gene Expression in Normal Cells during Hematopoiesis and in Their Leukemic Counterparts

Cells ◽

10.3390/cells9091991 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1991

Author(s):

Urszula Nowak ◽

Sylwia Janik ◽

Aleksandra Marchwicka ◽

Agnieszka Łaszkiewicz ◽

Agnieszka Jakuszak ◽

...

Keyword(s):

Gene Expression ◽

Blood Cells ◽

Transcriptional Activity ◽

Cpg Islands ◽

Human Umbilical Cord Blood ◽

Human Umbilical Cord ◽

Promoter Regions ◽

Vdr Gene ◽

Hematopoietic Stem ◽

Major Mechanism

(1) Background: Vitamin D receptor (VDR) is present in multiple types of blood cells, and its ligand, 1,25-dihydroxyvitamin D (1,25D), is important for the proper functioning of the immune system. Activity of VDR is higher in hematopoietic stem and progenitor cells than in fully differentiated blood cells of mice and humans. In some human acute myeloid leukemia (AML) blasts, the expression of the VDR gene is also high. The mechanism of silencing the VDR gene expression during differentiation of blood cells has been addressed in this work. (2) Methods: The cells have been obtained using fluorescence activated sorting from murine tissues and from human umbilical cord blood (UCB). Then, the expression of the VDR gene and transcriptional activity of the VDR protein has been tested in real-time polymerase chain reaction (PCR). Eventually, the methylation of VDR promoter regions was tested using bisulfite sequencing. (3) Results: The CpG islands in VDR promoters were not methylated in the cells studied both in mice and in humans. The use of hypomethylating agents had no effect toward expression of human VDR transcripts, but it increased expression of the VDR-target gene, CYP24A1. (4) Conclusions: The expression of the VDR gene and transcriptional activity of the VDR protein varies at successive stages of hematopoietic differentiation in humans and mice, and in blasts from AML patients. The experiments presented in this case indicate that methylation of the promoter region of the VDR gene is not the major mechanism responsible for these differences.

Download Full-text

Transcriptome-wide Identification and Expression Analysis of Brachypodium distachyon Transposons in Response to Viral Infection

Turkish Journal of Agriculture - Food Science and Technology ◽

10.24925/turjaf.v5i10.1156-1160.1260 ◽

2017 ◽

Vol 5 (10) ◽

pp. 1156

Author(s):

Tuğba Gürkök

Keyword(s):

Abiotic Stress ◽

Viral Infection ◽

Stress Resistance ◽

Transcriptional Activity ◽

Brachypodium Distachyon ◽

Breeding Strategies ◽

Rna Seq ◽

Transcription Activity ◽

Transcriptomic Data ◽

Intergenic Regions

Transposable elements (TEs) are the most abundant group of genomic elements in plants that can be found in genic or intergenic regions of their host genomes. Several stimuli such as biotic or abiotic stress have roles in either activating their transcription or transposition. Here the effect of the Panicum mosaic virus (PMV) and its satellite virus (SPMV) infection on the transposon transcription of the Brachypodium distachyon model plant was investigated. To evaluate the transcription activity of TEs, transcriptomic data of mock and virus inoculated plants were compared. Our results indicate that major components of TEs are retroelements in all RNA-seq libraries. The number of transcribed TEs detected in mock inoculated plants is higher than virus inoculated plants. In comparison with mock inoculated plants 13% of the TEs showed at least two folds alteration upon PMV infection and 21% upon PMV+SPMV infection. Rather than inoculation with PMV alone inoculation with PMV+SPMV together also increased various TE encoding transcripts expressions. MuDR-N78C_OS encoding transcript was strongly up-regulated against both PMV and PMV+SPMV infection. The synergism generated by PMV and SPMV together enhanced TE transcripts expressions than PMV alone. It was observed that viral infection induced the transcriptional activity of several transposons. The results suggest that increased expressions of TEs might have a role in response to biotic stress in B. distachyon. Identification of TEs which are taking part in stress can serve useful information for functional genomics and designing novel breeding strategies in developing stress resistance crops.

Download Full-text

Evolutionary characteristics of intergenic transcribed regions indicate widespread noisy transcription in the Poaceae

10.1101/440933 ◽

2018 ◽

Author(s):

John P. Lloyd ◽

Megan J. Bowman ◽

Christina B. Azodi ◽

Rosalie P. Sowers ◽

Gaurav D. Moghe ◽

...

Keyword(s):

Transcriptional Activity ◽

Prediction Models ◽

High Accuracy ◽

Model Systems ◽

Novel Genes ◽

Intergenic Regions ◽

Species Specific ◽

Biochemical Features ◽

Computational Predictions ◽

Functional Phenotype

AbstractExtensive transcriptional activity occurring in unannotated, intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Most ITR sequences are species-specific. Those found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could distinguish between benchmark functional (phenotype genes) and nonfunctional (pseudogenes) sequences with high accuracy based on 44 evolutionary and biochemical features. Based on the prediction models, 584 rice ITRs (8%) are classified as likely functional that tend to have conserved expression and ancient retained duplicates. However, most ITRs do not exhibit sequence or expression conservation across species or following duplication, consistent with computational predictions that suggest 61% ITRs are not under selection. We outline key evolutionary characteristics that are tightly associated with likely-functional ITRs and provide a framework to identify novel genes to improve genome annotation and move toward connecting genotype to phenotype in crop and model systems.

Download Full-text