intergenic distance
Recently Published Documents


TOTAL DOCUMENTS

6
(FIVE YEARS 1)

H-INDEX

3
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Lotte J U Pronk ◽  
Marnix H Medema

Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic. However, because of marked differences in gene structure, prokaryotic gene prediction tools fail to accurately predict eukaryotic genes. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in gene structure. We first developed a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated accuracy of 97%, this classifier with principled features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By re-training our classifier with Tiara predictions as additional feature, weaknesses of both types of classifiers are compensated; the result is an enhanced classifier that outperforms all individual classifiers, with an F1-score of 1.00 on precision, recall and accuracy for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endosphere microbial community, we show how using Whokaryote to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Our enhanced classifier, which we call ′Whokaryote′, is wrapped in an easily installable package and is freely available from https://git.wageningenur.nl/lotte.pronk/whokaryote.


2016 ◽  
Author(s):  
Juan F. Ortiz ◽  
Antonis Rokas

AbstractHighly diverse phenotypic traits are often encoded by clusters of gene paralogs that are physically linked on chromosomes. Examples include olfactory receptor gene clusters involved in the recognition of diverse odors, defensin and phospholipase gene clusters involved in snake venoms, and Hox gene clusters involved in morphological diversity. Historically, gene clusters have been identified subjectively as genomic neighborhoods containing several paralogs, however, their genomic arrangements are often highly variable with respect to gene number, intergenic distance, and synteny. For example, the prolactin gene cluster shows variation in paralogous gene number, order and intergenic distance across mammals, whereas animal Hox gene clusters are often broken into sub-clusters of different sizes. A lack of formal definition for clusters of gene paralogs does not only hamper the study of their evolutionary dynamics, but also the discovery of novel ones in the exponentially growing body of genomic data. To address this gap, we developed a novel homology-based algorithm, CGPFinder, which formalizes and automates the identification of clusters of gene paralogs (CGPs) by examining the physical distribution of individual gene members of families of paralogous genes across chromosomes. Application of CGPFinder to diverse mammalian genomes accurately identified CGPs for many well-known gene clusters in the human and mouse genomes (e.g., Hox, protocadherin, Siglec, and beta-globin gene clusters) as well as for 20 other mammalian genomes. Differences were due to the exclusion of non-homologous genes that have historically been considered parts of specific gene clusters, the inclusion or absence of one or more genes between the CGPs and their corresponding gene clusters, and the splitting of certain gene clusters into distinct CGPs. Finally, examination of human genes showing tissue-specific enhancement of their expression by CGPFinder identified members of several well-known gene clusters (e.g., cytochrome P450, aquaporins, and olfactory receptors) and revealed that they were unequally distributed across tissues. By formalizing and automating the identification of CGPs and of genes that are members of CGPs, CGPFinder will facilitate furthering our understanding of the evolutionary dynamics of genomic neighborhoods containing CGPs, their functional implications, and how they are associated with phenotypic diversity.


Blood ◽  
1999 ◽  
Vol 94 (6) ◽  
pp. 2039-2047 ◽  
Author(s):  
M.A. Thornton ◽  
M. Poncz ◽  
M. Korostishevsky ◽  
E. Yakobson ◽  
S. Usher ◽  
...  

Abstract IIbb3 integrin is a heterodimeric receptor facilitating platelet aggregation. Both genes are on chromosome 17q21.32. Intergenic distance between them has been reported to be 125 to 260 kilobasepairs (kb) by pulsed-field gel electrophoresis (PFGE) genomic analysis, suggesting that they may be regulated coordinately during megakaryopoiesis. In contrast, other studies suggest these genes are greater than 2.0 megabasepairs (mb) apart. Because of the potential biological implications of having these two megakaryocytic-specific genes contiguous, we attempted to resolve this discrepancy. Taking advantage of large kindreds with mutations in either IIb or β3, we have developed a genetic linkage map between the thyroid receptor hormone-1 gene (THRA1) and β3 as follows: cen-THRA1-BRCA1-D17S579/IIb-β3-qter, with a distance of 1.3 centiMorgans (cM) between IIb and β3 and the two genes being oriented in the same direction. PFGE genomic and YAC clone analysis showed that the β3 gene is distal and ≥365 kb upstream of IIb. Additional restriction mapping shows IIb is linked to the erythrocyte band 3 (EPB3) gene, and β3 to the homeobox HOX2b gene. Analysis of IIb+-BAC and P1 clones confirm that the EPB3 gene is ∼110 kb downstream of the IIb gene. Sequencing the region surrounding the human IIb locus showed the Granulin gene ∼18 kb downstream to IIb, and the KIAA0553 gene ∼5.7 kb upstream. This organization is conserved in the murine sequence. These studies show that IIb and β3 are not closely linked, with IIb flanked by nonmegakaryocytic genes, and imply that they are unlikely to share common regulatory domains during megakaryopoiesis.


Blood ◽  
1999 ◽  
Vol 94 (6) ◽  
pp. 2039-2047 ◽  
Author(s):  
M.A. Thornton ◽  
M. Poncz ◽  
M. Korostishevsky ◽  
E. Yakobson ◽  
S. Usher ◽  
...  

IIbb3 integrin is a heterodimeric receptor facilitating platelet aggregation. Both genes are on chromosome 17q21.32. Intergenic distance between them has been reported to be 125 to 260 kilobasepairs (kb) by pulsed-field gel electrophoresis (PFGE) genomic analysis, suggesting that they may be regulated coordinately during megakaryopoiesis. In contrast, other studies suggest these genes are greater than 2.0 megabasepairs (mb) apart. Because of the potential biological implications of having these two megakaryocytic-specific genes contiguous, we attempted to resolve this discrepancy. Taking advantage of large kindreds with mutations in either IIb or β3, we have developed a genetic linkage map between the thyroid receptor hormone-1 gene (THRA1) and β3 as follows: cen-THRA1-BRCA1-D17S579/IIb-β3-qter, with a distance of 1.3 centiMorgans (cM) between IIb and β3 and the two genes being oriented in the same direction. PFGE genomic and YAC clone analysis showed that the β3 gene is distal and ≥365 kb upstream of IIb. Additional restriction mapping shows IIb is linked to the erythrocyte band 3 (EPB3) gene, and β3 to the homeobox HOX2b gene. Analysis of IIb+-BAC and P1 clones confirm that the EPB3 gene is ∼110 kb downstream of the IIb gene. Sequencing the region surrounding the human IIb locus showed the Granulin gene ∼18 kb downstream to IIb, and the KIAA0553 gene ∼5.7 kb upstream. This organization is conserved in the murine sequence. These studies show that IIb and β3 are not closely linked, with IIb flanked by nonmegakaryocytic genes, and imply that they are unlikely to share common regulatory domains during megakaryopoiesis.


1999 ◽  
Vol 9 (3) ◽  
pp. 251-258 ◽  
Author(s):  
Klaus Gellner ◽  
Sydney Brenner

The analysis of the sequence of ∼150 kb of a genomic region corresponding to the wnt1 gene of the Japanese pufferfishFugu rubripes confirms the compact structure of the genome. Fifteen genes were found in this region, and 26.6% of the analyzed sequence is coding sequence. With an average intergenic distance of <5 kb, this gene density is comparable to that ofCaenorhabditis elegans. The compactness of this region corresponds to the reduction of the overall size of the genome, consistent with the conclusion that the gene number in Fuguand human genomes is approximately the same. Eight of the genes have been mapped in the human genome and all of them are found in the chromosomal band 12q13, indicating a high degree of synteny in both species, Fugu and human. Comparative sequence analysis allows us to identify potential regulatory elements for wnt1 andARF3, which are common to fish and mammals.[The sequence data described in this paper have been submitted to GenBank under accession no. AF056116.]


Sign in / Sign up

Export Citation Format

Share Document