scholarly journals Characterizing gene tree conflict in plastome-inferred phylogenies

2019 ◽  
Author(s):  
Joseph F. Walker ◽  
Gregory W. Stull ◽  
Nathanael Walker-Hale ◽  
Oscar M. Vargas ◽  
Drew A. Larson

ABSTRACTPremise of the studyEvolutionary relationships among plants have been inferred primarily using chloroplast data. To date, no study has comprehensively examined the plastome for gene tree conflict.MethodsUsing a broad sampling of angiosperm plastomes, we characterized gene tree conflict among plastid genes at various time scales and explore correlates to conflict (e.g., evolutionary rate, gene length, molecule type).Key resultsWe uncover notable gene tree conflict against a backdrop of largely uninformative genes. We find gene length is the strongest correlate to concordance, and that nucleotides outperform amino acids. Of the most commonly used markers, matK greatly outperforms rbcL; however, the rarely used gene rpoC2 is the top-performing gene in every analysis. We find that rpoC2 reconstructs angiosperm phylogeny as well as the entire concatenated set of protein-coding chloroplast genes.ConclusionsOur results suggest that longer genes are superior for phylogeny reconstruction. The alleviation of some conflict through the use of nucleotides suggests that systematic error is likely the root of most of the observed conflict, but further research on biological conflict within plastome is warranted given the documented cases of heteroplasmic recombination. We suggest rpoC2 as a useful marker for reconstructing angiosperm phylogeny, reducing the effort and expense of assembling and analyzing entire plastomes.

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7747 ◽  
Author(s):  
Joseph F. Walker ◽  
Nathanael Walker-Hale ◽  
Oscar M. Vargas ◽  
Drew A. Larson ◽  
Gregory W. Stull

Evolutionary relationships among plants have been inferred primarily using chloroplast data. To date, no study has comprehensively examined the plastome for gene tree conflict. Using a broad sampling of angiosperm plastomes, we characterize gene tree conflict among plastid genes at various time scales and explore correlates to conflict (e.g., evolutionary rate, gene length, molecule type). We uncover notable gene tree conflict against a backdrop of largely uninformative genes. We find alignment length and tree length are strong predictors of concordance, and that nucleotides outperform amino acids. Of the most commonly used markers, matK, greatly outperforms rbcL; however, the rarely used gene rpoC2 is the top-performing gene in every analysis. We find that rpoC2 reconstructs angiosperm phylogeny as well as the entire concatenated set of protein-coding chloroplast genes. Our results suggest that longer genes are superior for phylogeny reconstruction. The alleviation of some conflict through the use of nucleotides suggests that stochastic and systematic error is likely the root of most of the observed conflict, but further research on biological conflict within plastome is warranted given documented cases of heteroplasmic recombination. We suggest that researchers should filter genes for topological concordance when performing downstream comparative analyses on phylogenetic data, even when using chloroplast genomes.


1993 ◽  
Vol 342 (1300) ◽  
pp. 101-119 ◽  

The serpins are a widely distributed group of serine proteinase inhibitors found in plants, birds, mammals and viruses. Despite the great evolutionary divergence of these organisms, their serpins art highly conserved, both in sequence and structurally. Amino acid sequences were aligned by a combination of automatic algorithms and by consideration of conserved structural elements in those serpins for which crystal structures exist. The program HOMED was used which allowed the alignment of amino acids to be simultaneously converted into the equivalently aligned nucleotide sequences. The aligned amino acids were used as the basis for superposition of the four known three-dimensional structures for which coordinates are available and compared with an optimal three-dimensional superposition in order to estimate the reliability of the sequence alignment. Phylogenetic relationships implied by these nucleotide sequence alignments were determined by the method of maximum parsimony. The proposed gene tree suggested that as much diversity existed between the plant serpin and mammalian serpins as was present among mammalian serpins and provided further evidence that the architecture of serpin molecules is highly constrained.


2018 ◽  
Author(s):  
M Arabfard ◽  
K Kavousi ◽  
A Delbari ◽  
M Ohadi

AbstractRecent work in yeast and humans suggest that evolutionary divergence in cis-regulatory sequences impact translation initiation sites (TISs). Cis-elements can also affect the efficacy and amount of protein synthesis. Despite their vast biological implication, the landscape and relevance of short tandem repeats (STRs)/microsatellites to the human protein-coding gene TISs remain largely unknown. Here we characterized the STR distribution at the 120 bp cDNA sequence upstream of all annotated human protein-coding gene TISs based on the Ensembl database. Furthermore, we performed a comparative genomics study of all annotated orthologous TIS-flanking sequences across 47 vertebrate species (755,956 transcripts), aimed at identifying human-specific STRs in this interval. We also hypothesized that STRs may be used as genetic codes for the initiation of translation. The initial five amino acid sequences (excluding the initial methionine) that were flanked by STRs in human were BLASTed against the initial orthologous five amino acids in other vertebrate species (2,025,817 pair-wise TIS comparisons) in order to compare the number of events in which human-specific and non-specific STRs occurred with homologous and non-homologous TISs (i.e. ≥50% and <50% similarity of the five amino acids). We characterized human-specific STRs and a bias of this compartment in comparison to the overall (human-specific and non-specific) distribution of STRs (Mann Whitney p=1.4 × 10−11). We also found significant enrichment of non-homologous TISs flanked by human-specific STRs (p<0.00001). In conclusion, our data indicate a link between STRs and TIS selection, which is supported by differential evolution of the human-specific STRs in the TIS upstream flanking sequence.AbbreviationscDNAComplementary DNACDSCoding DNA sequenceSTRShort Tandem RepeatTISTranslation Initiation SiteTSSTranscription Start Site


2017 ◽  
Author(s):  
Jacoline Gerritsen ◽  
Bastian Hornung ◽  
Bernadette Renckens ◽  
Sacha A.F.T. van Hijum ◽  
Vitor A.P. Martins dos Santos ◽  
...  

Background. The microbiota in the small intestine relies on their capacity to rapidly import and ferment available carbohydrates to survive in a complex and highly competitive ecosystem. Understanding how these communities function requires elucidating the role of its key players, the interactions among them and with their environment/host. Methods. The genome of the gut bacterium Romboutsia ilealis CRIBT was sequenced with multiple technologies (Illumina paired end, mate pair and PacBio). The transcriptome was sequenced (Illumina HiSeq) while growing on three different carbohydrate sources and short chain fatty acids were measured via HPLC. Results. Hence, we present the complete genome of Romboutsia ilealis CRIBT, a natural inhabitant and key player of the small intestine of rats. R. ilealis CRIBT possesses a circular chromosome of 2,581,778 bp and a plasmid of 6,145 bp, carrying 2,351 and eight predicted protein coding sequences, respectively. Analysis of the genome revealed limited capacity to synthesize amino acids and vitamins, whereas multiple and partially redundant pathways for the utilization of different relatively simple carbohydrates are present. Transcriptome analysis allowed pinpointing the key components in the degradation of glucose, L-fucose and fructo-oligosaccharides. Discussion. This revealed that R. ilealis CRIBT is adapted to a nutrient-rich environment where carbohydrates, amino acids and vitamins are abundantly available and uncovered potential mechanisms for competition with mucus-degrading microbes.


1999 ◽  
Vol 344 (3) ◽  
pp. 667-675 ◽  
Author(s):  
Shin-ichiro SANO ◽  
Hiroshi OHNISHI ◽  
Misae KUBOTA

BIT/SHPS-1/SIRPα/P84 is a unique molecule with a high degree of homology with immune antigen recognition molecules (immunoglobulin, T-cell receptor and MHC), and is highly expressed in the brain. The extracellular region contains three immunoglobulin-like domains (V-type, C1-type and C1-type), and the intracellular region contains two signalling motifs that interact with SHP-2 protein tyrosine phosphatase. BIT-coated plates support cell-substrate adhesion and neurite extension of neurons, and BIT participates in neuronal signal transduction. Diversity of the V-type domain sequences of human BIT has been reported. In the present study we analysed the structure of the mouse BIT gene (Bit). The protein coding region consists of eight exons corresponding to a signal peptide, a V-type domain, a C1-type domain, a C1-type domain, a transmembrane region and three parts of one cytoplasmic region. The two signalling motifs are encoded in one exon. Four splicing forms of mouse BIT were revealed. We also found the sequence diversity in three mouse strains, namely BALB/c, 129/Sv and C57BL/6. The substitution patterns of amino acids and nucleotides indicate positive pressure to alter the amino acids in the V-type domain in evolution. Immunoblot analyses showed that mouse BIT and human BITα are predominantly expressed in the brain. On the bases of these findings we discuss the possibility that BIT contributes to the genetic individuality and diversity of the brain.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Zhaoqing Han ◽  
Kun Li ◽  
Houqiang Luo ◽  
Muhammad Shahzad ◽  
Khalid Mehmood

A study was conducted to reveal the characterization of the complete mitochondrial genome of Fischoederius elongatus derived from cows in Shanghai, China. Results indicated that the complete mt genome of F. elongatus was 14,288 bp and contained 12 protein-coding genes (cox1-3, nad1-6, nad4L, atp6, and cytb), 22 transfer RNA genes, and two ribosomal RNA genes (l-rRNA and s-rRNA). The overall A + T content of the mt genome was 63.83%, and the nucleotide composition was A (19.83%), C (9.75%), G (26.43%), and T (44.00%). A total of 3284 amino acids were encoded by current F. elongatus isolate mt genome, TTT (Phe) (9.84%) and TTG (Leu) (7.73%) codon were the most frequent amino acids, whereas the ACC (Thr) (0.06%), GCC (Ala) (0.09%), CTC (Leu) (0.09%), and AAC (Asn) (0.09%) codon were the least frequent ones. At the third codon position of F. elongatus mt protein genes, T (50.82%) was observed most frequently and C (5.85%) was the least one. The current results can contribute to epidemiology diagnosis, molecular identification, taxonomy, genetic, and drug development researches about this parasite species in cattle.


2011 ◽  
Vol 09 (06) ◽  
pp. 729-747 ◽  
Author(s):  
MD. SHAIK SADI ◽  
FEI-CHING KUO ◽  
JOSHUA W. K. HO ◽  
MICHAEL A. CHARLESTON ◽  
T. Y. CHEN

Many phylogenetic inference programs are available to infer evolutionary relationships among taxa using aligned sequences of characters, typically DNA or amino acids. These programs are often used to infer the evolutionary history of species. However, in most cases it is impossible to systematically verify the correctness of the tree returned by these programs, as the correct evolutionary history is generally unknown and unknowable. In addition, it is nearly impossible to verify whether any non-trivial tree is correct in accordance to the specification of the often complicated search and scoring algorithms. This difficulty is known as the oracle problem of software testing: there is no oracle that we can use to verify the correctness of the returned tree. This makes it very challenging to test the correctness of any phylogenetic inference programs. Here, we demonstrate how to apply a simple software testing technique, called Metamorphic Testing, to alleviate the oracle problem in testing phylogenetic inference programs. We have used both real and randomly generated test inputs to evaluate the effectiveness of metamorphic testing, and found that metamorphic testing can detect failures effectively in faulty phylogenetic inference programs with both types of test inputs.


1988 ◽  
Vol 67 (3) ◽  
pp. 543-547 ◽  
Author(s):  
R.R.B. Russell ◽  
T. Shiroza ◽  
H.K. Kuramitsu ◽  
J.J. Ferreti

The sequences of glucosyltransferase genes from Streptococcus sobrinus (gtfI) and Streptococcus mutans (gtfB) were compared and show a high degree of homology. There is a 57.7% homology of nucleotides in the genes and a 56. 7% homology of amino acids in the deduced protein sequences. The G + C content for the protein-coding region is 43.6% for S. sobrinus and 41.2% for S. mutans. Internal repeating sequences present in both proteins exhibit some difference in sequence pattern.


1985 ◽  
Vol 5 (6) ◽  
pp. 1408-1414 ◽  
Author(s):  
S O Meakin ◽  
M L Breitman ◽  
L C Tsui

We have characterized five human gamma-crystallin genes isolated from a genomic phage library. DNA sequencing of four of the genes revealed that two of them predict polypeptides of 174 residues showing 71% homology in their amino acid sequence; the other two correspond to closely related pseudogenes which contain the same in-frame termination codon at identical positions in the coding sequence. Two of the genes and one of the pseudogenes are oriented in a head-to-tail fashion clustered within 22.5 kilobases. All three contain a TATA box 60 to 80 base pairs upstream of the initiation codon and a highly conserved segment of 44 base pairs in length immediately preceding the TATA box. The two genes and the two pseudogenes are similar in structure: each contains a small 5' exon encoding three amino acids followed by two larger exons that correspond exactly to the two similar structural domains of the polypeptide. The first intron varies from 100 to 110 base pairs, and the second intron ranges from 1 to several kilobases, rendering an overall gene size of 1.7 to 4.5 kilobases. At least one of the two pseudogenes appears to have been functional before inactivation, suggesting that their identical mutation was generated by gene conversion.


Sign in / Sign up

Export Citation Format

Share Document