scholarly journals Broadly sampled orthologous groups of eukaryotic proteins for the phylogenetic study of plastid-bearing lineages

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Mick Van Vlierberghe ◽  
Hervé Philippe ◽  
Denis Baurain

Abstract Objectives Identifying orthology relationships among sequences is essential to understand evolution, diversity of life and ancestry among organisms. To build alignments of orthologous sequences, phylogenomic pipelines often start with all-vs-all similarity searches, followed by a clustering step. For the protein clusters (orthogroups) to be as accurate as possible, proteomes of good quality are needed. Here, our objective is to assemble a data set especially suited for the phylogenomic study of algae and formerly photosynthetic eukaryotes, which implies the proper integration of organellar data, to enable distinguishing between several copies of one gene (paralogs), taking into account their cellular compartment, if necessary. Data description We submitted 73 top-quality and taxonomically diverse proteomes to OrthoFinder. We obtained 47,266 orthogroups and identified 11,775 orthogroups with at least two algae. Whenever possible, sequences were functionally annotated with eggNOG and tagged after their genomic and target compartment(s). Then we aligned and computed phylogenetic trees for the orthogroups with IQ-TREE. Finally, these trees were further processed by identifying and pruning the subtrees exclusively composed of plastid-bearing organisms to yield a set of 31,784 clans suitable for studying photosynthetic organism genome evolution.

2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Nikolaus U. Szucsich ◽  
Daniela Bartel ◽  
Alexander Blanke ◽  
Alexander Böhm ◽  
Alexander Donath ◽  
...  

Abstract Background Phylogenetic relationships among the myriapod subgroups Chilopoda, Diplopoda, Symphyla and Pauropoda are still not robustly resolved. The first phylogenomic study covering all subgroups resolved phylogenetic relationships congruently to morphological evidence but is in conflict with most previously published phylogenetic trees based on diverse molecular data. Outgroup choice and long-branch attraction effects were stated as possible explanations for these incongruencies. In this study, we addressed these issues by extending the myriapod and outgroup taxon sampling using transcriptome data. Results We generated new transcriptome data of 42 panarthropod species, including all four myriapod subgroups and additional outgroup taxa. Our taxon sampling was complemented by published transcriptome and genome data resulting in a supermatrix covering 59 species. We compiled two data sets, the first with a full coverage of genes per species (292 single-copy protein-coding genes), the second with a less stringent coverage (988 genes). We inferred phylogenetic relationships among myriapods using different data types, tree inference, and quartet computation approaches. Our results unambiguously support monophyletic Mandibulata and Myriapoda. Our analyses clearly showed that there is strong signal for a single unrooted topology, but a sensitivity of the position of the internal root on the choice of outgroups. However, we observe strong evidence for a clade Pauropoda+Symphyla, as well as for a clade Chilopoda+Diplopoda. Conclusions Our best quartet topology is incongruent with current morphological phylogenies which were supported in another phylogenomic study. AU tests and quartet mapping reject the quartet topology congruent to trees inferred with morphological characters. Moreover, quartet mapping shows that confounding signal present in the data set is sufficient to explain the weak signal for the quartet topology derived from morphological characters. Although outgroup choice affects results, our study could narrow possible trees to derivatives of a single quartet topology. For highly disputed relationships, we propose to apply a series of tests (AU and quartet mapping), since results of such tests allow to narrow down possible relationships and to rule out confounding signal.


Zootaxa ◽  
2013 ◽  
Vol 3626 (1) ◽  
pp. 77-93 ◽  
Author(s):  
DAVID W. WEISROCK ◽  
J. ROBERT MACEY ◽  
MASAFUMI MATSUI ◽  
DANIEL G. MULCAHY

The salamander family Hynobiidae contains over 50 species and has been the subject of a number of molecular phylo-genetic investigations aimed at reconstructing branches across the entire family. In general, studies using the greatest amount of sequence data have used reduced taxon sampling, while the study with the greatest taxon sampling has used a limited sequence data set. Here, we provide insights into the phylogenetic history of the Hynobiidae using both dense taxon sampling and a large mitochondrial DNA sequence data set. We report exclusive new mitochondrial DNA data of 2566 aligned bases (with 151 excluded sites, of included sites 1157 are variable with 957 parsimony informative). This is sampled from two genic regions encoding a 12S–16S region (the 3’ end of 12S rRNA, tRNAVAl, and the 5’ end of 16S rRNA), and a ND2–COI region (ND2, tRNATrp, tRNAAla, tRNAAsn, the origin for light strand replication—OL, tRNACys, tRNATyr, and the 5’ end of COI). Analyses using parsimony, Bayesian, and maximum likelihood optimality criteria produce similar phylogenetic trees, with discordant branches generally receiving low levels of branch support. Monophyly of the Hynobiidae is strongly supported across all analyses, as is the sister relationship and deep divergence between the genus Onychodactylus with all remaining hynobiids. Within this latter grouping our phylogenetic results identify six clades that are relatively divergent from one another, but for which there is minimal support for their phy-logenetic placement. This includes the genus Batrachuperus, the genus Hynobius, the genus Pachyhynobius, the genus Salamandrella, a clade containing the genera Ranodon and Paradactylodon, and a clade containing the genera Liua and Pseudohynobius. This latter clade receives low bootstrap support in the parsimony analysis, but is consistent across all three analytical methods. Our results also clarify a number of well-supported relationships within the larger Batrachu-perus and Hynobius clades. While the relationships identified in this study do much to clarify the phylogenetic history of the Hynobiidae, the poor resolution among major hynobiid clades, and the contrast of mtDNA-derived relationships with recent phylogenetic results from a small number of nuclear genes, highlights the need for continued phylogenetic study with larger numbers of nuclear loci.


Development ◽  
1994 ◽  
Vol 1994 (Supplement) ◽  
pp. 15-25
Author(s):  
Hervé Philippe ◽  
Anne Chenuil ◽  
André Adoutte

Most of the major invertebrate phyla appear in the fossil record during a relatively short time interval, not exceeding 20 million years (Myr), 540-520 Myr ago. This rapid diversification is known as the `Cambrian explosion'. In the present paper, we ask whether molecular phylogenetic reconstruction provides confirmation for such an evolutionary burst. The expectation is that the molecular phylogenetic trees should take the form of a large unresolved multifurcation of the various animal lineages. Complete 18S rRNA sequences of 69 extant representatives of 15 animal phyla were obtained from data banks. After eliminating a major source of artefact leading to lack of resolution in phylogenetic trees (mutational saturation of sequences), we indeed observe that the major lines of triploblast coelomates (arthropods, molluscs, echinoderms, chordates...) are very poorly resolved i.e. the nodes defining the various clades are not supported by high bootstrap values. Using a previously developed procedure consisting of calculating bootstrap proportions of each node of the tree as a function of increasing amount of nucleotides (Lecointre, G., Philippe, H. Le, H. L. V. and Le Guyader, H. (1994) Mol. Phyl. Evol., in press) we obtain a more informative indication of the robustness of each node. In addition, this procedure allows us to estimate the number of additional nucleotides that would be required to resolve confidently the currently uncertain nodes; this number turns out to be extremely high and experimentally unfeasible. We then take this approach one step further: using parameters derived from the above analysis, assuming a molecular clock and using palaeontological dates for calibration, we establish a relationship between the number of sites contained in a given data set and the time interval that this data set can confidently resolve (with 95% bootstrap support). Under these assumptions, the presently available 18S rRNA database cannot confidently resolve cladogenetic events separated by less than about 40 Myr. Thus, at the present time, the potential resolution by the palaeontological approach is higher than that by the molecular one.


Paleobiology ◽  
1997 ◽  
Vol 23 (1) ◽  
pp. 1-19 ◽  
Author(s):  
William C. Clyde ◽  
Daniel C. Fisher

Stratigraphic data are compared to morphologic data in terms of their fit to phylogenetic hypotheses for 29 data sets taken from the literature. Stratigraphic fit is measured using MacClade's stratigraphic character, which tracks the number of independent discrepancies between observed order and the order of occurrence that would be expected on the basis of a given phylogenetic hypothesis. Acceptance of a phylogenetic hypothesis despite such discrepancies requires ad hoc hypotheses concerning differential probabilities of preservation and recovery. These stratigraphic ad hoc hypotheses are treated as logically equivalent to morphologic ad hoc hypotheses of homoplasy. The retention index is used to compare the number of stratigraphic and morphologic ad hoc hypotheses required by given phylogenetic hypotheses. Each data set is subjected to five analyses, varying in the constraints imposed on the structure of the phylogenetic tree against which fit is measured. Analyses 1–4 compare the stratigraphic and morphologic retention indices using phylogenetic trees consistent with the morphologically most-parsimonious cladogram reported in the original study. Analysis 5 compares retention indices using the overall (stratigraphically and morphologically) most-parsimonious phylogenetic tree, which may be, but is not necessarily, consistent with the reported cladogram. Proceeding from Analysis 1 to Analysis 5, stratigraphic data are allowed greater influence in determining the structure of phylogenetic trees, with the trees in Analysis 1 derived without reference to the stratigraphic character and the trees in Analysis 5 derived from full interaction of stratigraphic and morphologic characters. Morphologic and stratigraphic retention indices for these 29 studies cannot be statistically distinguished in comparisons 3–5, suggesting very similar degrees of fit. The values of these retention indices are high, indicating a generally high level of congruence under these phylogenetic hypotheses. Significant gains (49%) in stratigraphic fit can be realized without significant loss (4%) in morphologic fit as the stratigraphic and morphologic evidence are both allowed to participate in constraining the structure of phylogenetic hypotheses. These results suggest that arguments based on alleged “noisiness” of stratigraphic data offer inadequate grounds for ignoring stratigraphic order in phylogenetic analysis. In terms of congruence, stratigraphic and morphologic data perform about equally well.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1915 ◽  
Author(s):  
Eric R. Larson ◽  
Magalie Castelin ◽  
Bronwyn W. Williams ◽  
Julian D. Olden ◽  
Cathryn L. Abbott

Molecular genetic approaches are playing an increasing role in conservation science by identifying biodiversity that may not be evident by morphology-based taxonomy and systematics. So-called cryptic species are particularly prevalent in freshwater environments, where isolation of dispersal-limited species, such as crayfishes, within dendritic river networks often gives rise to high intra- and inter-specific genetic divergence. We apply here a multi-gene molecular approach to investigate relationships among extant species of the crayfish genusPacifastacus, representing the first comprehensive phylogenetic study of this taxonomic group. Importantly,Pacifastacusincludes both the widely invasive signal crayfishPacifastacus leniusculus,as well as several species of conservation concern like the Shasta crayfishPacifastacus fortis. Our analysis used 83 individuals sampled across the four extantPacifastacusspecies (omitting the extinctPacifastacus nigrescens), representing the known taxonomic diversity and geographic distributions within this genus as comprehensively as possible. We reconstructed phylogenetic trees from mitochondrial (16S, COI) and nuclear genes (GAPDH), both separately and using a combined or concatenated dataset, and performed several species delimitation analyses (PTP, ABGD, GMYC) on the COI phylogeny to propose Primary Species Hypotheses (PSHs) within the genus. All phylogenies recovered the genusPacifastacusas monophyletic, within which we identified a range of six to 21 PSHs; more abundant PSHs delimitations from GMYC and ABGD were always nested within PSHs delimited by the more conservative PTP method.Pacifastacus leniusculusincluded the majority of PSHs and was not monophyletic relative to the otherPacifastacusspecies considered. Several of these highly distinctP. leniusculusPSHs likely require urgent conservation attention. Our results identify research needs and conservation priorities forPacifastacuscrayfishes in western North America, and may inform better understanding and management ofP. leniusculusin regions where it is invasive, such as Europe and Japan.


1983 ◽  
Vol 38 (1-2) ◽  
pp. 156-158 ◽  
Author(s):  
Geert De Soete

An iterative algorithm for constructing the optimal phylogenetic tree from a given set o f dissimilarity data is described. The procedure is applied for illustrative purposes an a data set com piled by Fitch and Margoliash.


Botany ◽  
2012 ◽  
Vol 90 (8) ◽  
pp. 770-779 ◽  
Author(s):  
Annie Archambault ◽  
Martina V. Strömvik

Species of the genus Oxytropis are distributed in the northern hemisphere, especially in alpine and arctic areas. Although comprehensive taxonomic treatments exist for local floras, an understanding of the evolutionary relationships is lacking for the genus as a whole. To determine if different ancestral Oxytropis species colonized the North American Arctic separately, as suggested by taxonomy, we sequenced the nuclear ribosomal internal transcribed spacer (ITS) region from 16 Oxytropis specimens, including four species that were used in a previous transcriptome study. In addition, 81 other Oxytropis ITS sequences were retrieved from public sequence databases and included in the analysis. The whole data set was analyzed using phylogenetic trees and statistical parsimony networks. Results show that all Oxytropis ITS sequences are very similar. Furthermore, at least six lineages evolved from different temperate ancestors to colonize the North American Arctic. This pattern is believed to be typical of the arctic flora. Additionally, the sequence relationship analyses confirm that the subgenus Phacoxytropis may be ancestral in Oxytropis.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Daniel Lichtblau

Abstract Background Alignment-free methods of genomic comparison offer the possibility of scaling to large data sets of nucleotide sequences comprised of several thousand or more base pairs. Such methods can be used for purposes of deducing “nearby” species in a reference data set, or for constructing phylogenetic trees. Results We describe one such method that gives quite strong results. We use the Frequency Chaos Game Representation (FCGR) to create images from such sequences, We then reduce dimension, first using a Fourier trig transform, followed by a Singular Values Decomposition (SVD). This gives vectors of modest length. These in turn are used for fast sequence lookup, construction of phylogenetic trees, and classification of virus genomic data. We illustrate the accuracy and scalability of this approach on several benchmark test sets. Conclusions The tandem of FCGR and dimension reductions using Fourier-type transforms and SVD provides a powerful approach for alignment-free genomic comparison. Results compare favorably and often surpass best results reported in prior literature. Good scalability is also observed.


2021 ◽  
Vol 9 (1) ◽  
pp. 191
Author(s):  
Iliana Guardiola-Avila ◽  
Leonor Sánchez-Busó ◽  
Evelia Acedo-Félix ◽  
Bruno Gomez-Gil ◽  
Manuel Zúñiga-Cabrera ◽  
...  

Vibrio mimicus is an emerging pathogen, mainly associated with contaminated seafood consumption. However, little is known about its evolution, biodiversity, and pathogenic potential. This study analyzes the pan-, core, and accessory genomes of nine V. mimicus strains. The core genome yielded 2424 genes in chromosome I (ChI) and 822 genes in chromosome II (ChII), with an accessory genome comprising an average of 10.9% of the whole genome for ChI and 29% for ChII. Core genome phylogenetic trees were obtained, and V. mimicus ATCC-33654 strain was the closest to the outgroup in both chromosomes. Additionally, a phylogenetic study of eight conserved genes (ftsZ, gapA, gyrB, topA, rpoA, recA, mreB, and pyrH), including Vibrio cholerae, Vibrio parilis, Vibrio metoecus, and Vibrio caribbenthicus, clearly showed clade differentiation. The main virulence genes found in ChI corresponded with type I secretion proteins, extracellular components, flagellar proteins, and potential regulators, while, in ChII, the main categories were type-I secretion proteins, chemotaxis proteins, and antibiotic resistance proteins. The accessory genome was characterized by the presence of mobile elements and toxin encoding genes in both chromosomes. Based on the genome atlas, it was possible to characterize differential regions between strains. The pan-genome of V. mimicus encompassed 3539 genes for ChI and 2355 genes for ChII. These results give us an insight into the virulence and gene content of V. mimicus, as well as constitute the first approach to its diversity.


2020 ◽  
Vol 45 (2) ◽  
pp. 403-408 ◽  
Author(s):  
David M. Spooner ◽  
Holly Ruess ◽  
Philipp Simon ◽  
Douglas Senalik

Abstract—We explored the phylogenetic utility of mitochondrial DNA sequences in Daucus and compared the results with prior phylogenetic results using the same 36 accessions of Daucus (and two additional outgroups) with plastid DNA sequences and with other nuclear results. As in the plastid study we used Illumina HiSeq sequencer to obtain resequencing data of the same accessions of Daucus and outgroups, and analyzed the data with maximum parsimony and maximum likelihood. We obtained data from 47 of 71 total mitochondrial genes but only 17 of these 47 genes recovered major clades that were common in prior plastid and nuclear studies. Our phylogenetic trees of the concatenated data set of 47 genes were moderately resolved, with 100% bootstrap support for most of the external and many of the internal clades, except for the clade of D. carota and its most closely related species D. syrticus. There are areas of hard incongruence with phylogenies using plastid and nuclear data. In agreement with other studies, we conclude that mitochondrial sequences are generally poor phylogenetic markers, at least at the genus level, despite their utility in some other studies.


Sign in / Sign up

Export Citation Format

Share Document