Minimal clustering and species delimitation based on multi-locus alignments vs SNPs: the case of the Seriphium plumosum L. complex (Gnaphalieae: Asteraceae)

2021 ◽  
Author(s):  
Zaynab Shaik ◽  
Nicola Georgina Bergh ◽  
Bengt Oxelman ◽  
Anthony George Verboom

We applied species delimitation methods based on the Multi-Species Coalescent (MSC) model to 500+ loci derived from genotyping-by-sequencing on the South African Seriphium plumosum (Asteraceae) species complex. The loci were represented either as multiple sequence alignments or single nucleotide polymorphisms (SNPs), and analysed by the STACEY and Bayes Factor Delimitation (BFD)/SNAPP methods, respectively. Both methods supported species taxonomies where virtually all of the 32 sampled individuals, each representing its own geographical population, were identified as separate species. Computational efforts required to achieve adequate mixing of MCMC chains were considerable, and the species/minimal cluster trees identified similar strongly supported clades in replicate runs. The resolution was, however, higher in the STACEY trees than in the SNAPP trees, which is consistent with the higher information content of full sequences. The computational efficiency, measured as effective sample sizes of likelihood and posterior estimates per time unit, was consistently higher for STACEY. A random subset of 56 alignments had similar resolution to the 524-locus SNP data set. The STRUCTURE-like sparse Non-negative Matrix Factorisation (sNMF) method was applied to six individuals from each of 48 geographical populations and 28023 SNPs. Significantly fewer (13) clusters were identified as optimal by this analysis compared to the MSC methods. The sNMF clusters correspond closely to clades consistently supported by MSC methods, and showed evidence of admixture, especially in the western Cape Floristic Region. We discuss the significance of these findings, and conclude that it is important to a priori consider the kind of species one wants to identify when using genome-scale data, the assumptions behind the parametric models applied, and the potential consequences of model violations may have.

2016 ◽  
Author(s):  
Sereina Rutschmann ◽  
Harald Detering ◽  
Sabrina Simon ◽  
Jakob Fredslund ◽  
Michael T. Monaghan

AbstractHigh-throughput sequencing has laid the foundation for fast and cost-effective development of phylogenetic markers. Here we present the program DISCOMARK, which streamlines the development of nuclear DNA (nDNA) markers from whole-genome (or whole-transcriptome) sequencing data, combining local alignment, alignment trimming, reference mapping and primer design based on multiple sequence alignments in order to design primer pairs from input orthologous sequences. In order to demonstrate the suitability of DISCOMARK we designed markers for two groups of species, one consisting of closely related species and one group of distantly related species. For the closely related members of the species complex of Cloeon dipterum s.l. (Insecta, Ephemeroptera), the program discovered a total of 78 markers. Among these, we selected eight markers for amplification and Sanger sequencing. The exon sequence alignments (2,526 base pairs (bp)) were used to reconstruct a well supported phylogeny and to infer clearly structured haplotype networks. For the distantly related species we designed primers for several families in the insect order Ephemeroptera, using available genomic data from four sequenced species. We developed primer pairs for 23 markers that are designed to amplify across several families. The DISCOMARK program will enhance the development of new nDNA markersby providing a streamlined, automated approach to perform genome-scale scans for phylogenetic markers. The program is written in Python, released under a public license (GNU GPL v2), and together with a manual and example data set available at: https://github.com/hdetering/discomark.


Bothalia ◽  
2007 ◽  
Vol 37 (1) ◽  
pp. 89-102
Author(s):  
M. M. Zietsman ◽  
G. J. Bredenkamp

The vegetation of inland plains and hills of the Andrew’s Field and Tsaba-Tsaba Nature Reserve, Bredasdorp District, Western Cape was classified using TWINSPAN and Braun-Blanquet procedures. The resulting four plant communities and nine subcommunities were described and interpreted ecologically. The vegetation was sampled using 97 randomly stratified plots. The floristic composition, Braun-Blanquet cover-abundance of each species, and various environmental variables were recorded in each sample plot. The relationship between the vegetation units and the associated environmental gradients was confirmed by ordination, using the DECORANA computer program, applied to the floristic data set. The conservation priority of each vegetation unit was determined by taking the occurrence of Red Data List species, limestone endemic species and Cape Floristic Region endemic species into consideration. TTie distribution of the plant communities can mainly be ascribed to differences in the clay/sand content of the soil and the degree of exposure of the vegetation to the dominating winds (southeastern and northwestern) of the area.


2021 ◽  
Author(s):  
Bui Quang Minh ◽  
Cuong Cao Dang ◽  
Le Sy Vinh ◽  
Robert Lanfear

Abstract Amino acid substitution models play a crucial role in phylogenetic analyses. Maximum likelihood (ML) methods have been proposed to estimate amino acid substitution models; however, they are typically complicated and slow. In this article, we propose QMaker, a new ML method to estimate a general time-reversible $Q$ matrix from a large protein data set consisting of multiple sequence alignments. QMaker combines an efficient ML tree search algorithm, a model selection for handling the model heterogeneity among alignments, and the consideration of rate mixture models among sites. We provide QMaker as a user-friendly function in the IQ-TREE software package (http://www.iqtree.org) supporting the use of multiple CPU cores so that biologists can easily estimate amino acid substitution models from their own protein alignments. We used QMaker to estimate new empirical general amino acid substitution models from the current Pfam database as well as five clade-specific models for mammals, birds, insects, yeasts, and plants. Our results show that the new models considerably improve the fit between model and data and in some cases influence the inference of phylogenetic tree topologies.[Amino acid replacement matrices; amino acid substitution models; maximum likelihood estimation; phylogenetic inferences.]


2021 ◽  
Vol 42 (5) ◽  
pp. 1320-1329
Author(s):  
S. Chakravarty ◽  
◽  
K.G. Padwal ◽  
C.P. Srivastava ◽  
◽  
...  

Aim: The present study was undertaken to explore the genetic diversity among Helicoverpa armigera populations from varied geographic regions of India using mitochondrial cytochrome c oxidase I (COI) gene fragments. Methodology: The larval specimens of H. armigera collected from 20 locations were subjected to DNA extraction, PCR amplification of target gene, sequencing and then multiple sequence alignments. Results: Based on COI sequence data, high levels of genetic differentiation among some H. armigera populations were detected, but divergence existing was not high enough to delineate them as separate species. The Indian population as a whole exhibited similarity with global genetic assemblage. Significant negative neutrality test indices and unimodal mismatch distribution further supported that this insect experienced a demographic expansion in the past. The phylogenetic tree and median-joining haplotype network indicated that genetic similarity was not related with geographic proximity of populations. Interpretation: Differences based on genetic analyses indicate considerable subspecific level variations among H. armigera populations of India. However, there is no existence of any unidentified cryptic species of H. armigera in the country.


2017 ◽  
Vol 3 (1) ◽  
pp. 206-211
Author(s):  
Melody N. Hemmati-Sholeh ◽  
Larry A. Sholeh ◽  
David A. Lightfoot

Identification of single nucleotide polymorphisms (SNPs) and insertion-deletion mutations are important for discovering the connection between the genetic mutations and complex diseases. The objective of this study was to develop a sensitive and accurate computational method for SNP detection among Multiple Sequence Alignments (MSAs) to be run on Microsoft Office SuiteTM and WindowsTM. The SNP-Evaluator, was designed to simulate the process of human eye visual change-identification. Analysis of three 82-Kbp genomic loci derived from Sanger sequencing and the corresponding SNPs from 31 genomes from IlluminaTM sequencing of soybean (Glycine max L. Merr.) demonstrated that the SNP-E was an effective method for medium-scale genomic research.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3724 ◽  
Author(s):  
Ana C. Afonso Silva ◽  
Natali Santos ◽  
Huw A. Ogilvie ◽  
Craig Moritz

While methods for genetic species delimitation have noticeably improved in the last decade, this remains a work in progress. Ideally, model based approaches should be applied and considered jointly with other lines of evidence, primarily morphology and geography, in an integrative taxonomy framework. Deep phylogeographic divergences have been reported for several species ofCarliaskinks, but only for some eastern taxa have species boundaries been formally tested. The present study does this and revises the taxonomy for two species from northern Australia,Carlia johnstoneiandC. triacantha. We introduce an approach that is based on the recently published method StarBEAST2, which uses multilocus data to explore the support for alternative species delimitation hypotheses using Bayes Factors (BFD). We apply this method, jointly with two other multispecies coalescent methods, using an extensive (from 2,163 exons) data set along with measures of 11 morphological characters. We use this integrated approach to evaluate two new candidate species previously revealed in phylogeographic analyses of rainbow skinks (genusCarlia) in Western Australia. The results based on BFD StarBEAST2, BFD* SNAPP and BPP genetic delimitation, together with morphology, support each of the four recently identifiedCarlialineages as separate species. The BFD StarBEAST2 approach yielded results highly congruent with those from BFD* SNAPP and BPP. This supports use of the robust multilocus multispecies coalescent StarBEAST2 method for species delimitation, which does not requirea prioriresolved species or gene trees. Compared to the situation inC. triacantha, morphological divergence was greater between the two lineages within Kimberley endemicC. johnstonei, which also had deeper divergent histories. This congruence supports recognition of two species withinC. johnstonei. Nevertheless, the combined evidence also supports recognition of two taxa within the more widespreadC. triacantha. With this work, we describe two new species,Carlia insularissp. nov andCarlia isostriacanthasp. nov. in the northwest of Australia. This contributes to increasing recognition that this region of tropical Australia has a rich and unique fauna.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Elena N. Judd ◽  
Alison R. Gilchrist ◽  
Nicholas R. Meyerson ◽  
Sara L. Sawyer

Abstract Background The Type I interferon response is an important first-line defense against viruses. In turn, viruses antagonize (i.e., degrade, mis-localize, etc.) many proteins in interferon pathways. Thus, hosts and viruses are locked in an evolutionary arms race for dominance of the Type I interferon pathway. As a result, many genes in interferon pathways have experienced positive natural selection in favor of new allelic forms that can better recognize viruses or escape viral antagonists. Here, we performed a holistic analysis of selective pressures acting on genes in the Type I interferon family. We initially hypothesized that the genes responsible for inducing the production of interferon would be antagonized more heavily by viruses than genes that are turned on as a result of interferon. Our logic was that viruses would have greater effect if they worked upstream of the production of interferon molecules because, once interferon is produced, hundreds of interferon-stimulated proteins would activate and the virus would need to counteract them one-by-one. Results We curated multiple sequence alignments of primate orthologs for 131 genes active in interferon production and signaling (herein, “induction” genes), 100 interferon-stimulated genes, and 100 randomly chosen genes. We analyzed each multiple sequence alignment for the signatures of recurrent positive selection. Counter to our hypothesis, we found the interferon-stimulated genes, and not interferon induction genes, are evolving significantly more rapidly than a random set of genes. Interferon induction genes evolve in a way that is indistinguishable from a matched set of random genes (22% and 18% of genes bear signatures of positive selection, respectively). In contrast, interferon-stimulated genes evolve differently, with 33% of genes evolving under positive selection and containing a significantly higher fraction of codons that have experienced selection for recurrent replacement of the encoded amino acid. Conclusion Viruses may antagonize individual products of the interferon response more often than trying to neutralize the system altogether.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Juan C. Muñoz-Escalante ◽  
Andreu Comas-García ◽  
Sofía Bernal-Silva ◽  
Daniel E. Noyola

AbstractRespiratory syncytial virus (RSV) is a major cause of respiratory infections and is classified in two main groups, RSV-A and RSV-B, with multiple genotypes within each of them. For RSV-B, more than 30 genotypes have been described, without consensus on their definition. The lack of genotype assignation criteria has a direct impact on viral evolution understanding, development of viral detection methods as well as vaccines design. Here we analyzed the totality of complete RSV-B G gene ectodomain sequences published in GenBank until September 2018 (n = 2190) including 478 complete genome sequences using maximum likelihood and Bayesian phylogenetic analyses, as well as intergenotypic and intragenotypic distance matrices, in order to generate a systematic genotype assignation. Individual RSV-B genes were also assessed using maximum likelihood phylogenetic analyses and multiple sequence alignments were used to identify molecular markers associated to specific genotypes. Analyses of the complete G gene ectodomain region, sequences clustering patterns, and the presence of molecular markers of each individual gene indicate that the 37 previously described genotypes can be classified into fifteen distinct genotypes: BA, BA-C, BA-CC, CB1-THB, GB1-GB4, GB6, JAB1-NZB2, SAB1, SAB2, SAB4, URU2 and a novel early circulating genotype characterized in the present study and designated GB0.


Sign in / Sign up

Export Citation Format

Share Document