scholarly journals Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species

PLoS Genetics ◽  
2022 ◽  
Vol 18 (1) ◽  
pp. e1009604
Author(s):  
Jiru Han ◽  
Jacob E. Munro ◽  
Anthony Kocoski ◽  
Alyssa E. Barry ◽  
Melanie Bahlo

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).

2021 ◽  
Author(s):  
Jiru Han ◽  
Jacob E Munro ◽  
Anthony Kocoski ◽  
Alyssa E Barry ◽  
Melanie Bahlo

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been made available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).


Heredity ◽  
2021 ◽  
Author(s):  
Axel Jensen ◽  
Mette Lillie ◽  
Kristofer Bergström ◽  
Per Larsson ◽  
Jacob Höglund

AbstractThe use of genetic markers in the context of conservation is largely being outcompeted by whole-genome data. Comparative studies between the two are sparse, and the knowledge about potential effects of this methodology shift is limited. Here, we used whole-genome sequencing data to assess the genetic status of peripheral populations of the wels catfish (Silurus glanis), and discuss the results in light of a recent microsatellite study of the same populations. The Swedish populations of the wels catfish have suffered from severe declines during the last centuries and persists in only a few isolated water systems. Fragmented populations generally are at greater risk of extinction, for example due to loss of genetic diversity, and may thus require conservation actions. We sequenced individuals from the three remaining native populations (Båven, Emån, and Möckeln) and one reintroduced population of admixed origin (Helge å), and found that genetic diversity was highest in Emån but low overall, with strong differentiation among the populations. No signature of recent inbreeding was found, but a considerable number of short runs of homozygosity were present in all populations, likely linked to historically small population sizes and bottleneck events. Genetic substructure within any of the native populations was at best weak. Individuals from the admixed population Helge å shared most genetic ancestry with the Båven population (72%). Our results are largely in agreement with the microsatellite study, and stresses the need to protect these isolated populations at the northern edge of the distribution of the species.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guilherme B. Neumann ◽  
Paula Korkuć ◽  
Danny Arends ◽  
Manuel J. Wolf ◽  
Katharina May ◽  
...  

Abstract Background German Black Pied cattle (DSN) are an endangered dual-purpose breed which was largely replaced by Holstein cattle due to their lower milk yield. DSN cattle are kept as a genetic reserve with a current herd size of around 2500 animals. The ability to track sequence variants specific to DSN could help to support the conservation of DSN’s genetic diversity and to provide avenues for genetic improvement. Results Whole-genome sequencing data of 304 DSN cattle were used to design a customized DSN200k SNP chip harboring 182,154 variants (173,569 SNPs and 8585 indels) based on ten selection categories. We included variants of interest to DSN such as DSN unique variants and variants from previous association studies in DSN, but also variants of general interest such as variants with predicted consequences of high, moderate, or low impact on the transcripts and SNPs from the Illumina BovineSNP50 BeadChip. Further, the selection of variants based on haplotype blocks ensured that the whole-genome was uniformly covered with an average variant distance of 14.4 kb on autosomes. Using 300 DSN and 162 animals from other cattle breeds including Holstein, endangered local cattle populations, and also a Bos indicus breed, performance of the SNP chip was evaluated. Altogether, 171,978 (94.31%) of the variants were successfully called in at least one of the analyzed breeds. In DSN, the number of successfully called variants was 166,563 (91.44%) while 156,684 (86.02%) were segregating at a minor allele frequency > 1%. The concordance rate between technical replicates was 99.83 ± 0.19%. Conclusion The DSN200k SNP chip was proved useful for DSN and other Bos taurus as well as one Bos indicus breed. It is suitable for genetic diversity management and marker-assisted selection of DSN animals. Moreover, variants that were segregating in other breeds can be used for the design of breed-specific customized SNP chips. This will be of great value in the application of conservation programs for endangered local populations in the future.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5895 ◽  
Author(s):  
Thomas Andreas Kohl ◽  
Christian Utpatel ◽  
Viola Schleusener ◽  
Maria Rosaria De Filippo ◽  
Patrick Beckert ◽  
...  

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.


2021 ◽  
Vol 118 (20) ◽  
pp. e2101056118
Author(s):  
Danang Crysnanto ◽  
Alexander S. Leonard ◽  
Zih-Hua Fang ◽  
Hubert Pausch

Many genomic analyses start by aligning sequencing reads to a linear reference genome. However, linear reference genomes are imperfect, lacking millions of bases of unknown relevance and are unable to reflect the genetic diversity of populations. This makes reference-guided methods susceptible to reference-allele bias. To overcome such limitations, we build a pangenome from six reference-quality assemblies from taurine and indicine cattle as well as yak. The pangenome contains an additional 70,329,827 bases compared to the Bos taurus reference genome. Our multiassembly approach reveals 30 and 10.1 million bases private to yak and indicine cattle, respectively, and between 3.3 and 4.4 million bases unique to each taurine assembly. Utilizing transcriptomes from 56 cattle, we show that these nonreference sequences encode transcripts that hitherto remained undetected from the B. taurus reference genome. We uncover genes, primarily encoding proteins contributing to immune response and pathogen-mediated immunomodulation, differentially expressed between Mycobacterium bovis–infected and noninfected cattle that are also undetectable in the B. taurus reference genome. Using whole-genome sequencing data of cattle from five breeds, we show that reads which were previously misaligned against the Bos taurus reference genome now align accurately to the pangenome sequences. This enables us to discover 83,250 polymorphic sites that segregate within and between breeds of cattle and capture genetic differentiation across breeds. Our work makes a so-far unused source of variation amenable to genetic investigations and provides methods and a framework for establishing and exploiting a more diverse reference genome.


2021 ◽  
Vol 7 (3) ◽  
pp. 214
Author(s):  
Rory M. Welsh ◽  
Elizabeth Misas ◽  
Kaitlin Forsberg ◽  
Meghan Lyman ◽  
Nancy A. Chow

Candida auris is a multidrug-resistant pathogen that represents a serious public health threat due to its rapid global emergence, increasing incidence of healthcare-associated outbreaks, and high rates of antifungal resistance. Whole-genome sequencing and genomic surveillance have the potential to bolster C. auris surveillance networks moving forward. Laboratories conducting genomic surveillance need to be able to compare analyses from various national and international surveillance partners to ensure that results are mutually trusted and understood. Therefore, we established an empirical outbreak benchmark dataset consisting of 23 C. auris genomes to help validate comparisons of genomic analyses and facilitate communication among surveillance networks. Our outbreak benchmark dataset represents a polyclonal phylogeny with three subclades. The genomes in this dataset are from well-vetted studies that are supported by multiple lines of evidence, which demonstrate that the whole-genome sequencing data, phylogenetic tree, and epidemiological data are all in agreement. This C. auris benchmark set allows for standardized comparisons of phylogenomic pipelines, ultimately promoting effective C. auris collaborations.


2015 ◽  
Author(s):  
David J Winter ◽  
Maria A Pacheco ◽  
Andres Felipe A Vallejo ◽  
Rachel S. Schwartz ◽  
Myriam Arevalo-Herrera ◽  
...  

Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations is affects. Its control and eventual elimination are a global health priority. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole genome data from eight field samples from a region in Cord ́ oba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this parasite. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 microsatellite loci that are polymorphic and easy to score and can thus be used in epidemiological investigations in South America.


2016 ◽  
Author(s):  
Thomas Willems ◽  
Dina Zielinski ◽  
Assaf Gordon ◽  
Melissa Gymrek ◽  
Yaniv Erlich

AbstractShort tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, STRs have proven problematic to genotype from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping, haplotyping, and phasing STRs from whole genome sequencing data and report a genome-wide analysis and validation of de novo STR mutations.


Sign in / Sign up

Export Citation Format

Share Document