scholarly journals SPEARS: Standard Performance Evaluation of Ancestral Reconstruction through Simulation

2020 ◽  
Author(s):  
H. Manching ◽  
R. J. Wisser

MotivationAncestral haplotype maps provide useful information about genomic variation and biological processes. Reconstructing the descendent haplotype structure of homologous chromosomes, particularly for large numbers of individuals, can help with characterizing the recombination landscape, elucidating genotype-to-phenotype relationships, improving genomic predictions and more. Inferring haplotype maps from sparse genotype data is an efficient approach to whole-genome haplotyping, but this is a non-trivial problem. A standardized approach is needed to validate whether haplotype reconstruction software, conceived population designs and existing data for a given population provides accurate haplotype information for further inference.ResultsWe introduce SPEARS, a pipeline for whole simulation-based appraisal of genome-wide ancestral haplotype inference. The pipeline generates virtual genotypes (truth data) with real-world missing data structure. It then proceeds to mimic analysis in practice, capturing sources of error due to imputation and reconstruction of ancestral haplotypes. Standard metrics allow researchers to assess which features of haplotype structure or regions of the genome are sufficiently accurate for analysis and reporting. Haplotype maps for 1,000 outcross progeny from a multi-parent population of maize is used to demonstrate SPEARS.Availabilityhttps://github.com/maizeatlas/spears

Author(s):  
Heather Manching ◽  
Randall J Wisser

Abstract Motivation Ancestral haplotype maps provide useful information about genomic variation and insights into biological processes. Reconstructing the descendent haplotype structure of homologous chromosomes, particularly for large numbers of individuals, can help with characterizing the recombination landscape, elucidating genotype-to-phenotype relationships, improving genomic predictions and more. Inferring haplotype maps from sparse genotype data is an efficient approach to whole-genome haplotyping, but this is a non-trivial problem. A standardized approach is needed to validate whether haplotype reconstruction software, conceived population designs and existing data for a given population provides accurate haplotype information for further inference. Results We introduce SPEARS, a pipeline for the simulation-based appraisal of genome-wide haplotype maps constructed from sparse genotype data. Using a specified pedigree, the pipeline generates virtual genotypes (known data) with genotyping errors and missing data structure. It then proceeds to mimic analysis in practice, capturing sources of error due to genotyping, imputation and haplotype inference. Standard metrics allow researchers to assess different population designs and which features of haplotype structure or regions of the genome are sufficiently accurate for analysis. Haplotype maps for 1000 outcross progeny from a multi-parent population of maize are used to demonstrate SPEARS. Availabilityand implementation SPEARS, the protocol and suite of scripts, are publicly available under an MIT license at GitHub (https://github.com/maizeatlas/spears).. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Francisco J. Esteban ◽  
Peter J. Tonellato ◽  
Dennis P. Wall

AbstractThe genetic heterogeneity of autism has stymied the search for causes and cures. Even whole-genomic studies on large numbers of families have yielded results of relatively little impact. In the present work, we analyze two genomic databases using a novel strategy that takes prior knowledge of genetic relationships into account and that was designed to boost signal important to our understanding of the molecular basis of autism. Our strategy was designed to identify significant genomic variation within a priori defined biological concepts and improves signal detection while lessening the severity of multiple test correction seen in standard analysis of genome-wide association data. Upon application of our approach using 3,244 biological concepts, we detected genomic variation in 68 biological concepts with significant association to autism in comparison to family-based controls. These concepts clustered naturally into a total of 19 classes, principally including cell adhesion, cancer, and immune response. The top-ranking concepts contained high percentages of genes already suspected to play roles in autism or in a related neurological disorder. In addition, many of the sets associated with autism at the DNA level also proved to be predictive of changes in gene expression within a separate population of autistic cases, suggesting that the signature of genomic variation may also be detectable in blood-based transcriptional profiles. This robust cross-validation with gene expression data from individuals with autism coupled with the enrichment within autism-related neurological disorders supported the possibility that the mutations play important roles in the onset of autism and should be given priority for further study. In sum, our work provides new leads into the genetic underpinnings of autism and highlights the importance of reanalysis of genomic studies of complex disease using prior knowledge of genetic organization.Author SummaryThe genetic heterogeneity of autism has stymied the search for causes and cures. Even whole-genomic studies on large numbers of families have yielded results of relatively little impact. In the present work, we reanalyze two of the most influential whole-genomic studies using a novel strategy that takes prior knowledge of genetic relationships into account in an effort to boost signal important to our understanding of the molecular structure of autism. Our approach demonstrates that these genome wide association studies contain more information relevant to autism than previously realized. We detected 68 highly significant collections of mutations that map to genes with measurable and significant changes in gene expression in autistic individuals, and that have been implicated in other neurological disorders believed to be closely related, and genetically linked, to autism. Our work provides leads into the genetic underpinnings of autism and highlights the importance of reanalysis of genomic studies of disease using prior knowledge of genetic organization.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Kelly B. Klingler ◽  
Joshua P. Jahner ◽  
Thomas L. Parchman ◽  
Chris Ray ◽  
Mary M. Peacock

Abstract Background Distributional responses by alpine taxa to repeated, glacial-interglacial cycles throughout the last two million years have significantly influenced the spatial genetic structure of populations. These effects have been exacerbated for the American pika (Ochotona princeps), a small alpine lagomorph constrained by thermal sensitivity and a limited dispersal capacity. As a species of conservation concern, long-term lack of gene flow has important consequences for landscape genetic structure and levels of diversity within populations. Here, we use reduced representation sequencing (ddRADseq) to provide a genome-wide perspective on patterns of genetic variation across pika populations representing distinct subspecies. To investigate how landscape and environmental features shape genetic variation, we collected genetic samples from distinct geographic regions as well as across finer spatial scales in two geographically proximate mountain ranges of eastern Nevada. Results Our genome-wide analyses corroborate range-wide, mitochondrial subspecific designations and reveal pronounced fine-scale population structure between the Ruby Mountains and East Humboldt Range of eastern Nevada. Populations in Nevada were characterized by low genetic diversity (π = 0.0006–0.0009; θW = 0.0005–0.0007) relative to populations in California (π = 0.0014–0.0019; θW = 0.0011–0.0017) and the Rocky Mountains (π = 0.0025–0.0027; θW = 0.0021–0.0024), indicating substantial genetic drift in these isolated populations. Tajima’s D was positive for all sites (D = 0.240–0.811), consistent with recent contraction in population sizes range-wide. Conclusions Substantial influences of geography, elevation and climate variables on genetic differentiation were also detected and may interact with the regional effects of anthropogenic climate change to force the loss of unique genetic lineages through continued population extirpations in the Great Basin and Sierra Nevada.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Cooper J. Park ◽  
Nicole A. Caimi ◽  
Debbie C. Buecher ◽  
Ernest W. Valdez ◽  
Diana E. Northup ◽  
...  

Abstract Background Antibiotic-producing Streptomyces bacteria are ubiquitous in nature, yet most studies of its diversity have focused on free-living strains inhabiting diverse soil environments and those in symbiotic relationship with invertebrates. Results We studied the draft genomes of 73 Streptomyces isolates sampled from the skin (wing and tail membranes) and fur surfaces of bats collected in Arizona and New Mexico. We uncovered large genomic variation and biosynthetic potential, even among closely related strains. The isolates, which were initially identified as three distinct species based on sequence variation in the 16S rRNA locus, could be distinguished as 41 different species based on genome-wide average nucleotide identity. Of the 32 biosynthetic gene cluster (BGC) classes detected, non-ribosomal peptide synthetases, siderophores, and terpenes were present in all genomes. On average, Streptomyces genomes carried 14 distinct classes of BGCs (range = 9–20). Results also revealed large inter- and intra-species variation in gene content (single nucleotide polymorphisms, accessory genes and singletons) and BGCs, further contributing to the overall genetic diversity present in bat-associated Streptomyces. Finally, we show that genome-wide recombination has partly contributed to the large genomic variation among strains of the same species. Conclusions Our study provides an initial genomic assessment of bat-associated Streptomyces that will be critical to prioritizing those strains with the greatest ability to produce novel antibiotics. It also highlights the need to recognize within-species variation as an important factor in genetic manipulation studies, diversity estimates and drug discovery efforts in Streptomyces.


Plants ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 518
Author(s):  
Siriporn Korinsak ◽  
Clive T. Darwell ◽  
Samart Wanchana ◽  
Lawan Praphaisal ◽  
Siripar Korinsak ◽  
...  

Bacterial leaf blight (BLB) is a serious disease affecting global rice agriculture caused by Xanthomonas oryzae pv. oryzae (Xoo). Most resistant rice lines are dependent on single genes that are vulnerable to resistance breakdown caused by pathogen mutation. Here we describe a genome-wide association study of 222 predominantly Thai rice accessions assayed by phenotypic screening against 20 Xoo isolates. Loci corresponding to BLB resistance were detected using >142,000 SNPs. We identified 147 genes according to employed significance thresholds across chromosomes 1–6, 8, 9 and 11. Moreover, 127 of identified genes are located on chromosomal regions outside estimated Linkage Disequilibrium influences of known resistance genes, potentially indicating novel BLB resistance markers. However, significantly associated SNPs only occurred across a maximum of six Xoo isolates indicating that the development of broad-spectrum Xoo strain varieties may prove challenging. Analyses indicated a range of gene functions likely underpinning BLB resistance. In accordance with previous studies of accession panels focusing on indica varieties, our germplasm displays large numbers of SNPs associated with resistance. Despite encouraging data suggesting that many loci contribute to resistance, our findings corroborate previous inferences that multi-strain resistant varieties may not be easily realised in breeding programs without resorting to multi-locus strategies.


2021 ◽  
Vol 118 (48) ◽  
pp. e2104642118
Author(s):  
Marty Kardos ◽  
Ellie E. Armstrong ◽  
Sarah W. Fitzpatrick ◽  
Samantha Hauser ◽  
Philip W. Hedrick ◽  
...  

The unprecedented rate of extinction calls for efficient use of genetics to help conserve biodiversity. Several recent genomic and simulation-based studies have argued that the field of conservation biology has placed too much focus on conserving genome-wide genetic variation, and that the field should instead focus on managing the subset of functional genetic variation that is thought to affect fitness. Here, we critically evaluate the feasibility and likely benefits of this approach in conservation. We find that population genetics theory and empirical results show that conserving genome-wide genetic variation is generally the best approach to prevent inbreeding depression and loss of adaptive potential from driving populations toward extinction. Focusing conservation efforts on presumably functional genetic variation will only be feasible occasionally, often misleading, and counterproductive when prioritized over genome-wide genetic variation. Given the increasing rate of habitat loss and other environmental changes, failure to recognize the detrimental effects of lost genome-wide genetic variation on long-term population viability will only worsen the biodiversity crisis.


2019 ◽  
Author(s):  
Susanne U. Franssen ◽  
Caroline Durrant ◽  
Olivia Stark ◽  
Bettina Moser ◽  
Tim Downing ◽  
...  

AbstractProtozoan parasites of the Leishmania donovani complex – L. donovani and L. infantum – cause the fatal disease visceral leishmaniasis. We present the first comprehensive genome-wide global study, with 151 cultured field isolates representing most of the geographical distribution. L. donovani isolates separated into five groups that largely coincide with geographical origin but vary greatly in diversity. In contrast, the majority of L. infantum samples fell into one globally-distributed group with little diversity. This picture is complicated by several hybrid lineages. Identified genetic groups vary in heterozygosity and levels of linkage, suggesting different recombination histories. We characterise chromosome-specific patterns of aneuploidy and identified extensive structural variation, including known and suspected drug resistance loci. This study reveals greater genetic diversity than suggested by geographically-focused studies, provides a resource of genomic variation for future work and sets the scene for a new understanding of the evolution and genetics of the Leishmania donovani complex.


2019 ◽  
Author(s):  
Lewis G. Spurgin ◽  
Mirte Bosse ◽  
Frank Adriaensen ◽  
Tamer Albayrak ◽  
Christos Barboutis ◽  
...  

AbstractA major aim of evolutionary biology is to understand why patterns of genomic diversity vary among populations and species. Large-scale genomic studies of widespread species are useful for studying how the environment and demographic history shape patterns of genomic divergence, and with the continually decreasing cost of sequencing and genotyping, such studies are now becoming feasible. Here, we carry out one of the most geographically comprehensive surveys of genomic variation in a wild vertebrate to date; the great tit (Parus major) HapMap project. We screened ca 500,000 SNP markers across 647 individuals from 29 populations, spanning almost the entire geographic range of the European great tit subspecies. We found that genome-wide variation was consistent with a recent colonisation across Europe from a single refugium in South-East Europe, with bottlenecks and reduced genetic diversity in island populations. Differentiation across the genome was highly heterogeneous, with clear “islands of differentiation” even among populations with very low levels of genome-wide differentiation. Low local recombination rate in the genome was a strong predictor of high local genomic differentiation (FST), especially in island and peripheral mainland populations, suggesting that the interplay between genetic drift and recombination is a key driver of highly heterogeneous differentiation landscapes. We also detected genomic outlier regions that were confined to one or more peripheral great tit populations, most likely as a result of recent directional selection at the range edges of this species. Haplotype-based measures of selection were also related to recombination rate, albeit less strongly, and highlighted population-specific sweeps that likely resulted from positive selection. These regions under positive selection contained candidate genes associated with morphology, thermal adaptation and colouration, providing promising avenues for future investigation. Our study highlights how comprehensive screens of genomic variation in wild organisms can provide unique insights into evolution.


Sign in / Sign up

Export Citation Format

Share Document