scholarly journals On the impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters

2019 ◽  
Author(s):  
Eleonora Rachtman ◽  
Metin Balaban ◽  
Vineet Bafna ◽  
Siavash Mirarab

AbstractThe ability to detect the identity of a sample obtained from its environment is a cornerstone of molecular ecological research. Thanks to the falling price of shotgun sequencing, genome skimming, the acquisition of short reads spread across the genome at low coverage, is emerging as an alternative to traditional barcoding. By obtaining far more data across the whole genome, skimming has the promise to increase the precision of sample identification beyond traditional barcoding while keeping the costs manageable. While methods for assembly-free sample identification based on genome skims are now available, little is known about how these methods react to the presence of DNA from organisms other than the target species. In this paper, we show that the accuracy of distances computed between a pair of genome skims based on k-mer similarity can degrade dramatically if the skims include contaminant reads; i.e., any reads originating from other organisms. We establish a theoretical model of the impact of contamination. We then suggest and evaluate a solution to the contamination problem: Query reads in a genome skim against an extensive database of possible contaminants (e.g., all microbial organisms) and filter out any read that matches. We evaluate the effectiveness of this strategy when implemented using Kraken-II, in detailed analyses. Our results show substantial improvements in accuracy as a result of filtering but also point to limitations, including a need for relatively close matches in the contaminant database.

2019 ◽  
Author(s):  
Chiheb Boudhrioua ◽  
Maxime Bastien ◽  
Davoud Torkamaneh ◽  
François Belzile

Abstract Sclerotinia stem rot (SSR), caused by Sclerotinia sclerotiorum (Lib.) de Bary, is an important cause of yield loss in soybean. Although many papers have reported different loci contributing to partial resistance, few of these were proved to reproduce the same phenotypic impact in different populations. In this study, we identified a major quantitative trait loci (QTL) associated with resistance to SSR progression on the main stem by using a genome-wide association mapping (GWAM). A population of 127 soybean accessions was genotyped with 1.5M SNPs derived from genotyping-by-sequencing (GBS) and whole-genome sequencing (WGS) ensuring an extensive genome coverage and phenotyped for SSR resistance. SNP-trait association led to discovery of a new QTL on chromosome 1 (Chr01) where resistant lines had shorter lesions on the stem by 29 mm . A single gene (Glyma.01g048000) resided in the same LD block as the peak SNP, but it is of unknown function. The impact of this QTL was even more significant in the descendants of a cross between two lines carrying contrasted alleles for Chr01. Individuals carrying the resistance allele developed lesions almost 50% shorter than those bearing the sensitivity allele. These results suggest that this region harbors a promising resistance QTL to SSR that can be used in soybean breeding program.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Shijing Feng ◽  
Zhenshan Liu ◽  
Jian Cheng ◽  
Zihe Li ◽  
Lu Tian ◽  
...  

AbstractZanthoxylum bungeanum is an important spice and medicinal plant that is unique for its accumulation of abundant secondary metabolites, which create a characteristic aroma and tingling sensation in the mouth. Owing to the high proportion of repetitive sequences, high heterozygosity, and increased chromosome number of Z. bungeanum, the assembly of its chromosomal pseudomolecules is extremely challenging. Here, we present a genome sequence for Z. bungeanum, with a dramatically expanded size of 4.23 Gb, assembled into 68 chromosomes. This genome is approximately tenfold larger than that of its close relative Citrus sinensis. After the divergence of Zanthoxylum and Citrus, the lineage-specific whole-genome duplication event η-WGD approximately 26.8 million years ago (MYA) and the recent transposable element (TE) burst ~6.41 MYA account for the substantial genome expansion in Z. bungeanum. The independent Zanthoxylum-specific WGD event was followed by numerous fusion/fission events that shaped the genomic architecture. Integrative genomic and transcriptomic analyses suggested that prominent species-specific gene family expansions and changes in gene expression have shaped the biosynthesis of sanshools, terpenoids, and anthocyanins, which contribute to the special flavor and appearance of Z. bungeanum. In summary, the reference genome provides a valuable model for studying the impact of WGDs with recent TE activity on gene gain and loss and genome reconstruction and provides resources to accelerate Zanthoxylum improvement.


2020 ◽  
Author(s):  
Chiheb Boudhrioua ◽  
Maxime Bastien ◽  
Davoud Torkamaneh ◽  
François Belzile

Abstract Background: Sclerotinia stem rot (SSR), caused by Sclerotinia sclerotiorum (Lib.) de Bary, is an important cause of yield loss in soybean. Although many papers have reported different loci contributing to partial resistance, few of these were proved to reproduce the same phenotypic impact in different populations. Results: In this study, we identified a major quantitative trait loci (QTL) associated with resistance to SSR progression on the main stem by using a genome-wide association mapping (GWAM). A population of 127 soybean accessions was genotyped with 1.5M SNPs derived from genotyping-by-sequencing (GBS) and whole-genome sequencing (WGS) ensuring an extensive genome coverage and phenotyped for SSR resistance. SNP-trait association led to discovery of a new QTL on chromosome 1 (Chr01) where resistant lines had shorter lesions on the stem by 29 mm . A single gene (Glyma.01g048000) resided in the same LD block as the peak SNP, but it is of unknown function. The impact of this QTL was even more significant in the descendants of a cross between two lines carrying contrasted alleles for Chr01. Individuals carrying the resistance allele developed lesions almost 50% shorter than those bearing the sensitivity allele. Conclusion: These results suggest that the new region on chromosome 1 harbors a promising resistance QTL to SSR that can be used in soybean breeding program.


2020 ◽  
Author(s):  
Chiheb Boudhrioua ◽  
Maxime Bastien ◽  
Davoud Torkamaneh ◽  
François Belzile

Abstract Background: Sclerotinia stem rot (SSR), caused by Sclerotinia sclerotiorum (Lib.) de Bary, is an important cause of yield loss in soybean. Although many papers have reported different loci contributing to partial resistance, few of these were proved to reproduce the same phenotypic impact in different populations. Results: In this study, we identified a major quantitative trait loci (QTL) associated with resistance to SSR progression on the main stem by using a genome-wide association mapping (GWAM). A population of 127 soybean accessions was genotyped with 1.5M SNPs derived from genotyping-by-sequencing (GBS) and whole-genome sequencing (WGS) ensuring an extensive genome coverage and phenotyped for SSR resistance. SNP-trait association led to discovery of a new QTL on chromosome 1 (Chr01) where resistant lines had shorter lesions on the stem by 29 mm . A single gene (Glyma.01g048000) resided in the same LD block as the peak SNP, but it is of unknown function. The impact of this QTL was even more significant in the descendants of a cross between two lines carrying contrasted alleles for Chr01. Individuals carrying the resistance allele developed lesions almost 50% shorter than those bearing the sensitivity allele. Conclusion: These results suggest that the new region on chromosome 1 harbors a promising resistance QTL to SSR that can be used in soybean breeding program.


2022 ◽  
Vol 12 ◽  
Author(s):  
Tianyu Deng ◽  
Pengfei Zhang ◽  
Dorian Garrick ◽  
Huijiang Gao ◽  
Lixian Wang ◽  
...  

Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals.


2020 ◽  
Author(s):  
Chiheb Boudhrioua ◽  
Maxime Bastien ◽  
Davoud Torkamaneh ◽  
François Belzile

Abstract Sclerotinia stem rot (SSR), caused by Sclerotinia sclerotiorum (Lib.) de Bary, is an important cause of yield loss in soybean. Although many papers have reported different loci contributing to partial resistance, few of these were proved to reproduce the same phenotypic impact in different populations. In this study, we identified a major quantitative trait loci (QTL) associated with resistance to SSR progression on the main stem by using a genome-wide association mapping (GWAM). A population of 127 soybean accessions was genotyped with 1.5M SNPs derived from genotyping-by-sequencing (GBS) and whole-genome sequencing (WGS) ensuring an extensive genome coverage and phenotyped for SSR resistance. SNP-trait association led to discovery of a new QTL on chromosome 1 (Chr01) where resistant lines had shorter lesions on the stem by 29 mm . A single gene (Glyma.01g048000) resided in the same LD block as the peak SNP, but it is of unknown function. The impact of this QTL was even more significant in the descendants of a cross between two lines carrying contrasted alleles for Chr01. Individuals carrying the resistance allele developed lesions almost 50% shorter than those bearing the sensitivity allele. These results suggest that this region harbors a promising resistance QTL to SSR that can be used in soybean breeding program.


Author(s):  
Rute da Fonseca ◽  
Paula Campos ◽  
Alba Rey de la Iglesia ◽  
Gustavo Barroso ◽  
Lucie Bergeron ◽  
...  

Whole genome sequence data is an ideal tool for characterizing processes in ecology and evolution. Despite the lowering in sequencing costs, it can be challenging to produce a genome and high-coverage resequencing data for a non-model species. New population genomics data analysis pipelines based on genotype likelihoods allow for a significant reduction in cost by efficiently extracting information from low coverage sequence data. We demonstrate the robustness of such approaches with a genomic data set consisting of two draft genomes of the European sardine (Sardina pilchardus, Walbaum 1792), and resequencing data (~1.5 X depth) for 78 individuals from 12 sampling locations across the 5,000 Km of the species’ distribution range (from the Eastern Mediterranean to the archipelagos of Madeira and Azores). Our results clearly show at least three genetic clusters. One includes individuals from Azores and Madeira (two archipelagos in the Atlantic), the second corresponds to Iberia (the center of the sampling distribution), and the third gathers the Mediterranean samples and those from the Canary Islands. This suggests at least two important barriers to gene flow, even though these do not seem complete, with individuals from Iberia showing some degree of admixture. These results together with the genetic resources generated for this commercially important taxon provide a baseline for further studies aiming at identifying the nature of these barriers between Sardine populations, and information for transnational stock management of this highly exploited species towards sustainable fisheries.


2019 ◽  
Author(s):  
Chiheb Boudhrioua ◽  
Maxime Bastien ◽  
Davoud Torkamaneh ◽  
François Belzile

Abstract Sclerotinia stem rot (SSR), caused by Sclerotinia sclerotiorum (Lib.) de Bary, is an important cause of yield loss in soybean. Although many papers have reported different loci contributing to partial resistance, few of these were proved to reproduce the same phenotypic impact in different populations. In this study, we identified a major quantitative trait loci (QTL) associated with resistance to SSR progression on the main stem by using a genome-wide association mapping (GWAM). A population of 127 soybean accessions was genotyped with 1.5M SNPs derived from genotyping-by-sequencing (GBS) and whole-genome sequencing (WGS) ensuring an extensive genome coverage and phenotyped for SSR resistance. SNP-trait association led to discovery of a new QTL on chromosome 1 (Gm01) where resistant lines had shorter lesions on the stem by 29 mm. The impact of this QTL was even more significant in the descendants of a cross between two lines carrying contrasted alleles for Gm01. Individuals carrying the resistance allele developed lesions almost 50% shorter than those bearing the sensitivity allele. These results suggest that this region harbors a promising resistance QTL to SSR that can be used in soybean breeding program.


Author(s):  
Daniella F Lato ◽  
G Brian Golding

Abstract Increasing evidence supports the notion that different regions of a genome have unique rates of molecular change. This variation is particularly evident in bacterial genomes where previous studies have reported gene expression and essentiality tend to decrease, while substitution rates usually increases with increasing distance from the origin of replication. Genomic reorganization such as rearrangements occur frequently in bacteria and allow for the introduction and restructuring of genetic content, creating gradients of molecular traits along genomes. Here, we explore the interplay of these phenomena by mapping substitutions to the genomes of Escherichia coli, Bacillus subtilis, Streptomyces, and Sinorhizobium meliloti, quantifying how many substitutions have occurred at each position in the genome. Preceding work indicates that substitution rate significantly increases with distance from the origin. Using a larger sample size and accounting for genome rearrangements through ancestral reconstruction, our analysis demonstrates that the correlation between the number of substitutions and distance from the origin of replication is often significant but small and inconsistent in direction. Some replicons had a significantly decreasing trend (E. coli and the chromosome of S. meliloti), while others showed the opposite significant trend (B. subtilis, Streptomyces, pSymA and pSymB in S. meliloti). dN, dS and ω were examined across all genes and there was no significant correlation between those values and distance from the origin. This study highlights the impact that genomic rearrangements and location have on molecular trends in some bacteria, illustrating the importance of considering spatial trends in molecular evolutionary analysis. Assuming that molecular trends are exclusively in one direction can be problematic.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Peter Higgins ◽  
Cooper A Grace ◽  
Soon A Lee ◽  
Matthew R Goddard

Abstract Saccharomyces cerevisiae is extensively utilized for commercial fermentation, and is also an important biological model; however, its ecology has only recently begun to be understood. Through the use of whole-genome sequencing, the species has been characterized into a number of distinct subpopulations, defined by geographical ranges and industrial uses. Here, the whole-genome sequences of 104 New Zealand (NZ) S. cerevisiae strains, including 52 novel genomes, are analyzed alongside 450 published sequences derived from various global locations. The impact of S. cerevisiae novel range expansion into NZ was investigated and these analyses reveal the positioning of NZ strains as a subgroup to the predominantly European/wine clade. A number of genomic differences with the European group correlate with range expansion into NZ, including 18 highly enriched single-nucleotide polymorphism (SNPs) and novel Ty1/2 insertions. While it is not possible to categorically determine if any genetic differences are due to stochastic process or the operations of natural selection, we suggest that the observation of NZ-specific copy number increases of four sugar transporter genes in the HXT family may reasonably represent an adaptation in the NZ S. cerevisiae subpopulation, and this correlates with the observations of copy number changes during adaptation in small-scale experimental evolution studies.


Sign in / Sign up

Export Citation Format

Share Document