scholarly journals Genotyping by sequencing can reveal the complex mosaic genomes in gene pools resulting from reticulate evolution: a case study in diploid and polyploid citrus

2019 ◽  
Vol 123 (7) ◽  
pp. 1231-1251 ◽  
Author(s):  
Dalel Ahmed ◽  
Aurore Comte ◽  
Franck Curk ◽  
Gilles Costantino ◽  
François Luro ◽  
...  

Abstract Background and Aims Reticulate evolution, coupled with reproductive features limiting further interspecific recombinations, results in admixed mosaics of large genomic fragments from the ancestral taxa. Whole-genome sequencing (WGS) data are powerful tools to decipher such complex genomes but still too costly to be used for large populations. The aim of this work was to develop an approach to infer phylogenomic structures in diploid, triploid and tetraploid individuals from sequencing data in reduced genome complexity libraries. The approach was applied to the cultivated Citrus gene pool resulting from reticulate evolution involving four ancestral taxa, C. maxima, C. medica, C. micrantha and C. reticulata. Methods A genotyping by sequencing library was established with the restriction enzyme ApeKI applying one base (A) selection. Diagnostic single nucleotide polymorphisms (DSNPs) for the four ancestral taxa were mined in 29 representative varieties. A generic pipeline based on a maximum likelihood analysis of the number of read data was established to infer ancestral contributions along the genome of diploid, triploid and tetraploid individuals. The pipeline was applied to 48 diploid, four triploid and one tetraploid citrus accessions. Key Results Among 43 598 mined SNPs, we identified a set of 15 946 DSNPs covering the whole genome with a distribution similar to that of gene sequences. The set efficiently inferred the phylogenomic karyotype of the 53 analysed accessions, providing patterns for common accessions very close to that previously established using WGS data. The complex phylogenomic karyotypes of 21 cultivated citrus, including bergamot, triploid and tetraploid limes, were revealed for the first time. Conclusions The pipeline, available online, efficiently inferred the phylogenomic structures of diploid, triploid and tetraploid citrus. It will be useful for any species whose reproductive behaviour resulted in an interspecific mosaic of large genomic fragments. It can also be used for the first generations of interspecific breeding schemes.

Genome ◽  
2021 ◽  
Author(s):  
Guoliang Li ◽  
Lixin Yue ◽  
Xu Cai ◽  
Fei Li ◽  
Hui Zhang ◽  
...  

This study evaluated genotyping by sequencing (GBS) protocol for fingerprinting Brassica rapa and the data derived were more reliable than the re-sequencing data of B. rapa. Of the 10 enzyme solutions used to analyze the numbers of genotypes and single nucleotide polymorphisms (SNPs) in B. rapa, five solutions showed better results, namely: A (HaeIII, 450–500 bp), E (RsaI+HaeIII, 500–550 bp), F (RsaI+HaeIII, 500–600 bp), G (RsaI+HaeIII, ‘All’ fragment), and J (RsaI+EcoRV-HF®, ‘All’ fragment). The five enzyme solutions showed less than 40% similarity in different individuals from various samples, and 90% similarity in between two individuals from one sample. The E enzyme solution was most suitable for fingerprinting B. rapa revealing well-distributed SNPs in the whole genome. Of the 82 highly inbred lines and 18 F1 lines of B. rapa sequenced by GBS in E enzyme solution, known parents of 10 F1 lines were verified and male parents were discovered for 8 F1 lines that had only known female parents. This study provided a valuable method for screening parents for F1 lines in B. rapa for applied breeding through efficient evaluation of GBS with varied library construction strategies.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 1042
Author(s):  
Zhuoying Weng ◽  
Yang Yang ◽  
Xi Wang ◽  
Lina Wu ◽  
Sijie Hua ◽  
...  

Pedigree information is necessary for the maintenance of diversity for wild and captive populations. Accurate pedigree is determined by molecular marker-based parentage analysis, which may be influenced by the polymorphism and number of markers, integrity of samples, relatedness of parents, or different analysis programs. Here, we described the first development of 208 single nucleotide polymorphisms (SNPs) and 11 microsatellites for giant grouper (Epinephelus lanceolatus) taking advantage of Genotyping-by-sequencing (GBS), and compared the power of SNPs and microsatellites for parentage and relatedness analysis, based on a mixed family composed of 4 candidate females, 4 candidate males and 289 offspring. CERVUS, PAPA and COLONY were used for mutually verification. We found that SNPs had a better potential for relatedness estimation, exclusion of non-parentage and individual identification than microsatellites, and > 98% accuracy of parentage assignment could be achieved by 100 polymorphic SNPs (MAF cut-off < 0.4) or 10 polymorphic microsatellites (mean Ho = 0.821, mean PIC = 0.651). This study provides a reference for the development of molecular markers for parentage analysis taking advantage of next-generation sequencing, and contributes to the molecular breeding, fishery management and population conservation.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ho-Yon Hwang ◽  
Jiou Wang

AbstractGenetic mapping is used in forward genetics to narrow the list of candidate mutations and genes corresponding to the mutant phenotype of interest. Even with modern advances in biology such as efficient identification of candidate mutations by whole-genome sequencing, mapping remains critical in pinpointing the responsible mutation. Here we describe a simple, fast, and affordable mapping toolkit that is particularly suitable for mapping in Caenorhabditis elegans. This mapping method uses insertion-deletion polymorphisms or indels that could be easily detected instead of single nucleotide polymorphisms in commonly used Hawaiian CB4856 mapping strain. The materials and methods were optimized so that mapping could be performed using tiny amount of genetic material without growing many large populations of mutants for DNA purification. We performed mapping of previously known and unknown mutations to show strengths and weaknesses of this method and to present examples of completed mapping. For situations where Hawaiian CB4856 is unsuitable, we provide an annotated list of indels as a basis for fast and easy mapping using other wild isolates. Finally, we provide rationale for using this mapping method over other alternatives as a part of a comprehensive strategy also involving whole-genome sequencing and other methods.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 258
Author(s):  
Karim Karimi ◽  
Duy Ngoc Do ◽  
Mehdi Sargolzaei ◽  
Younes Miar

Characterizing the genetic structure and population history can facilitate the development of genomic breeding strategies for the American mink. In this study, we used the whole genome sequences of 100 mink from the Canadian Centre for Fur Animal Research (CCFAR) at the Dalhousie Faculty of Agriculture (Truro, NS, Canada) and Millbank Fur Farm (Rockwood, ON, Canada) to investigate their population structure, genetic diversity and linkage disequilibrium (LD) patterns. Analysis of molecular variance (AMOVA) indicated that the variation among color-types was significant (p < 0.001) and accounted for 18% of the total variation. The admixture analysis revealed that assuming three ancestral populations (K = 3) provided the lowest cross-validation error (0.49). The effective population size (Ne) at five generations ago was estimated to be 99 and 50 for CCFAR and Millbank Fur Farm, respectively. The LD patterns revealed that the average r2 reduced to <0.2 at genomic distances of >20 kb and >100 kb in CCFAR and Millbank Fur Farm suggesting that the density of 120,000 and 24,000 single nucleotide polymorphisms (SNP) would provide the adequate accuracy of genomic evaluation in these populations, respectively. These results indicated that accounting for admixture is critical for designing the SNP panels for genotype-phenotype association studies of American mink.


2020 ◽  
Author(s):  
Kyle Fletcher ◽  
Lin Zhang ◽  
Juliana Gil ◽  
Rongkui Han ◽  
Keri Cavanaugh ◽  
...  

AbstractBackgroundGenetic maps are an important resource for validation of genome assemblies, trait discovery, and breeding. Next generation sequencing has enabled production of high-density genetic maps constructed with 10,000s of markers. Most current approaches require a genome assembly to identify markers. Our Assembly Free Linkage Analysis Pipeline (AFLAP) removes this requirement by using uniquely segregating k-mers as markers to rapidly construct a genotype table and perform subsequent linkage analysis. This avoids potential biases including preferential read alignment and variant calling.ResultsThe performance of AFLAP was determined in simulations and contrasted to a conventional workflow. We tested AFLAP using 100 F2 individuals of Arabidopsis thaliana, sequenced to low coverage. Genetic maps generated using k-mers contained over 130,000 markers that were concordant with the genomic assembly. The utility of AFLAP was then demonstrated by generating an accurate genetic map using genotyping-by-sequencing data of 235 recombinant inbred lines of Lactuca spp. AFLAP was then applied to 83 F1 individuals of the oomycete Bremia lactucae, sequenced to >5x coverage. The genetic map contained over 90,000 markers ordered in 19 large linkage groups. This genetic map was used to fragment, order, orient, and scaffold the genome, resulting in a much-improved reference assembly.ConclusionsAFLAP can be used to generate high density linkage maps and improve genome assemblies of any organism when a mapping population is available using whole genome sequencing or genotyping-by-sequencing data. Genetic maps produced for B. lactucae were accurately aligned to the genome and guided significant improvements of the reference assembly.


2021 ◽  
Author(s):  
Scott T O’Donnell ◽  
Sorel T Fitz-Gibbon ◽  
Victoria L Sork

Abstract Ancient introgression can be an important source of genetic variation that shapes the evolution and diversification of many taxa. Here, we estimate the timing, direction and extent of gene flow between two distantly related oak species in the same section (Quercus sect. Quercus). We estimated these demographic events using genotyping by sequencing data (GBS), which generated 25,702 single nucleotide polymorphisms (SNPs) for 24 individuals of California scrub oak (Quercus berberidifolia) and 23 individuals of Engelmann oak (Q. engelmannii). We tested several scenarios involving gene flow between these species using the diffusion approximation-based population genetic inference framework and model-testing approach of the Python package DaDi. We found that the most likely demographic scenario includes a bottleneck in Q. engelmannii that coincides with asymmetric gene flow from Q. berberidifolia into Q. engelmannii. Given that the timing of this gene flow coincides with the advent of a Mediterranean-type climate in the California Floristic Province, we propose that changing precipitation patterns and seasonality may have favored the introgression of climate-associated genes from the endemic into the non-endemic California oak.


2016 ◽  
Vol 7 (1) ◽  
Author(s):  
Michael Baym ◽  
Lev Shaket ◽  
Isao A. Anzai ◽  
Oluwakemi Adesina ◽  
Buz Barstow

Abstract Whole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally, their construction has required an extraordinary technical effort. Here we report a method for the construction and purification of a curated whole-genome collection of single-gene transposon disruption mutants termed Knockout Sudoku. Using simple combinatorial pooling, a highly oversampled collection of mutants is condensed into a next-generation sequencing library in a single day, a 30- to 100-fold improvement over prior methods. The identities of the mutants in the collection are then solved by a probabilistic algorithm that uses internal self-consistency within the sequencing data set, followed by rapid algorithmically guided condensation to a minimal representative set of mutants, validation, and curation. Starting from a progenitor collection of 39,918 mutants, we compile a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR-1 containing representatives for 3,667 genes that is functionally validated by high-throughput kinetic measurements of quinone reduction.


2015 ◽  
Author(s):  
Ken G Dodds ◽  
John C McEwan ◽  
Rudiger Brauning ◽  
Rayna M Anderson ◽  
Tracey C van Stijn ◽  
...  

Background Genotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower quality. A common analysis strategy is to filter SNPs to just those with sufficient depth, thereby greatly reducing the number of SNPs available. We investigate methods for estimating relatedness using GBS data, including results of low depth, using theoretical calculation, simulation and application to a real data set. Results We show that unbiased estimates of relatedness can be obtained by using only those SNPs with genotype calls in both individuals. The expected value of this estimator is independent of the SNP depth in each individual, under a model of genotype calling that includes the special case of the two alleles being read at random. In contrast, the estimator of self-relatedness does depend on the SNP depth, and we provide a modification to provide unbiased estimates of self-relatedness. We refer to these methods of estimation as kinship using GBS with depth adjustment (KGD). The estimators can be calculated using matrix methods, which allow efficient computation. Simulation results were consistent with the methods being unbiased, and suggest that the optimal sequencing depth is around 2-4 for relatedness between individuals and 5-10 for self-relatedness. Application to a real data set revealed that some SNP filtering may still be necessary, for the exclusion of SNPs which did not behave in a Mendelian fashion. A simple graphical method (a ‘fin plot’) is given to illustrate this issue and to guide filtering parameters. Conclusion We provide a method which gives unbiased estimates of relatedness, based on SNPs assayed by GBS, which accounts for the depth (including zero depth) of the genotype calls. This allows GBS to be applied at read depths which can be chosen to optimise the information obtained. SNPs with excess heterozygosity, often due to (partial) polyploidy or other duplications can be filtered based on a simple graphical method.


2021 ◽  
Author(s):  
Laxman Adhikari ◽  
Sandesh Shrestha ◽  
Shuanyge Wu ◽  
Jared Crain ◽  
Liangliang Gao ◽  
...  

Abstract The development of next generation sequencing (NGS) enabled a shift from array-based genotyping to high-throughput genotyping by directly sequencing genomic libraries. Even though whole genome sequencing was initially too costly for routine analysis in large populations, such as those utilized for breeding or genetic studies, continued advancements in genome sequencing and bioinformatics have provided the opportunity to utilize whole-genome information. As new sequencing platforms can routinely provide high-quality sequencing data for sufficient genome coverage, a limitation comes in the time and high cost of library construction when multiplexing a large number of samples. Here we describe a high-throughput whole-genome skim-sequencing (skim-seq) approach that can be utilized for a broad range of genotyping and genomic characterization. Using optimized low-volume Illumina Nextera chemistry, we developed a skim-seq method and combined up to 960 samples in one multiplex library using dual index barcoding. With the dual-index barcoding, the number of samples for multiplexing can be adjusted depending on amount of data required and extended to 3,072 samples or more. Panels of double haploid wheat lines (Triticum aestivum, CDC Stanley x CDC Landmark), wheat-barley (T. aestivum x Hordeum vulgare) and wheat-wheatgrass (Triticum durum x Thinopyrum intermedium) introgression lines as well as known monosomic wheat stocks were genotyped using the skim-seq approach. Bioinformatics pipelines were developed for various applications where sequencing coverage ranged from 1x down to 0.01x per sample. Using reference genomes, we detected chromosome dosage, identified aneuploidy, and karyotyped introgression lines from the low coverage skim-seq data. Leveraging the recent advancements in genome sequencing, skim-seq provides an effective and low-cost tool for routine genotyping and genetic analysis, which can track and identify introgressions and genomic regions of interest in genetics research and applied breeding programs.


Sign in / Sign up

Export Citation Format

Share Document