A High-Throughput Skim-sequencing Approach for Genotyping, Dosage Estimation and Identifying Translocations
Abstract The development of next generation sequencing (NGS) enabled a shift from array-based genotyping to high-throughput genotyping by directly sequencing genomic libraries. Even though whole genome sequencing was initially too costly for routine analysis in large populations, such as those utilized for breeding or genetic studies, continued advancements in genome sequencing and bioinformatics have provided the opportunity to utilize whole-genome information. As new sequencing platforms can routinely provide high-quality sequencing data for sufficient genome coverage, a limitation comes in the time and high cost of library construction when multiplexing a large number of samples. Here we describe a high-throughput whole-genome skim-sequencing (skim-seq) approach that can be utilized for a broad range of genotyping and genomic characterization. Using optimized low-volume Illumina Nextera chemistry, we developed a skim-seq method and combined up to 960 samples in one multiplex library using dual index barcoding. With the dual-index barcoding, the number of samples for multiplexing can be adjusted depending on amount of data required and extended to 3,072 samples or more. Panels of double haploid wheat lines (Triticum aestivum, CDC Stanley x CDC Landmark), wheat-barley (T. aestivum x Hordeum vulgare) and wheat-wheatgrass (Triticum durum x Thinopyrum intermedium) introgression lines as well as known monosomic wheat stocks were genotyped using the skim-seq approach. Bioinformatics pipelines were developed for various applications where sequencing coverage ranged from 1x down to 0.01x per sample. Using reference genomes, we detected chromosome dosage, identified aneuploidy, and karyotyped introgression lines from the low coverage skim-seq data. Leveraging the recent advancements in genome sequencing, skim-seq provides an effective and low-cost tool for routine genotyping and genetic analysis, which can track and identify introgressions and genomic regions of interest in genetics research and applied breeding programs.