scholarly journals Towards population genomics in non-model species with large genomes: a case study of the marine zooplankton Calanus finmarchicus

2019 ◽  
Vol 6 (2) ◽  
pp. 180608 ◽  
Author(s):  
Marvin Choquet ◽  
Irina Smolina ◽  
Anusha K. S. Dhanasiri ◽  
Leocadio Blanco-Bercial ◽  
Martina Kopp ◽  
...  

Advances in next-generation sequencing technologies and the development of genome-reduced representation protocols have opened the way to genome-wide population studies in non-model species. However, species with large genomes remain challenging, hampering the development of genomic resources for a number of taxa including marine arthropods. Here, we developed a genome-reduced representation method for the ecologically important marine copepod Calanus finmarchicus (haploid genome size of 6.34 Gbp). We optimized a capture enrichment-based protocol based on 2656 single-copy genes, yielding a total of 154 087 high-quality SNPs in C. finmarchicus including 62 372 in common among the three locations tested. The set of capture probes was also successfully applied to the congeneric C. glacialis . Preliminary analyses of these markers revealed similar levels of genetic diversity between the two Calanus species, while populations of C. glacialis showed stronger genetic structure compared to C. finmarchicus . Using this powerful set of markers, we did not detect any evidence of hybridization between C. finmarchicus and C. glacialis . Finally, we propose a shortened version of our protocol, offering a promising solution for population genomics studies in non-model species with large genomes.

Author(s):  
Shivangi Nath ◽  
Daniel E. Shaw ◽  
Michael A. White

AbstractWhile the cost and time for assembling a genome have drastically reduced, it still remains a challenge to assemble a highly contiguous genome. These challenges are rapidly being overcome by the integration of long-read sequencing technologies. Here, we use long sequencing reads to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species. Using Pacific Biosciences sequencing, we were able to fill over 76% of the gaps in the genome, improving contiguity over five-fold. Our approach was highly accurate, validated by 10X Genomics long-distance linked-reads. In addition to closing a majority of gaps, we were able to assemble segments of telomeres and centromeres throughout the genome. This highlights the power of using long sequencing reads to assemble highly repetitive and difficult to assemble regions of genomes. This latest genome build has been released through a newly designed community genome browser that aims to consolidate the growing number of genomics datasets available for the threespine stickleback fish.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Chih-Ming Hung ◽  
Ai-Yun Yu ◽  
Yu-Ting Lai ◽  
Pei-Jen L. Shaner

Abstract Microsatellites have a wide range of applications from behavioral biology, evolution, to agriculture-based breeding programs. The recent progress in the next-generation sequencing technologies and the rapidly increasing number of published genomes may greatly enhance the current applications of microsatellites by turning them from anonymous to informative markers. Here we developed an approach to anchor microsatellite markers of any target species in a genome of a related model species, through which the genomic locations of the markers, along with any functional genes potentially linked to them, can be revealed. We mapped the shotgun sequence reads of a non-model rodent species Apodemus semotus against the genome of a model species, Mus musculus, and presented 24 polymorphic microsatellite markers with detailed background information for A. semotus in this study. The developed markers can be used in other rodent species, especially those that are closely related to A. semotus or M. musculus. Compared to the traditional approaches based on DNA cloning, our approach is likely to yield more loci for the same cost. This study is a timely demonstration of how a research team can efficiently generate informative (neutral or function-associated) microsatellite markers for their study species and unique biological questions.


Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 222
Author(s):  
Bartosz Ulaszewski ◽  
Joanna Meger ◽  
Jaroslaw Burczyk

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.


2020 ◽  
Vol 36 (12) ◽  
pp. 3669-3679 ◽  
Author(s):  
Can Firtina ◽  
Jeremie S Kim ◽  
Mohammed Alser ◽  
Damla Senol Cali ◽  
A Ercument Cicek ◽  
...  

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.


Genome ◽  
2009 ◽  
Vol 52 (6) ◽  
pp. 566-575 ◽  
Author(s):  
Harpinder S. Randhawa ◽  
Jaswinder Singh ◽  
Peggy G. Lemaux ◽  
Kulvinder S. Gill

Gene distribution is highly uneven in the large genomes of barley and wheat; however, location, order, and gene density of gene-containing regions are very similar between the two genomes. Flanking sequences from 35 unique, single-copy, barley Ds insertion events were physically mapped using wheat nullisomic-tetrasomic, ditelosomic, and deletion lines. Of the 35 sequences, 23 (66%) detected 34 loci mapping on all 7 homoeologous wheat groups. Seven sequences were not mapped owing to lack of polymorphism and the remaining 5 (14%) were barley-specific. All 34 loci physically mapped to the previously identified gene-rich regions (GRRs) of wheat, making the contained genes candidates for targeted mutagenesis by remobilization. Transpositions occurred preferentially into GRRs with higher recombination rates. The GRRs containing 17 of the 23 Ds insertions accounted for 60%–89% of the respective arm’s recombination. The remaining 6 (17%) insertions mapped to GRRs with <15% of the arm’s recombination. Overall, kb/cM estimates for the Ds-containing GRRs were twofold higher than those for regions without insertions. These results suggest that all genes may be targeted by transposon-based gene cloning, although the transposition frequency for genes present in recombination-poor regions is significantly less than that present in highly recombinogenic regions.


Genetics ◽  
1992 ◽  
Vol 132 (1) ◽  
pp. 125-133 ◽  
Author(s):  
N L Glass ◽  
L Lee

Abstract In the filamentous fungus, Neurospora crassa, mating type is regulated by a single locus with alternate alleles, termed A and a. The mating type alleles control entry into the sexual cycle, but during vegetative growth they function to elicit heterokaryon incompatibility, such that fusion of A and a hypha results in death of cells along the fusion point. Previous studies have shown that the A allele consists of 5301 bp and has no similarity to the a allele; it is found as a single copy and only within the A genome. The a allele is 3235 bp in length and it, too, is found as a single copy within the a genome. Within the A sequence, a single open reading frame (ORF) of 288 amino acids (mt A-1) is thought to confer fertility and heterokaryon incompatibility. In this study, we have used repeat induced point (RIP) mutation to identify functional regions of the A idiomorph. RIP mutations in mt A-1 resulted in the isolation of sterile, heterokaryon-compatible mutants, while RIP mutations generated in a region outside of mt A-1 resulted in the isolation of mutants capable of mating, but deficient in ascospore formation.


2021 ◽  
Vol 111 (1) ◽  
pp. 8-11
Author(s):  
Remco Stam ◽  
Pierre Gladieux ◽  
Boris A. Vinatzer ◽  
Erica M. Goss ◽  
Neha Potnis ◽  
...  

Population genetics has been a key discipline in phytopathology for many years. The recent rise in cost-effective, high-throughput DNA sequencing technologies, allows sequencing of dozens, if not hundreds of specimens, turning population genetics into population genomics and opening up new, exciting opportunities as described in this Focus Issue . Without the limitations of genetic markers and the availability of whole or near whole-genome data, population genomics can give new insights into the biology, evolution and adaptation, and dissemination patterns of plant-associated microbes.


2021 ◽  
Author(s):  
Romain Feron ◽  
Robert Michael Waterhouse

Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. In order to guide forthcoming genome generation efforts and promote efficient prioritisation of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. Here we present an automated analysis workflow that surveys genome assemblies from the United States National Center for Biotechnology Information (NCBI), assesses their completeness using the relevant Benchmarking Universal Single-Copy Orthologue (BUSCO) datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, we examine how key assembly metrics relate to gene content completeness, and we compare results from using different BUSCO lineage datasets. These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritisations for ongoing and future sampling, sequencing, and genome generation initiatives.


2021 ◽  
Author(s):  
Cushla J Metcalfe ◽  
Jingchuan Li ◽  
Bangyou Zheng ◽  
Jiri Stiller ◽  
Adam Healey ◽  
...  

Abstract The large complex genomes of many crops constrain the use of new technologies for genome-assisted selection and genetic improvement. One method to simplify a genome is to break it into individual chromosomes by flow cytometry, however, in many crop species most chromosomes cannot be isolated individually. Flow sorting of a single copy of a chromosome has been developed in wheat and here we demonstrate its use to identify markers of interest in an Erianthus/Sacchurum hybrid. Erianthus/Saccharum hybrids are of interest because Erianthus is known to be highly resistant to soil borne diseases which cause extensive sugarcane yield losses in Australia. Sugarcane (Saccharum) cultivars are autopolyploids with a highly complex genome and over 100 chromosomes. Flow cytometry for sugarcane, as in most crops, does not resolve individual chromosomes to a karyotype peak for sorting. To isolate a single chromosome, we used genomic in situ hybridisation (GISH) to identify the flow karyotype region containing the Erianthus chromosomes, flow sorted single chromosomes from this region, PCR screened for the Erianthus chromosomes and sequenced them. One Erianthus chromosome amplified and sequenced well, and from this data we could identify 57 resistant type genes and SNPs in nearly half of these genes. We developed KASP SNP assays and demonstrated that the identified SNP markers segregated as expected in a small introgression population. The pipeline we developed here to flow sort and sequence single chromosomes could be used in any crop with a large complex genome to rapidly discover and develop markers to important loci.


Sign in / Sign up

Export Citation Format

Share Document