scholarly journals Testing pipelines for genome-wide SNP calling from Genotyping-By-Sequencing (GBS) data for Pinus ponderosa

Author(s):  
Mengjun Shu ◽  
Emily V. Moran

Abstract Background Single Nucleotide Polymorphism (SNP) markers have rapidly gained popularity due to their abundance in most genomes and their amenability to high-throughput genotyping techniques. Reduced-representation restriction-enzyme-based sequencing methods (GBS or RADseq) have been demonstrated to be robust and cost-effective genotyping methods. While previous studies have shown that alignment of the short-read fragments to a genome sequence results in better SNP calling than de novo approaches, only a few tree species - and few conifers in particular - have an annotated sequence. While these could be used to align sequence fragments from related species, sequence divergence might result in SNPs being missed if they are in fragments that don't align properly. Producing a new annotated genome sequence for every conifer species before SNP analyses are conducted is still prohibitive, as many conifer genomes are huge (> 19 GB) and include a large proportion of repeat sequences, making assembly difficult. Here we compare four bioinformatics pipelines, two of which require a reference genome (TASSEL-GBS V2 and Stacks), two of which are de novo pipelines (UNEAK and Stacks). We used Illumina sequence data from 94 ponderosa pines, with loblolly pine as the reference genome. Results The number of SNPs called was much lower without a reference genome (62–196 thousand vs. 2.1–2.7 million SNPs). UNEAK was the fastest overall and identified more SNPs than Stacks de novo. Stacks with a reference genome produced the highest number of SNPs with lowest proportion of paralogs, while SNPs identified by TASSEL-GBS V2 exhibited the highest heterozygosity, minor allele frequency, and proportion of paralogs. More SNPs were uniquely identified by Stacks than TASSEL, though there was high overlap between methods. Conclusion The present case study provides a comprehensive comparison between four commonly-used SNP calling pipelines, and identifies the Stacks reference-based approach as the best overall for conifers (or other species with large repetitive genomes) that do not have a published reference genome for the same species. However, all four pipelines had distinct benefits and limitations, with Stacks for instance being less user-friendly than some of the other pipelines. In addition, researchers studying other conifer species using similar approaches should be prepared to analyze very large numbers of SNPs.

Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 246
Author(s):  
Xiaomeng Chen ◽  
Rui Li ◽  
Yonglin Wang ◽  
Aining Li

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.


2021 ◽  
Author(s):  
Xinxin Yi ◽  
Jing Liu ◽  
Shengcai Chen ◽  
Hao Wu ◽  
Min Liu ◽  
...  

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.


2019 ◽  
Author(s):  
Kenta Shirasawa ◽  
Akifumi Azuma ◽  
Fumiya Taniguchi ◽  
Toshiya Yamamoto ◽  
Akihiko Sato ◽  
...  

AbstractThis study presents the first genome sequence of an interspecific grape hybrid, ‘Shine Muscat’ (Vitis labruscana × V. vinifera), an elite table grape cultivar bred in Japan. The complexity of the genome structure, arising from the interspecific hybridization, necessitated the use of a sophisticated genome assembly pipeline with short-read genome sequence data. The resultant genome assemblies consisted of two types of sequences: a haplotype-phased sequence of the highly heterozygous genomes and an unphased sequence representing a “haploid” genome. The unphased sequences spanned 490.1 Mb in length, 99.4% of the estimated genome size, with 8,696 scaffold sequences with an N50 length of 13.2 Mb. The phased sequences had 15,650 scaffolds spanning 1.0 Gb with N50 of 4.2 Mb. The two sequences comprised 94.7% and 96.3% of the core eukaryotic genes, indicating that the entire genome of ‘Shine Muscat’ was represented. Examination of genome structures revealed possible genome rearrangements between the genomes of ‘Shine Muscat’ and a V. vinifera line. Furthermore, full-length transcriptome sequencing analysis revealed 13,947 gene loci on the ‘Shine Muscat’ genome, from which 26,199 transcript isoforms were transcribed. These genome resources provide new insights that could help cultivation and breeding strategies produce more high-quality table grapes such as ‘Shine Muscat’.


2019 ◽  
Vol 11 (7) ◽  
pp. 1965-1970 ◽  
Author(s):  
Nikola Palevich ◽  
Paul H Maclean ◽  
Abdul Baten ◽  
Richard W Scott ◽  
David M Leathwick

Abstract Internal parasitic nematodes are a global animal health issue causing drastic losses in livestock. Here, we report a H. contortus representative draft genome to serve as a genetic resource to the scientific community and support future experimental research of molecular mechanisms in related parasites. A de novo hybrid assembly was generated from PCR-free whole genome sequence data, resulting in a chromosome-level assembly that is 465 Mb in size encoding 22,341 genes. The genome sequence presented here is consistent with the genome architecture of the existing Haemonchus species and is a valuable resource for future studies regarding population genetic structures of parasitic nematodes. Additionally, comparative pan-genomics with other species of economically important parasitic nematodes have revealed highly open genomes and strong collinearities within the phylum Nematoda.


2020 ◽  
Vol 9 (21) ◽  
Author(s):  
Matías Poblete-Morales ◽  
Claudia Rabert ◽  
Andrés F. Olea ◽  
Héctor Carrasco ◽  
Raúl Calderón ◽  
...  

ABSTRACT Here, we announce the draft genome sequence of Pseudomonas sp. strain AN3A02, isolated from the rhizosphere of one of the only two species of vascular plants existing in the Antarctic continent, Deschampsia antarctica Desv. This isolate, which inhibited the mycelial growth of Botrytis cinerea in dual culture, has a genome sequence of 6,778,644 bp, with a G+C content of 60.4%. These draft genome sequence data provide insight into the genetics underpinning the antifungal activity of this strain.


2020 ◽  
Vol 9 (37) ◽  
Author(s):  
Samuel O’Donnell ◽  
Frederic Chaux ◽  
Gilles Fischer

ABSTRACT The current Chlamydomonas reinhardtii reference genome remains fragmented due to gaps stemming from large repetitive regions. To overcome the vast majority of these gaps, publicly available Oxford Nanopore Technology data were used to create a new reference-quality de novo genome assembly containing only 21 contigs, 30/34 telomeric ends, and a genome size of 111 Mb.


2015 ◽  
Vol 3 (6) ◽  
Author(s):  
F. Wu ◽  
X. Deng ◽  
G. Liang ◽  
C. Wallis ◽  
J. T. Trumble ◽  
...  

The draft genome sequence of “ Candidatus Liberibacter solanacearum” strain RSTM from a potato psyllid ( Bactericera cockerelli ) in California is reported here. The RSTM strain has a genome size of 1,286,787 bp, a G+C content of 35.1%, 1,211 predicted open reading frames (ORFs), and 43 RNA genes.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 297 ◽  
Author(s):  
Jason R. Miller ◽  
Sergey Koren ◽  
Kari A. Dilley ◽  
Derek M. Harkins ◽  
Timothy B. Stockwell ◽  
...  

Background:The tick cell line ISE6, derived fromIxodes scapularis, is commonly used for amplification and detection of arboviruses in environmental or clinical samples.Methods:To assist with sequence-based assays, we sequenced the ISE6 genome with single-molecule, long-read technology.Results:The draft assembly appears near complete based on gene content analysis, though it appears to lack some instances of repeats in this highly repetitive genome. The assembly appears to have separated the haplotypes at many loci. DNA short read pairs, used for validation only, mapped to the cell line assembly at a higher rate than they mapped to theIxodes scapularisreference genome sequence.Conclusions:The assembly could be useful for filtering host genome sequence from sequence data obtained from cells infected with pathogens.


2019 ◽  
Author(s):  
Antonis Kioukis ◽  
Vassiliki A. Michalopoulou ◽  
Laura Briers ◽  
Stergios Pirintsos ◽  
David J. Studholme ◽  
...  

AbstractCrop wild relatives contain great levels of genetic diversity, representing an invaluable resource for crop improvement. Many of their traits have the potential to help crops become more resistant and resilient, and adapt to the new conditions that they will experience due to climate change. An impressive global effort occurs for the conservation of various wild crop relatives and facilitates their use in crop breeding for food security.The genus Brassica is listed in Annex I of the International Treaty on Plant Genetic Resources for Food and Agriculture. Brassica oleracea (or wild cabbage) is a species native to coastal southern and western Europe that has become established as an important human food crop plant because of its large reserves stored over the winter in its leaves.Brassica cretica Lam. is a wild relative crop in the brassica group and B. cretica subsp. nivea has been suggested as a separate subspecies. The species B. cretica has been proposed as a potential gene donor to a number of crops in the brassica group, including broccoli, Brussels sprout, cabbage, cauliflower, kale, swede, turnip and oilseed rape.Here, we present the draft de novo genome assemblies of four B. cretica individuals, including two B. cretica subsp. nivea and two B. cretica.De novo assembly of Illumina MiSeq genomic shotgun sequencing data yielded 243,461 contigs totalling 412.5 Mb in length, corresponding to 122 % of the estimated genome size of B. cretica (339 Mb). According to synteny mapping and phylogenetic analysis of conserved genes, B. cretica genome based on our sequence data reveals approximately 30.360 proteins.Furthermore, our demographic analysis based on whole genome data, suggests that distinct populations of B. cretica are not isolated. Our findings suggest that the classification of the B. cretica in distinct subspecies is not supported from the genome sequence data we analyzed.


2021 ◽  
Vol 10 (28) ◽  
Author(s):  
Ryosuke Nakai ◽  
Hiroyuki Kusada ◽  
Fumihiro Sassa ◽  
Susumu Morigasaki ◽  
Hisayoshi Hayashi ◽  
...  

We report the draft genome sequence of a novel Rhodospirillales bacterium strain, TMPK1, isolated from a micropore-filtered soil suspension. This strain has a genome of 4,249,070 bp, comprising 4,151 protein-coding sequences. The genome sequence data further suggest that strain TMPK1 is an alphaproteobacterium capable of carotenoid production.


Sign in / Sign up

Export Citation Format

Share Document