scholarly journals Proteomics as a metrological tool to evaluate genome annotation accuracy following de novo genome assembly: a case study using the Atlantic bottlenose dolphin (Tursiops truncatus)

2018 ◽  
Author(s):  
Benjamin A. Neely ◽  
Debra L. Ellisor ◽  
W. Clay Davis

AbstractBackgroundThe last decade has witnessed dramatic improvements in whole-genome sequencing capabilities coupled to drastically decreased costs, leading to an inundation of high-quality de novo genomes. For this reason, continued development of genome quality metrics is imperative. The current study utilized the recently updated Atlantic bottlenose dolphin (Tursiops truncatus) genome and annotation to evaluate a proteomics-based metric of genome accuracy.ResultsProteomic analysis of six tissues provided experimental confirmation of 10 402 proteins from 4 711 protein groups, almost 1/3 of the possible predicted proteins in the genome. There was an increased median molecular weight and number of identified peptides per protein using the current T. truncatus annotation versus the previous annotation. Identification of larger proteins with more identified peptides implied reduced database fragmentation and improved gene annotation accuracy. A metric is proposed, NP10, that attempts to capture this quality improvement. When using the new T. truncatus genome there was a 21 % improvement in NP10. This metric was further demonstrated by using a publicly available proteomic data set to compare human genome annotations from 2004, 2013 and 2016, which had a 33 % improvement in NP10.ConclusionsThese results demonstrate that new whole-genome sequencing techniques can rapidly generate high quality de novo genome assemblies and emphasizes the speed of advancing bioanalytical measurements in a non-model organism. Moreover, proteomics may be a useful metrological tool to benchmark genome accuracy, though there is a need for reference proteomic datasets to facilitate this utility in new de novo and existing genomes.

2017 ◽  
Author(s):  
Hilary C. Martin ◽  
Elizabeth M. Batty ◽  
Julie Hussin ◽  
Portia Westall ◽  
Tasman Daish ◽  
...  

AbstractThe platypus is an egg-laying mammal which, alongside the echidna, occupies a unique place in the mammalian phylogenetic tree. Despite widespread interest in its unusual biology, little is known about its population structure or recent evolutionary history. To provide new insights into the dispersal and demographic history of this iconic species, we sequenced the genomes of 57 platypuses from across the whole species range in eastern mainland Australia and Tasmania. Using a highly-improved reference genome, we called over 6.7M SNPs, providing an informative genetic data set for population analyses. Our results show very strong population structure in the platypus, with our sampling locations corresponding to discrete groupings between which there is no evidence for recent gene flow. Genome-wide data allowed us to establish that 28 of the 57 sampled individuals had at least a third-degree relative amongst other samples from the same river, often taken at different times. Taking advantage of a sampled family quartet, we estimated the de novo mutation rate in the platypus at 7.0×10−9/bp/generation (95% CI 4.1×10−9 − 1.2×10−8/bp/generation). We estimated effective population sizes of ancestral populations and haplotype sharing between current groupings, and found evidence for bottlenecks and long-term population decline in multiple regions, and early divergence between populations in different regions. This study demonstrates the power of whole-genome sequencing for studying natural populations of an evolutionarily important species.


2020 ◽  
Vol 18 (2) ◽  
pp. 197-208
Author(s):  
Le Tung Lam ◽  
Nguyen Trung Hieu ◽  
Nguyen Hong Trang ◽  
Ho Thi Thuong ◽  
Tran Huyen Linh ◽  
...  

The pandemic COVID-19 caused by the virus SARS-CoV-2 has devastated countries worldwide, infecting more than 4.5 million people and leading to more than 300,000 deaths as of May 16th, 2020. Whole-genome sequencing (WGS) is an effective tool to monitor emerging strains and provide information for intervention, thus help to inform outbreak control decisions. Here, we reported the first effort to sequence and de novo assemble the whole genome of SARS-CoV-2 using PacBio’s SMRT sequencing technology in Vietnam. We also presented the annotation results and a brief analysis of the variants found in our SARS-CoV-2 strain, which was isolated from a Vietnamese patient. The sequencing was successfully completed and de novo assembled in less than 30 hours, resulting in one contig with no gap and a length of 29,766 bp. All detected variants as compared to the NCBI reference were highly accurate, as confirmed by Sanger sequencing. The results have shown the potential of long read sequencing to provide high quality WGS data to support public health responses and advance understanding of this and future pandemics.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Brent S. Pedersen ◽  
Joe M. Brown ◽  
Harriet Dashnow ◽  
Amelia D. Wallace ◽  
Matt Velinder ◽  
...  

AbstractIn studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we explore effective guidelines for variant (SNP and INDEL) filtering and report the expected number of candidates for de novo dominant, recessive, and autosomal dominant modes of inheritance. We derived these guidelines using two large family-based cohorts that underwent whole-genome sequencing, as well as two family cohorts with whole-exome sequencing. The filters are applied to common attributes, including genotype-quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield ~10 candidate SNP and INDEL variants per exome, and 18 per genome for recessive and de novo dominant modes of inheritance, with substantially more candidates for autosomal dominant inheritance. For family-based, whole-genome sequencing studies, this number includes an average of three de novo, ten compound heterozygous, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0253440
Author(s):  
Samantha Gunasekera ◽  
Sam Abraham ◽  
Marc Stegger ◽  
Stanley Pang ◽  
Penghao Wang ◽  
...  

Whole-genome sequencing is essential to many facets of infectious disease research. However, technical limitations such as bias in coverage and tagmentation, and difficulties characterising genomic regions with extreme GC content have created significant obstacles in its use. Illumina has claimed that the recently released DNA Prep library preparation kit, formerly known as Nextera Flex, overcomes some of these limitations. This study aimed to assess bias in coverage, tagmentation, GC content, average fragment size distribution, and de novo assembly quality using both the Nextera XT and DNA Prep kits from Illumina. When performing whole-genome sequencing on Escherichia coli and where coverage bias is the main concern, the DNA Prep kit may provide higher quality results; though de novo assembly quality, tagmentation bias and GC content related bias are unlikely to improve. Based on these results, laboratories with existing workflows based on Nextera XT would see minor benefits in transitioning to the DNA Prep kit if they were primarily studying organisms with neutral GC content.


2020 ◽  
Vol 29 (1) ◽  
pp. 184-193 ◽  
Author(s):  
Jonas Carlsson Almlöf ◽  
Sara Nystedt ◽  
Aikaterini Mechtidou ◽  
Dag Leonard ◽  
Maija-Leena Eloranta ◽  
...  

AbstractBy performing whole-genome sequencing in a Swedish cohort of 71 parent-offspring trios, in which the child in each family is affected by systemic lupus erythematosus (SLE, OMIM 152700), we investigated the contribution of de novo variants to risk of SLE. We found de novo single nucleotide variants (SNVs) to be significantly enriched in gene promoters in SLE patients compared with healthy controls at a level corresponding to 26 de novo promoter SNVs more in each patient than expected. We identified 12 de novo SNVs in promoter regions of genes that have been previously implicated in SLE, or that have functions that could be of relevance to SLE. Furthermore, we detected three missense de novo SNVs, five de novo insertion-deletions, and three de novo structural variants with potential to affect the expression of genes that are relevant for SLE. Based on enrichment analysis, disease-affecting de novo SNVs are expected to occur in one-third of SLE patients. This study shows that de novo variants in promoters commonly contribute to the genetic risk of SLE. The fact that de novo SNVs in SLE were enriched to promoter regions highlights the importance of using whole-genome sequencing for identification of de novo variants.


2016 ◽  
Vol 4 (6) ◽  
Author(s):  
Claudia Carolina Carbonari ◽  
Nahuel Fittipaldi ◽  
Sarah Teatero ◽  
Taryn B. T. Athey ◽  
Luis Pianciola ◽  
...  

Shiga toxin-producing Escherichia coli strains are worldwide associated with sporadic human infections and outbreaks. In this work, we report the availability of high-quality draft whole-genome sequences for 19 O157:H7 strains isolated in Argentina.


BMC Genomics ◽  
2011 ◽  
Vol 12 (1) ◽  
Author(s):  
Yanliang Jiang ◽  
Jianguo Lu ◽  
Eric Peatman ◽  
Huseyin Kucuktas ◽  
Shikai Liu ◽  
...  

2015 ◽  
Vol 25 (3) ◽  
pp. 426-434 ◽  
Author(s):  
Brock A. Peters ◽  
Bahram G. Kermani ◽  
Oleg Alferov ◽  
Misha R. Agarwal ◽  
Mark A. McElwain ◽  
...  

2015 ◽  
Vol 68 (10) ◽  
pp. 835-838 ◽  
Author(s):  
Björn A Espedido ◽  
Borce Dimitrijovski ◽  
Sebastiaan J van Hal ◽  
Slade O Jensen

AimsTo characterise the resistome of a multi-drug resistant Klebsiella pneumoniae (Kp0003) isolated from an Australian traveller who was repatriated to a Sydney Metropolitan Hospital from Myanmar with possible prosthetic aortic valve infective endocarditis.MethodsKp0003 was recovered from a blood culture of the patient and whole genome sequencing was performed. Read mapping and de novo assembly of reads facilitated in silico multi-locus sequence and plasmid replicon typing as well as the characterisation of antibiotic resistance genes and their genetic context. Conjugation experiments were also performed to assess the plasmid (and resistance gene) transferability and the effect on the antibiotic resistance phenotype.ResultsImportantly, and of particular concern, the carbapenem-hydrolysing β-lactamase gene blaNDM-4 was identified on a conjugative IncX3 plasmid (pJEG027). In this respect, the blaNDM-4 genetic context is similar (at least to some extent) to what has previously been identified for blaNDM-1 and blaNDM-4-like variants.ConclusionsThis study highlights the potential role that IncX3 plasmids have played in the emergence and dissemination of blaNDM-4-like variants worldwide and emphasises the importance of resistance gene surveillance.


Sign in / Sign up

Export Citation Format

Share Document