scholarly journals MACSNVdb: a high-quality SNV database for interspecies genetic divergence investigation among macaques

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Lianming Du ◽  
Tao Guo ◽  
Qin Liu ◽  
Jing Li ◽  
Xiuyue Zhang ◽  
...  

Abstract Macaques are the most widely used non-human primates in biomedical research. The genetic divergence between these animal models is responsible for their phenotypic differences in response to certain diseases. However, the macaque single nucleotide polymorphism resources mainly focused on rhesus macaque (Macaca mulatta), which hinders the broad research and biomedical application of other macaques. In order to overcome these limitations, we constructed a database named MACSNVdb that focuses on the interspecies genetic diversity among macaque genomes. MACSNVdb is a web-enabled database comprising ~74.51 million high-quality non-redundant single nucleotide variants (SNVs) identified among 20 macaque individuals from six species groups (muttla, fascicularis, sinica, arctoides, silenus, sylvanus). In addition to individual SNVs, MACSNVdb also allows users to browse and retrieve groups of user-defined SNVs. In particular, users can retrieve non-synonymous SNVs that may have deleterious effects on protein structure or function within macaque orthologs of human disease and drug-target genes. Besides position, alleles and flanking sequences, MACSNVdb integrated additional genomic information including SNV annotations and gene functional annotations. MACSNVdb will facilitate biomedical researchers to discover molecular mechanisms of diverse responses to diseases as well as primatologist to perform population genetic studies. We will continue updating MACSNVdb with newly available sequencing data and annotation to keep the resource up to date. Database URL: http://big.cdu.edu.cn/macsnvdb/

Author(s):  
Renata Parissi Buainain ◽  
Matheus Negri Boschiero ◽  
Bruno Camporeze ◽  
Paulo Henrique Pires de Aguiar ◽  
Fernando Augusto Lima Marson ◽  
...  

2020 ◽  
Author(s):  
Daniel Shriner ◽  
Adebowale Adeyemo ◽  
Charles Rotimi

In clinical genomics, variant calling from short-read sequencing data typically relies on a pan-genomic, universal human reference sequence. A major limitation of this approach is that the number of reads that incorrectly map or fail to map increase as the reads diverge from the reference sequence. In the context of genome sequencing of genetically diverse Africans, we investigate the advantages and disadvantages of using a de novo assembly of the read data as the reference sequence in single sample calling. Conditional on sufficient read depth, the alignment-based and assembly-based approaches yielded comparable sensitivity and false discovery rates for single nucleotide variants when benchmarked against a gold standard call set. The alignment-based approach yielded coverage of an additional 270.8 Mb over which sensitivity was lower and the false discovery rate was higher. Although both approaches detected and missed clinically relevant variants, the assembly-based approach identified more such variants than the alignment-based approach. Of particular relevance to individuals of African descent, the assembly-based approach identified four heterozygous genotypes containing the sickle allele whereas the alignment-based approach identified no occurrences of the sickle allele. Variant annotation using dbSNP and gnomAD identified systematic biases in these databases due to underrepresentation of Africans. Using the counts of homozygous alternate genotypes from the alignment-based approach as a measure of genetic distance to the reference sequence GRCh38.p12, we found that the numbers of misassemblies, total variant sites, potentially novel single nucleotide variants (SNVs), and certain variant classes (e.g., splice acceptor variants, stop loss variants, missense variants, synonymous variants, and variants absent from gnomAD) were significantly correlated with genetic distance. In contrast, genomic coverage and other variant classes (e.g., ClinVar pathogenic or likely pathogenic variants, start loss variants, stop gain variants, splice donor variants, incomplete terminal codons, variants with CADD score ≥20) were not correlated with genetic distance. With improvement in coverage, the assembly-based approach can offer a viable alternative to the alignment-based approach, with the advantage that it can obviate the need to generate diverse human reference sequences or collections of alternate scaffolds.


2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Andrew Currin ◽  
Neil Swainston ◽  
Mark S Dunstan ◽  
Adrian J Jervis ◽  
Paul Mulherin ◽  
...  

Abstract Synthetic biology utilizes the Design–Build–Test–Learn pipeline for the engineering of biological systems. Typically, this requires the construction of specifically designed, large and complex DNA assemblies. The availability of cheap DNA synthesis and automation enables high-throughput assembly approaches, which generates a heavy demand for DNA sequencing to verify correctly assembled constructs. Next-generation sequencing is ideally positioned to perform this task, however with expensive hardware costs and bespoke data analysis requirements few laboratories utilize this technology in-house. Here a workflow for highly multiplexed sequencing is presented, capable of fast and accurate sequence verification of DNA assemblies using nanopore technology. A novel sample barcoding system using polymerase chain reaction is introduced, and sequencing data are analyzed through a bespoke analysis algorithm. Crucially, this algorithm overcomes the problem of high-error rate nanopore data (which typically prevents identification of single nucleotide variants) through statistical analysis of strand bias, permitting accurate sequence analysis with single-base resolution. As an example, 576 constructs (6 × 96 well plates) were processed in a single workflow in 72 h (from Escherichia coli colonies to analyzed data). Given our procedure’s low hardware costs and highly multiplexed capability, this provides cost-effective access to powerful DNA sequencing for any laboratory, with applications beyond synthetic biology including directed evolution, single nucleotide polymorphism analysis and gene synthesis.


2019 ◽  
Vol 36 (3) ◽  
pp. 713-720 ◽  
Author(s):  
Mary A Wood ◽  
Austin Nguyen ◽  
Adam J Struck ◽  
Kyle Ellrott ◽  
Abhinav Nellore ◽  
...  

Abstract Motivation The vast majority of tools for neoepitope prediction from DNA sequencing of complementary tumor and normal patient samples do not consider germline context or the potential for the co-occurrence of two or more somatic variants on the same mRNA transcript. Without consideration of these phenomena, existing approaches are likely to produce both false-positive and false-negative results, resulting in an inaccurate and incomplete picture of the cancer neoepitope landscape. We developed neoepiscope chiefly to address this issue for single nucleotide variants (SNVs) and insertions/deletions (indels). Results Herein, we illustrate how germline and somatic variant phasing affects neoepitope prediction across multiple datasets. We estimate that up to ∼5% of neoepitopes arising from SNVs and indels may require variant phasing for their accurate assessment. neoepiscope is performant, flexible and supports several major histocompatibility complex binding affinity prediction tools. Availability and implementation neoepiscope is available on GitHub at https://github.com/pdxgx/neoepiscope under the MIT license. Scripts for reproducing results described in the text are available at https://github.com/pdxgx/neoepiscope-paper under the MIT license. Additional data from this study, including summaries of variant phasing incidence and benchmarking wallclock times, are available in Supplementary Files 1, 2 and 3. Supplementary File 1 contains Supplementary Table 1, Supplementary Figures 1 and 2, and descriptions of Supplementary Tables 2–8. Supplementary File 2 contains Supplementary Tables 2–6 and 8. Supplementary File 3 contains Supplementary Table 7. Raw sequencing data used for the analyses in this manuscript are available from the Sequence Read Archive under accessions PRJNA278450, PRJNA312948, PRJNA307199, PRJNA343789, PRJNA357321, PRJNA293912, PRJNA369259, PRJNA305077, PRJNA306070, PRJNA82745 and PRJNA324705; from the European Genome-phenome Archive under accessions EGAD00001004352 and EGAD00001002731; and by direct request to the authors. Supplementary information Supplementary data are available at Bioinformatics online.


F1000Research ◽  
2014 ◽  
Vol 2 ◽  
pp. 217 ◽  
Author(s):  
Guillermo Barturen ◽  
Antonio Rueda ◽  
José L. Oliver ◽  
Michael Hackenberg

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants.We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs – Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP.MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.


2019 ◽  
Vol 4 ◽  
pp. 145
Author(s):  
Matthew N. Wakeling ◽  
Thomas W. Laver ◽  
Kevin Colclough ◽  
Andrew Parish ◽  
Sian Ellard ◽  
...  

Multiple Nucleotide Variants (MNVs) are miscalled by the most widely utilised next generation sequencing analysis (NGS) pipelines, presenting the potential for missing diagnoses that would previously have been made by standard Sanger (dideoxy) sequencing. These variants, which should be treated as a single insertion-deletion mutation event, are commonly called as separate single nucleotide variants. This can result in misannotation, incorrect amino acid predictions and potentially false positive and false negative diagnostic results. This risk will be increased as confirmatory Sanger sequencing of Single Nucleotide variants (SNVs) ceases to be standard practice. Using simulated data and re-analysis of sequencing data from a diagnostic targeted gene panel, we demonstrate that the widely adopted pipeline, GATK best practices, results in miscalling of MNVs and that alternative tools can call these variants correctly. The adoption of calling methods that annotate MNVs correctly would present a solution for individual laboratories, however GATK best practices are the basis for important public resources such as the gnomAD database. We suggest integrating a solution into these guidelines would be the optimal approach.


2018 ◽  
Author(s):  
Dimitrios Kleftogiannis ◽  
Marco Punta ◽  
Anuradha Jayaram ◽  
Shahneen Sandhu ◽  
Stephen Q. Wong ◽  
...  

AbstractBackgroundTargeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs).MethodsTo address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection.ResultsOur tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments.ConclusionsAmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve.


2018 ◽  
Author(s):  
Shengcai Liu ◽  
Liyun Peng ◽  
Junfei Pan ◽  
Xiao Wang ◽  
Chunli Zhao ◽  
...  

Betalains are abundant in amaranth plants. Additionally, the betalain molecular structure and metabolic pathway differ from those of betanin in beet plants. To date, only a few studies have examined the regulatory roles of miRNAs in betalain biosynthesis in plants. Thus, we constructed small RNA libraries for the red and green sectors of amaranth leaves to identify miRNAs associated with betalain biosynthesis. We identified 198 known and 41 novel miRNAs. Moreover, 216 miRNAs were distributed in 44 miRNA families, including miR156, miR159, miR160, miR166, miR172, miR319, miR167, miR396, and miR398. An analysis of all unigene sequences in an amaranth transcriptome database resulted in the detection of 493 target genes for the 239 screened miRNAs. The targets included SPL2, ARF18, ARF6, and NAC. A quantitative real-time polymerase chain reaction validation of 20 miRNAs and nine target genes revealed expression-level differences between the red and green sectors of amaranth leaves. This study involved the application of an Illumina sequencing platform to identify miRNAs regulating betalain metabolism in amaranth plants. The data presented herein may provide insights into the molecular mechanisms underlying the regulation of betalain biosynthesis in amaranth and other plant species.


2020 ◽  
Author(s):  
Junhe Hu ◽  
Jinyi Dong ◽  
Zhi Zeng ◽  
Juan Wu ◽  
Xiansheng Tan ◽  
...  

Abstract Follicular development is crucial to normal oocyte maturation, with follicular size closely related to oocyte maturation. To better understand the molecular mechanisms behind porcine oocyte maturation, we obtained exosomal miRNA from porcine follicular fluid (PFF). These miRNA samples were then sequenced and analyzed regarding their different follicular sizes, as described in the methods section. First, these results showed that this process successfully isolated PFF exosomes. Nearly all valid reads from the PFF exosomal sequencing data were successfully mapped to the porcine genome database. Second, we used hierarchical clustering methods to determine that significantly expressed miRNAs were clustered into A, B, C, and D groups in our heatmap according to different follicle sizes. These results allowed for the targeting of potential mRNAs genes related to porcine oocyte development. Third, we chose ten, significantly expressed miRNAs and predicted their target genes for further GO analysis. These results showed that the expression levels of neurotransmitter secretion genes were greatly changed, as were many target genes involved in the regulation of FSH secretion. Notably, these are genes that are very closely related to oocyte maturation in growing follicles. We then used pathway analysis for these targeted genes based on the originally selected ten miRNAs. Results indicated that the pathways were mainly related to the biosynthesis of TGF-beta and its signaling pathway, which are very closely related to reproductive system functions. Finally, these exosomal miRNAs obtained from PFF may provide a valuable addition to our understanding of the mechanism of porcine oocyte maturation. It is also likely that these exosomal miRNAs could function as molecular biomarkers to choose high-quality oocytes and allow for in vitro porcine embryo production.


Sign in / Sign up

Export Citation Format

Share Document