scholarly journals BnaGVD: A genomic variation database of rapeseed (Brassica napus)

Author(s):  
Tao Yan ◽  
Yao Yao ◽  
Dezhi Wu ◽  
Lixi Jiang

Abstract Rapeseed (Brassica napus L.) is a typical polyploid crop and one of the most important oilseed crops worldwide. With the rapid progress on high-throughput sequencing technologies and the reduction of sequencing cost, large-scale genomic data of a specific crop have become available. However, raw sequence data are mostly deposited in the sequence read archive of the National Center of Biotechnology Information (NCBI) and the European Nucleotide Archive (ENA), which is freely accessible to all researchers. Extensive tools for practical purposes should be developed to efficiently utilize these large raw data. Here, we report a web-based rapeseed genomic variation database (BnaGVD, http://rapeseed.biocloud.net/home) from which genomic variations, such as single nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) across a world-wide collection of rapeseed accessions, can be referred. The current release of the BnaGVD contains 34,591,899 high-quality SNPs and 12,281,923 high-quality InDels and provides search tools to retrieve genomic variations and gene annotations across 1,007 accessions of worldwide rapeseed germplasm. We implement a variety of built-in tools (e.g., BnaGWAS, BnaPCA, and BnaStructure) to help users perform in-depth analyses. We recommend this web resource for accelerating studies on the functional genomics and screening of molecular markers for rapeseed breeding.

GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Wenxi Wang ◽  
Zihao Wang ◽  
Xintong Li ◽  
Zhongfu Ni ◽  
Zhaorong Hu ◽  
...  

Abstract Background The cost of high-throughput sequencing is rapidly decreasing, allowing researchers to investigate genomic variations across hundreds or even thousands of samples in the post-genomic era. The management and exploration of these large-scale genomic variation data require programming skills. The public genotype querying databases of many species are usually centralized and implemented independently, making them difficult to update with new data over time. Currently, there is a lack of a widely used framework for setting up user-friendly web servers to explore new genomic variation data in diverse species. Results Here, we present SnpHub, a Shiny/R-based server framework for retrieving, analysing, and visualizing large-scale genomic variation data that can be easily set up on any Linux server. After a pre-building process based on the provided VCF files and genome annotation files, the local server allows users to interactively access single-nucleotide polymorphisms and small insertions/deletions with annotation information by locus or gene and to define sample sets through a web page. Users can freely analyse and visualize genomic variations in heatmaps, phylogenetic trees, haplotype networks, or geographical maps. Sample-specific sequences can be accessed as replaced by detected sequence variations. Conclusions SnpHub can be applied to any species, and we build up a SnpHub portal website for wheat and its progenitors based on published data in recent studies. SnpHub and its tutorial are available at http://guoweilong.github.io/SnpHub/. The wheat-SnpHub-portal website can be accessed at http://wheat.cau.edu.cn/Wheat_SnpHub_Portal/.


2019 ◽  
Author(s):  
Wenxi Wang ◽  
Zihao Wang ◽  
Xintong Li ◽  
Zhongfu Ni ◽  
Zhaorong Hu ◽  
...  

AbstractBackgroundThe cost of high-throughput sequencing is rapidly decreasing, allowing researchers to investigate genomic variations across hundreds or even thousands of samples in the post-genomic era. The management and exploration of these large-scale genomic variation data require programming skills. The public genotype querying databases of many species are usually centralized and implemented independently, making them difficult to update with new data over time. Currently, there is a lack of a widely used framework for setting up user-friendly web servers for exploring new genomic variation data in diverse species.ResultsHere, we present SnpHub, a Shiny/R-based server framework for retrieving, analysing and visualizing the large-scale genomic variation data that be easily set up on any Linux server. After a pre-building process based on the provided VCF files and genome annotation files, the local server allows users to interactively access SNPs/INDELs and annotation information by locus or gene and for user-defined sample sets through a webpage. Users can freely analyse and visualize genomic variations in heatmaps, phylogenetic trees, haplotype networks, or geographical maps. Sample-specific sequences can be accessed as replaced by SNPs/INDELs.ConclusionsSnpHub can be applied to any species, and we build up a SnpHub portal website for wheat and its progenitors based on published data in present studies. SnpHub and its tutorial are available as http://guoweilong.github.io/SnpHub/.


Agronomy ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 2006
Author(s):  
David P. Horvath ◽  
Michael Stamm ◽  
Zahirul I. Talukder ◽  
Jason Fiedler ◽  
Aidan P. Horvath ◽  
...  

A diverse population (429 member) of canola (Brassica napus L.) consisting primarily of winter biotypes was assembled and used in genome-wide association studies. Genotype by sequencing analysis of the population identified and mapped 290,972 high-quality markers ranging from 18.5 to 82.4% missing markers per line and an average of 36.8%. After interpolation, 251,575 high-quality markers remained. After filtering for markers with low minor allele counts (count > 5), we were left with 190,375 markers. The average distance between these markers is 4463 bases with a median of 69 and a range from 1 to 281,248 bases. The heterozygosity among the imputed population ranges from 0.9 to 11.0% with an average of 5.4%. The filtered and imputed dataset was used to determine population structure and kinship, which indicated that the population had minimal structure with the best K value of 2–3. These results also indicated that the majority of the population has substantial sequence from a single population with sub-clusters of, and admixtures with, a very small number of other populations. Analysis of chromosomal linkage disequilibrium decay ranged from ~7 Kb for chromosome A01 to ~68 Kb for chromosome C01. Local linkage decay rates determined for all 500 kb windows with a 10kb sliding step indicated a wide range of linkage disequilibrium decay rates, indicating numerous crossover hotspots within this population, and provide a resource for determining the likely limits of linkage disequilibrium from any given marker in which to identify candidate genes. This population and the resources provided here should serve as helpful tools for investigating genetics in winter canola.


2005 ◽  
Vol 818 (1) ◽  
pp. 35-42 ◽  
Author(s):  
S. Bérot ◽  
J.P. Compoint ◽  
C. Larré ◽  
C. Malabat ◽  
J. Guéguen

2009 ◽  
Vol 191 (10) ◽  
pp. 3203-3211 ◽  
Author(s):  
Karla D. Passalacqua ◽  
Anjana Varadarajan ◽  
Brian D. Ondov ◽  
David T. Okou ◽  
Michael E. Zwick ◽  
...  

ABSTRACT Although gene expression has been studied in bacteria for decades, many aspects of the bacterial transcriptome remain poorly understood. Transcript structure, operon linkages, and information on absolute abundance all provide valuable insights into gene function and regulation, but none has ever been determined on a genome-wide scale for any bacterium. Indeed, these aspects of the prokaryotic transcriptome have been explored on a large scale in only a few instances, and consequently little is known about the absolute composition of the mRNA population within a bacterial cell. Here we report the use of a high-throughput sequencing-based approach in assembling the first comprehensive, single-nucleotide resolution view of a bacterial transcriptome. We sampled the Bacillus anthracis transcriptome under a variety of growth conditions and showed that the data provide an accurate and high-resolution map of transcript start sites and operon structure throughout the genome. Further, the sequence data identified previously nonannotated regions with significant transcriptional activity and enhanced the accuracy of existing genome annotations. Finally, our data provide estimates of absolute transcript abundance and suggest that there is significant transcriptional heterogeneity within a clonal, synchronized bacterial population. Overall, our results offer an unprecedented view of gene expression and regulation in a bacterial cell.


2020 ◽  
Author(s):  
Yang Young Lu ◽  
Jiaxing Bai ◽  
Yiwen Wang ◽  
Ying Wang ◽  
Fengzhu Sun

AbstractMotivationRapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption.ResultsWe report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing (HTS) data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102 – 104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures.AvailabilityCRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/[email protected]; [email protected] informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Vol 50 (1) ◽  
pp. 1-6
Author(s):  
Muhammad Sajjad Iqbal ◽  
Muhammad Akbar

Thirteen elite lines of Brassica napus L. were tested. Nine phenotypic traits viz., days to 50% flowering, days to maturity, plant height, branches, pods, pod length, pod width, pod weight and seed yield were studied. ANOVA revealed significant results for all the traits while summary statistics exhibited high level of genetic variability in days to 50% flowering, days to maturity, plant height and number of pods. On the other hand, number of branches, pod length, pod width, pod weight and seed yield need more attention for improvement. Coefficient of correlation revealed significant combinations among various traits which could be utilized directly. Cluster analysis based on linkage distances described grouping pattern into three clusters for dissimilarities. Elite line 24866 of Pakistani origin placed in separate cluster prominently that is due to best performance for maximum traits, hence recommended for farmers’ field large scale cultivation.


2018 ◽  
Author(s):  
Joshua B Singer ◽  
Emma C Thomson ◽  
John McLauchlan ◽  
Joseph Hughes ◽  
Robert J Gifford

AbstractBackgroundVirus genome sequences, generated in ever-higher volumes, can provide new scientific insights and inform our responses to epidemics and outbreaks. To facilitate interpretation, such data must be organised and processed within scalable computing resources that encapsulate virology expertise. GLUE (Genes Linked by Underlying Evolution) is a data-centric bioinformatics environment for building such resources. The GLUE core data schema organises sequence data along evolutionary lines, capturing not only nucleotide data but associated items such as alignments, genotype definitions, genome annotations and motifs. Its flexible design emphasises applicability to different viruses and to diverse needs within research, clinical or public health contexts.ResultsHCV-GLUE is a case study GLUE resource for hepatitis C virus (HCV). It includes an interactive public web application providing sequence analysis in the form of a maximum-likelihood-based genotyping method, antiviral resistance detection and graphical sequence visualisation. HCV sequence data from GenBank is categorised and stored in a large-scale sequence alignment which is accessible via web-based queries. Whereas this web resource provides a range of basic functionality, the underlying GLUE project can also be downloaded and extended by bioinformaticians addressing more advanced questions.ConclusionGLUE can be used to rapidly develop virus sequence data resources with public health, research and clinical applications. This streamlined approach, with its focus on reuse, will help realise the full value of virus sequence data.


Author(s):  
Yang Young Lu ◽  
Jiaxing Bai ◽  
Yiwen Wang ◽  
Ying Wang ◽  
Fengzhu Sun

Abstract Motivation Rapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption. Results We report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102−104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures. Availability and implementation CRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/CRAFT. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Taylor Reiter ◽  
Phillip T. Brooks ◽  
Luiz Irber ◽  
Shannon E.K. Joslin ◽  
Charles M. Reid ◽  
...  

AbstractAs the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis, and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of practices and strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these strategies in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.Author SummaryWe present a guide for workflow-enabled biological sequence data analysis, developed through our own teaching, training and analysis projects. We recognize that this is based on our own use cases and experiences, but we hope that our guide will contribute to a larger discussion within the open source and open science communities and lead to more comprehensive resources. Our main goal is to accelerate the research of scientists conducting sequence analyses by introducing them to organized workflow practices that not only benefit their own research but also facilitate open and reproducible science.


Sign in / Sign up

Export Citation Format

Share Document