Faculty Opinions recommendation of Estimation of nucleotide diversity, disequilibrium coefficients, and mutation rates from high-coverage genome-sequencing projects.

Abstract Barley (Hordeum vulgare L.) is one of the first domesticated grain crops and represents the fourth most important cereal source for human and animal consumption. BarleyVarDB is a database of barley genomic variation. It can be publicly accessible through the website at http://146.118.64.11/BarleyVar. This database mainly provides three sets of information. First, there are 57 754 224 single nuclear polymorphisms (SNPs) and 3 600 663 insertions or deletions (InDels) included in BarleyVarDB, which were identified from high-coverage whole genome sequencing of 21 barley germplasm, including 8 wild barley accessions from 3 barley evolutionary original centers and 13 barley landraces from different continents. Second, it uses the latest barley genome reference and its annotation information publicly accessible, which has been achieved by the International Barley Genome Sequencing Consortium (IBSC). Third, 522 212 whole genome-wide microsatellites/simple sequence repeats (SSRs) were also included in this database, which were identified in the reference barley pseudo-molecular genome sequence. Additionally, several useful web-based applications are provided including JBrowse, BLAST and Primer3. Users can design PCR primers to asses polymorphic variants deposited in this database and use a user-friendly interface for accessing the barley reference genome. We envisage that the BarleyVarDB will benefit the barley genetic research community by providing access to all publicly available barley genomic variation information and barley reference genome as well as providing them with an ultra-high density of SNP and InDel markers for molecular breeding and identification of functional genes with important agronomic traits in barley. Database URL: http://146.118.64.11/BarleyVar

Download Full-text

Integrated Addressable Dynamic Droplet Array (aDDA) as Sub‐Nanoliter Reactors for High‐Coverage Genome Sequencing of Single Yeast Cells

Small ◽

10.1002/smll.202100325 ◽

2021 ◽

pp. 2100325

Author(s):

Chunyu Li ◽

Yanhai Gong ◽

Xixian Wang ◽

Jian Xu ◽

Bo Ma

Keyword(s):

Genome Sequencing ◽

Yeast Cells ◽

High Coverage ◽

Droplet Array

Download Full-text

Progress in plant genome sequencing: research directions

Vavilov Journal of Genetics and Breeding ◽

10.18699/vj19.459 ◽

2019 ◽

Vol 23 (1) ◽

pp. 38-48 ◽

Cited By ~ 1

Author(s):

M. K. Bragina ◽

D. A. Afonnikov ◽

E. A. Salina

Keyword(s):

Genome Sequencing ◽

Plant Traits ◽

Plant Genome ◽

Targeted Sequencing ◽

Genome Sequences ◽

Crop Species ◽

High Coverage ◽

Protein Coding ◽

Sequencing Technologies ◽

A Genome

Since the first plant genome of Arabidopsis thaliana has been sequenced and published, genome sequencing technologies have undergone significant changes. New algorithms, sequencing technologies and bioinformatic approaches were adopted to obtain genome, transcriptome and exome sequences for model and crop species, which have permitted deep inferences into plant biology. As a result of an improved genome assembly and analysis methods, genome sequencing costs plummeted and the number of high-quality plant genome sequences is constantly growing. Consequently, more than 300 plant genome sequences have been published over the past twenty years. Although many of the published genomes are considered incomplete, they proved to be a valuable tool for identifying genes involved in the formation of economically valuable plant traits, for marker-assisted and genomic selection and for comparative analysis of plant genomes in order to determine the basic patterns of origin of various plant species. Since a high coverage and resolution of a genome sequence is not enough to detect all changes in complex samples, targeted sequencing, which consists in the isolation and sequencing of a specific region of the genome, has begun to develop. Targeted sequencing has a higher detection power (the ability to identify new differences/variants) and resolution (up to one basis). In addition, exome sequencing (the method of sequencing only protein-coding genes regions) is actively developed, which allows for the sequencing of non-expressed alleles and genes that cannot be found with RNA-seq. In this review, an analysis of sequencing technologies development and the construction of “reference” genomes of plants is performed. A comparison of the methods of targeted sequencing based on the use of the reference DNA sequence is accomplished.

Download Full-text

Comprehensive population-based genome sequencing provides insight into hematopoietic regulatory mechanisms

10.1101/067934 ◽

2016 ◽

Author(s):

Michael Guo ◽

Satish K. Nandakumar ◽

Jacob C. Ulirsch ◽

Seyedeh Maryam Zekavat ◽

Jason D. Buenrostro ◽

...

Keyword(s):

Blood Cell ◽

Genome Sequencing ◽

Association Studies ◽

Population Based ◽

Regulatory Mechanisms ◽

Data Sets ◽

Lineage Specification ◽

High Coverage ◽

Hematopoietic Stem ◽

Stem And Progenitor Cells

ABSTRACTGenetic variants affecting hematopoiesis can influence commonly measured blood cell traits. To identify factors that affect hematopoiesis, we performed association studies for blood cell traits in the population-based Estonian Biobank using high coverage whole genome sequencing (WGS) in 2,284 samples and SNP genotyping in an additional ~17,000 samples. Our analyses identified 17 associations across 14 blood cell traits. Integration of WGS-based fine-mapping and complementary epigenomic data sets provided evidence for causal mechanisms at several loci, including at a novel basophil count-associated locus near the master hematopoietic transcription factor CEBPA. The fine-mapped variant at this basophil count association near CEBPA overlapped an enhancer active in common myeloid progenitors and influenced its activity. In situ perturbation of this enhancer by CRISPR/Cas9 mutagenesis in hematopoietic stem and progenitor cells demonstrated that it is necessary for and specifically regulates CEBPA expression during basophil differentiation. We additionally identified basophil count-associated variation at another more pleiotropic myeloid enhancer near GATA2, highlighting regulatory mechanisms for ordered expression of master hematopoietic regulators during lineage specification. Our study illustrates how population-based genetic studies can provide key insights into poorly understood cell differentiation processes of considerable physiologic relevance.

Download Full-text

High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

10.1101/2021.02.06.430068 ◽

2021 ◽

Cited By ~ 4

Author(s):

Marta Byrska-Bishop ◽

Uday S. Evani ◽

Xuefang Zhao ◽

Anna O. Basile ◽

Haley J. Abel ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Sequence Data ◽

Whole Genome ◽

1000 Genomes Project ◽

Phase 3 ◽

High Coverage ◽

Entire Cohort ◽

1000 Genomes ◽

Low Coverage

ABSTRACTThe 1000 Genomes Project (1kGP), launched in 2008, is the largest fully open resource of whole genome sequencing (WGS) data consented for public distribution of raw sequence data without access or use restrictions. The final (phase 3) 2015 release of 1kGP included 2,504 unrelated samples from 26 populations, representing five continental regions of the world and was based on a combination of technologies including low coverage WGS (mean depth 7.4X), high coverage whole exome sequencing (mean depth 65.7X), and microarray genotyping. Here, we present a new, high coverage WGS resource encompassing the original 2,504 1kGP samples, as well as an additional 698 related samples that result in 602 complete trios in the 1kGP cohort. We sequenced this expanded 1kGP cohort of 3,202 samples to a targeted depth of 30X using Illumina NovaSeq 6000 instruments. We performed SNV/INDEL calling against the GRCh38 reference using GATK’s HaplotypeCaller, and generated a comprehensive set of SVs by integrating multiple analytic methods through a sophisticated machine learning model, upgrading the 1kGP dataset to current state-of-the-art standards. Using this strategy, we defined over 111 million SNVs, 14 million INDELs, and ∼170 thousand SVs across the entire cohort of 3,202 samples with estimated false discovery rate (FDR) of 0.3%, 1.0%, and 1.8%, respectively. By comparison to the low-coverage phase 3 callset, we observed substantial improvements in variant discovery and estimated FDR that were facilitated by high coverage re-sequencing and expansion of the cohort. Specifically, we called 7% more SNVs, 59% more INDELs, and 170% more SVs per genome than the phase 3 callset. Moreover, we leveraged the presence of families in the cohort to achieve superior haplotype phasing accuracy and we demonstrate improvements that the high coverage panel brings especially for INDEL imputation. We make all the data generated as part of this project publicly available and we envision this updated version of the 1kGP callset to become the new de facto public resource for the worldwide scientific community working on genomics and genetics.

Download Full-text

GESS: a database of global evaluation of SARS-CoV-2/hCoV-19 sequences

Nucleic Acids Research ◽

10.1093/nar/gkaa808 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D706-D714 ◽

Cited By ~ 2

Author(s):

Shuyi Fang ◽

Kailing Li ◽

Jikui Shen ◽

Sheng Liu ◽

Juli Liu ◽

...

Keyword(s):

Genomic Region ◽

Comprehensive Analysis ◽

Mutation Rates ◽

Global Evaluation ◽

Single Nucleotide Variants ◽

High Coverage ◽

Single Nucleotide ◽

Genomic Variations ◽

Area Of Interest ◽

The World

Abstract The COVID-19 outbreak has become a global emergency since December 2019. Analysis of SARS-CoV-2 sequences can uncover single nucleotide variants (SNVs) and corresponding evolution patterns. The Global Evaluation of SARS-CoV-2/hCoV-19 Sequences (GESS, https://wan-bioinfo.shinyapps.io/GESS/) is a resource to provide comprehensive analysis results based on tens of thousands of high-coverage and high-quality SARS-CoV-2 complete genomes. The database allows user to browse, search and download SNVs at any individual or multiple SARS-CoV-2 genomic positions, or within a chosen genomic region or protein, or in certain country/area of interest. GESS reveals geographical distributions of SNVs around the world and across the states of USA, while exhibiting time-dependent patterns for SNV occurrences which reflect development of SARS-CoV-2 genomes. For each month, the top 100 SNVs that were firstly identified world-widely can be retrieved. GESS also explores SNVs occurring simultaneously with specific SNVs of user's interests. Furthermore, the database can be of great help to calibrate mutation rates and identify conserved genome regions. Taken together, GESS is a powerful resource and tool to monitor SARS-CoV-2 migration and evolution according to featured genomic variations. It provides potential directive information for prevalence prediction, related public health policy making, and vaccine designs.

Download Full-text

Direct estimation of de novo mutation rates in a chimpanzee parent-offspring trio by ultra-deep whole genome sequencing

Scientific Reports ◽

10.1038/s41598-017-13919-7 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 16

Author(s):

Shoji Tatsumoto ◽

Yasuhiro Go ◽

Kentaro Fukuta ◽

Hideki Noguchi ◽

Takashi Hayakawa ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

De Novo Mutation ◽

Mutation Rates ◽

Whole Genome ◽

Direct Estimation

Download Full-text

Deep Sequencing of 10,000 Human Genomes

10.1101/061663 ◽

2016 ◽

Cited By ~ 5

Author(s):

Amalio Telenti ◽

Levi C.T. Pierce ◽

William H. Biggs ◽

Julia di Iulio ◽

Emily H.M. Wong ◽

...

Keyword(s):

Genome Sequencing ◽

Large Scale ◽

Confidence Region ◽

Clinical Use ◽

Single Nucleotide Variants ◽

High Coverage ◽

Human Genomes ◽

Exon Sequence ◽

Novel Variants ◽

Novel Variant

AbstractWe report on the sequencing of 10,545 human genomes at 30-40x coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present thedistribution of over 150 million single nucleotide variants in the coding and non-coding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries in average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct highresolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.Significance statementDeclining sequencing costs and new large-scale initiatives towards personalized medicine are driving a massive expansion in the number of human genomes being sequenced. Therefore, there is an urgent need to define quality standards for clinical use. This includes deep coverage and sequencing accuracy of an individual’s genome, rather than aggregated coverage of data across a cohort or population. Our work represents the largest effort to date in sequencing human genomes at deep coverage with these new standards. This study identifies over 150 million human variants, a majority of them rare and unknown. Moreover, these data identify sites in the genome that are highly intolerant to variation - possibly essential for life or health. We conclude that high coverage genome sequencing provides accurate detail on human variation for discovery and for clinical applications.

Download Full-text