PopAmaranth: A population genetic genome browser for grain amaranths and their wild relatives

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab103 ◽

2021 ◽

Author(s):

José Gonçalves-Dias ◽

Markus G Stetter

Keyword(s):

Interdisciplinary Research ◽

Population Genetic ◽

Genetic Research ◽

Wild Relatives ◽

Genome Browser ◽

Reference Sequence ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Grain Amaranth ◽

Amaranth Species

Abstract The combination of genomic, physiological, and population genetic research has accelerated the understanding and improvement of numerous crops. For non-model crops the lack of interdisciplinary research hinders their improvement. Grain amaranth is an ancient nutritious pseudocereal that has been domesticated three times in different regions of the Americas. We present and employ PopAmaranth, a population genetic genome browser, which provides an accessible representation of the genetic variation of the three grain amaranth species (A. hypochondriacus, A. cruentus, and A. caudatus) and two wild relatives (A. hybridus and A. quitensis) along the A. hypochondriacus reference sequence. We performed population-scale diversity and selection analysis from whole-genome sequencing data of 88 curated genetically and taxonomically unambiguously classified accessions. We employ the platform to show that genetic diversity in the water stress-related MIF1 gene declined during amaranth domestication and provide evidence for convergent saponin reduction between amaranth and quinoa. PopAmaranth is available through amaranthGDB at amaranthgdb.org/popamaranth.html.

Download Full-text

PopAmaranth: A population genetic genome browser for grain amaranths and their wild relatives

10.1101/2020.12.09.415331 ◽

2020 ◽

Author(s):

José Gonçalves-Dias ◽

Markus G Stetter

Keyword(s):

Genetic Diversity ◽

Genome Sequencing ◽

Interdisciplinary Research ◽

Genetic Information ◽

Population Genetic ◽

Wild Relatives ◽

Genome Browser ◽

Reference Sequence ◽

Grain Amaranth ◽

Amaranth Species

The last decades of genomic, physiological, and population genetic research have accelerated the understanding and improvement of a numerous crops. The transfer of methods to minor crops could accelerate their improvement if knowledge is effectively shared between disciplines. Grain amaranth is an ancient nutritious pseudocereal from the Americas that is regaining importance due to its high protein content and favorable amino acid and micronutrient composition. To effectively combine genomic and population genetic information with molecular genetics, plant physiology, and use it for interdisciplinary research and crop improvement, an intuitive interaction for scientists across disciplines is essential. Here, we present PopAmaranth, a population genetic genome browser, which provides an accessible representation of the genetic variation of the three grain amaranth species (A. hypochondriacus, A. cruentus, and A. caudatus) and two wild relatives (A. hybridus and A. quitensis) along the A. hypochondriacus reference sequence. We performed population-scale diversity and selection analysis from whole-genome sequencing data of 88 curated genetically and taxonomically unambiguously classified accessions. We incorporate the domestication history of the three grain amaranths to make an evolutionary perspective for candidate genes and regions available. We employ the platform to show that genetic diversity in the water stress-related MIF1 gene declined during amaranth domestication and provide evidence for convergent saponin reduction between amaranth and quinoa. These examples show that our tool enables the detailed study of individual genes, provides target regions for breeding efforts and can enhance the interdisciplinary integration of population genomic findings across species. PopAmaranth is available through amaranthGDB at amaranthgdb.org/popamaranth.htmlSignificanceSharing population genetic results between disciplines can facilitate interdisciplinary research and accelerate the improvement of crops. Since the onset of genome sequencing online genome browser platforms have provide access to features of an organisms genetic information. Rarely this has been extended to population-wide summary statistics for evolutionary hypothesis testing. We implemented a population genetic genome browser PopAmaranth for three grain amaranth species and their two wild relatives. The intuitive and user-friendly interface of PopA-maranth makes the genetic diversity of the species complex available to broad audience of biologists across disciplines. We show how our tool can be used to study convergence across distant genera and find signals of past selection in domestication and stress related genes. Community platforms and genome browsers are an integrative element of numerous study systems. PopAmaranth can serve as template for other research communities to integrate and share their results.

Download Full-text

Harnessing the 100,000 Genomes Project whole genome sequencing data - an unbiased systematic tool to filter by biologically validated regions of functionality

10.1101/2020.03.30.20047209 ◽

2020 ◽

Author(s):

Sihao Xiao ◽

Zhentian Kai ◽

David Brown ◽

Claire L Shovlin ◽

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Clinical Diagnostics ◽

Reference Sequence ◽

Whole Genome Sequencing Data ◽

Disease Genes ◽

Full Potential ◽

Hereditary Haemorrhagic Telangiectasia ◽

Whole Genome ◽

Sequencing Data

SUMMARYWhole genome sequencing (WGS) is championed by the UK National Health Service (NHS) to identify genetic variants that cause particular diseases. The full potential of WGS has yet to be realised as early data analytic steps prioritise protein-coding genes, and effectively ignore the less well annotated non-coding genome which is rich in transcribed and critical regulatory regions. To address, we developed a filter, which we call GROFFFY, and validated in WGS data from hereditary haemorrhagic telangiectasia patients within the 100,000 Genomes Project. Before filter application, the mean number of DNA variants compared to human reference sequence GRCh38 was 4,867,167 (range 4,786,039-5,070,340), and one-third lay within intergenic areas. GROFFFY removed a mean of 2,812,015 variants per DNA. In combination with allele frequency and other filters, GROFFFY enabled a 99.56% reduction in variant number. The proportion of intergenic variants was maintained, and no pathogenic variants in disease genes were lost. We conclude that the filter applied to NHS diagnostic samples in the 100,000 Genomes pipeline offers an efficient method to prioritise intergenic, intronic and coding gDNA variants. Reducing the overwhelming number of variants while retaining functional genome variation of importance to patients, enhances the near-term value of WGS in clinical diagnostics.

Download Full-text

rdmc: An Open Source R Package Implementing Convergent Adaptation Models of Lee and Coop (2017)

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401527 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3041-3046

Author(s):

Silas Tittes

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Population Genetic ◽

R Package ◽

Ease Of Use ◽

Software Implementation ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Comprehensive Overview

Abstract The availability of whole genome sequencing data from multiple related populations creates opportunities to test sophisticated population genetic models of convergent adaptation. Recent work by Lee and Coop (2017) developed models to infer modes of convergent adaption at local genomic scales, providing a rich framework for assessing how selection has acted across multiple populations at the tested locus. Here I present, rdmc, an R package that builds on the existing software implementation of Lee and Coop (2017) that prioritizes ease of use, portability, and scalability. I demonstrate installation and comprehensive overview of the package’s current utilities.

Download Full-text

Evolutionary Genomics of High Fecundity

Annual Review of Genetics ◽

10.1146/annurev-genet-021920-095932 ◽

2020 ◽

Vol 54 (1) ◽

pp. 213-236

Author(s):

Bjarki Eldon

Keyword(s):

Population Genetics ◽

Population Genetic ◽

Reference Model ◽

Evolutionary Genomics ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

High Fecundity ◽

Large Numbers ◽

Fisher Model ◽

Evolutionary Studies

Natural highly fecund populations abound. These range from viruses to gadids. Many highly fecund populations are economically important. Highly fecund populations provide an important contrast to the low-fecundity organisms that have traditionally been applied in evolutionary studies. A key question regarding high fecundity is whether large numbers of offspring are produced on a regular basis, by few individuals each time, in a sweepstakes mode of reproduction. Such reproduction characteristics are not incorporated into the classical Wright–Fisher model, the standard reference model of population genetics, or similar types of models, in which each individual can produce only small numbers of offspring relative to the population size. The expected genomic footprints of population genetic models of sweepstakes reproduction are very different from those of the Wright–Fisher model. A key, immediate issue involves identifying the footprints of sweepstakes reproduction in genomic data. Whole-genome sequencing data can be used to distinguish the patterns made by sweepstakes reproduction from the patterns made by population growth in a population evolving according to the Wright–Fisher model (or similar models). If the hypothesis of sweepstakes reproduction cannot be rejected, then models of sweepstakes reproduction and associated multiple-merger coalescents will become at least as relevant as the Wright–Fisher model (or similar models) and the Kingman coalescent, the cornerstones of mathematical population genetics, in further discussions of evolutionary genomics of highly fecund populations.

Download Full-text

rdmc: an open source R package implementing convergent adaptation models of Lee and Coop (2017)

10.1101/2020.04.22.056150 ◽

2020 ◽

Author(s):

Silas Tittes

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Population Genetic ◽

R Package ◽

Ease Of Use ◽

Software Implementation ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Comprehensive Overview

ABSTRACTThe availability of whole genome sequencing data from multiple related populations creates opportunities to test sophisticated population genetic models of convergent adaptation. Recent work by Lee and Coop (2017) developed models to infer modes of convergent adaption at local genomic scales, providing a rich framework for assessing how selection has acted across multiple populations at the tested locus. Here I present, rdmc, an R package that builds on the existing software implementation of Lee and Coop (2017) that prioritizes ease of use, portability, and scalability. I demonstrate installation and comprehensive overview of the package’s current utilities.

Download Full-text

From whole genome sequencing data toward a simple genotyping tool: application to the animal pathogen Mycobacterium bovis

10.26226/morressier.56d5ba2ad462b80296c965c0 ◽

2016 ◽

Author(s):

Lorraine Michelet

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Mycobacterium Bovis ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text

Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab083 ◽

2021 ◽

Author(s):

Eric S Tvedte ◽

Mark Gasser ◽

Benjamin C Sparklin ◽

Jane Michalski ◽

Carl E Hjelmen ◽

...

Keyword(s):

Bacterial Genome ◽

Hybrid Approach ◽

Cost Effective ◽

Fruit Fly ◽

Drosophila Ananassae ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

E Coli ◽

Hybrid Approaches ◽

Long Read

Abstract The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.

Download Full-text

A 127 kb truncating deletion of PGRMC1 is a novel cause of X-linked isolated paediatric cataract

European Journal of Human Genetics ◽

10.1038/s41431-021-00889-8 ◽

2021 ◽

Author(s):

Johanna L. Jones ◽

Mark A. Corbett ◽

Elise Yeaman ◽

Duran Zhao ◽

Jozef Gecz ◽

...

Keyword(s):

Intellectual Disability ◽

Protein Interactions ◽

Cholesterol Biosynthesis ◽

Crystalline Lens ◽

Whole Genome Sequencing Data ◽

Mendelian Disease ◽

Cataract Formation ◽

Sequencing Data ◽

Zebrafish Model ◽

Paediatric Cataract

AbstractInherited paediatric cataract is a rare Mendelian disease that results in visual impairment or blindness due to a clouding of the eye’s crystalline lens. Here we report an Australian family with isolated paediatric cataract, which we had previously mapped to Xq24. Linkage at Xq24–25 (LOD = 2.53) was confirmed, and the region refined with a denser marker map. In addition, two autosomal regions with suggestive evidence of linkage were observed. A segregating 127 kb deletion (chrX:g.118373226_118500408del) in the Xq24–25 linkage region was identified from whole-genome sequencing data. This deletion completely removed a commonly deleted long non-coding RNA gene LOC101928336 and truncated the protein coding progesterone receptor membrane component 1 (PGRMC1) gene following exon 1. A literature search revealed a report of two unrelated males with non-syndromic intellectual disability, as well as congenital cataract, who had contiguous gene deletions that accounted for their intellectual disability but also disrupted the PGRMC1 gene. A morpholino-induced pgrmc1 knockdown in a zebrafish model produced significant cataract formation, supporting a role for PGRMC1 in lens development and cataract formation. We hypothesise that the loss of PGRMC1 causes cataract through disrupted PGRMC1-CYP51A1 protein–protein interactions and altered cholesterol biosynthesis. The cause of paediatric cataract in this family is the truncating deletion of PGRMC1, which we report as a novel cataract gene.

Download Full-text

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Generation Sequencing

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

Download Full-text