scholarly journals PopAmaranth: A population genetic genome browser for grain amaranths and their wild relatives

Author(s):  
José Gonçalves-Dias ◽  
Markus G Stetter

Abstract The combination of genomic, physiological, and population genetic research has accelerated the understanding and improvement of numerous crops. For non-model crops the lack of interdisciplinary research hinders their improvement. Grain amaranth is an ancient nutritious pseudocereal that has been domesticated three times in different regions of the Americas. We present and employ PopAmaranth, a population genetic genome browser, which provides an accessible representation of the genetic variation of the three grain amaranth species (A. hypochondriacus, A. cruentus, and A. caudatus) and two wild relatives (A. hybridus and A. quitensis) along the A. hypochondriacus reference sequence. We performed population-scale diversity and selection analysis from whole-genome sequencing data of 88 curated genetically and taxonomically unambiguously classified accessions. We employ the platform to show that genetic diversity in the water stress-related MIF1 gene declined during amaranth domestication and provide evidence for convergent saponin reduction between amaranth and quinoa. PopAmaranth is available through amaranthGDB at amaranthgdb.org/popamaranth.html.

2020 ◽  
Author(s):  
José Gonçalves-Dias ◽  
Markus G Stetter

The last decades of genomic, physiological, and population genetic research have accelerated the understanding and improvement of a numerous crops. The transfer of methods to minor crops could accelerate their improvement if knowledge is effectively shared between disciplines. Grain amaranth is an ancient nutritious pseudocereal from the Americas that is regaining importance due to its high protein content and favorable amino acid and micronutrient composition. To effectively combine genomic and population genetic information with molecular genetics, plant physiology, and use it for interdisciplinary research and crop improvement, an intuitive interaction for scientists across disciplines is essential. Here, we present PopAmaranth, a population genetic genome browser, which provides an accessible representation of the genetic variation of the three grain amaranth species (A. hypochondriacus, A. cruentus, and A. caudatus) and two wild relatives (A. hybridus and A. quitensis) along the A. hypochondriacus reference sequence. We performed population-scale diversity and selection analysis from whole-genome sequencing data of 88 curated genetically and taxonomically unambiguously classified accessions. We incorporate the domestication history of the three grain amaranths to make an evolutionary perspective for candidate genes and regions available. We employ the platform to show that genetic diversity in the water stress-related MIF1 gene declined during amaranth domestication and provide evidence for convergent saponin reduction between amaranth and quinoa. These examples show that our tool enables the detailed study of individual genes, provides target regions for breeding efforts and can enhance the interdisciplinary integration of population genomic findings across species. PopAmaranth is available through amaranthGDB at amaranthgdb.org/popamaranth.htmlSignificanceSharing population genetic results between disciplines can facilitate interdisciplinary research and accelerate the improvement of crops. Since the onset of genome sequencing online genome browser platforms have provide access to features of an organisms genetic information. Rarely this has been extended to population-wide summary statistics for evolutionary hypothesis testing. We implemented a population genetic genome browser PopAmaranth for three grain amaranth species and their two wild relatives. The intuitive and user-friendly interface of PopA-maranth makes the genetic diversity of the species complex available to broad audience of biologists across disciplines. We show how our tool can be used to study convergence across distant genera and find signals of past selection in domestication and stress related genes. Community platforms and genome browsers are an integrative element of numerous study systems. PopAmaranth can serve as template for other research communities to integrate and share their results.


2020 ◽  
Author(s):  
Sihao Xiao ◽  
Zhentian Kai ◽  
David Brown ◽  
Claire L Shovlin ◽  

SUMMARYWhole genome sequencing (WGS) is championed by the UK National Health Service (NHS) to identify genetic variants that cause particular diseases. The full potential of WGS has yet to be realised as early data analytic steps prioritise protein-coding genes, and effectively ignore the less well annotated non-coding genome which is rich in transcribed and critical regulatory regions. To address, we developed a filter, which we call GROFFFY, and validated in WGS data from hereditary haemorrhagic telangiectasia patients within the 100,000 Genomes Project. Before filter application, the mean number of DNA variants compared to human reference sequence GRCh38 was 4,867,167 (range 4,786,039-5,070,340), and one-third lay within intergenic areas. GROFFFY removed a mean of 2,812,015 variants per DNA. In combination with allele frequency and other filters, GROFFFY enabled a 99.56% reduction in variant number. The proportion of intergenic variants was maintained, and no pathogenic variants in disease genes were lost. We conclude that the filter applied to NHS diagnostic samples in the 100,000 Genomes pipeline offers an efficient method to prioritise intergenic, intronic and coding gDNA variants. Reducing the overwhelming number of variants while retaining functional genome variation of importance to patients, enhances the near-term value of WGS in clinical diagnostics.


2020 ◽  
Vol 10 (9) ◽  
pp. 3041-3046
Author(s):  
Silas Tittes

Abstract The availability of whole genome sequencing data from multiple related populations creates opportunities to test sophisticated population genetic models of convergent adaptation. Recent work by Lee and Coop (2017) developed models to infer modes of convergent adaption at local genomic scales, providing a rich framework for assessing how selection has acted across multiple populations at the tested locus. Here I present, rdmc, an R package that builds on the existing software implementation of Lee and Coop (2017) that prioritizes ease of use, portability, and scalability. I demonstrate installation and comprehensive overview of the package’s current utilities.


2020 ◽  
Vol 54 (1) ◽  
pp. 213-236
Author(s):  
Bjarki Eldon

Natural highly fecund populations abound. These range from viruses to gadids. Many highly fecund populations are economically important. Highly fecund populations provide an important contrast to the low-fecundity organisms that have traditionally been applied in evolutionary studies. A key question regarding high fecundity is whether large numbers of offspring are produced on a regular basis, by few individuals each time, in a sweepstakes mode of reproduction. Such reproduction characteristics are not incorporated into the classical Wright–Fisher model, the standard reference model of population genetics, or similar types of models, in which each individual can produce only small numbers of offspring relative to the population size. The expected genomic footprints of population genetic models of sweepstakes reproduction are very different from those of the Wright–Fisher model. A key, immediate issue involves identifying the footprints of sweepstakes reproduction in genomic data. Whole-genome sequencing data can be used to distinguish the patterns made by sweepstakes reproduction from the patterns made by population growth in a population evolving according to the Wright–Fisher model (or similar models). If the hypothesis of sweepstakes reproduction cannot be rejected, then models of sweepstakes reproduction and associated multiple-merger coalescents will become at least as relevant as the Wright–Fisher model (or similar models) and the Kingman coalescent, the cornerstones of mathematical population genetics, in further discussions of evolutionary genomics of highly fecund populations.


2020 ◽  
Author(s):  
Silas Tittes

ABSTRACTThe availability of whole genome sequencing data from multiple related populations creates opportunities to test sophisticated population genetic models of convergent adaptation. Recent work by Lee and Coop (2017) developed models to infer modes of convergent adaption at local genomic scales, providing a rich framework for assessing how selection has acted across multiple populations at the tested locus. Here I present, rdmc, an R package that builds on the existing software implementation of Lee and Coop (2017) that prioritizes ease of use, portability, and scalability. I demonstrate installation and comprehensive overview of the package’s current utilities.


Author(s):  
Eric S Tvedte ◽  
Mark Gasser ◽  
Benjamin C Sparklin ◽  
Jane Michalski ◽  
Carl E Hjelmen ◽  
...  

Abstract The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.


Author(s):  
Johanna L. Jones ◽  
Mark A. Corbett ◽  
Elise Yeaman ◽  
Duran Zhao ◽  
Jozef Gecz ◽  
...  

AbstractInherited paediatric cataract is a rare Mendelian disease that results in visual impairment or blindness due to a clouding of the eye’s crystalline lens. Here we report an Australian family with isolated paediatric cataract, which we had previously mapped to Xq24. Linkage at Xq24–25 (LOD = 2.53) was confirmed, and the region refined with a denser marker map. In addition, two autosomal regions with suggestive evidence of linkage were observed. A segregating 127 kb deletion (chrX:g.118373226_118500408del) in the Xq24–25 linkage region was identified from whole-genome sequencing data. This deletion completely removed a commonly deleted long non-coding RNA gene LOC101928336 and truncated the protein coding progesterone receptor membrane component 1 (PGRMC1) gene following exon 1. A literature search revealed a report of two unrelated males with non-syndromic intellectual disability, as well as congenital cataract, who had contiguous gene deletions that accounted for their intellectual disability but also disrupted the PGRMC1 gene. A morpholino-induced pgrmc1 knockdown in a zebrafish model produced significant cataract formation, supporting a role for PGRMC1 in lens development and cataract formation. We hypothesise that the loss of PGRMC1 causes cataract through disrupted PGRMC1-CYP51A1 protein–protein interactions and altered cholesterol biosynthesis. The cause of paediatric cataract in this family is the truncating deletion of PGRMC1, which we report as a novel cataract gene.


Author(s):  
Anne Krogh Nøhr ◽  
Kristian Hanghøj ◽  
Genis Garcia Erill ◽  
Zilong Li ◽  
Ida Moltke ◽  
...  

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.


Sign in / Sign up

Export Citation Format

Share Document