Genome sequencing and population genomics in non-model organisms

2014 ◽  
Vol 29 (1) ◽  
pp. 51-63 ◽  
Author(s):  
Hans Ellegren
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Omar Abou Saada ◽  
Andreas Tsouris ◽  
Chris Eberlein ◽  
Anne Friedrich ◽  
Joseph Schacherer

AbstractWhile genome sequencing and assembly are now routine, we do not have a full, precise picture of polyploid genomes. No existing polyploid phasing method provides accurate and contiguous haplotype predictions. We developed nPhase, a ploidy agnostic tool that leverages long reads and accurate short reads to solve alignment-based phasing for samples of unspecified ploidy (https://github.com/OmarOakheart/nPhase). nPhase is validated by tests on simulated and real polyploids. nPhase obtains on average over 95% accuracy and a contiguous 1.25 haplotigs per haplotype to cover more than 90% of each chromosome (heterozygosity rate ≥ 0.5%). nPhase allows population genomics and hybrid studies of polyploids.


2018 ◽  
Author(s):  
Valerie Wood ◽  
Antonia Lock ◽  
Midori A. Harris ◽  
Kim Rutherford ◽  
Jürg Bähler ◽  
...  

AbstractThe first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes. To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences.We use a simple yet powerful metric based on Gene Ontology (GO) biological process terms to define characterized and uncharacterized proteins for human, budding yeast, and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe, and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalog of proteins’ biological roles.


2014 ◽  
Author(s):  
Jonathan Puritz ◽  
Christopher M. Hollenbeck ◽  
John R. Gold

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).


BMC Genomics ◽  
2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Navin Rustagi ◽  
Anbo Zhou ◽  
W. Scott Watkins ◽  
Erika Gedvilaite ◽  
Shuoguo Wang ◽  
...  

2017 ◽  
Author(s):  
Audrey Rohfritsch ◽  
Maxime Galan ◽  
Mathieu Gautier ◽  
Karim Gharbi ◽  
Gert Olsson ◽  
...  

AbstractInfectious pathogens are major selective forces acting on individuals. The recent advent of high-throughput sequencing technologies now enables to investigate the genetic bases of resistance/susceptibility to infections in non-model organisms. From an evolutionary perspective, the analysis of the genetic diversity observed at these genes in natural populations provides insight into the mechanisms maintaining polymorphism and their epidemiological consequences. We explored these questions in the context of the interactions between Puumala hantavirus (PUUV) and its reservoir host, the bank vole Myodes glareolus. Despite the continuous spatial distribution of M. glareolus in Europe, PUUV distribution is strongly heterogeneous. Different defence strategies might have evolved in bank voles as a result of co-adaptation with PUUV, which may in turn reinforce spatial heterogeneity in PUUV distribution. We performed a genome scan study of six bank vole populations sampled along a North/South transect in Sweden, including PUUV endemic and non-endemic areas. We combined candidate gene analyses (Tlr4, Tlr7, Mx2 genes) and high throughput sequencing of RAD (Restriction-site Associated DNA) markers. We found evidence for outlier loci showing high levels of genetic differentiation. Ten outliers among the 52 that matched to mouse protein-coding genes corresponded to immune related genes and were detected using ecological associations with variations in PUUV prevalence. One third of the enriched pathways concerned immune processes, including platelet activation and TLR pathway. In the future, functional experimentations should enable to confirm the role of these these immune related genes with regard to the interactions between M. glareolus and PUUV.


2016 ◽  
Vol 82 (10) ◽  
pp. 3070-3081 ◽  
Author(s):  
Changyi Zhang ◽  
Qunxin She ◽  
Hongkai Bi ◽  
Rachel J. Whitaker

ABSTRACTSulfolobus islandicusserves as a model for studying archaeal biology as well as linking novel biology to evolutionary ecology using functional population genomics. In the present study, we developed a new counterselectable genetic marker inS. islandicusto expand the genetic toolbox for this species. We show that resistance to the purine analog 6-methylpurine (6-MP) inS. islandicusM.16.4 is due to the inactivation of a putative adenine phosphoribosyltransferase encoded byM164_0158(apt). The application of theaptgene as a novel counterselectable marker was first illustrated by constructing an unmarked α-amylase deletion mutant. Furthermore, the 6-MP counterselection feature was employed in a forward (loss-of-function) mutation assay to reveal the profile of spontaneous mutations inS. islandicusM.16.4 at theaptlocus. Moreover, the general conservation ofaptgenes in the crenarchaea suggests that the same strategy can be broadly applied to other crenarchaeal model organisms. These results demonstrate that theaptlocus represents a new tool for genetic manipulation and sequence analysis of the hyperthermophilic crenarchaeonS. islandicus.IMPORTANCECurrently, thepyrEF/5-fluoroorotic acid (5-FOA) counterselection system remains the sole counterselection marker in crenarchaeal genetics. Since mostSulfolobusmutants constructed by the research community were derived from genetic hosts lacking thepyrEFgenes, thepyrEF/5-FOA system is no longer available for use in forward mutation assays. Demonstration of theapt/6-MP counterselection system for theSulfolobusmodel renders it possible to again study the mutation profiles in mutants that have already been constructed by the use of strains with apyrEF-deficient background. Furthermore, additional counterselectable markers will allow us to conduct more sophisticated genetic studies, i.e., investigate mechanisms of chromosomal DNA transfer and quantify recombination frequencies amongS. islandicusstrains.


2019 ◽  
Vol 10 (1) ◽  
pp. 417-430 ◽  
Author(s):  
Elizabeth A. Morton ◽  
Ashley N. Hall ◽  
Elizabeth Kwan ◽  
Calvin Mok ◽  
Konstantin Queitsch ◽  
...  

Individuals within a species can exhibit vast variation in copy number of repetitive DNA elements. This variation may contribute to complex traits such as lifespan and disease, yet it is only infrequently considered in genotype-phenotype associations. Although the possible importance of copy number variation is widely recognized, accurate copy number quantification remains challenging. Here, we assess the technical reproducibility of several major methods for copy number estimation as they apply to the large repetitive ribosomal DNA array (rDNA). rDNA encodes the ribosomal RNAs and exists as a tandem gene array in all eukaryotes. Repeat units of rDNA are kilobases in size, often with several hundred units comprising the array, making rDNA particularly intractable to common quantification techniques. We evaluate pulsed-field gel electrophoresis, droplet digital PCR, and Nextera-based whole genome sequencing as approaches to copy number estimation, comparing techniques across model organisms and spanning wide ranges of copy numbers. Nextera-based whole genome sequencing, though commonly used in recent literature, produced high error. We explore possible causes for this error and provide recommendations for best practices in rDNA copy number estimation. We present a resource of high-confidence rDNA copy number estimates for a set of S. cerevisiae and C. elegans strains for future use. We furthermore explore the possibility for FISH-based copy number estimation, an alternative that could potentially characterize copy number on a cellular level.


2016 ◽  
Author(s):  
Cassandra Kontur ◽  
Santosh Kumar ◽  
Xun Lan ◽  
Jonathan K Pritchard ◽  
Aaron P Turkewitz

Unbiased genetic approaches have a unique ability to identify novel genes associated with specific biological pathways. Thanks to next generation sequencing, forward genetic strategies can be expanded into a wider range of model organisms. The formation of secretory granules, called mucocysts, in the ciliate Tetrahymena thermophila relies in part on ancestral lysosomal sorting machinery but is also likely to involve novel factors. In prior work, multiple strains with defect in mucocyst biogenesis were generated by nitrosoguanidine mutagenesis, and characterized using genetic and cell biological approaches, but the genetic lesions themselves were unknown. Here, we show that analyzing one such mutant by whole genome sequencing reveals a novel factor in mucocyst formation. Strain UC620 has both morphological and biochemical defects in mucocyst maturation, a process analogous to dense core granule maturation in animals. Illumina sequencing of a pool of UC620 F2 clones identified a missense mutation in a novel gene called MMA1 (Mucocyst maturation). The defects in UC620 were rescued by expression of a wildtype copy of MMA1, and disruption of MMA1 in an otherwise wildtype strain generated a phenocopy of UC620. The product of MMA1, characterized as a CFP-tagged copy, encodes a large soluble cytosolic protein. A small fraction of Mma1p-CFP is pelletable, which may reflect association with endosomes. The gene has no identifiable homologs except in other Tetrahymena species, and therefore represents an evolutionarily recent innovation that is required for granule maturation.


Sign in / Sign up

Export Citation Format

Share Document