scholarly journals DiscoSnp-RAD: de novo detection of small variants for population genomics

2017 ◽  
Author(s):  
Jèrèmy Gauthier ◽  
Charlotte Mouden ◽  
Tomasz Suchan ◽  
Nadir Alvarez ◽  
Nils Arrigo ◽  
...  

AbstractWe present an original method to de novo call variants for Restriction site associated DNA Sequencing (RAD-Seq). RAD-Seq is a technique characterized by the sequencing of specific loci along the genome, that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, as STACKS or IPyRAD, are based on all-versus-all read comparisons, which require consequent time and computing resources. Based on the variant caller DiscoSnp, initially designed for shotgun sequencing, DiscoSnp-RAD avoids this pitfall as variants are detected by exploring the De Bruijn Graph built from all the read datasets. We tested the implementation on RAD data from 259 specimens of Chiastocheta flies, morphologically assigned to 7 species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within species structuration and the existence of two populations linked to their geographic distributions. Furthermore, our results show that DiscoSnp-RAD is at least one order of magnitude faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD data, and stands out from other tools due to his completely different principle, making it significantly faster, in particular on large datasets.LicenseGNU Affero general public licenseAvailabilityhttps://github.com/GATB/[email protected]

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9291
Author(s):  
Jérémy Gauthier ◽  
Charlotte Mouden ◽  
Tomasz Suchan ◽  
Nadir Alvarez ◽  
Nils Arrigo ◽  
...  

Restriction site Associated DNA Sequencing (RAD-Seq) is a technique characterized by the sequencing of specific loci along the genome that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly Single Nucleotide Polymorphism—SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, such as STACKS or IPyRAD, are based on all-vs-all read alignments, which require consequent time and computing resources. We present an original method, DiscoSnp-RAD, that avoids this pitfall since variants are detected by exploiting specific parts of the assembly graph built from the reads, hence preventing all-vs-all read alignments. We tested the implementation on simulated datasets of increasing size, up to 1,000 samples, and on real RAD-Seq data from 259 specimens of Chiastocheta flies, morphologically assigned to seven species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within-species genetic structure linked to the geographic distribution. Furthermore, our results show that DiscoSnp-RAD is significantly faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD-Seq data, it does not require time-consuming parameterization steps and it stands out from other tools due to its completely different principle, making it substantially faster, in particular on large datasets.


2019 ◽  
Author(s):  
Emeline Deleury ◽  
Thomas Guillemaud ◽  
Aurélie Blin ◽  
Eric Lombaert

AbstractExon capture coupled to high-throughput sequencing constitutes a cost-effective technical solution for addressing specific questions in evolutionary biology by focusing on expressed regions of the genome preferentially targeted by selection. Transcriptome-based capture, a process that can be used to capture the exons of non-model species, is use in phylogenomics. However, its use in population genomics remains rare due to the high costs of sequencing large numbers of indexed individuals across multiple populations. We evaluated the feasibility of combining transcriptome-based capture and the pooling of tissues from numerous individuals for DNA extraction as a cost-effective, generic and robust approach to estimating the variant allele frequencies of any species at the population level. We designed capture probes for ∼5 Mb of chosen de novo transcripts from the Asian ladybird Harmonia axyridis (5,717 transcripts). We called ∼300,000 bi-allelic SNPs for a pool of 36 non-indexed individuals. Capture efficiency was high, and pool-seq was as effective and accurate as individual-seq for detecting variants and estimating allele frequencies. Finally, we also evaluated an approach for simplifying bioinformatic analyses by mapping genomic reads directly to targeted transcript sequences to obtain coding variants. This approach is effective and does not affect the estimation of SNP allele frequencies, except for a small bias close to some exon ends. We demonstrate that this approach can also be used to predict the intron-exon boundaries of targeted de novo transcripts, making it possible to abolish genotyping biases near exon ends.


Author(s):  
Amatur Rahman ◽  
Paul Medvedev

AbstractGiven the popularity and elegance of k-mer based tools, finding a space-efficient way to represent a set of k-mers is important for improving the scalability of bioinformatics analyses. One popular approach is to convert the set of k-mers into the more compact set of unitigs. We generalize this approach and formulate it as the problem of finding a smallest spectrum-preserving string set (SPSS) representation. We show that this problem is equivalent to finding a smallest path cover in a compacted de Bruijn graph. Using this reduction, we prove a lower bound on the size of the optimal SPSS and propose a greedy method called UST that results in a smaller representation than unitigs and is nearly optimal with respect to our lower bound. We demonstrate the usefulness of the SPSS formulation with two applications of UST. The first one is a compression algorithm, UST-Compress, which we show can store a set of k-mers using an order-of-magnitude less disk space than other lossless compression tools. The second one is an exact static k-mer membership index, UST-FM, which we show improves index size by 10-44% compared to other state-of-the-art low memory indices. Our tool is publicly available at: https://github.com/medvedevgroup/UST/.


2017 ◽  
Author(s):  
Roye Rozov ◽  
Gil Goldshlager ◽  
Eran Halperin ◽  
Ron Shamir

AbstractMotivationWe present Faucet, a 2-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.ResultsFaucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata - coverage counts collected at junction k-mers and connections bridging between junction pairs - contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Faucet’s resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency - namely, Minia and LightAssembler. However, on metagenomes tested, Faucet’s outputs had 14-110% higher mean NGA50 lengths compared to Minia, and 2-11-fold higher mean NGA50 lengths compared to LightAssembler, the only other streaming assembler available.AvailabilityFaucet is available at https://github.com/Shamir-Lab/[email protected],[email protected] information:Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Prashant Pandey ◽  
Michael A. Bender ◽  
Rob Johnson ◽  
Rob Patro

AbstractMotivationk-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing (HTS) data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g., for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this paper, we set forth the fundamental operations for maintaining multisets of k-mers and classify existing systems from a data-structural perspective. We then show how to build a k-mer-counting and multiset-representation system using the counting quotient filter (CQF), a feature-rich approximate membership query (AMQ) data structure. We introduce the k-mer-counting/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the CQF. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers.ResultsSqueakr takes 2×−3;4.3× less time than the state-of-the-art to count and perform a random-point-query workload. Squeakr is memory-efficient, consuming 1.5X–4.3X less memory than the state-of-the-art. It offers competitive counting performance, and answers point queries (i.e. queries for the abundance of a particular k-mer) over an order-of-magnitude faster than other systems. The Squeakr representation of the k-mer multiset turns out to be immediately useful for downstream processing (e.g., de Bruijn graph traversal) because it supports fast queries and dynamic k-mer insertion, deletion, and modification.Availabilityhttps://github.com/splatlab/[email protected]


2021 ◽  
Author(s):  
Zhilin Yuan ◽  
Irina S. Druzhinina ◽  
John G. Gibbons ◽  
Zhenhui Zhong ◽  
Yves Van de Peer ◽  
...  

AbstractUnderstanding how organisms adapt to extreme living conditions is central to evolutionary biology. Dark septate endophytes (DSEs) constitute an important component of the root mycobiome and they are often able to alleviate host abiotic stresses. Here, we investigated the molecular mechanisms underlying the beneficial association between the DSE Laburnicola rhizohalophila and its host, the native halophyte Suaeda salsa, using population genomics. Based on genome-wide Fst (pairwise fixation index) and Vst analyses, which compared the variance in allele frequencies of single-nucleotide polymorphisms (SNPs) and copy number variants (CNVs), respectively, we found a high level of genetic differentiation between two populations. CNV patterns revealed population-specific expansions and contractions. Interestingly, we identified a ~20 kbp genomic island of high divergence with a strong sign of positive selection. This region contains a melanin-biosynthetic polyketide synthase gene cluster linked to six additional genes likely involved in biosynthesis, membrane trafficking, regulation, and localization of melanin. Differences in growth yield and melanin biosynthesis between the two populations grown under 2% NaCl stress suggested that this genomic island contributes to the observed differences in melanin accumulation. Our findings provide a better understanding of the genetic and evolutionary mechanisms underlying the adaptation to saline conditions of the L. rhizohalophila–S. salsa symbiosis.


Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 222
Author(s):  
Bartosz Ulaszewski ◽  
Joanna Meger ◽  
Jaroslaw Burczyk

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.


2019 ◽  
Vol 28 (15) ◽  
pp. 2501-2513 ◽  
Author(s):  
Jacqueline A C Goos ◽  
Walter K Vogel ◽  
Hana Mlcochova ◽  
Christopher J Millard ◽  
Elahe Esfandiari ◽  
...  

Abstract Craniosynostosis, the premature ossification of cranial sutures, is a developmental disorder of the skull vault, occurring in approximately 1 in 2250 births. The causes are heterogeneous, with a monogenic basis identified in ~25% of patients. Using whole-genome sequencing, we identified a novel, de novo variant in BCL11B, c.7C>A, encoding an R3S substitution (p.R3S), in a male patient with coronal suture synostosis. BCL11B is a transcription factor that interacts directly with the nucleosome remodelling and deacetylation complex (NuRD) and polycomb-related complex 2 (PRC2) through the invariant proteins RBBP4 and RBBP7. The p.R3S substitution occurs within a conserved amino-terminal motif (RRKQxxP) of BCL11B and reduces interaction with both transcriptional complexes. Equilibrium binding studies and molecular dynamics simulations show that the p.R3S substitution disrupts ionic coordination between BCL11B and the RBBP4–MTA1 complex, a subassembly of the NuRD complex, and increases the conformational flexibility of Arg-4, Lys-5 and Gln-6 of BCL11B. These alterations collectively reduce the affinity of BCL11B p.R3S for the RBBP4–MTA1 complex by nearly an order of magnitude. We generated a mouse model of the BCL11B p.R3S substitution using a CRISPR-Cas9-based approach, and we report herein that these mice exhibit craniosynostosis of the coronal suture, as well as other cranial sutures. This finding provides strong evidence that the BCL11B p.R3S substitution is causally associated with craniosynostosis and confirms an important role for BCL11B in the maintenance of cranial suture patency.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Vittorino Lanzio ◽  
Gregory Telian ◽  
Alexander Koshelev ◽  
Paolo Micheletti ◽  
Gianni Presti ◽  
...  

AbstractThe combination of electrophysiology and optogenetics enables the exploration of how the brain operates down to a single neuron and its network activity. Neural probes are in vivo invasive devices that integrate sensors and stimulation sites to record and manipulate neuronal activity with high spatiotemporal resolution. State-of-the-art probes are limited by tradeoffs involving their lateral dimension, number of sensors, and ability to access independent stimulation sites. Here, we realize a highly scalable probe that features three-dimensional integration of small-footprint arrays of sensors and nanophotonic circuits to scale the density of sensors per cross-section by one order of magnitude with respect to state-of-the-art devices. For the first time, we overcome the spatial limit of the nanophotonic circuit by coupling only one waveguide to numerous optical ring resonators as passive nanophotonic switches. With this strategy, we achieve accurate on-demand light localization while avoiding spatially demanding bundles of waveguides and demonstrate the feasibility with a proof-of-concept device and its scalability towards high-resolution and low-damage neural optoelectrodes.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ryo Matsuzaki ◽  
Shigekatsu Suzuki ◽  
Haruyo Yamaguchi ◽  
Masanobu Kawachi ◽  
Yu Kanesaki ◽  
...  

Abstract Background Pyrenoids are protein microcompartments composed mainly of Rubisco that are localized in the chloroplasts of many photosynthetic organisms. Pyrenoids contribute to the CO2-concentrating mechanism. This organelle has been lost many times during algal/plant evolution, including with the origin of land plants. The molecular basis of the evolutionary loss of pyrenoids is a major topic in evolutionary biology. Recently, it was hypothesized that pyrenoid formation is controlled by the hydrophobicity of the two helices on the surface of the Rubisco small subunit (RBCS), but the relationship between hydrophobicity and pyrenoid loss during the evolution of closely related algal/plant lineages has not been examined. Here, we focused on, the Reticulata group of the unicellular green algal genus Chloromonas, within which pyrenoids are present in some species, although they are absent in the closely related species. Results Based on de novo transcriptome analysis and Sanger sequencing of cloned reverse transcription-polymerase chain reaction products, rbcS sequences were determined from 11 strains of two pyrenoid-lacking and three pyrenoid-containing species of the Reticulata group. We found that the hydrophobicity of the RBCS helices was roughly correlated with the presence or absence of pyrenoids within the Reticulata group and that a decrease in the hydrophobicity of the RBCS helices may have primarily caused pyrenoid loss during the evolution of this group. Conclusions Although we suggest that the observed correlation may only exist for the Reticulata group, this is still an interesting study that provides novel insight into a potential mechanism determining initial evolutionary steps of gain and loss of the pyrenoid.


Sign in / Sign up

Export Citation Format

Share Document