scholarly journals DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9291
Author(s):  
Jérémy Gauthier ◽  
Charlotte Mouden ◽  
Tomasz Suchan ◽  
Nadir Alvarez ◽  
Nils Arrigo ◽  
...  

Restriction site Associated DNA Sequencing (RAD-Seq) is a technique characterized by the sequencing of specific loci along the genome that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly Single Nucleotide Polymorphism—SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, such as STACKS or IPyRAD, are based on all-vs-all read alignments, which require consequent time and computing resources. We present an original method, DiscoSnp-RAD, that avoids this pitfall since variants are detected by exploiting specific parts of the assembly graph built from the reads, hence preventing all-vs-all read alignments. We tested the implementation on simulated datasets of increasing size, up to 1,000 samples, and on real RAD-Seq data from 259 specimens of Chiastocheta flies, morphologically assigned to seven species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within-species genetic structure linked to the geographic distribution. Furthermore, our results show that DiscoSnp-RAD is significantly faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD-Seq data, it does not require time-consuming parameterization steps and it stands out from other tools due to its completely different principle, making it substantially faster, in particular on large datasets.

2017 ◽  
Author(s):  
Jèrèmy Gauthier ◽  
Charlotte Mouden ◽  
Tomasz Suchan ◽  
Nadir Alvarez ◽  
Nils Arrigo ◽  
...  

AbstractWe present an original method to de novo call variants for Restriction site associated DNA Sequencing (RAD-Seq). RAD-Seq is a technique characterized by the sequencing of specific loci along the genome, that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, as STACKS or IPyRAD, are based on all-versus-all read comparisons, which require consequent time and computing resources. Based on the variant caller DiscoSnp, initially designed for shotgun sequencing, DiscoSnp-RAD avoids this pitfall as variants are detected by exploring the De Bruijn Graph built from all the read datasets. We tested the implementation on RAD data from 259 specimens of Chiastocheta flies, morphologically assigned to 7 species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within species structuration and the existence of two populations linked to their geographic distributions. Furthermore, our results show that DiscoSnp-RAD is at least one order of magnitude faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD data, and stands out from other tools due to his completely different principle, making it significantly faster, in particular on large datasets.LicenseGNU Affero general public licenseAvailabilityhttps://github.com/GATB/[email protected]


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Mochamad Syaifudin ◽  
Michaël Bekaert ◽  
John B. Taggart ◽  
Kerry L. Bartie ◽  
Stefanie Wehner ◽  
...  

Abstract Tilapias (family Cichlidae) are of importance in aquaculture and fisheries. Hybridisation and introgression are common within tilapia genera but are difficult to analyse due to limited numbers of species-specific genetic markers. We tested the potential of double digested restriction-site associated DNA (ddRAD) sequencing for discovering single nucleotide polymorphism (SNP) markers to distinguish between 10 tilapia species. Analysis of ddRAD data revealed 1,371 shared SNPs in the de novo-based analysis and 1,204 SNPs in the reference-based analysis. Phylogenetic trees based on these two analyses were very similar. A total of 57 species-specific SNP markers were found among the samples analysed of the 10 tilapia species. Another set of 62 species-specific SNP markers was identified from a subset of four species which have often been involved in hybridisation in aquaculture: 13 for Oreochromis niloticus, 23 for O. aureus, 12 for O. mossambicus and 14 for O. u. hornorum. A panel of 24 SNPs was selected to distinguish among these four species and validated using 91 individuals. Larger numbers of SNP markers were found that could distinguish between the pairs of species within this subset. This technique offers potential for the investigation of hybridisation and introgression among tilapia species in aquaculture and in wild populations.


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 481
Author(s):  
Max Robinson ◽  
Gustavo Glusman

Genetic testing has expanded out of the research laboratory into medical practice and the direct-to-consumer market. Rapid analysis of the resulting genotype data now has a significant impact. We present a method for summarizing personal genotypes as ‘genotype fingerprints’ that meets these needs. Genotype fingerprints can be derived from any single nucleotide polymorphism-based assay, and remain comparable as chip designs evolve to higher marker densities. We demonstrate that these fingerprints support distinguishing types of relationships among closely related individuals and closely related individuals from individuals from the same background population, as well as high-throughput identification of identical genotypes, individuals in known background populations, and de novo separation of subpopulations within a large cohort through extremely rapid comparisons. Although fingerprints do not preserve anonymity, they provide a useful degree of privacy by summarizing a genotype while preventing reconstruction of individual marker states. Genotype fingerprints are therefore well-suited as a format for public aggregation of genetic information to support ancestry and relatedness determination without revealing personal health risk status.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Md Asaduzzaman ◽  
Md A. Wahab ◽  
Md J. Rahman ◽  
Md Nahiduzzzaman ◽  
Malcom W. Dickson ◽  
...  

Abstract The anadromous Hilsa shad (Tenualosa ilisha) live in the Bay of Bengal and migrate to the estuaries and freshwater rivers for spawning and nursing of the juveniles. This has led to two pertinent questions: (i) do all Hilsa shad that migrate from marine to freshwater rivers come from the same population? and (ii) is there any relationship between adults and juveniles of a particular habitat? To address these questions, NextRAD sequencing was applied to genotype 31,276 single nucleotide polymorphism (SNP) loci for 180 individuals collected from six strategic locations of riverine, estuarine and marine habitats. FST OutFLANK approach identified 14,815 SNP loci as putatively neutral and 79 SNP loci as putatively adaptive. We observed that divergent local adaptations in differing environmental habitats have divided Hilsa shad into three genetically structured ecotypes: turbid freshwater (Western Riverine), clear freshwater (Eastern Riverine) and brackish-saline (Southern Estuarine-Marine). Our results also revealed that genes involved in neuronal activity may have facilitated the juveniles’ Hilsa shad in returning to their respective natal rivers for spawning. This study emphasized the application of fundamental population genomics information in strategizing conservation and management of anadromous fish such as Hilsa shad that intersect diverse ecotypes during their life-history stages.


2019 ◽  
Author(s):  
Emeline Deleury ◽  
Thomas Guillemaud ◽  
Aurélie Blin ◽  
Eric Lombaert

AbstractExon capture coupled to high-throughput sequencing constitutes a cost-effective technical solution for addressing specific questions in evolutionary biology by focusing on expressed regions of the genome preferentially targeted by selection. Transcriptome-based capture, a process that can be used to capture the exons of non-model species, is use in phylogenomics. However, its use in population genomics remains rare due to the high costs of sequencing large numbers of indexed individuals across multiple populations. We evaluated the feasibility of combining transcriptome-based capture and the pooling of tissues from numerous individuals for DNA extraction as a cost-effective, generic and robust approach to estimating the variant allele frequencies of any species at the population level. We designed capture probes for ∼5 Mb of chosen de novo transcripts from the Asian ladybird Harmonia axyridis (5,717 transcripts). We called ∼300,000 bi-allelic SNPs for a pool of 36 non-indexed individuals. Capture efficiency was high, and pool-seq was as effective and accurate as individual-seq for detecting variants and estimating allele frequencies. Finally, we also evaluated an approach for simplifying bioinformatic analyses by mapping genomic reads directly to targeted transcript sequences to obtain coding variants. This approach is effective and does not affect the estimation of SNP allele frequencies, except for a small bias close to some exon ends. We demonstrate that this approach can also be used to predict the intron-exon boundaries of targeted de novo transcripts, making it possible to abolish genotyping biases near exon ends.


2021 ◽  
Vol 12 ◽  
Author(s):  
Meiying Cai ◽  
Xianguo Fu ◽  
Liangpu Xu ◽  
Na Lin ◽  
Hailong Huang

Smith-Magenis syndrome and Potocki-Lupski syndrome are rare autosomal dominant diseases. Although clinical phenotypes of adults and children have been reported, fetal ultrasonic phenotypes are rarely reported. A retrospective analysis of 6,200 pregnant women who received invasive prenatal diagnosis at Fujian Provincial Maternal and Child Health Hospital between October 2016 and January 2021 was performed. Amniotic fluid or umbilical cord blood was extracted for karyotyping and single nucleotide polymorphism array analysis. Single nucleotide polymorphism array analysis revealed six fetuses with copy number variant changes in the 17p11.2 region. Among them, one had a copy number variant microdeletion in the 17p11.2 region, which was pathogenically analyzed and diagnosed as Smith-Magenis syndrome. Five fetuses had copy number variant microduplications in the 17p11.2 region, which were pathogenically analyzed and diagnosed as Potocki-Lupski syndrome. The prenatal ultrasound phenotypes of the six fetuses were varied. The parents of two fetuses with Potocki-Lupski syndrome refused verification. Smith-Magenis syndrome in one fetus and Potocki-Lupski in another were confirmed as de novo. Potocki-Lupski syndrome in two fetuses was confirmed to be from maternal inheritance. The prenatal ultrasound phenotypes of Smith-Magenis syndrome and Potocki-Lupski syndrome in fetuses vary; single nucleotide polymorphism array analysis is a powerful diagnostic tool for these diseases. The ultrasonic phenotypes of these cases may enrich the clinical database.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 3550-3550
Author(s):  
Sanidad A Marc ◽  
Marilyn L Slovak ◽  
Philip N Mowry ◽  
Joey C Kelly ◽  
Daniel M Jones

Abstract Abstract 3550 Introduction: The genetic loci altered in many de novo leukemia cases are relatively well-understood and can be accurately assessed by current cytogenetic techniques including multi-probe fluorescence in situ hybridization (FISH). However, identifying the cancer genes involved in complex leukemia karyotypes remains problematic due to the presence of multiple secondary structural rearrangements observed in subclonal populations. These alterations often affect both chromosome (chr) homologues and predominantly involve chr 1, 3, 5, 7, 12 and 17. Such clonal diversity within a tumor reflects the underlying biologically-selected sequential and multiple rearrangements and can, if carefully mapped, highlight the locations of tumor suppressor genes and modifiers involved in disease progression. Previous generations of DNA microarrays have proven useful in dissecting genomic changes in the predominant tumor clone, including copy-neutral loss of heterozygosity (CN-LOH) when single nucleotide polymorphism (SNP) arrays are used. However, a well-known shortcoming of DNA microarrays to date has been their limited sensitivity for accurately detecting low level mosaicism (<20%) and subclonal changes that are common in complex karyotypes. Methods: Using leukemia cases that showed complex karyotypes with up to 4 subclones, we compared the ability of standard (SNP 6.0, Affymetrix) and next-generation (Cytoscan HD, Affymetrix) SNP/copy number oligonucleotide arrays to accurately detect the observed karyotypic subclones and more precisely delineate areas of complex chromosomal alterations. Genomic DNA extracted from fresh material or 24∼48 hour short-term cultures from 8 patients with either de novo or previously treated chronic lymphocytic leukemia (CLL) was assessed on the SNP 6.0 and Cytoscan HD platforms and then compared with their karyotype, and/or supporting FISH studies. Copy number alterations and CN-LOH calls were made using ChAS software (Affymetrix), with the degree of clonal mosaicism analyzed for segmental increments of each chromosome by averaging the smooth signal data. Results and Conclusion: For all 53 CN-LOH and copy number calls, the two arrays gave identical detection rates and similar alteration boundaries in 34 instances (64.1% concordance). The genetic alterations that differed among the cytogenetically-related clones (subclones) were subclonal, in all but 3 instances, and most frequently involved chr 1 and 5. In general, the Cytoscan HD arrays were able to accurately detect karyotypically-confirmed subclones down to the 20% level (as well as distinguishing 90% vs. 100% calls), as opposed to the 30–50% level seen with the SNP 6.0 arrays. Improved detection of the discrete subclones or lower level clonality was attributed to more precise allele peak heights that did not require smoothing. Next-generation SNP/copy number oligonucleotide arrays show great promise in providing additive value to leukemic genomic profiling by clear visual separation of multiple genomic alterations within clonally diverse samples with the potential of identifying novel genetic alterations that may be important in disease progression. Disclosures: No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document