SRG extractor: a skinny reference genome approach for reduced-representation sequencing

Davoud Torkamaneh; Jérôme Laroche; Istvan Rajcan; François Belzile

doi:10.1093/bioinformatics/btz043

SRG extractor: a skinny reference genome approach for reduced-representation sequencing

Bioinformatics ◽

10.1093/bioinformatics/btz043 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3160-3162

Author(s):

Davoud Torkamaneh ◽

Jérôme Laroche ◽

Istvan Rajcan ◽

François Belzile

Keyword(s):

Reference Genome ◽

Computing Time ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Scanning Method ◽

Reduced Representation ◽

A Genome ◽

Wide Range ◽

Reduced Representation Sequencing

Abstract Motivation Reduced-representation sequencing is a genome-wide scanning method for simultaneous discovery and genotyping of thousands to millions of single nucleotide polymorphisms that is used across a wide range of species. However, in this method a reproducible but very small fraction of the genome is captured for sequencing, while the resulting reads are typically aligned against the entire reference genome. Results Here we present a skinny reference genome approach in which a simplified reference genome is used to decrease computing time for data processing and to increase single nucleotide polymorphism counts and accuracy. A skinny reference genome can be integrated into any reduced-representation sequencing analytical pipeline. Availability and implementation https://bitbucket.org/jerlar73/SRG-Extractor. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DepthFinder: a tool to determine the optimal read depth for reduced-representation sequencing

Bioinformatics ◽

10.1093/bioinformatics/btz473 ◽

2019 ◽

Vol 36 (1) ◽

pp. 26-32

Author(s):

Davoud Torkamaneh ◽

Jérôme Laroche ◽

Brian Boyle ◽

François Belzile

Keyword(s):

Cost Effective ◽

Read Depth ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Reduced Representation ◽

Sequencing Platform ◽

Genome Complexity ◽

Wide Range ◽

Broad Array ◽

Reduced Representation Sequencing

Abstract Motivation Identification of DNA sequence variations such as single nucleotide polymorphisms (SNPs) is a fundamental step toward genetic studies. Reduced-representation sequencing methods have been developed as alternatives to whole genome sequencing to reduce costs and enable the analysis of many more individual. Amongst these methods, restriction site associated sequencing (RSAS) methodologies have been widely used for rapid and cost-effective discovery of SNPs and for high-throughput genotyping in a wide range of species. Despite the extensive improvements of the RSAS methods in the last decade, the estimation of the number of reads (i.e. read depth) required per sample for an efficient and effective genotyping remains mostly based on trial and error. Results Herein we describe a bioinformatics tool, DepthFinder, designed to estimate the required read counts for RSAS methods. To illustrate its performance, we estimated required read counts in six different species (human, cattle, spruce budworm, salmon, barley and soybean) that cover a range of different biological (genome size, level of genome complexity, level of DNA methylation and ploidy) and technical (library preparation protocol and sequencing platform) factors. To assess the prediction accuracy of DepthFinder, we compared DepthFinder-derived results with independent datasets obtained from an RSAS experiment. This analysis yielded estimated accuracies of nearly 94%. Moreover, we present DepthFinder as a powerful tool to predict the most effective size selection interval in RSAS work. We conclude that DepthFinder constitutes an efficient, reliable and useful tool for a broad array of users in different research communities. Availability and implementation https://bitbucket.org/jerlar73/DepthFinder Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

simuG: a general-purpose genome simulator

Bioinformatics ◽

10.1093/bioinformatics/btz424 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4442-4444 ◽

Cited By ~ 10

Author(s):

Jia-Xing Yue ◽

Gianni Liti

Keyword(s):

Copy Number Variants ◽

General Purpose ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Bioinformatics Analyses ◽

Full Spectrum ◽

Single Nucleotide ◽

Genomic Variants ◽

Wide Range ◽

Simulation Based

Abstract Summary Simulated genomes with pre-defined and random genomic variants can be very useful for benchmarking genomic and bioinformatics analyses. Here we introduce simuG, a lightweight tool for simulating the full-spectrum of genomic variants (single nucleotide polymorphisms, Insertions/Deletions, copy number variants, inversions and translocations) for any organisms (including human). The simplicity and versatility of simuG make it a unique general-purpose genome simulator for a wide-range of simulation-based applications. Availability and implementation Code in Perl along with user manual and testing data is available at https://github.com/yjx1217/simuG. This software is free for use under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Signatures of selection in a recent invasion reveals adaptive divergence in a highly vagile invasive species

10.1101/643569 ◽

2019 ◽

Cited By ~ 1

Author(s):

Adam P. A. Cardilini ◽

Katarina C. Stuart ◽

Phillip Cassey ◽

Mark F. Richardson ◽

William Sherwin ◽

...

Keyword(s):

Isolation By Distance ◽

Adaptive Divergence ◽

Candidate Snps ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Reduced Representation ◽

Introduction History ◽

Wide Range ◽

Critical Interpretation ◽

Genetic Substructure

AbstractA detailed understanding of population genetics in invasive populations helps us to identify drivers of successful introductions. Here, we investigate putative signals of selection in Australian populations of invasive common starlings,Sturnus vulgaris, and seek to understand how these have been influenced by introduction history. We use reduced representation sequencing to determine population structure, and identity Single Nucleotide Polymorphisms (SNPs) that are putatively under selection. We found that since their introduction into Australia, starling populations have become genetically differentiated despite the potential for high levels of dispersal, and that selection has facilitated their adaptation to the wide range of environmental conditions across their geographic range. Isolation by distance appears to have played a strong role in determining genetic substructure across the starling’s Australian range. Analyses of candidate SNPs that are putatively under selection indicate that aridity, precipitation, and temperature may be important factors driving adaptive variation across the starling’s invasive range in Australia. However, we also note that the historic introduction regime may leave footprints on sites flagged as being under adaptive selection, and encourage critical interpretation of selection analyses.

Download Full-text

Dementia key gene identification with multi-layered SNP-gene-disease network

Bioinformatics ◽

10.1093/bioinformatics/btaa814 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i831-i839

Author(s):

Dong-gi Lee ◽

Myungjun Kim ◽

Sang Joon Son ◽

Chang Hyung Hong ◽

Hyunjung Shin

Keyword(s):

Candidate Genes ◽

Learning Algorithm ◽

Search Space ◽

Supplementary Information ◽

Gene Identification ◽

Nucleotide Polymorphisms ◽

Disease Network ◽

Single Nucleotide ◽

Key Genes ◽

Significant Attention

Abstract Motivation Recently, various approaches for diagnosing and treating dementia have received significant attention, especially in identifying key genes that are crucial for dementia. If the mutations of such key genes could be tracked, it would be possible to predict the time of onset of dementia and significantly aid in developing drugs to treat dementia. However, gene finding involves tremendous cost, time and effort. To alleviate these problems, research on utilizing computational biology to decrease the search space of candidate genes is actively conducted. In this study, we propose a framework in which diseases, genes and single-nucleotide polymorphisms are represented by a layered network, and key genes are predicted by a machine learning algorithm. The algorithm utilizes a network-based semi-supervised learning model that can be applied to layered data structures. Results The proposed method was applied to a dataset extracted from public databases related to diseases and genes with data collected from 186 patients. A portion of key genes obtained using the proposed method was verified in silico through PubMed literature, and the remaining genes were left as possible candidate genes. Availability and implementation The code for the framework will be available at http://www.alphaminers.net/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Bisphosphonate-related osteonecrosis of the jaw is associated with polymorphisms of the cytochrome P450 CYP2C8 in multiple myeloma: a genome-wide single nucleotide polymorphism analysis

Blood ◽

10.1182/blood-2008-04-147884 ◽

2008 ◽

Vol 112 (7) ◽

pp. 2709-2712 ◽

Cited By ~ 161

Author(s):

Maria E. Sarasquete ◽

Ramon García-Sanz ◽

Luis Marín ◽

Miguel Alcoceba ◽

Maria C. Chillón ◽

...

Keyword(s):

Multiple Myeloma ◽

Cytochrome P450 ◽

Genome Wide Association Study ◽

Osteonecrosis Of The Jaw ◽

Bisphosphonate Therapy ◽

Polymorphism Analysis ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

A Genome

Abstract We have explored the potential role of genetics in the development of osteonecrosis of the jaw (ONJ) in multiple myeloma (MM) patients under bisphosphonate therapy. A genome-wide association study was performed using 500 568 single nucleotide polymorphisms (SNPs) in 2 series of homogeneously treated MM patients, one with ONJ (22 MM cases) and another without ONJ (65 matched MM controls). Four SNPs (rs1934951, rs1934980, rs1341162, and rs17110453) mapped within the cytochrome P450-2C gene (CYP2C8) showed a different distribution between cases and controls with statistically significant differences (P = 1.07 × 10−6, P = 4.231 × 10−6, P = 6.22 × 10−6, and P = 2.15 × 10−6, respectively). SNP rs1934951 was significantly associated with a higher risk of ONJ development even after Bonferroni correction (P corrected value = .02). Genotyping results displayed an overrepresentation of the T allele in cases compared with controls (48% vs 12%). Thus, individuals homozygous for the T allele had an increased likelihood of developing ONJ (odds ratio 12.75, 95% confidence interval 3.7-43.5).

Download Full-text

Haplotypes within tandemly duplicated candidate genes at BnaA9.MRP5 modulate phytate concentration in canola (Brassica napus L.)

10.22541/au.161165160.08519096/v1 ◽

2021 ◽

Author(s):

Haijiang Liu ◽

xiaojuan Li ◽

Qianwen Zhang ◽

pan yuan ◽

Lei Liu ◽

...

Keyword(s):

Candidate Genes ◽

Oilseed Rape ◽

Genome Wide Association Study ◽

Mineral Elements ◽

Nucleotide Polymorphisms ◽

Brassica Napus L ◽

Single Nucleotide ◽

Genome Wide ◽

A Genome ◽

Phytate Content

Phytate is the storage form of phosphorus in angiosperm seeds and plays vitally important roles during seed development. However, in crop plants phytate decreases bioavailability of seed-sourced mineral elements for humans, livestock and poultry, and contributes to phosphate-related water pollution. However, there is little knowledge about this trait in oilseed rape B. napus (oilseed rape). Here, a panel of 505 diverse B. napus accessions was screened in a genome-wide association study (GWAS) using 3.28 x 106 single nucleotide polymorphisms (SNPs). This identified 119 SNPs significantly associated with phytate concentration (PA_Conc) and phytate content (PA_Cont) and six candidate genes were identified. Of these, BnaA9.MRP5 represented the candidate gene for the significant SNP chrA09_5198034 (27kb) for both PA_Cont and PA_Conc. Transcription of BnaA9.MRP5 in a low -phytate variety (LPA20) was significantly elevated compared with a high -phytate variety (HPA972). Association and haplotype analysis indicated that inbred lines carrying specific SNP haplotypes within BnaA9.MRP5 were associated with high- and low-phytate phenotypes. No significant differences in seed germination and seed yield were detected between low and high phytate cultivars examined. Candidate genes, favorable haplotypes and the low phytate varieties identified in this study will be useful for low-phytate breeding of B. napus.

Download Full-text

Three New Single Nucleotide Polymorphisms Identified by a Genome-Wide Association Study in Korean Patients with Vitiligo

Journal of Korean Medical Science ◽

10.3346/jkms.2013.28.5.775 ◽

2013 ◽

Vol 28 (5) ◽

pp. 775 ◽

Cited By ~ 11

Author(s):

Kyung Ah Cheong ◽

Nan-Hyung Kim ◽

Minsoo Noh ◽

Ai-Young Lee

Keyword(s):

Single Nucleotide Polymorphisms ◽

Association Study ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Korean Patients ◽

Genome Wide ◽

A Genome

Download Full-text

Lifestyle factors and genetic variants on two biological age measures: evidence from 94,443 Taiwan Biobank participants

The Journals of Gerontology Series A ◽

10.1093/gerona/glab251 ◽

2021 ◽

Author(s):

Wan-Yu Lin

Keyword(s):

Genetic Variants ◽

Genome Wide Association Study ◽

Lifestyle Factors ◽

Biological Age ◽

Biological Aging ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

A Genome ◽

The Uk

Abstract Background Biological age (BA) can be estimated by phenotypes and is useful for predicting lifespan and healthspan. Levine et al. proposed a PhenoAge and a BioAge to measure BA. Although there have been studies investigating the genetic predisposition to BA acceleration in Europeans, little has been known regarding this topic in Asians. Methods I here estimated PhenoAgeAccel (age-adjusted PhenoAge) and BioAgeAccel (age-adjusted BioAge) of 94,443 Taiwan Biobank (TWB) participants, wherein 25,460 TWB1 subjects formed a discovery cohort and 68,983 TWB2 individuals constructed a replication cohort. Lifestyle factors and genetic variants associated with PhenoAgeAccel and BioAgeAccel were investigated through regression analysis and a genome-wide association study (GWAS). Results A unit (kg/m 2) increase of BMI was associated with a 0.177-year PhenoAgeAccel (95% C.I. = 0.163~0.191, p = 6.0×) and 0.171-year BioAgeAccel (95% C.I. = 0.165~0.177, p = 0). Smokers on average had a 1.134-year PhenoAgeAccel (95% C.I. = 0.966~1.303, p = 1.3×) compared with non-smokers. Drinkers on average had a 0.640-year PhenoAgeAccel (95% C.I. = 0.433~0.847, p = 1.3×) and 0.193-year BioAgeAccel (95% C.I. = 0.107~0.279, p = 1.1×) relative to non-drinkers. A total of 11 and 4 single-nucleotide polymorphisms (SNPs) were associated with PhenoAgeAccel and BioAgeAccel (p<5× in both TWB1 and TWB2), respectively. Conclusions A PhenoAgeAccel-associated SNP (rs1260326 in GCKR) and two BioAgeAccel-associated SNPs (rs7412 in APOE; rs16998073 near FGF5) were consistent with the finding from the UK Biobank. The lifestyle analysis shows that prevention from obesity, cigarette smoking, and alcohol consumption is associated with a slower rate of biological aging.

Download Full-text

DNA steganography: hiding undetectable secret messages within the single nucleotide polymorphisms of a genome and detecting mutation-induced errors

Microbial Cell Factories ◽

10.1186/s12934-020-01387-0 ◽

2020 ◽

Vol 19 (1) ◽

Cited By ~ 1

Author(s):

Dokyun Na

Keyword(s):

Single Nucleotide Polymorphisms ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

A Genome

Download Full-text

Progressive effects of single-nucleotide polymorphisms on 16 phenotypic traits based on longitudinal data

Genes & Genomics ◽

10.1007/s13258-019-00902-x ◽

2020 ◽

Vol 42 (4) ◽

pp. 393-403

Author(s):

Donghe Li ◽

Hahn Kang ◽

Sanghun Lee ◽

Sungho Won

Keyword(s):

Single Nucleotide Polymorphisms ◽

Longitudinal Data ◽

Density Lipoprotein ◽

Phenotypic Traits ◽

Chromosome 2 ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Longitudinal Changes ◽

Genome Wide ◽

A Genome

Abstract Background There are many research studies have estimated the heritability of phenotypic traits, but few have considered longitudinal changes in several phenotypic traits together. Objective To evaluate the progressive effect of single nucleotide polymorphisms (SNPs) on prominent health-related phenotypic traits by determining SNP-based heritability ($$h_{snp}^{2}$$hsnp2) using longitudinal data. Methods Sixteen phenotypic traits associated with major health indices were observed biennially for 6843 individuals with 10-year follow-up in a Korean community-based cohort. Average SNP heritability and longitudinal changes in the total period were estimated using a two-stage model. Average and periodic differences for each subject were considered responses to estimate SNP heritability. Furthermore, a genome-wide association study (GWAS) was performed for significant SNPs. Results Each SNP heritability for the phenotypic mean of all sixteen traits through 6 periods (baseline and five follow-ups) were significant. Gradually, the forced vital capacity in one second (FEV1) reflected the only significant SNP heritability among longitudinal changes at a false discovery rate (FDR)-adjusted 0.05 significance level ($$h_{snp}^{2} = 0.171$$hsnp2=0.171, FDR = 0.0012). On estimating chromosomal heritability, chromosome 2 displayed the highest heritability upon periodic changes in FEV1. SNPs including rs2272402 and rs7209788 displayed a genome-wide significant association with longitudinal changes in FEV1 (P = 1.22 × 10−8 for rs2272402 and P = 3.36 × 10−7 for rs7209788). De novo variants including rs4922117 (near LPL, P = 2.13 × 10−15) of log-transformed high-density lipoprotein (HDL) ratios and rs2335418 (near HMGCR, P = 3.2 $$\times$$× 10−9) of low-density lipoprotein were detected on GWAS. Conclusion Significant genetic effects on longitudinal changes in FEV1 among the middle-aged general population and chromosome 2 account for most of the genetic variance.

Download Full-text