scholarly journals De novo identification, differential analysis and functional annotation of SNPs from RNA-seq data in non-model species

2015 ◽  
Author(s):  
Helene Lopez Maestre ◽  
Lilia Brinza ◽  
Camille Marchet ◽  
Janice Kielbassa ◽  
Sylvere Bastien ◽  
...  

SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available for sequencing from one individual. Using human RNA-seq data, we first compared the performance of our method with G<small>ATK</small>, a well established method that requires a reference genome. We showed that both methods predict SNPs with similar accuracy. We then validated experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on proteins. We further enable to test for the association of the identified SNPs with a phenotype of interest.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
M. Joseph Tomlinson ◽  
Shawn W. Polson ◽  
Jing Qiu ◽  
Juniper A. Lake ◽  
William Lee ◽  
...  

AbstractDifferential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.


2016 ◽  
Vol 283 (1835) ◽  
pp. 20160569 ◽  
Author(s):  
M. E. Goddard ◽  
K. E. Kemper ◽  
I. M. MacLeod ◽  
A. J. Chamberlain ◽  
B. J. Hayes

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.


2010 ◽  
Vol 30 (6) ◽  
pp. 1411-1420 ◽  
Author(s):  
Jason B. Wright ◽  
Seth J. Brown ◽  
Michael D. Cole

ABSTRACT Genome-wide association studies have mapped many single-nucleotide polymorphisms (SNPs) that are linked to cancer risk, but the mechanism by which most SNPs promote cancer remains undefined. The rs6983267 SNP at 8q24 has been associated with many cancers, yet the SNP falls 335 kb from the nearest gene, c-MYC. We show that the beta-catenin-TCF4 transcription factor complex binds preferentially to the cancer risk-associated rs6983267(G) allele in colon cancer cells. We also show that the rs6983267 SNP has enhancer-related histone marks and can form a 335-kb chromatin loop to interact with the c-MYC promoter. Finally, we show that the SNP has no effect on the efficiency of chromatin looping to the c-MYC promoter but that the cancer risk-associated SNP enhances the expression of the linked c-MYC allele. Thus, cancer risk is a direct consequence of elevated c-MYC expression from increased distal enhancer activity and not from reorganization/creation of the large chromatin loop. The findings of these studies support a mechanism for intergenic SNPs that can promote cancer through the regulation of distal genes by utilizing preexisting large chromatin loops.


2020 ◽  
Author(s):  
Celine Charon ◽  
Rodrigue Allodji ◽  
Vincent Meyer ◽  
Jean-François Deleuze

Abstract Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.


2021 ◽  
Author(s):  
Myung-Shin Kim ◽  
Taeyoung Lee ◽  
Jeonghun Baek ◽  
Ji Hong Kim ◽  
Changhoon Kim ◽  
...  

AbstractMassive resequencing efforts have been undertaken to catalog allelic variants in major crop species including soybean, but the scope of the information for genetic variation often depends on short sequence reads mapped to the extant reference genome. Additional de novo assembled genome sequences provide a unique opportunity to explore a dispensable genome fraction in the pan-genome of a species. Here, we report the de novo assembly and annotation of Hwangkeum, a popular soybean cultivar in Korea. The assembly was constructed using PromethION nanopore sequencing data and two genetic maps, and was then error-corrected using Illumina short-reads and PacBio SMRT reads. The 933.12 Mb assembly was annotated 79,870 transcripts for 58,550 genes using RNA-Seq data and the public soybean annotation set. Comparison of the Hwangkeum assembly with the Williams 82 soybean reference genome sequence revealed 1.8 million single-nucleotide polymorphisms, 0.5 million indels, and 25 thousand putative structural variants. However, there was no natural megabase-scale chromosomal rearrangement. Incidentally, by adding two novel groups, we found that soybean contains four clearly separated groups of centromeric satellite repeats. Analyses of satellite repeats and gene content suggested that the Hwangkeum assembly is a high-quality assembly. This was further supported by comparison of the marker arrangement of anthocyanin biosynthesis genes and of gene arrangement at the Rsv3 locus. Therefore, the results indicate that the de novo assembly of Hwangkeum is a valuable additional reference genome resource for characterizing traits for the improvement of this important crop species.


2017 ◽  
Vol 2017 ◽  
pp. 1-5 ◽  
Author(s):  
Lijun Wu ◽  
Liwang Gao ◽  
Xiaoyuan Zhao ◽  
Meixian Zhang ◽  
Jianxin Wu ◽  
...  

Purpose. Genome-wide association studies have found two obesity-related single-nucleotide polymorphisms (SNPs), rs17782313 near the melanocortin-4 receptor (MC4R) gene and rs6265 near the brain-derived neurotrophic factor (BDNF) gene, but the associations of both SNPs with other obesity-related traits are not fully described, especially in children. The aim of the present study is to investigate the associations between the SNPs and adiponectin that has a regulatory role in glucose and lipid metabolism. Methods. We examined the associations of the SNPs with adiponectin in Beijing Child and Adolescent Metabolic Syndrome (BCAMS) study. A total of 3503 children participated in the study. Results. The SNP rs6265 was significantly associated with adiponectin under an additive model (P=0.02 and 0.024, resp.) after adjustment for age, gender, and BMI or obesity statuses. The SNP rs17782313 was significantly associated with low adiponectin under a recessive model. No statistical significance was found between the two SNPs and low adiponectin after correction for multiple testing. Conclusion. We demonstrate for the first time that the SNP rs17782313 near MC4R and the SNP rs6265 near BDNF are associated with adiponectin in Chinese children. These novel findings provide important evidence that adiponectin possibly mediates MC4R and BDNF involved in obesity.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Qing-Lan Li ◽  
Xiang Lin ◽  
Ya-Li Yu ◽  
Lin Chen ◽  
Qi-Xin Hu ◽  
...  

AbstractColorectal cancer is one of the most common cancers in the world. Although genomic mutations and single nucleotide polymorphisms have been extensively studied, the epigenomic status in colorectal cancer patient tissues remains elusive. Here, together with genomic and transcriptomic analysis, we use ChIP-Seq to profile active enhancers at the genome wide level in colorectal cancer paired patient tissues (tumor and adjacent tissues from the same patients). In total, we sequence 73 pairs of colorectal cancer tissues and generate 147 H3K27ac ChIP-Seq, 144 RNA-Seq, 147 whole genome sequencing and 86 H3K4me3 ChIP-Seq samples. Our analysis identifies 5590 gain and 1100 lost variant enhancer loci in colorectal cancer, and 334 gain and 121 lost variant super enhancer loci. Multiple key transcription factors in colorectal cancer are predicted with motif analysis and core regulatory circuitry analysis. Further experiments verify the function of the super enhancers governing PHF19 and TBC1D16 in regulating colorectal cancer tumorigenesis, and KLF3 is identified as an oncogenic transcription factor in colorectal cancer. Taken together, our work provides an important epigenomic resource and functional factors for epigenetic studies in colorectal cancer.


Animals ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 2211
Author(s):  
Shan Lin ◽  
Zihui Wan ◽  
Junnan Zhang ◽  
Lingna Xu ◽  
Bo Han ◽  
...  

Albumin can be of particular benefit in fighting infections for newborn calves due to its anti-inflammatory and anti-oxidative stress properties. To identify the candidate genes related to the concentration of albumin in colostrum and serum, we collected the colostrum and blood samples from 572 Chinese Holstein cows within 24 h after calving and measured the concentration of albumin in the colostrum and serum using the ELISA methods. The cows were genotyped with GeneSeek 150 K chips (containing 140,668 single nucleotide polymorphisms; SNPs). After quality control, we performed GWASs via GCTA software with 91,620 SNPs and 563 cows. Consequently, 9 and 7 genome-wide significant SNPs (false discovery rate (FDR) at 1%) were identified. Correspondingly, 42 and 206 functional genes that contained or were approximate to (±1 Mbp) the significant SNPs were acquired. Integrating the biological process of these genes and the reported QTLs for immune and inflammation traits in cattle, 3 and 12 genes were identified as candidates for the concentration of colostrum and serum albumin, respectively; these are RUNX1, CBR1, OTULIN,CDK6, SHARPIN, CYC1, EXOSC4, PARP10, NRBP2, GFUS, PYCR3, EEF1D, GSDMD, PYCR2 and CXCL12. Our findings provide important information for revealing the genetic mechanism behind albumin concentration and for molecular breeding of disease-resistance traits in dairy cattle.


Sign in / Sign up

Export Citation Format

Share Document