scholarly journals Whole Genome Variation of Transposable Element Insertions in a Maize Diversity Panel

Author(s):  
Yinjie Qiu ◽  
Christine H O’Connor ◽  
Rafael Della Coletta ◽  
Jonathan S Renk ◽  
Patrick J Monnahan ◽  
...  

Abstract Intact transposable elements (TEs) account for 65% of the maize genome and can impact gene function and regulation. Although TEs comprise the majority of the maize genome and affect important phenotypes, genome wide patterns of TE polymorphisms in maize have only been studied in a handful of maize genotypes, due to the challenging nature of assessing highly repetitive sequences. We implemented a method to use short read sequencing data from 509 diverse inbred lines to classify the presence/absence of 445,418 non-redundant TEs that were previously annotated in four genome assemblies including B73, Mo17, PH207, and W22. Different orders of TEs (i.e. LTRs, Helitrons, TIRs) had different frequency distributions within the population. LTRs with lower LTR similarity were generally more frequent in the population than LTRs with higher LTR similarity, though high frequency insertions with very high LTR similarity were observed. LTR similarity and frequency estimates of nested elements and the outer elements in which they insert revealed that most nesting events occurred very near the timing of the outer element insertion. TEs within genes were at higher frequency than those that were outside of genes and this is particularly true for those not inserted into introns. Many TE insertional polymorphisms observed in this population were tagged by SNP markers. However, there were also 19.9% of the TE polymorphisms that were not well tagged by SNPs (R2 < 0.5) that potentially represent information that has not been well captured in previous SNP based marker-trait association studies. This study provides a population scale genome-wide assessment of TE variation in maize, and provides valuable insight on variation in TEs in maize and factors that contribute to this variation.

2020 ◽  
Author(s):  
Christine H. O’Connor ◽  
Yinjie Qiu ◽  
Rafael Della Coletta ◽  
Jonathan S. Renk ◽  
Patrick J. Monnahan ◽  
...  

ABSTRACTIntact transposable elements (TEs) account for 65% of the maize genome and can impact gene function and regulation. Although TEs comprise the majority of the maize genome and affect important phenotypes, genome wide patterns of TE polymorphisms in maize have only been studied in a handful of maize genotypes, due to the challenging nature of assessing highly repetitive sequences. We implemented a method to use short read sequencing data from 509 diverse inbred lines to classify the presence/absence of 494,564 non-redundant TEs that were previously annotated in four genome assemblies including B73, Mo17, PH207, and W22. Different orders of TEs (i.e. LTRs, Helitrons, TIRs) had different frequency distributions within the population. Older LTRs were generally more frequent in the population than younger LTRs, though high frequency very young TEs were observed. Age and frequency estimates of nested elements and the outer elements in which they insert revealed that most nesting events occurred very near the timing of the outer element insertion. TEs within genes were at higher frequency than those that were outside of genes and this is particularly true for those not inserted into introns. Many TE insertional polymorphisms observed in this population were not tagged by SNP markers and therefore not captured in previous SNP based marker-trait association studies. This study provides a population scale genome-wide assessment of TE variation in maize, and provides valuable insight on variation in TEs in maize and factors that contribute to this variation.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 267-267
Author(s):  
Karim Karimi ◽  
A Hossain Farid ◽  
Mehdi Sargolzaei ◽  
Sean Myles ◽  
Younes Miar

Abstract Linkage disequilibrium (LD) has been defined as the correlation between alleles at different loci in the genome. The LD levels can be influenced by the evolutionary processes and historical events in populations. The main objective of this study was to estimate the LD levels at different distances of American mink genome using genotyping-by-sequencing (GBS) data. A total of 285 American mink (Neovison vison) were sequenced based on GBS libraries prepared by digesting the genomic DNA with the restriction enzyme ApeKI. After quality control, 13,321 single nucleotide polymorphism (SNP) markers located on 46 Scaffolds were used to determine the extension of LD in the genome. The average r2 was computed for all syntenic SNP pairwise at inter-marker distances from 0 up to 1 Mb. The average r2 between adjacent SNPs was 0.29, ranged from 0.18 to 0.53 across all scaffolds. In addition, the average distance between adjacent markers was 51 kb. The average r2 above 0.3 was observed in less than 1 kb distances and declined with increase in distances between markers. The average r2 was estimated to be less than 0.2 for markers more than 10 kb apart. Furthermore, the average LD level was decreased to 0.08 for inter-marker distances between 0.9 and 1 Mb. The results of this study can be used to determine the optimum maker density required for obtaining enough accuracy and power in both genomic selection and genome-wide association studies.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Javed Akhatar ◽  
Anna Goyal ◽  
Navneet Kaur ◽  
Chhaya Atri ◽  
Meenakshi Mittal ◽  
...  

AbstractTimely transition to flowering, maturity and plant height are important for agronomic adaptation and productivity of Indian mustard (B. juncea), which is a major edible oilseed crop of low input ecologies in Indian subcontinent. Breeding manipulation for these traits is difficult because of the involvement of multiple interacting genetic and environmental factors. Here, we report a genetic analysis of these traits using a population comprising 92 diverse genotypes of mustard. These genotypes were evaluated under deficient (N75), normal (N100) or excess (N125) conditions of nitrogen (N) application. Lower N availability induced early flowering and maturity in most genotypes, while high N conditions delayed both. A genotyping-by-sequencing approach helped to identify 406,888 SNP markers and undertake genome wide association studies (GWAS). 282 significant marker-trait associations (MTA's) were identified. We detected strong interactions between GWAS loci and nitrogen levels. Though some trait associated SNPs were detected repeatedly across fertility gradients, majority were identified under deficient or normal levels of N applications. Annotation of the genomic region (s) within ± 50 kb of the peak SNPs facilitated prediction of 30 candidate genes belonging to light perception, circadian, floral meristem identity, flowering regulation, gibberellic acid pathways and plant development. These included over one copy each of AGL24, AP1, FVE, FRI, GID1A and GNC. FLC and CO were predicted on chromosomes A02 and B08 respectively. CDF1, CO, FLC, AGL24, GNC and FAF2 appeared to influence the variation for plant height. Our findings may help in improving phenotypic plasticity of mustard across fertility gradients through marker-assisted breeding strategies.


2018 ◽  
Vol 2018 ◽  
pp. 1-22 ◽  
Author(s):  
Cheng-Wei Li ◽  
Yu-Kai Chiu ◽  
Bor-Sen Chen

The prevalence of hepatocellular carcinoma (HCC) is still high worldwide because liver diseases could develop into HCC. Recent reports indicate nonalcoholic fatty liver disease and nonalcoholic steatohepatitis (NAFLD&NASH) and primary biliary cirrhosis and primary sclerosing cholangitis (PBC&PSC) are significant of HCC. Therefore, understanding the cellular mechanisms of the pathogenesis and hepatocarcinogenesis from normal liver cells to HCC through NAFLD&NASH or PBC&PSC is a priority to prevent the progression of liver damage and reduce the risk of further complications. By the genetic and epigenetic data mining and the system identification through next-generation sequencing data and its corresponding DNA methylation profiles of liver cells in normal, NAFLD&NASH, PBC&PSC, and HCC patients, we identified the genome-wide real genetic and epigenetic networks (GENs) of normal, NAFLD&NASH, PBC&PSC, and HCC patients. In order to get valuable insight into these identified genome-wide GENs, we then applied a principal network projection method to extract the corresponding core GENs for normal liver cells, NAFLD&NASH, PBC&PSC, and HCC. By comparing the signal transduction pathways involved in the identified core GENs, we found that the hepatocarcinogenesis through NAFLD&NASH was induced through DNA methylation of HIST2H2BE, HSPB1, RPL30, and ALDOB and the regulation of miR-21 and miR-122, and the hepatocarcinogenesis through PBC&PSC was induced through DNA methylation of RPL23A, HIST2H2BE, TIMP1, IGF2, RPL30, and ALDOB and the regulation of miR-29a, miR-21, and miR-122. The genetic and epigenetic changes in the pathogenesis and hepatocarcinogenesis potentially serve as potential diagnostic biomarkers and/or therapeutic targets.


Author(s):  
Marianne L. Slaten ◽  
Yen On Chan ◽  
Vivek Shrestha ◽  
Alexander E. Lipka ◽  
Ruthie Angelovici

AbstractMotivationAdvanced publicly available sequencing data from large populations have enabled in-formative genome-wide association studies (GWAS) that associate SNPs with phenotypic traits of interest. Many publicly available tools able to perform GWAS have been developed in response to increased demand. However, these tools lack a comprehensive pipeline that includes both pre-GWAS analysis such as outlier removal, data transformation, and calculation of Best Linear Unbiased Predictions (BLUPs) or Best Linear Unbiased Estimates (BLUEs). In addition, post-GWAS analysis such as haploblock analysis and candidate gene identification are lacking.ResultsHere, we present HAPPI GWAS, an open-source GWAS tool able to perform pre-GWAS, GWAS, and post-GWAS analysis in an automated pipeline using the command-line interface.AvailabilityHAPPI GWAS is written in R for any Unix-like operating systems and is available on GitHub (https://github.com/Angelovici-Lab/HAPPI.GWAS.git)[email protected]


2021 ◽  
Author(s):  
Dev Paudel ◽  
Rocheteau Dareus ◽  
Julia Rosenwald ◽  
Maria Munoz-Amatriain ◽  
Esteban Rios

Cowpea (Vigna unguiculata [L.] Walp., diploid, 2n = 22) is a major crop used as a protein source for human consumption as well as a quality feed for livestock. It is drought and heat tolerant and has been bred to develop varieties that are resilient to changing climates. Plant adaptation to new climates and their yield are strongly affected by flowering time. Therefore, understanding the genetic basis of flowering time is critical to advance cowpea breeding. The aim of this study was to perform genome-wide association studies (GWAS) to identify marker trait associations for flowering time in cowpea using single nucleotide polymorphism (SNP) markers. A total of 367 accessions from a cowpea mini-core collection were evaluated in Ft. Collins, CO in 2019 and 2020, and 292 accessions were evaluated in Citra, FL in 2018. These accessions were genotyped using the Cowpea iSelect Consortium Array that contained 51,128 SNPs. GWAS revealed seven reliable SNPs for flowering time that explained 8-12% of the phenotypic variance. Candidate genes including FT, GI, CRY2, LSH3, UGT87A2, LIF2, and HTA9 that are associated with flowering time were identified for the significant SNP markers. Further efforts to validate these loci will help to understand their role in flowering time in cowpea, and it could facilitate the transfer of some of this knowledge to other closely related legume species.


2019 ◽  
Vol 20 (17) ◽  
pp. 1189-1197 ◽  
Author(s):  
Vincent Gagné ◽  
Anne Aubry-Morin ◽  
Maria Plesa ◽  
Rachid Abaji ◽  
Kateryna Petrykey ◽  
...  

Aim: To evaluate top-ranking genes identified through genome-wide association studies for an association with corticosteroid-related osteonecrosis in children with acute lymphoblastic leukemia (ALL) who received Dana–Farber Cancer Institute treatment protocols. Patients & methods: Lead SNPs from these studies, as well as other variants in the same genes, pooled from whole exome sequencing data, were analyzed for an association with osteonecrosis in childhood ALL patients from Quebec cohort. Top-ranking variants were verified in the replication patient group. Results: The analyses of variants in the ACP1-SH3YL1 locus derived from whole exome sequencing data showed an association of several correlated SNPs (rs11553746, rs2290911, rs7595075, rs2306060 and rs79716074). The rs79716074 defines *B haplotype of the APC1 gene, which is well known for its functional role. Conclusion: This study confirms implication of the ACP1 gene in the treatment-related osteonecrosis in childhood ALL and identifies novel, potentially causal variant of this complication.


2015 ◽  
Vol 9S4 ◽  
pp. BBI.S29334 ◽  
Author(s):  
Jessica P. Hekman ◽  
Jennifer L Johnson ◽  
Anna V. Kukekova

Domesticated species occupy a special place in the human world due to their economic and cultural value. In the era of genomic research, domesticated species provide unique advantages for investigation of diseases and complex phenotypes. RNA sequencing, or RNA-seq, has recently emerged as a new approach for studying transcriptional activity of the whole genome, changing the focus from individual genes to gene networks. RNA-seq analysis in domesticated species may complement genome-wide association studies of complex traits with economic importance or direct relevance to biomedical research. However, RNA-seq studies are more challenging in domesticated species than in model organisms. These challenges are at least in part associated with the lack of quality genome assemblies for some domesticated species and the absence of genome assemblies for others. In this review, we discuss strategies for analyzing RNA-seq data, focusing particularly on questions and examples relevant to domesticated species.


2019 ◽  
Author(s):  
Abdulwahab Saliu Shaibu ◽  
Clay Sneller ◽  
Babu N. Motagi ◽  
Jackline Chepkoech ◽  
Mercy Chepngetich ◽  
...  

Abstract Background In order to integrate genomics in breeding and development of drought tolerant groundnut genotypes, identification of genomic regions/genetic markers for drought surrogate traits is essential. We used SNP markers for a genetic analysis of the ICRISAT groundnut minicore collection for genome wide marker-trait association for some physiological traits and to determine the magnitude of linkage disequilibrium (LD) present in the genetic resources. Results The LD analysis showed that about 36% of loci pairs were in significant LD (P < 0.05 and r2 > 0.2) and 3.14% of the pairs were in complete LD. There was rapid decline in LD with distance and the LD was <0.2 at a distance of 41635 bp. The marker trait association (MTAs) studies revealed 20 significant MTAs (p <0.001) with 11 markers for leaf area index (4), canopy temperature (13), chlorophyll content (1) and NDVI (2). The markers explained 2 to 21% of the phenotypic variation observed. Most of the MTAs identified on the A subgenome were also identified on the respective homeologous chromosome on the B subgenome. The duplications of effect observed could be due to common ancestor of the A and B genome which explains the linkage detected between markers lying on different chromosomes seen in the current study. Conclusions The present study identified a total of 20 highly significant marker trait associations with 11 markers for four physiological traits of importance in groundnut; LAI, CT, SCMR and NDVI. The markers identified in this study can serve as useful genomic resources to initiate marker-assisted selection and trait introgression of groundnut for drought tolerance. The identified markers in this study may be useful for marker assisted selection after further validation.


Sign in / Sign up

Export Citation Format

Share Document