scholarly journals Eagle: multi-locus association mapping on a genome-wide scale made routine

2019 ◽  
Vol 36 (5) ◽  
pp. 1509-1516
Author(s):  
Andrew W George ◽  
Arunas Verbyla ◽  
Joshua Bowden

Abstract Motivation We present Eagle, a new method for multi-locus association mapping. The motivation for developing Eagle was to make multi-locus association mapping ‘easy’ and the method-of-choice. Eagle’s strengths are that it (i) is considerably more powerful than single-locus association mapping, (ii) does not suffer from multiple testing issues, (iii) gives results that are immediately interpretable and (iv) has a computational footprint comparable to single-locus association mapping. Results By conducting a large simulation study, we will show that Eagle finds true and avoids false single-nucleotide polymorphism trait associations better than competing single- and multi-locus methods. We also analyze data from a published mouse study. Eagle found over 50% more validated findings than the state-of-the-art single-locus method. Availability and implementation Eagle has been implemented as an R package, with a browser-based Graphical User Interface for users less familiar with R. It is freely available via the CRAN website at https://cran.r-project.org. Videos, Quick Start guides, FAQs and Demos are available via the Eagle website http://eagle.r-forge.r-project.org. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Andrew W George ◽  
Arunas Verbyla ◽  
Joshua Bowden

Abstract Eagle is an R package for multi-locus association mapping on a genome-wide scale. It is unlike other multi-locus packages in that it is easy-to-use for R users and non-users alike. It has two modes of use, command line and GUI. Eagle is fully documented and has its own supporting website, http://eagle.r-forge.r-project.org/index.html. Eagle is a significant improvement over the method-of-choice, single-locus association mapping. It has greater power to detect SNP-trait associations. It is based on model selection, linear mixed models, and a clever idea on how random effects can be used to identify SNP-trait associations. Through an example with real mouse data, we demonstrate Eagle’s ability to bring clarity and increased insight to single-locus findings. Initially, we see Eagle complementing single-locus analyses. However, over time, we hope the community will make, increasingly, multi-locus association mapping their method-of-choice for the analysis of genome-wide association study data.


2020 ◽  
Author(s):  
R. Moore ◽  
L. Georgatou-Politou ◽  
J. Liley ◽  
O. Stegle ◽  
I. Barroso

AbstractGenotype-environment interaction (G×E) studies typically focus on variants with previously known marginal associations. While such two-step filtering greatly reduces the multiple testing burden, it can miss loci with pronounced G×E effects, which tend to have weaker marginal associations. To test for G×E effects on a genome-wide scale whilst leveraging information from marginal associations in a flexible manner, we combine the conditional false discovery rate with interaction test results obtained from StructLMM. After validating our approach, we applied this strategy to UK Biobank (UKBB) data to probe for G×E effects on BMI. Using 126,077 UKBB individuals for discovery, we identified known (FTO, MC4R, SEC16B) and novel G×E signals, many of which replicated (FAM150B/ALKAL2,TMEM18, EFR3B, ZNF596-FAM87A, LIN7C-BDNF, FAIM2, UNC79, LAT) in an independent subset of UKBB (n=126,076). Finally, when analysing the full UKBB cohort, we identified 140 candidate loci with G×E effects, highlighting the advantages of our approach.


Author(s):  
Alexis Hardy ◽  
Mélody Matelot ◽  
Amandine Touzeau ◽  
Christophe Klopp ◽  
Céline Lopez-Roques ◽  
...  

Abstract Motivation Long-read sequencing technologies can be employed to detect and map DNA modifications at the nucleotide resolution on a genome-wide scale. However, published software packages neglect the integration of genomic annotation and comprehensive filtering when analyzing patterns of modified bases detected using Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT) data. Here, we present DNAModAnnot, a R package designed for the global analysis of DNA modification patterns using adapted filtering and visualization tools. Results We tested our package using PacBio sequencing data to analyze patterns of the 6-methyladenine (6 mA) in the ciliate Paramecium tetraurelia, in which high 6 mA amounts were previously reported. We found Paramecium tetraurelia 6 mA genome-wide distribution to be similar to other ciliates. We also performed 5-methylcytosine (5mC) analysis in human lymphoblastoid cells using ONT data and confirmed previously known patterns of 5mC. DNAModAnnot provides a toolbox for the genome-wide analysis of different DNA modifications using PacBio and ONT long-read sequencing data. Availability DNAModAnnot is distributed as a R package available via GitHub (https://github.com/AlexisHardy/DNAModAnnot) Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Robin H van der Weide ◽  
Teun van den Brand ◽  
Judith H I Haarhuis ◽  
Hans Teunissen ◽  
Benjamin D Rowland ◽  
...  

Abstract Conformation capture-approaches like Hi-C can elucidate chromosome structure at a genome-wide scale. Hi-C datasets are large and require specialised software. Here, we present GENOVA: a user-friendly software package to analyse and visualise chromosome conformation capture (3C) data. GENOVA is an R-package that includes the most common Hi-C analyses, such as compartment and insulation score analysis. It can create annotated heatmaps to visualise the contact frequency at a specific locus and aggregate Hi-C signal over user-specified genomic regions such as ChIP-seq data. Finally, our package supports output from the major mapping-pipelines. We demonstrate the capabilities of GENOVA by analysing Hi-C data from HAP1 cell lines in which the cohesin-subunits SA1 and SA2 were knocked out. We find that ΔSA1 cells gain intra-TAD interactions and increase compartmentalisation. ΔSA2 cells have longer loops and a less compartmentalised genome. These results suggest that cohesinSA1 forms longer loops, while cohesinSA2 plays a role in forming and maintaining intra-TAD interactions. Our data supports the model that the genome is provided structure in 3D by the counter-balancing of loop formation on one hand, and compartmentalization on the other hand. By differentially controlling loops, cohesinSA1 and cohesinSA2 therefore also affect nuclear compartmentalization. We show that GENOVA is an easy to use R-package, that allows researchers to explore Hi-C data in great detail.


2018 ◽  
Vol 35 (15) ◽  
pp. 2680-2682 ◽  
Author(s):  
Felipe Llinares-López ◽  
Laetitia Papaxanthos ◽  
Damian Roqueiro ◽  
Dean Bodenham ◽  
Karsten Borgwardt

Abstract Summary Combinatorial association mapping aims to assess the statistical association of higher-order interactions of genetic markers with a phenotype of interest. This article presents combinatorial association mapping (CASMAP), a software package that leverages recent advances in significant pattern mining to overcome the statistical and computational challenges that have hindered combinatorial association mapping. CASMAP can be used to perform region-based association studies and to detect higher-order epistatic interactions of genetic variants. Most importantly, unlike other existing significant pattern mining-based tools, CASMAP allows for the correction of categorical covariates such as age or gender, making it suitable for genome-wide association studies. Availability and implementation The R and Python packages can be downloaded from our GitHub repository http://github.com/BorgwardtLab/CASMAP. The R package is also available on CRAN. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Răzvan V. Chereji

AbstractSummaryMicrococcal nuclease digestion followed by deep sequencing (MNase-seq) is the most used method to investigate nucleosome organization on a genome-wide scale. We present plot2DO, a software package for creating 2D occupancy plots, which allows biologists to evaluate the quality of MNase-seq data and to visualize the distribution of nucleosomes near the functional regions of the genome (e.g. gene promoters, origins of replication, etc.).Availability And ImplementationThe plot2DO open source package is freely available on GitHub at https://github.com/rchereji/plot2DO under the MIT [email protected] InformationSupplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Robin H. van der Weide ◽  
Teun van den Brand ◽  
Judith H.I. Haarhuis ◽  
Hans Teunissen ◽  
Benjamin D. Rowland ◽  
...  

AbstractConformation capture-approaches like Hi-C can elucidate chromosome structure at a genome-wide scale. Hi-C datasets are large and require specialised software. Here, we present GENOVA: a user-friendly software package to analyse and visualise conformation capture data. GENOVA is an R-package that includes the most common Hi-C analyses, such as compartment and insulation score analysis. It can create annotated heatmaps to visualise the contact frequency at a specific locus and aggregate Hi-C signal over user-specified genomic regions such as ChIP-seq data. Finally, our package supports output from the major mapping-pipelines. We demonstrate the capabilities of GENOVA by analysing Hi-C data from HAP1 cell lines in which the cohesin-subunits SA1 and SA2 were knocked out. We find that ΔSA1 cells gain intra-TAD interactions and increase compartmentalisation. ΔSA2 cells have longer loops and a less compartmentalised genome. These results suggest that cohesinSA1 forms longer loops, while cohesinSA2 plays a role in forming and maintaining intra-TAD interactions. Our data supports the model that the genome is provided structure in 3D by the counter-balancing of loop formation on one hand, and compartmentalization on the other hand. By differentially controlling loops, cohesinSA1 and cohesinSA2 therefore also affect nuclear compartmentalization. We show that GENOVA is an easy to use R-package, that allows researchers to explore Hi-C data in great detail.


Genes ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 1154
Author(s):  
Min Jeong Hong ◽  
Jin-Baek Kim ◽  
Yong Weon Seo ◽  
Dae Yeon Kim

Genes of the F-box family play specific roles in protein degradation by post-translational modification in several biological processes, including flowering, the regulation of circadian rhythms, photomorphogenesis, seed development, leaf senescence, and hormone signaling. F-box genes have not been previously investigated on a genome-wide scale; however, the establishment of the wheat (Triticum aestivum L.) reference genome sequence enabled a genome-based examination of the F-box genes to be conducted in the present study. In total, 1796 F-box genes were detected in the wheat genome and classified into various subgroups based on their functional C-terminal domain. The F-box genes were distributed among 21 chromosomes and most showed high sequence homology with F-box genes located on the homoeologous chromosomes because of allohexaploidy in the wheat genome. Additionally, a synteny analysis of wheat F-box genes was conducted in rice and Brachypodium distachyon. Transcriptome analysis during various wheat developmental stages and expression analysis by quantitative real-time PCR revealed that some F-box genes were specifically expressed in the vegetative and/or seed developmental stages. A genome-based examination and classification of F-box genes provide an opportunity to elucidate the biological functions of F-box genes in wheat.


2014 ◽  
Vol 42 (15) ◽  
pp. 9838-9853 ◽  
Author(s):  
Saeed Kaboli ◽  
Takuya Yamakawa ◽  
Keisuke Sunada ◽  
Tao Takagaki ◽  
Yu Sasano ◽  
...  

Abstract Despite systematic approaches to mapping networks of genetic interactions in Saccharomyces cerevisiae, exploration of genetic interactions on a genome-wide scale has been limited. The S. cerevisiae haploid genome has 110 regions that are longer than 10 kb but harbor only non-essential genes. Here, we attempted to delete these regions by PCR-mediated chromosomal deletion technology (PCD), which enables chromosomal segments to be deleted by a one-step transformation. Thirty-three of the 110 regions could be deleted, but the remaining 77 regions could not. To determine whether the 77 undeletable regions are essential, we successfully converted 67 of them to mini-chromosomes marked with URA3 using PCR-mediated chromosome splitting technology and conducted a mitotic loss assay of the mini-chromosomes. Fifty-six of the 67 regions were found to be essential for cell growth, and 49 of these carried co-lethal gene pair(s) that were not previously been detected by synthetic genetic array analysis. This result implies that regions harboring only non-essential genes contain unidentified synthetic lethal combinations at an unexpectedly high frequency, revealing a novel landscape of genetic interactions in the S. cerevisiae genome. Furthermore, this study indicates that segmental deletion might be exploited for not only revealing genome function but also breeding stress-tolerant strains.


2016 ◽  
Author(s):  
Bethany Signal ◽  
Brian S Gloss ◽  
Marcel E Dinger ◽  
Timothy R Mercer

ABSTRACTBackgroundThe branchpoint element is required for the first lariat-forming reaction in splicing. However due to difficulty in experimentally mapping at a genome-wide scale, current catalogues are incomplete.ResultsWe have developed a machine-learning algorithm trained with empirical human branchpoint annotations to identify branchpoint elements from primary genome sequence alone. Using this approach, we can accurately locate branchpoints elements in 85% of introns in current gene annotations. Consistent with branchpoints as basal genetic elements, we find our annotation is unbiased towards gene type and expression levels. A major fraction of introns was found to encode multiple branchpoints raising the prospect that mutational redundancy is encoded in key genes. We also confirmed all deleterious branchpoint mutations annotated in clinical variant databases, and further identified thousands of clinical and common genetic variants with similar predicted effects.ConclusionsWe propose the broad annotation of branchpoints constitutes a valuable resource for further investigations into the genetic encoding of splicing patterns, and interpreting the impact of common- and disease-causing human genetic variation on gene splicing.


Sign in / Sign up

Export Citation Format

Share Document