scholarly journals Bayesian inference of fine-scale recombination rates using population genomic data

2008 ◽  
Vol 363 (1512) ◽  
pp. 3921-3930 ◽  
Author(s):  
Ying Wang ◽  
Bruce Rannala

Recently, several statistical methods for estimating fine-scale recombination rates using population samples have been developed. However, currently available methods that can be applied to large-scale data are limited to approximated likelihoods. Here, we developed a full-likelihood Markov chain Monte Carlo method for estimating recombination rate under a Bayesian framework. Genealogies underlying a sampling of chromosomes are effectively modelled by using marginal individual single nucleotide polymorphism genealogies related through an ancestral recombination graph. The method is compared with two existing composite-likelihood methods using simulated data. Simulation studies show that our method performs well for different simulation scenarios. The method is applied to two human population genetic variation datasets that have been studied by sperm typing. Our results are consistent with the estimates from sperm crossover analysis.

2019 ◽  
Author(s):  
Benoit Morel ◽  
Alexey M. Kozlov ◽  
Alexandros Stamatakis ◽  
Gergely J. Szöllősi

AbstractInferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges species tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data pre-processing (e.g., computing bootstrap trees), and rely on approximations and heuristics that limit the degree of tree space exploration. Here we present GeneRax, the first maximum likelihood species tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared to competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical datasets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1099 Cyanobacteria families in eight minutes on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax.


2020 ◽  
Vol 37 (9) ◽  
pp. 2763-2774 ◽  
Author(s):  
Benoit Morel ◽  
Alexey M Kozlov ◽  
Alexandros Stamatakis ◽  
Gergely J Szöllősi

Abstract Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).  


Genetics ◽  
1996 ◽  
Vol 142 (2) ◽  
pp. 537-548 ◽  
Author(s):  
Michael W Nachman ◽  
Gary A Churchill

Abstract If loci are randomly distributed on a physical map, the density of markers on a genetic map will be inversely proportional to recombination rate. First proposed by MARY LYON, we have used this idea to estimate recombination rates from the Drosophila melanogaster linkage map. These results were compared with results of two other studies that estimated regional recombination rates in D. melanogaster using both physical and genetic maps. The three methods were largely concordant in identifying large-scale genomic patterns of recombination. The marker density method was then applied to the Mus musculus microsatellite linkage map. The distribution of microsatellites provided evidence for heterogeneity in recombination rates. Centromeric regions for several mouse chromosomes had significantly greater numbers of markers than expected, suggesting that recombination rates were lower in these regions. In contrast, most telomeric regions contained significantly fewer markers than expected. This indicates that recombination rates are elevated at the telomeres of many mouse chromosomes and is consistent with a comparison of the genetic and cytogenetic maps in these regions. The density of markers on a genetic map may provide a generally useful way to estimate regional recombination rates in species for which genetic, but not physical, maps are available.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mulalo M. Muluvhahothe ◽  
Grant S. Joseph ◽  
Colleen L. Seymour ◽  
Thinandavha C. Munyai ◽  
Stefan H. Foord

AbstractHigh-altitude-adapted ectotherms can escape competition from dominant species by tolerating low temperatures at cooler elevations, but climate change is eroding such advantages. Studies evaluating broad-scale impacts of global change for high-altitude organisms often overlook the mitigating role of biotic factors. Yet, at fine spatial-scales, vegetation-associated microclimates provide refuges from climatic extremes. Using one of the largest standardised data sets collected to date, we tested how ant species composition and functional diversity (i.e., the range and value of species traits found within assemblages) respond to large-scale abiotic factors (altitude, aspect), and fine-scale factors (vegetation, soil structure) along an elevational gradient in tropical Africa. Altitude emerged as the principal factor explaining species composition. Analysis of nestedness and turnover components of beta diversity indicated that ant assemblages are specific to each elevation, so species are not filtered out but replaced with new species as elevation increases. Similarity of assemblages over time (assessed using beta decay) did not change significantly at low and mid elevations but declined at the highest elevations. Assemblages also differed between northern and southern mountain aspects, although at highest elevations, composition was restricted to a set of species found on both aspects. Functional diversity was not explained by large scale variables like elevation, but by factors associated with elevation that operate at fine scales (i.e., temperature and habitat structure). Our findings highlight the significance of fine-scale variables in predicting organisms’ responses to changing temperature, offering management possibilities that might dilute climate change impacts, and caution when predicting assemblage responses using climate models, alone.


Genetics ◽  
2003 ◽  
Vol 165 (4) ◽  
pp. 2269-2282
Author(s):  
D Mester ◽  
Y Ronin ◽  
D Minkov ◽  
E Nevo ◽  
A Korol

Abstract This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with ∼50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.


Genetics ◽  
2003 ◽  
Vol 165 (4) ◽  
pp. 2213-2233 ◽  
Author(s):  
Na Li ◽  
Matthew Stephens

AbstractWe introduce a new statistical model for patterns of linkage disequilibrium (LD) among multiple SNPs in a population sample. The model overcomes limitations of existing approaches to understanding, summarizing, and interpreting LD by (i) relating patterns of LD directly to the underlying recombination process; (ii) considering all loci simultaneously, rather than pairwise; (iii) avoiding the assumption that LD necessarily has a “block-like” structure; and (iv) being computationally tractable for huge genomic regions (up to complete chromosomes). We examine in detail one natural application of the model: estimation of underlying recombination rates from population data. Using simulation, we show that in the case where recombination is assumed constant across the region of interest, recombination rate estimates based on our model are competitive with the very best of current available methods. More importantly, we demonstrate, on real and simulated data, the potential of the model to help identify and quantify fine-scale variation in recombination rate from population data. We also outline how the model could be useful in other contexts, such as in the development of more efficient haplotype-based methods for LD mapping.


Land ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1197
Author(s):  
Yuyang Zhang ◽  
Qilin Wu ◽  
Lei Wu ◽  
Yan Li

Green space exposure is beneficial to the physical and mental health of community residents, but the spatial distribution of green space is inequitable. Due to data availability, green equality or justice studies typically use administrative units as contextual areas to evaluate green spaces exposure, which is macro-scale and may lead to biased estimates as it ignores fine-scale green spaces (e.g. community gardens, lawns), that community residents are more frequently exposed to. In this study, we used the community as the unit of analysis, considered the green exposure of community residents in their daily social and physical activities, obtained data on three types of green spaces including fine-scale green spaces in the communities, surrounding large-scale parks and streetscape images. We propose a series of metrics for assessing community green equity, including a total of 11 metrics in three major categories of morphology, visibility and accessibility and applied them to 4,544 communities in Beijing urban area. Through spatial visualization, spatial clustering, radar plots, and correlation analysis, we comprehensively analyzed the equity of green space at the community scale, identified the cold and hot spots of homogeneity, and then analyzed the equity of green space among regions under the urbanization process. The measurement results of these metrics showed that there are large differences and complementarities between different categories of metrics, but similarities exist between metrics of the same category. The proposed methodology represents the development of a green space evaluation system that can be used by decision makers and urban green designers to create and maintain more equitable community green spaces. In addition, the large-scale, comprehensive and fine-scale green space measurement of this study can be combined with other studies such as public health and environmental pollution in the future to obtain more comprehensive conclusions and better guide the construction and regeneration of green spaces.


Sign in / Sign up

Export Citation Format

Share Document