Using Parsimony-Guided Tree Proposals to Accelerate Convergence in Bayesian Phylogenetic Inference

Abstract Sampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such moves tend to get trapped in tree space, making them slow in finding the globally most probable trees (known as “convergence”) and in estimating the correct proportions of the different types of them (known as “mixing”). Here, we introduce a new class of moves, which propose trees based on their parsimony scores. The proposal distribution derived from the parsimony scores is a quickly computable albeit rough approximation of the conditional posterior distribution over candidate trees. We demonstrate with simulations that parsimony-guided moves correctly sample the uniform distribution of topologies from the prior. We then evaluate their performance against standard moves using six challenging empirical data sets, for which we were able to obtain accurate reference estimates of the posterior using long MCMC runs, a mix of topology proposals, and Metropolis coupling. On these data sets, ranging in size from 357 to 934 taxa and from 1740 to 5681 sites, we find that single chains using parsimony-guided moves usually converge an order of magnitude faster than chains using standard moves. They also exhibit better mixing, that is, they cover the most probable trees more quickly. Our results show that tree moves based on quick and dirty estimates of the posterior probability can significantly outperform standard moves. Future research will have to show to what extent the performance of such moves can be improved further by finding better ways of approximating the posterior probability, taking the trade-off between accuracy and speed into account. [Bayesian phylogenetic inference; MCMC; parsimony; tree proposal.]

Download Full-text

Using Parsimony-Guided Tree Proposals to Accelerate Convergence in Bayesian Phylogenetic Inference

10.1101/778571 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chi Zhang ◽

John P. Huelsenbeck ◽

Fredrik Ronquist

Keyword(s):

Posterior Probability ◽

Phylogenetic Inference ◽

Future Research ◽

Proposal Distribution ◽

Mcmc Algorithms ◽

Rough Approximation ◽

Order Of Magnitude ◽

Accelerate Convergence ◽

Tree Space ◽

Bayesian Phylogenetic Inference

AbstractSampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such moves tend to get trapped in tree space, making them slow in finding the globally most probable trees (known as ‘convergence’) and in estimating the correct proportions of the different types of them (known as ‘mixing’). Here, we introduce a new class of moves, which propose trees based on their parsimony scores. The proposal distribution derived from the parsimony scores is a quickly computable albeit rough approximation of the conditional posterior distribution over candidate trees. We demonstrate with simulations that parsimony-guided moves correctly sample the uniform distribution of topologies from the prior. We then evaluate their performance against standard moves using six challenging empirical datasets, for which we were able to obtain accurate reference estimates of the posterior using long MCMC runs, a mix of topology proposals, and Metropolis coupling. On these datasets, ranging in size from 357 to 934 taxa and from 1,740 to 5,681 sites, we find that single chains using parsimony-guided moves usually converge an order of magnitude faster than chains using standard moves. They also exhibit better mixing, that is, they cover the most probable trees more quickly. Our results show that tree moves based on quick and dirty estimates of the posterior probability can significantly outperform standard moves. Future research will have to show to what extent the performance of such moves can be improved further by finding better ways of approximating the posterior probability, taking the trade-off between accuracy and speed into account.

Download Full-text

Two C++ Libraries for Counting Trees on a Phylogenetic Terrace

10.1101/211276 ◽

2017 ◽

Cited By ~ 2

Author(s):

R. Biczok ◽

P. Bozsoky ◽

P. Eisenmann ◽

J. Ernst ◽

T. Ribizel ◽

...

Keyword(s):

Maximum Likelihood ◽

Phylogenetic Tree ◽

Phylogenetic Inference ◽

Source Codes ◽

Likelihood Score ◽

Order Of Magnitude ◽

Tree Topologies ◽

Tree Space ◽

Bayesian Phylogenetic Inference ◽

Counting Trees

AbstractMotivationThe presence of terraces in phylogenetic tree space, that is, a potentially large number of distinct tree topologies that have exactly the same analytical likelihood score, was first described by Sanderson et al, (2011). However, popular software tools for maximum likelihood and Bayesian phylogenetic inference do not yet routinely report, if inferred phylogenies reside on a terrace, or not. We believe, this is due to the unavailability of an efficient library implementation to (i) determine if a tree resides on a terrace, (ii) calculate how many trees reside on a terrace, and (iii) enumerate all trees on a terrace.ResultsIn our bioinformatics programming practical we developed two efficient and independent C++ implementations of the SUPERB algorithm by Constantinescu and Sankoff (1995) for counting and enumerating the trees on a terrace. Both implementations yield exactly the same results and are more than one order of magnitude faster and require one order of magnitude less memory than a previous 3rd party python implementation.AvailabilityThe source codes are available under GNU GPL at https://github.com/[email protected]

Download Full-text

Bayesian Phylogenetic Inference in the LIBI Grid Platform: A Tool to Explore Large Data Sets

2008 IEEE International Symposium on Parallel and Distributed Processing with Applications ◽

10.1109/ispa.2008.137 ◽

2008 ◽

Cited By ~ 3

Author(s):

Maria Mirto ◽

Saverio Vicario ◽

Daniele Tartarini ◽

Italo Epicoco ◽

Cecilia Saccone ◽

...

Keyword(s):

Large Data ◽

Phylogenetic Inference ◽

Large Data Sets ◽

Data Sets ◽

Bayesian Phylogenetic Inference ◽

Grid Platform

Download Full-text

Economic value of protected areas via visitor mental health

Nature Communications ◽

10.1038/s41467-019-12631-6 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 10

Author(s):

Ralf Buckley ◽

Paula Brough ◽

Leah Hague ◽

Alienor Chauvenet ◽

Chris Fleming ◽

...

Keyword(s):

Mental Health ◽

Protected Areas ◽

Protected Area ◽

Economic Value ◽

Future Research ◽

Global Estimate ◽

Conservation Policies ◽

Life Years ◽

Order Of Magnitude ◽

Individual Scale

Abstract We evaluate methods to calculate the economic value of protected areas derived from the improved mental health of visitors. A conservative global estimate using quality-adjusted life years, a standard measure in health economics, is US$6 trillion p.a. This is an order of magnitude greater than the global value of protected area tourism, and two to three orders greater than global aggregate protected area management agency budgets. Future research should: refine this estimate using more precise methods; consider interactions between health and conservation policies and budgets at national scales; and examine links between personalities and protected area experiences at individual scale.

Download Full-text

Social Norms Sustaining Intimate Partner Violence: A Systematic Review of Methodologies for Proxy Measures

Trauma Violence & Abuse ◽

10.1177/15248380211013141 ◽

2021 ◽

pp. 152483802110131

Author(s):

Ilana Seff

Keyword(s):

Systematic Review ◽

Intimate Partner Violence ◽

Social Norms ◽

Partner Violence ◽

Evidence Base ◽

Intimate Partner ◽

Future Research ◽

Data Sets ◽

Boolean Search ◽

The Many

In light of the many robust quantitative data sets that include information on attitudes and behaviors related to intimate partner violence (IPV), and in an effort to expand the evidence base around social norms and IPV, many researchers construct proxy measures of norms within and across groups embedded in the data. While this strategy has become increasingly popular, there is no standardized approach for assessing and constructing these norm proxies, and no review of these approaches has been undertaken to date. This study presents the results of a systematic review of methods used to construct quantitative proxy measures for social norms related to IPV. PubMed, Embase, Popline, and Scopus, and PsycINFO were searched using Boolean search techniques. Inclusion criteria comprised studies published since 2000 in English that either (i) examined a norm proxy related to gender or IPV or (ii) analyzed the relationship between a norm proxy and perpetration of, experiences of, or attitudes toward IPV. Studies that employed qualitative methods or that elicited direct measures of descriptive or injunctive norms were not included. Twenty-six studies were eligible for review. Evidence from this review highlights inconsistencies in how proxies are constructed, how they are assessed to ensure valid representation of norms, and how researchers acknowledge their respective method’s limitations. Key processes and reflections employed by some of the studies are identified and recommended for future research inquiries.

Download Full-text

The global impacts of COVID-19 lockdowns on urban air pollution

Elem Sci Anth ◽

10.1525/elementa.2021.00176 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Georgios I. Gkatzelis ◽

Jessica B. Gilman ◽

Steven S. Brown ◽

Henk Eskes ◽

A. Rita Gomes ◽

...

Keyword(s):

Atmospheric Chemistry ◽

Urban Air Pollution ◽

Future Research ◽

Specific Method ◽

Data Sets ◽

Atmospheric Pollutant ◽

Government Interventions ◽

Pollutant Concentrations ◽

Global Impacts ◽

The Government

The coronavirus-19 (COVID-19) pandemic led to government interventions to limit the spread of the disease which are unprecedented in recent history; for example, stay at home orders led to sudden decreases in atmospheric emissions from the transportation sector. In this review article, the current understanding of the influence of emission reductions on atmospheric pollutant concentrations and air quality is summarized for nitrogen dioxide (NO2), particulate matter (PM2.5), ozone (O3), ammonia, sulfur dioxide, black carbon, volatile organic compounds, and carbon monoxide (CO). In the first 7 months following the onset of the pandemic, more than 200 papers were accepted by peer-reviewed journals utilizing observations from ground-based and satellite instruments. Only about one-third of this literature incorporates a specific method for meteorological correction or normalization for comparing data from the lockdown period with prior reference observations despite the importance of doing so on the interpretation of results. We use the government stringency index (SI) as an indicator for the severity of lockdown measures and show how key air pollutants change as the SI increases. The observed decrease of NO2 with increasing SI is in general agreement with emission inventories that account for the lockdown. Other compounds such as O3, PM2.5, and CO are also broadly covered. Due to the importance of atmospheric chemistry on O3 and PM2.5 concentrations, their responses may not be linear with respect to primary pollutants. At most sites, we found O3 increased, whereas PM2.5 decreased slightly, with increasing SI. Changes of other compounds are found to be understudied. We highlight future research needs for utilizing the emerging data sets as a preview of a future state of the atmosphere in a world with targeted permanent reductions of emissions. Finally, we emphasize the need to account for the effects of meteorology, emission trends, and atmospheric chemistry when determining the lockdown effects on pollutant concentrations.

Download Full-text

An Examination of the Monophyly of Morning Glory Taxa Using Bayesian Phylogenetic Inference

Systematic Biology ◽

10.1080/10635150290102401 ◽

2002 ◽

Vol 51 (5) ◽

pp. 740-753 ◽

Cited By ~ 51

Author(s):

Richard E. Miller ◽

Thomas R. Buckley ◽

Paul S. Manos

Keyword(s):

Phylogenetic Inference ◽

Morning Glory ◽

Bayesian Phylogenetic Inference

Download Full-text

Semantic Technologies and Big Data Analytics for Cyber Defence

Web Services ◽

10.4018/978-1-5225-7501-6.ch074 ◽

2019 ◽

pp. 1430-1443

Author(s):

Louise Leenen ◽

Thomas Meyer

Keyword(s):

Decision Making ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Future Research ◽

Data Sets ◽

Semantic Technologies ◽

Data Types ◽

Intelligent Decision Making ◽

Big Data Technologies

The Governments, military forces and other organisations responsible for cybersecurity deal with vast amounts of data that has to be understood in order to lead to intelligent decision making. Due to the vast amounts of information pertinent to cybersecurity, automation is required for processing and decision making, specifically to present advance warning of possible threats. The ability to detect patterns in vast data sets, and being able to understanding the significance of detected patterns are essential in the cyber defence domain. Big data technologies supported by semantic technologies can improve cybersecurity, and thus cyber defence by providing support for the processing and understanding of the huge amounts of information in the cyber environment. The term big data analytics refers to advanced analytic techniques such as machine learning, predictive analysis, and other intelligent processing techniques applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends and other useful information. Semantic technologies is a knowledge representation paradigm where the meaning of data is encoded separately from the data itself. The use of semantic technologies such as logic-based systems to support decision making is becoming increasingly popular. However, most automated systems are currently based on syntactic rules. These rules are generally not sophisticated enough to deal with the complexity of decisions required to be made. The incorporation of semantic information allows for increased understanding and sophistication in cyber defence systems. This paper argues that both big data analytics and semantic technologies are necessary to provide counter measures against cyber threats. An overview of the use of semantic technologies and big data technologies in cyber defence is provided, and important areas for future research in the combined domains are discussed.

Download Full-text

Electronic Records Management - An Old Solution to a New Problem

Big Data ◽

10.4018/978-1-4666-9840-6.ch102 ◽

2016 ◽

pp. 2249-2274

Author(s):

Chinh Nguyen ◽

Rosemary Stockdale ◽

Helana Scheepers ◽

Jason Sargent

Keyword(s):

Big Data ◽

Rapid Development ◽

Large Data ◽

Large Data Sets ◽

Electronic Records ◽

Future Research ◽

Records Management ◽

Data Sets ◽

Interactive Nature ◽

Electronic Records Management

The rapid development of technology and interactive nature of Government 2.0 (Gov 2.0) is generating large data sets for Government, resulting in a struggle to control, manage, and extract the right information. Therefore, research into these large data sets (termed Big Data) has become necessary. Governments are now spending significant finances on storing and processing vast amounts of information because of the huge proliferation and complexity of Big Data and a lack of effective records management. On the other hand, there is a method called Electronic Records Management (ERM), for controlling and governing the important data of an organisation. This paper investigates the challenges identified from reviewing the literature for Gov 2.0, Big Data, and ERM in order to develop a better understanding of the application of ERM to Big Data to extract useable information in the context of Gov 2.0. The paper suggests that a key building block in providing useable information to stakeholders could potentially be ERM with its well established governance policies. A framework is constructed to illustrate how ERM can play a role in the context of Gov 2.0. Future research is necessary to address the specific constraints and expectations placed on governments in terms of data retention and use.

Download Full-text

Limitations of information extraction methods and techniques for heterogeneous unstructured big data

International Journal of Engineering Business Management ◽

10.1177/1847979019890771 ◽

2019 ◽

Vol 11 ◽

pp. 184797901989077 ◽

Cited By ~ 1

Author(s):

Kiran Adnan ◽

Rehan Akbar

Keyword(s):

Big Data ◽

Information Extraction ◽

Extraction Methods ◽

Unstructured Data ◽

Future Research ◽

Data Sets ◽

Data Types ◽

Efficiency And Effectiveness ◽

Single Data ◽

The Impact

During the recent era of big data, a huge volume of unstructured data are being produced in various forms of audio, video, images, text, and animation. Effective use of these unstructured big data is a laborious and tedious task. Information extraction (IE) systems help to extract useful information from this large variety of unstructured data. Several techniques and methods have been presented for IE from unstructured data. However, numerous studies conducted on IE from a variety of unstructured data are limited to single data types such as text, image, audio, or video. This article reviews the existing IE techniques along with its subtasks, limitations, and challenges for the variety of unstructured data highlighting the impact of unstructured big data on IE techniques. To the best of our knowledge, there is no comprehensive study conducted to investigate the limitations of existing IE techniques for the variety of unstructured big data. The objective of the structured review presented in this article is twofold. First, it presents the overview of IE techniques from a variety of unstructured data such as text, image, audio, and video at one platform. Second, it investigates the limitations of these existing IE techniques due to the heterogeneity, dimensionality, and volume of unstructured big data. The review finds that advanced techniques for IE, particularly for multifaceted unstructured big data sets, are the utmost requirement of the organizations to manage big data and derive strategic information. Further, potential solutions are also presented to improve the unstructured big data IE systems for future research. These solutions will help to increase the efficiency and effectiveness of the data analytics process in terms of context-aware analytics systems, data-driven decision-making, and knowledge management.

Download Full-text