Model checking software for phylogenetic trees using distribution and database methods

Summary Model checking, a generic and formal paradigm stemming from computer science based on temporal logics, has been proposed for the study of biological properties that emerge from the labeling of the states defined over the phylogenetic tree. This strategy allows us to use generic software tools already present in the industry. However, the performance of traditional model checking is penalized when scaling the system for large phylogenies. To this end, two strategies are presented here. The first one consists of partitioning the phylogenetic tree into a set of subgraphs each one representing a subproblem to be verified so as to speed up the computation time and distribute the memory consumption. The second strategy is based on uncoupling the information associated to each state of the phylogenetic tree (mainly, the DNA sequence) and exporting it to an external tool for the management of large information systems. The integration of all these approaches outperforms the results of monolithic model checking and helps us to execute the verification of properties in a real phylogenetic tree.

Download Full-text

APRIMORAMENTOS DA SOLUÇÃO PARALELA BASEADA EM OPERAÇÕES COLETIVAS PARA O BOOTSTRAP DA RECONSTRUÇÃO DE ÁRVORES FILOGENÉTICAS NO PHYML 3.0

Colloquium Exactarum ◽

10.5747/ce.2020.v12.n3.e328 ◽

2021 ◽

Vol 12 (3) ◽

pp. 39-52

Author(s):

Martha Ximena Torres Delgado

Keyword(s):

Phylogenetic Tree ◽

Statistical Method ◽

Phylogenetic Trees ◽

Evolutionary Relationships ◽

Performance Tests ◽

Memory Consumption ◽

Data Set ◽

Parallel Implementations ◽

Point To Point

Phylogenetics determines the evolutionary relationships between groups of species, through a phylogenetic tree. PhyML is among the main programs for the reconstruction of phylogenetic trees. Bootstrap is a statistical method used to measure the confidence of a given data set, which is usually applied in the analysis of inferred phylogenetic trees. In PhyML this method has two MPI parallel implementations: with point-to-point operations and collective operations. The second version is more efficient than the first, however it has a limitation on the number of bootstrap to be used due to the increase in memory consumption. In order to solve this problem, three proposals were developed. The objectives of this work were to carry out the validation of these versions together with performance tests. The validation showed that the proposed solutions present results equivalent to the point-to-point version. In the performance simulations, two solutions were shown to be superior to the point-to-point version, with the best one achieving gains of 28.46% and 39.64% for 32 and 64 processes, respectively. Therefore, the enhancements allow alternatives to the point-to-point version without limitingmemory.

Download Full-text

Computing nearest neighbour interchange distances between ranked phylogenetic trees

Journal of Mathematical Biology ◽

10.1007/s00285-021-01567-5 ◽

2021 ◽

Vol 82 (1-2) ◽

Author(s):

Lena Collienne ◽

Alex Gavryushkin

Keyword(s):

Cancer Research ◽

Computational Complexity ◽

Phylogenetic Tree ◽

Shortest Path ◽

Phylogenetic Trees ◽

Shortest Paths ◽

Nearest Neighbour ◽

Tree Inference ◽

Subtree Prune And Regraft ◽

Comparison Algorithms

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Parallel Algorithms for Spatial Rainfall Distribution

Jurnal INKOM ◽

10.14203/j.inkom.383 ◽

2014 ◽

Vol 8 (1) ◽

pp. 29 ◽

Cited By ~ 1

Author(s):

Arnida Lailatul Latifah ◽

Adi Nurhadiyatna

Keyword(s):

Parallel Algorithms ◽

Computation Time ◽

Computational Time ◽

Rainfall Distribution ◽

Flood Modelling ◽

Computation Efficiency ◽

Distance Weighting ◽

Speed Up ◽

Important Input ◽

Serial Algorithms

This paper proposes parallel algorithms for precipitation of flood modelling, especially applied in spatial rainfall distribution. As an important input in flood modelling, spatial distribution of rainfall is always needed as a pre-conditioned model. In this paper two interpolation methods, Inverse distance weighting (IDW) and Ordinary kriging (OK) are discussed. Both are developed in parallel algorithms in order to reduce the computational time. To measure the computation efficiency, the performance of the parallel algorithms are compared to the serial algorithms for both methods. Findings indicate that: (1) the computation time of OK algorithm is up to 23% longer than IDW; (2) the computation time of OK and IDW algorithms is linearly increasing with the number of cells/ points; (3) the computation time of the parallel algorithms for both methods is exponentially decaying with the number of processors. The parallel algorithm of IDW gives a decay factor of 0.52, while OK gives 0.53; (4) The parallel algorithms perform near ideal speed-up.

Download Full-text

Hyper-optimized tensor network contraction

Quantum ◽

10.22331/q-2021-03-15-410 ◽

2021 ◽

Vol 5 ◽

pp. 410

Author(s):

Johnnie Gray ◽

Stefanos Kourtis

Keyword(s):

Computation Time ◽

Quantum Circuit ◽

Optimization Approach ◽

Many Body ◽

Tensor Networks ◽

Randomized Protocols ◽

Classical Simulation ◽

Tensor Network ◽

Speed Up ◽

Many Body Systems

Tensor networks represent the state-of-the-art in computational methods across many disciplines, including the classical simulation of quantum many-body systems and quantum circuits. Several applications of current interest give rise to tensor networks with irregular geometries. Finding the best possible contraction path for such networks is a central problem, with an exponential effect on computation time and memory footprint. In this work, we implement new randomized protocols that find very high quality contraction paths for arbitrary and large tensor networks. We test our methods on a variety of benchmarks, including the random quantum circuit instances recently implemented on Google quantum chips. We find that the paths obtained can be very close to optimal, and often many orders or magnitude better than the most established approaches. As different underlying geometries suit different methods, we also introduce a hyper-optimization approach, where both the method applied and its algorithmic parameters are tuned during the path finding. The increase in quality of contraction schemes found has significant practical implications for the simulation of quantum many-body systems and particularly for the benchmarking of new quantum chips. Concretely, we estimate a speed-up of over 10,000× compared to the original expectation for the classical simulation of the Sycamore `supremacy' circuits.

Download Full-text

Analysis of SARS-CoV-2 nucleocapsid protein sequence variations in ASEAN countries

Medical Journal of Indonesia ◽

10.13181/mji.oa.215304 ◽

2021 ◽

Author(s):

Mochammad Rajasa Mukti Negara ◽

Ita Krissanti ◽

Gita Widya Pradini

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Protein Sequences ◽

Reference Sequence ◽

N Protein ◽

Asean Country ◽

Sequence Variations ◽

Complete Sequences ◽

Asean Countries ◽

Global Initiative

BACKGROUND Nucleocapsid (N) protein is one of four structural proteins of SARS-CoV-2 which is known to be more conserved than spike protein and is highly immunogenic. This study aimed to analyze the variation of the SARS-CoV-2 N protein sequences in ASEAN countries, including Indonesia. METHODS Complete sequences of SARS-CoV-2 N protein from each ASEAN country were obtained from Global Initiative on Sharing All Influenza Data (GISAID), while the reference sequence was obtained from GenBank. All sequences collected from December 2019 to March 2021 were grouped to the clade according to GISAID, and two representative isolates were chosen from each clade for the analysis. The sequences were aligned by MUSCLE, and phylogenetic trees were built using MEGA-X software based on the nucleotide and translated AA sequences. RESULTS 98 isolates of complete N protein genes from ASEAN countries were analyzed. The nucleotides of all isolates were 97.5% conserved. Of 31 nucleotide changes, 22 led to amino acid (AA) substitutions; thus, the AA sequences were 94.5% conserved. The phylogenetic tree of nucleotide and AA sequences shows similar branches. Nucleotide variations in clade O (C28311T); clade GR (28881–28883 GGG>AAC); and clade GRY (28881–28883 GGG>AAC and C28977T) lead to specific branches corresponding to the clade within both trees. CONCLUSIONS The N protein sequences of SARS-CoV-2 across ASEAN countries are highly conserved. Most isolates were closely related to the reference sequence originating from China, except the isolates representing clade O, GR, and GRY which formed specific branches in the phylogenetic tree.

Download Full-text

Helminthosporium velutinum and H. aquaticum sp. nov. from aquatic habitats in Yunnan Province, China

Phytotaxa ◽

10.11646/phytotaxa.253.3.1 ◽

2016 ◽

Vol 253 (3) ◽

pp. 179 ◽

Cited By ~ 6

Author(s):

DAN ZHU ◽

ZONG-LONG LUO ◽

DARBHE JAYARAMA BAHT ◽

ERIC.H.C. MCKENZIE ◽

ALI H. BAHKALI ◽

...

Keyword(s):

New Species ◽

Phylogenetic Tree ◽

Dna Sequence ◽

Yunnan Province ◽

Sequence Data ◽

Its Sequence ◽

Aquatic Habitats ◽

Submerged Wood ◽

Dna Sequence Data ◽

A New Species

Helminthosporium species from submerged wood in streams in Yunnan Province, China were studied based on morphology and DNA sequence data. Descriptions and illustrations of Helminthosporium velutinum and a new species H. aquaticum are provided. A combined phylogenetic tree, based on SSU, ITS and LSU sequence data, place the species in Massarinaceae, Pleosporales. The polyphyletic nature of Helminthosporium species within Massarinaceae is shown based on ITS sequence data available in GenBank.

Download Full-text

Interpolation-Based Learning as a Mean to Speed-Up Bounded Model Checking (Short Paper)

Software Engineering and Formal Methods - Lecture Notes in Computer Science ◽

10.1007/978-3-319-66197-1_25 ◽

2017 ◽

pp. 382-387 ◽

Cited By ~ 1

Author(s):

Gianpiero Cabodi ◽

Paolo Camurati ◽

Marco Palena ◽

Paolo Pasini ◽

Danilo Vendraminetto

Keyword(s):

Model Checking ◽

Bounded Model Checking ◽

Short Paper ◽

Speed Up

Download Full-text

Model Checking Concurrent Recursive Programs Using Temporal Logics

Mathematical Foundations of Computer Science 2014 - Lecture Notes in Computer Science ◽

10.1007/978-3-662-44522-8_37 ◽

2014 ◽

pp. 438-450 ◽

Cited By ~ 2

Author(s):

Roy Mennicke

Keyword(s):

Model Checking ◽

Temporal Logics ◽

Recursive Programs

Download Full-text

Simulation-Based Scheduling of Waterway Projects Using a Parallel Genetic Algorithm

Transportation Systems and Engineering ◽

10.4018/978-1-4666-8473-7.ch016 ◽

2015 ◽

pp. 334-347 ◽

Cited By ~ 2

Author(s):

Ning Yang ◽

Shiaaulir Wang ◽

Paul Schonfeld

Keyword(s):

Genetic Algorithm ◽

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Computation Time ◽

Parallel Genetic Algorithm ◽

Simulation Based ◽

Multiple Processors ◽

Simulation Based Optimization ◽

Speed Up

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).

Download Full-text