scholarly journals Inferring demographic parameters in bacterial genomic data using Bayesian and hybrid phylogenetic methods

2017 ◽  
Author(s):  
Sebastian Duchene ◽  
David Duchene ◽  
Jemma Geoghegan ◽  
Zoe Anne Dyson ◽  
Jane Hawkey ◽  
...  

Background: Recent developments in sequencing technologies make it possible to obtain genome sequences from a large number of isolates in a very short time. Bayesian phylogenetic approaches can take advantage of these data by simultaneously inferring the phylogenetic tree, evolutionary timescale, and demographic parameters (such as population growth rates), while naturally integrating uncertainty in all parameters. Despite their desirable properties, Bayesian approaches can be computationally intensive, hindering their use for outbreak investigations involving genome data for a large numbers of pathogen isolates. An alternative to using full Bayesian inference is to use a hybrid approach, where the phylogenetic tree and evolutionary timescale are estimated first using maximum likelihood. Under this hybrid approach, demographic parameters are inferred from estimated trees instead of the sequence data, using maximum likelihood, Bayesian inference, or approximate Bayesian computation. This can vastly reduce the computational burden, but has the disadvantage of ignoring the uncertainty in the phylogenetic tree and evolutionary timescale. Results: We compared the performance of a fully Bayesian and a hybrid method by analysing six whole-genome SNP data sets from a range of bacteria and simulations. The estimates from the two methods were very similar, suggesting that the hybrid method is a valid alternative for very large datasets. However, we also found that congruence between these methods is contingent on the presence of strong temporal structure in the data (i.e. clocklike behaviour), which is typically verified using a date-randomisation test in a Bayesian framework. To reduce the computational burden of this Bayesian test we implemented a date-randomisation test using a rapid maximum likelihood method, which has similar performance to its Bayesian counterpart. Conclusions: Hybrid approaches can produce reliable inferences of evolutionary timescales and phylodynamic parameters in a fraction of the time required for fully Bayesian analyses. As such, they are a valuable alternative in outbreak studies involving a large number of isolates.

2016 ◽  
Vol 1 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Basant K. Tiwary

Background/Aims: A recent duplication of the gene encoding SLIT-ROBO Rho GTPase-activating protein 2 (SRGAP2) in the primate lineage has been proposed to be associated with the human-specific extraordinary development of intelligence. There is no report regarding the role of the SRGAP2 gene in the expression of neural traits indicating intelligence in mammals. Methods: A phylogenetic tree of the SRGAP2 gene from 11 mammals was reconstructed using MrBayes. The evolution of neural traits along the branches of the phylogenetic tree was modeled in the BayesTraits, and the dN/dS ratio (i.e. the ratio between the number of nonsynonymous substitutions per nonsynonymous site and the number of synonymous substitutions per synonymous site) was estimated using the codon-based maximum likelihood method (CODEML) in PAML (phylogenetic analysis by maximum likelihood). Results: Two neural traits, namely brain mass and the number of cortical neurons, showed statistical dependency on the underlying evolutionary history of the SRGAP2 gene in mammals. A significant positive correlation between the increase in cortical neurons and the rate of nucleotide substitutions in the SRGAP2 gene was observed concomitantly with a significant negative correlation between the increase in cortical neurons and the rate of nonsynonymous substitutions in the gene. The SRGAP2 gene appears to be under intense pressure of purifying selection in all mammalian lineages under stringent functional constraint. Conclusion: This work indicates a key role of the SRGAP2 gene in the rapid expansion of neurons in the brain cortex, thereby facilitating the evolution of remarkable intelligence in mammals.


Author(s):  
Sergei L Kosakovsky Pond ◽  
Sadie R Wisotsky ◽  
Ananias Escalante ◽  
Brittany Rife Magalis ◽  
Steven Weaver

Abstract A number of evolutionary hypotheses can be tested by comparing selective pressures among sets of branches in a phylogenetic tree. When the question of interest is to identify specific sites within genes that may be evolving differently, a common approach is to perform separate analyses on subsets of sequences and compare parameter estimates in a post hoc fashion. This approach is statistically suboptimal and not always applicable. Here, we develop a simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL. It is suitable for identifying individual alignment sites where any among the K≥2 sets of branches in a phylogenetic tree have detectably different ω ratios, indicative of different selective regimes. Using extensive simulations, we show that Contrast-FEL delivers good power, exceeding 90% for sufficiently large differences, while maintaining tight control over false positive rates, when the model is correctly specified. We conclude by applying Contrast-FEL to data from five previously published studies spanning a diverse range of organisms and focusing on different evolutionary questions.


Author(s):  
Sergei L. Kosakovsky Pond ◽  
Sadie R Wisotsky ◽  
Ananias Escalante ◽  
Brittany Rife Magalis ◽  
Steven Weaver

AbstractA number of evolutionary hypotheses can be tested by comparing selective pressures among sets of branches in a phylogenetic tree. When the question of interest is to identify specific sites within genes that may be evolving differently, a common approach is to perform separate analyses on subsets of sequences, and compare parameter estimates in a post hoc fashion. This approach is statistically suboptimal, and not always applicable. Here, we develop a simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL. It is suitable for identifying individual alignment sites where any among the K ≥ 2 sets of branches in a phylogenetic tree have detectably different dN/dS ratios, indicative of different selective regimes. Using extensive simulations, we show that Contrast-FEL delivers good power, exceeding 90% for sufficiently large differences, while maintaining tight control over false positive rates. We conclude by applying Contrast-FEL to data from five previously published studies spanning a diverse range of organisms and focusing on different evolutionary questions.


2013 ◽  
Vol 1 (5) ◽  
pp. 6001-6024 ◽  
Author(s):  
K. Kochanek ◽  
W. G. Strupczewski ◽  
E. Bogdanowicz ◽  
W. Feluch ◽  
I. Markiewicz

Abstract. The alleged changes in rivers' flow regime resulted in the surge in the methods of non-stationary flood frequency analysis (NFFA). The maximum likelihood method is said to produce big systematic errors in moments and quantiles resulting mainly from bad assumption of the model (model error) unless this model is the normal distribution. Since the estimators by the method of linear moments (L-moments) yield much lower model errors than those by the maximum likelihood, to improve the accuracy of the parameters and quantiles in non-stationary case, a new two-stage methodology of NFFA based on the concept of L-moments was developed. Despite taking advantage of the positive characteristics of L-moments, a new technique also allows to keep the calculations "distribution independent" as long as possible. These two stages consists in (1) least square estimation of trends in mean value and/or in standard deviation and "de-trendisation" of the time series and (2) estimation of parameters and quantiles by means of stationary sample with L-moments method and "re-trendisation" of quantiles. As a result time-dependent quantiles for a given time and return period can be calculated. The comparative results of Monte Carlo simulations confirmed the superiority of two-stage NFFA methodology over the classical maximum likelihood one. Further analysis of trends in GEV-parent-distributed generic time series by means of both NFFA methods revealed big differences between classical and two-stage estimators of trends got for the same data by the same model (GEV or Gumbel). Additionally, it turned out that the quantiles estimated by the methods of traditional stationary flood frequency analysis equal only to those non-stationary calculated for a strict middle of the time series. It proves that use of traditional stationary methods in conditions of variable regime is too much a simplification and leads to erroneous results. Therefore, when the phenomenon is non-stationary, so should be the methods used for its interpretation.


2021 ◽  
Vol 46 (2) ◽  
pp. 259
Author(s):  
Rara Erlina Oktafia ◽  
Badruzsaufari Badruzsaufari

Garcinia genus has a complicated taxonomy due to the high similarities in morphological characters of its members. The phylogenetic analysis on Garcinia species based on rRNA gene sequence intended to find out the evolutionary relationship amongst the species. This research employed 20 sequences of the rRNA gene of Garcinia species selected from the GenBank of the National Center for Biology Information (NCBI).  The sequences were aligned using ClustalW with the MEGA X application. A phylogenetic tree was constructed Maximum Likelihood method approximation and Kimura 2-parameter model. The results of the analysis showed that the cladogram had monophyletic parameter properties and classified into 3 clades. Clade I consisted of G. celebica, G. hombroniana, G. opaca, G. mangostana, G. malaccensis, G. penangiana, G. scortechinii, G. hanburyi, and G. urophylla. Clade II included G. atroviridis, G. bancana, G. forbesii, G. griffithii, G. cowa, G. nigrolineata, G. globulosa, and G. parvifolia. Clade III composed of G. rostrata, G. nervosa, and G. praininiana.  The species of Garcinia considered the most primitive and the closet to their ancestor is G. nervosa. 


Author(s):  
Anggis Sagitarisman ◽  
Aceng Komarudin Mutaqin

AbstractCar manufacturers in Indonesia need to determine reasonable warranty costs that do not burden companies or consumers. Several statistical approaches have been developed to analyze warranty costs. One of them is the Gertsbakh-Kordonsky method which reduces the two-dimensional warranty problem to one dimensional. In this research, we apply the Gertsbakh-Kordonsky method to estimate the warranty cost for car type A in XYZ company. The one-dimensional data will be tested using the Kolmogorov-Smirnov to determine its distribution and the parameter of distribution will be estimated using the maximum likelihood method. There are three approaches to estimate the parameter of the distribution. The difference between these three approaches is in the calculation of mileage for units that do not claim within the warranty period. In the application, we use claim data for the car type A. The data exploration indicates the failure of car type A is mostly due to the age of the vehicle. The Kolmogorov-Smirnov shows that the most appropriate distribution for the claim data is the three-parameter Weibull. Meanwhile, the estimated using the Gertsbakh-Kordonsky method shows that the warranty costs for car type A are around 3.54% from the selling price of this car unit without warranty i.e. around Rp. 4,248,000 per unit.Keywords: warranty costs; the Gertsbakh-Kordonsky method; maximum likelihood estimation; Kolmogorov-Smirnov test.                                   AbstrakPerusahaan produsen mobil di Indonesia perlu menentukan biaya garansi yang bersifat wajar tidak memberatkan perusahaan maupun konsumen. Beberapa pendekatan statistik telah dikembangkan untuk menganalisis biaya garansi. Salah satunya adalah metode Gertsbakh-Kordonsky yang mereduksi masalah garansi dua dimensi menjadi satu dimensi. Pada penelitian ini, metode Gertsbakh-Kordonsky akan digunakan untuk mengestimasi biaya garansi untuk mobil tipe A pada perusahaan XYZ. Data satu dimensi hasil reduksi diuji kecocokan distribusinya menggunakan uji kecocokan Kolmogorov-Smirnov dan taksiran parameter distribusinya menggunakan metode penaksir kemungkinan maksimum. Ada tiga pendekatan yang digunakan untuk menaksir parameter distribusi. Perbedaan dari ketiga pendekatan tersebut terletak pada perhitungan jarak tempuh untuk unit yang tidak melakukan klaim dalam periode garansi. Sebagai bahan aplikasi, kami menggunakan data klaim unit mobil tipe A. Hasil eksplorasi data menunjukkan bahwa kegagalan mobil tipe A lebih banyak disebabkan karena faktor usia kendaraan. Hasil uji kecocokan distribusi untuk data hasil reduksi menunjukkan bahwa distribusi yang cocok adalah distribusi Weibull 3-parameter. Sementara itu, hasil perhitungan taksiran biaya garansi menunjukan bahwa taksiran biaya garansi untuk unit mobil tipe A sekitar 3,54% dari harga jual unit mobil tipe A tanpa garansi, atau sekitar Rp. 4.248.000,- per unit.Kata Kunci: biaya garansi; metode Gertsbakh-Kordonsky; penaksiran kemungkinan maksimum; uji Kolmogorov-Smirnov.


Sign in / Sign up

Export Citation Format

Share Document