scholarly journals Towards Consensus Gene Ages

2016 ◽  
Author(s):  
Benjamin J. Liebeskind ◽  
Claire D. McWhite ◽  
Edward M. Marcotte

Correctly estimating the age of a gene or gene family is important for a variety of fields, including molecular evolution, comparative genomics, and phylogenetics, and increasingly for systems biology and disease genetics. However, most studies use only a point estimate of a gene's age, neglecting the substantial uncertainty involved in this estimation. Here, we characterize this uncertainty by investigating the effect of algorithm choice on gene-age inference and calculate consensus gene ages with attendant error distributions for a variety of model eukaryotes. We use thirteen orthology inference algorithms to create gene-age datasets and then characterize the error around each age-call on a per-gene and per-algorithm basis. Systematic error was found to be a large factor in estimating gene age, suggesting that simple consensus algorithms are not enough to give a reliable point estimate. We also found that different sources of error can affect downstream analyses, such as gene ontology enrichment. Our consensus gene-age datasets, with associated error terms, are made fully available at so that researchers can propagate this uncertainty through their analyses (https://github.com/marcottelab/Gene-Ages).

Author(s):  
Nino Antulov-Fantulin ◽  
Tian Guo ◽  
Fabrizio Lillo

AbstractWe study the problem of the intraday short-term volume forecasting in cryptocurrency multi-markets. The predictions are built by using transaction and order book data from different markets where the exchange takes place. Methodologically, we propose a temporal mixture ensemble, capable of adaptively exploiting, for the forecasting, different sources of data and providing a volume point estimate, as well as its uncertainty. We provide evidence of the clear outperformance of our model with respect to econometric models. Moreover our model performs slightly better than Gradient Boosting Machine while having a much clearer interpretability of the results. Finally, we show that the above results are robust also when restricting the prediction analysis to each volume quartile.


2014 ◽  
Vol 32 (1) ◽  
pp. 258-267 ◽  
Author(s):  
Bryan A. Moyers ◽  
Jianzhi Zhang

Phylostratigraphy is a method for dating the evolutionary emergence of a gene or gene family by identifying its homologs across the tree of life, typically by using BLAST searches. Applying this method to all genes in a species, or genomic phylostratigraphy, allows investigation of genome-wide patterns in new gene origination at different evolutionary times and thus has been extensively used. However, gene age estimation depends on the challenging task of detecting distant homologs via sequence similarity, which is expected to have differential accuracies for different genes. Here, we evaluate the accuracy of phylostratigraphy by realistic computer simulation with parameters estimated from genomic data, and investigate the impact of its error on findings of genome evolution. We show that 1) phylostratigraphy substantially underestimates gene age for a considerable fraction of genes, 2) the error is especially serious when the protein evolves rapidly, is short, and/or its most conserved block of sites is small, and 3) these errors create spurious nonuniform distributions of various gene properties among age groups, many of which cannot be predicted a priori. Given the high likelihood that conclusions about gene age are faulty, we advocate the use of realistic simulation to determine if observations from phylostratigraphy are explainable, at least qualitatively, by a null model of biased measurement, and in all cases, critical evaluation of results.


Author(s):  
Kamran Malik

This research presents a new and efficient Centroidal mean derivative-based numerical cubature scheme which has been proposed for the accurate evaluation of double integrals under finite range. The proposed modification is based on the Trapezoidal-type quadrature and cubature rules. The approximate values can only be obtained for some important applications to evaluate the complex double integrals. Higher precision and order of accuracy could be achieved by the proposed scheme. The schemes, in basic and composite forms, with local and global error terms are presented with necessary supporting arguments with their performance evaluation against conventional Trapezoid rule through some numerical experiments. The simultaneously observed error distributions of the proposed schemes are found to be lower than the conventional Trapezoidal cubature scheme in composite form


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Shivaji Shripati Desai ◽  
D. N. Kashid

Support vector machine (SVM) is used for estimation of regression parameters to modify the sum of cross products (Sp). It works well for some nonnormal error distributions. The performance of existing robust methods and the modified Sp is evaluated through simulated and real data. The results show the performance of the modified Sp is good.


Author(s):  
Kamran Malik

This study focuses on the Heronian mean derivative-based numerical cubature scheme to better evaluate double integrals’ infinite limits. The proposed modifications rely on the Trapezoidal-type quadrature and cubature schemes. The aforementioned proposed scheme is important to numerically evaluate the complex double integrals, where the exact value is not available but the approximate values can only be obtained. With regards to higher precision and order of accuracy, the proposed Heronian derivative-based double integral scheme provides efficient results. The discussed scheme, in basic and composite forms, with local and global error terms is presented with necessary proofs with their performance evaluation against conventional Trapezoid rule through some numerical experiments. The consequently observed error distributions of the aforementioned scheme are found to be lower than the conventional Trapezoidal cubature scheme in composite form


2021 ◽  
Author(s):  
Natalia Piotrowska ◽  
Jarosław Sikorski

<p><sup>14</sup>C and <sup>210</sup>Pb methods are regularly used to determine ages and accumulation rates of peat, fen and lake sediments. The overall aim is to estimate the age of discrete layers, which were analysed for environmental proxies. Ideally, the age-depth models should fit the investigated proxy in terms of resolution and give precise results. Nevertheless, the differences in the nature of dating methods and statistical treatment of data need to be considered.</p><p>Both <sup>14</sup>C and <sup>210</sup>Pb signals are integrated over a considerable period. Moreover, they originate from different sources. <sup>210</sup>Pb is bound to aerosols and trapped by peat while <sup>14</sup>C is bound from atmospheric CO<sub>2</sub> by photosynthesis. Hence, <sup>210</sup>Pb gives the time span during which the aerosol has been buried, whereas the <sup>14</sup>C date gives the time of death of a plant.</p><p>After the analysis, the results are usually combined into an age-depth model. This process involves statistical treatment of data during which specific assumptions and simplifications are made. Depending on the algorithm, they lead to alterations in modelled ages compared to unmodelled data. Principally it is a desired result–increasing the robustness and decreasing the uncertainty of the age-depth model. In worse cases models alter the modelled ages to an unacceptable extent, which may be overlooked if the results are treated automatically.</p><p>We test the performance of various age-depth modelling algorithms (OxCal P_Sequence, Bacon, clam, MOD-AGE) on a selected true dataset where <sup>14</sup>C and <sup>210</sup>Pb data overlap and are used simultaneously. Afterwards, a point estimate is selected and used for proxy analysis on a time scale and for calculation of the accumulation rates. We also check the influence of <sup>210</sup>Pb calculation method (CRS, ModAge, extrapolation technique) on derived age-depth models.</p><p>Together with the thickness of analysed samples the age model provides an information about the time resolution of proxy analysis. While the age-depth curves, except outstanding circumstances, give relatively similar answers within 95% uncertainty ranges, the differences are observed in point estimates and accumulation rate, and they may be relevant for the palaeoenvironmental studies. With this exercise we attempt to assess the uncertainty beyond simple age errors reported from the measurements and age-depth modelling.</p>


Biometrika ◽  
2019 ◽  
Author(s):  
Y Samuel Wang ◽  
Mathias Drton

Summary We consider graphical models based on a recursive system of linear structural equations. This implies that there is an ordering, $\sigma$, of the variables such that each observed variable $Y_v$ is a linear function of a variable-specific error term and the other observed variables $Y_u$ with $\sigma(u) < \sigma (v)$. The causal relationships, i.e., which other variables the linear functions depend on, can be described using a directed graph. It has previously been shown that when the variable-specific error terms are non-Gaussian, the exact causal graph, as opposed to a Markov equivalence class, can be consistently estimated from observational data. We propose an algorithm that yields consistent estimates of the graph also in high-dimensional settings in which the number of variables may grow at a faster rate than the number of observations, but in which the underlying causal structure features suitable sparsity; specifically, the maximum in-degree of the graph is controlled. Our theoretical analysis is couched in the setting of log-concave error distributions.


1994 ◽  
Vol 39 (9) ◽  
pp. 878-879 ◽  
Author(s):  
David C. Rowe
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document