A rainfall-runoff probabilistic simulation program: 2. Synthetic data analysis

1996 ◽  
Vol 11 (4) ◽  
pp. 243-249 ◽  
Author(s):  
T.V. Hromadka
eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Prathitha Kar ◽  
Sriram Tiruvadi-Krishnan ◽  
Jaana Männik ◽  
Jaan Männik ◽  
Ariel Amir

Collection of high-throughput data has become prevalent in biology. Large datasets allow the use of statistical constructs such as binning and linear regression to quantify relationships between variables and hypothesize underlying biological mechanisms based on it. We discuss several such examples in relation to single-cell data and cellular growth. In particular, we show instances where what appears to be ordinary use of these statistical methods leads to incorrect conclusions such as growth being non-exponential as opposed to exponential and vice versa. We propose that the data analysis and its interpretation should be done in the context of a generative model, if possible. In this way, the statistical methods can be validated either analytically or against synthetic data generated via the use of the model, leading to a consistent method for inferring biological mechanisms from data. On applying the validated methods of data analysis to infer cellular growth on our experimental data, we find the growth of length in E. coli to be non-exponential. Our analysis shows that in the later stages of the cell cycle the growth rate is faster than exponential.


1971 ◽  
Vol 8 (1) ◽  
pp. 71-77 ◽  
Author(s):  
Paul E. Green ◽  
Vithala R. Rao

This article compares, via synthetic data analysis, the performance of five different methods for scaling averaged dissimilarities data under conditions involving individual differences in “perception.” All methods perform well when no “degradation” of the (simulated) ratings is entailed. When the data are transformed to zero-one values—a procedure sometimes followed in applied studies—all procedures perform poorly compared to the no-degradation case. Implications of these results for scaling applications involving group solutions are discussed.


2013 ◽  
Vol 380-384 ◽  
pp. 2876-2879
Author(s):  
Ming Li Song ◽  
Shu Juan Wang

Spatiotemporal data are widely visible in everyday life. This paper proposes an algorithm to represent them in a granular wayinformation granules. Information granules can be regarded as a collection of conceptual landmarks using which people can view the data and describe them in a semantic way. The key objective of this paper is to introduce a new granular way of data analysis through their granulation. Several experiments are done with synthetic data and the results show a clear way how our algorithm performs.


2020 ◽  
Author(s):  
Peiyuan Zhou ◽  
Andrew K.C. Wong

Abstract Background Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest and application in clinical practices. First, the interpretability of the diagnostic/prognostic results will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. Furthermore, from the clinical aspect, when the datasets are imbalanced in diagnostic categories, the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it is desirable to have a method that could produce explicit transparent and interpretable results in decision-making, even for data with imbalanced groups.Methods In order to interpret the clinical patterns and conduct diagnostic prediction of patients, we present our new method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small and succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even for small and rare groups.Results Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover fewer patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches.Conclusions In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discover all patterns implanted in the data, display them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel explainable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.


Organizacija ◽  
2013 ◽  
Vol 46 (3) ◽  
pp. 87-97
Author(s):  
Mahdi Salehi ◽  
Ali Mohammadi ◽  
Parisa Taherzadeh Esfahani

The main objective of the current study is to examine the effect of audit report on cash-flow investment sensitivity of 123 listed companies in Tehran Stock Exchange (TSE) during 2006-2010. Regression analysis and synthetic data were used for data analysis. The results showed that receiving modified report has a significant negative effect on cash flow-investment sensitivity. The findings also suggest the significant effect of receiving qualified report and unqualified report with explanatory paragraphs on cash flow-investment sensitivity.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Aditya M. Limaye ◽  
Joy S. Zeng ◽  
Adam P. Willard ◽  
Karthish Manthiram

AbstractThe Tafel slope is a key parameter often quoted to characterize the efficacy of an electrochemical catalyst. In this paper, we develop a Bayesian data analysis approach to estimate the Tafel slope from experimentally-measured current-voltage data. Our approach obviates the human intervention required by current literature practice for Tafel estimation, and provides robust, distributional uncertainty estimates. Using synthetic data, we illustrate how data insufficiency can unknowingly influence current fitting approaches, and how our approach allays these concerns. We apply our approach to conduct a comprehensive re-analysis of data from the CO2 reduction literature. This analysis reveals no systematic preference for Tafel slopes to cluster around certain "cardinal values” (e.g. 60 or 120 mV/decade). We hypothesize several plausible physical explanations for this observation, and discuss the implications of our finding for mechanistic analysis in electrochemical kinetic investigations.


Sign in / Sign up

Export Citation Format

Share Document