Empirical evaluation of directional-dependence tests

2015 ◽  
Vol 39 (6) ◽  
pp. 560-569 ◽  
Author(s):  
Felix Thoemmes

Testing of directional dependence is a method to infer causal direction that recently has attracted some attention. Previous examples by e.g. von Eye and DeShon (2012a) and extensive simulation studies by Pornprasertmanit and Little (2012) have demonstrated that under specific assumptions, directional-dependence tests can recover the true causal direction between two variables. Simulation results are important in the evaluation of any statistical method, but they are necessarily less complex than real data that come with potential irregularities (e.g. departures from linearity, presence of confounders, etc.). In this article, we evaluate the performance of directional-dependence tests using benchmark data consisting of 65 variable pairs with known causal order. We find that between 21% and 43% of all cases are correctly classified using different directional-dependence tests that rely on differences in skew, kurtosis, or a combined measure. We then examine some of the assumptions of the directional-dependence test and find that for virtually all variable pairs, some assumptions are violated. When only pairs in which assumptions are fulfilled are selected, performance of all directional-dependence tests improves. We probe whether particular features of the variable pairs impact whether a test yields a correct or incorrect result, but find no strong predictors. Our findings provide a complimentary picture to previously conducted simulation studies, and highlight the fact that directional-dependence tests are prone to causal classification errors when key assumptions are violated. Such violations are potentially common in real data.

2016 ◽  
Vol 40 (1) ◽  
pp. 318-330 ◽  
Author(s):  
Amirhossein Amiri ◽  
Reza Ghashghaei ◽  
Mohammad Reza Maleki

In this paper, we investigate the misleading effect of measurement errors on simultaneous monitoring of the multivariate process mean and variability. For this purpose, we incorporate the measurement errors into a hybrid method based on the generalized likelihood ratio (GLR) and exponentially weighted moving average (EWMA) control charts. After that, we propose four remedial methods to decrease the effects of measurement errors on the performance of the monitoring procedure. The performance of the monitoring procedure as well as the proposed remedial methods is investigated through extensive simulation studies and a real data example.


2021 ◽  
Author(s):  
Jakob Raymaekers ◽  
Peter J. Rousseeuw

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.


Author(s):  
Guanghao Qi ◽  
Nilanjan Chatterjee

Abstract Background Previous studies have often evaluated methods for Mendelian randomization (MR) analysis based on simulations that do not adequately reflect the data-generating mechanisms in genome-wide association studies (GWAS) and there are often discrepancies in the performance of MR methods in simulations and real data sets. Methods We use a simulation framework that generates data on full GWAS for two traits under a realistic model for effect-size distribution coherent with the heritability, co-heritability and polygenicity typically observed for complex traits. We further use recent data generated from GWAS of 38 biomarkers in the UK Biobank and performed down sampling to investigate trends in estimates of causal effects of these biomarkers on the risk of type 2 diabetes (T2D). Results Simulation studies show that weighted mode and MRMix are the only two methods that maintain the correct type I error rate in a diverse set of scenarios. Between the two methods, MRMix tends to be more powerful for larger GWAS whereas the opposite is true for smaller sample sizes. Among the other methods, random-effect IVW (inverse-variance weighted method), MR-Robust and MR-RAPS (robust adjust profile score) tend to perform best in maintaining a low mean-squared error when the InSIDE assumption is satisfied, but can produce large bias when InSIDE is violated. In real-data analysis, some biomarkers showed major heterogeneity in estimates of their causal effects on the risk of T2D across the different methods and estimates from many methods trended in one direction with increasing sample size with patterns similar to those observed in simulation studies. Conclusion The relative performance of different MR methods depends heavily on the sample sizes of the underlying GWAS, the proportion of valid instruments and the validity of the InSIDE assumption. Down-sampling analysis can be used in large GWAS for the possible detection of bias in the MR methods.


2021 ◽  
Vol 11 (8) ◽  
pp. 3484
Author(s):  
Martin Tabakov ◽  
Adrian Chlopowiec ◽  
Adam Chlopowiec ◽  
Adam Dlubak

In this research, we introduce a classification procedure based on rule induction and fuzzy reasoning. The classifier generalizes attribute information to handle uncertainty, which often occurs in real data. To induce fuzzy rules, we define the corresponding fuzzy information system. A transformation of the derived rules into interval type-2 fuzzy rules is provided as well. The fuzzification applied is optimized with respect to the footprint of uncertainty of the corresponding type-2 fuzzy sets. The classification process is related to a Mamdani type fuzzy inference. The method proposed was evaluated by the F-score measure on benchmark data.


2021 ◽  
Vol 15 (4) ◽  
pp. 1-20
Author(s):  
Georg Steinbuss ◽  
Klemens Böhm

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.


Author(s):  
Mohadese Jahanian ◽  
Amin Ramezani ◽  
Ali Moarefianpour ◽  
Mahdi Aliari Shouredeli

One of the most significant systems that can be expressed by partial differential equations (PDEs) is the transmission pipeline system. To avoid the accidents that originated from oil and gas pipeline leakage, the exact location and quantity of leakage are required to be recognized. The designed goal is a leakage diagnosis based on the system model and the use of real data provided by transmission line systems. Nonlinear equations of the system have been extracted employing continuity and momentum equations. In this paper, the extended Kalman filter (EKF) is used to detect and locate the leakage and to attenuate the negative effects of measurement and process noises. Besides, a robust extended Kalman filter (REKF) is applied to compensate for the effect of parameter uncertainty. The quantity and the location of the occurred leakage are estimated along the pipeline. Simulation results show that REKF has better estimations of the leak and its location as compared with that of EKF. This filter is robust against process noise, measurement noise, parameter uncertainties, and guarantees a higher limit for the covariance of state estimation error as well. It is remarkable that simulation results are evaluated by OLGA software.


Author(s):  
R. W. Toogood

Abstract A number of programs have been developed for the automatic symbolic generation of efficient computer code for the dynamic analysis of serial rigid and flexible link manipulators. Code for both the inverse and the direct dynamics computations can be generated. The symbolic generators allow the robot base to be given an arbitrary linear acceleration anchor angular velocity and acceleration. The efficiency of the generated code is an important consideration for simulation studies and/or implementation in control systems. This paper briefly describes the symbolic generation and simplification techniques. The added computational load due to including the base motion is discussed. Some dynamics simulation results are presented for a 3R rigid link manipulator mounted on an oscillating base, which graphically illustrates the effect of the base movement on the dynamics.


Signals ◽  
2022 ◽  
Vol 3 (1) ◽  
pp. 1-10
Author(s):  
Md. Noor-A-Rahim ◽  
M. Omar Khyam ◽  
Apel Mahmud ◽  
Xinde Li ◽  
Dirk Pesch ◽  
...  

Long-range (LoRa) communication has attracted much attention recently due to its utility for many Internet of Things applications. However, one of the key problems of LoRa technology is that it is vulnerable to noise/interference due to the use of only up-chirp signals during modulation. In this paper, to solve this problem, unlike the conventional LoRa modulation scheme, we propose a modulation scheme for LoRa communication based on joint up- and down-chirps. A fast Fourier transform (FFT)-based demodulation scheme is devised to detect modulated symbols. To further improve the demodulation performance, a hybrid demodulation scheme, comprised of FFT- and correlation-based demodulation, is also proposed. The performance of the proposed scheme is evaluated through extensive simulation results. Compared to the conventional LoRa modulation scheme, we show that the proposed scheme exhibits over 3 dB performance gain at a bit error rate of 10−4.


Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2164
Author(s):  
Héctor J. Gómez ◽  
Diego I. Gallardo ◽  
Karol I. Santoro

In this paper, we present an extension of the truncated positive normal (TPN) distribution to model positive data with a high kurtosis. The new model is defined as the quotient between two random variables: the TPN distribution (numerator) and the power of a standard uniform distribution (denominator). The resulting model has greater kurtosis than the TPN distribution. We studied some properties of the distribution, such as moments, asymmetry, and kurtosis. Parameter estimation is based on the moments method, and maximum likelihood estimation uses the expectation-maximization algorithm. We performed some simulation studies to assess the recovery parameters and illustrate the model with a real data application related to body weight. The computational implementation of this work was included in the tpn package of the R software.


2018 ◽  
Author(s):  
Adrian Fritz ◽  
Peter Hofmann ◽  
Stephan Majda ◽  
Eik Dahms ◽  
Johannes Dröge ◽  
...  

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Here, we describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series and differential abundance studies, includes real and simulated strain-level diversity, and generates second and third generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with truth standards for method evaluation. All data sets and the software are freely available at: https://github.com/CAMI-challenge/CAMISIM


Sign in / Sign up

Export Citation Format

Share Document