scholarly journals SEED-G: Simulated EEG Data Generator for Testing Connectivity Algorithms

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3632
Author(s):  
Alessandra Anzolin ◽  
Jlenia Toppi ◽  
Manuela Petti ◽  
Febo Cincotti ◽  
Laura Astolfi

EEG signals are widely used to estimate brain circuits associated with specific tasks and cognitive processes. The testing of connectivity estimators is still an open issue because of the lack of a ground-truth in real data. Existing solutions such as the generation of simulated data based on a manually imposed connectivity pattern or mass oscillators can model only a few real cases with limited number of signals and spectral properties that do not reflect those of real brain activity. Furthermore, the generation of time series reproducing non-ideal and non-stationary ground-truth models is still missing. In this work, we present the SEED-G toolbox for the generation of pseudo-EEG data with imposed connectivity patterns, overcoming the existing limitations and enabling control of several parameters for data simulation according to the user’s needs. We first described the toolbox including guidelines for its correct use and then we tested its performances showing how, in a wide range of conditions, datasets composed by up to 60 time series were successfully generated in less than 5 s and with spectral features similar to real data. Then, SEED-G is employed for studying the effect of inter-trial variability Partial Directed Coherence (PDC) estimates, confirming its robustness.

2018 ◽  
Author(s):  
Laurens R. Krol ◽  
Juliane Pawlitzki ◽  
Fabien Lotte ◽  
Klaus Gramann ◽  
Thorsten O. Zander

AbstractElectroencephalography (EEG) is a popular method to monitor brain activity, but it can be difficult to evaluate EEG-based analysis methods because no ground-truth brain activity is available for comparison. Therefore, in order to test and evaluate such methods, researchers often use simulated EEG data instead of actual EEG recordings, ensuring that it is known beforehand which e ects are present in the data. As such, simulated data can be used, among other things, to assess or compare signal processing and machine learn-ing algorithms, to model EEG variabilities, and to design source reconstruction methods. In this paper, we present SEREEGA, short for Simulating Event-Related EEG Activity. SEREEGA is a MATLAB-based open-source toolbox dedicated to the generation of sim-ulated epochs of EEG data. It is modular and extensible, at initial release supporting ve different publicly available head models and capable of simulating multiple different types of signals mimicking brain activity. This paper presents the architecture and general work ow of this toolbox, as well as a simulated data set demonstrating some of its functions.HighlightsSimulated EEG data has a known ground truth, which can be used to validate methods.We present a general-purpose open-source toolbox to simulate EEG data.It provides a single framework to simulate many different types of EEG recordings.It is modular, extensible, and already includes a number of head models and signals.It supports noise, oscillations, event-related potentials, connectivity, and more.


Entropy ◽  
2019 ◽  
Vol 21 (6) ◽  
pp. 613 ◽  
Author(s):  
Christoph Bandt

The study of order patterns of three equally-spaced values x t , x t + d , x t + 2 d in a time series is a powerful tool. The lag d is changed in a wide range so that the differences of the frequencies of order patterns become autocorrelation functions. Similar to a spectrogram in speech analysis, four ordinal autocorrelation functions are used to visualize big data series, as for instance heart and brain activity over many hours. The method applies to real data without preprocessing, and outliers and missing data do not matter. On the theoretical side, we study the properties of order correlation functions and show that the four autocorrelation functions are orthogonal in a certain sense. An analysis of variance of a modified permutation entropy can be performed with four variance components associated with the functions.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
João Lobo ◽  
Rui Henriques ◽  
Sara C. Madeira

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.


2020 ◽  
Author(s):  
Yoonjee Kang ◽  
Denis Thieffry ◽  
Laura Cantini

AbstractNetworks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth.Here, we benchmark four single-cell network inference methods based on their reproducibility, i.e. their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis.GENIE3 results to be the most reproducible algorithm, independently from the single-cell sequencing platform, the cell type annotation system, the number of cells constituting the dataset, or the thresholding applied to the links of the inferred networks. In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.


Author(s):  
Dr. Maysoon M. Aziz, Et. al.

In this paper, we will use the differential equations of the SIR model as a non-linear system, by using the Runge-Kutta numerical method to calculate simulated values for known epidemiological diseases related to the time series including the epidemic disease COVID-19, to obtain hypothetical results and compare them with the dailyreal statisticals of the disease for counties of the world and to know the behavior of this disease through mathematical applications, in terms of stability as well as chaos in many applied methods. The simulated data was obtained by using Matlab programms, and compared between real data and simulated datd were well compatible and with a degree of closeness. we took the data for Italy as an application.  The results shows that this disease is unstable, dissipative and chaotic, and the Kcorr of it equal (0.9621), ,also the power spectrum system was used as an indicator to clarify the chaos of the disease, these proves that it is a spread,outbreaks,chaotic and epidemic disease .


2021 ◽  
Author(s):  
Mikhail Kanevski

<p>Nowadays a wide range of methods and tools to study and forecast time series is available. An important problem in forecasting concerns embedding of time series, i.e. construction of a high dimensional space where forecasting problem is considered as a regression task. There are several basic linear and nonlinear approaches of constructing such space by defining an optimal delay vector using different theoretical concepts. Another way is to consider this space as an input feature space – IFS, and to apply machine learning feature selection (FS) algorithms to optimize IFS according to the problem under study (analysis, modelling or forecasting). Such approach is an empirical one: it is based on data and depends on the FS algorithms applied. In machine learning features are generally classified as relevant, redundant and irrelevant. It gives a reach possibility to perform advanced multivariate time series exploration and development of interpretable predictive models.</p><p>Therefore, in the present research different FS algorithms are used to analyze fundamental properties of time series from empirical point of view. Linear and nonlinear simulated time series are studied in detail to understand the advantages and drawbacks of the proposed approach. Real data case studies deal with air pollution and wind speed times series. Preliminary results are quite promising and more research is in progress.</p>


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Francesca Pizzorni Ferrarese ◽  
Flavio Simonetti ◽  
Roberto Israel Foroni ◽  
Gloria Menegaz

Validation and accuracy assessment are the main bottlenecks preventing the adoption of image processing algorithms in the clinical practice. In the classical approach, a posteriori analysis is performed through objective metrics. In this work, a different approach based on Petri nets is proposed. The basic idea consists in predicting the accuracy of a given pipeline based on the identification and characterization of the sources of inaccuracy. The concept is demonstrated on a case study: intrasubject rigid and affine registration of magnetic resonance images. Both synthetic and real data are considered. While synthetic data allow the benchmarking of the performance with respect to the ground truth, real data enable to assess the robustness of the methodology in real contexts as well as to determine the suitability of the use of synthetic data in the training phase. Results revealed a higher correlation and a lower dispersion among the metrics for simulated data, while the opposite trend was observed for pathologic ones. Results show that the proposed model not only provides a good prediction performance but also leads to the optimization of the end-to-end chain in terms of accuracy and robustness, setting the ground for its generalization to different and more complex scenarios.


2020 ◽  
Author(s):  
Edlin J. Guerra-Castro ◽  
Juan Carlos Cajas ◽  
Nuno Simões ◽  
Juan J Cruz-Motta ◽  
Maite Mascaró

ABSTRACTSSP (simulation-based sampling protocol) is an R package that uses simulation of ecological data and dissimilarity-based multivariate standard error (MultSE) as an estimator of precision to evaluate the adequacy of different sampling efforts for studies that will test hypothesis using permutational multivariate analysis of variance. The procedure consists in simulating several extensive data matrixes that mimic some of the relevant ecological features of the community of interest using a pilot data set. For each simulated data, several sampling efforts are repeatedly executed and MultSE calculated. The mean value, 0.025 and 0.975 quantiles of MultSE for each sampling effort across all simulated data are then estimated and standardized regarding the lowest sampling effort. The optimal sampling effort is identified as that in which the increase in sampling effort do not improve the precision beyond a threshold value (e.g. 2.5 %). The performance of SSP was validated using real data, and in all examples the simulated data mimicked well the real data, allowing to evaluate the relationship MultSE – n beyond the sampling size of the pilot studies. SSP can be used to estimate sample size in a wide range of situations, ranging from simple (e.g. single site) to more complex (e.g. several sites for different habitats) experimental designs. The latter constitutes an important advantage, since it offers new possibilities for complex sampling designs, as it has been advised for multi-scale studies in ecology.


2021 ◽  
Author(s):  
Teppei Matsui ◽  
Trung Quang Pham ◽  
Koji Jimura ◽  
Junichi Chikazoe

AbstractThe non-stationarity of resting-state brain activity has received increasing attention in recent years. Functional connectivity (FC) analysis with short sliding windows and coactivation pattern (CAP) analysis are two widely used methods for assessing the non-stationary characteristics of brain activity observed with functional magnetic resonance imaging (fMRI). However, whether these techniques adequately capture non-stationarity needs to be verified. In this study, we found that the results of CAP analysis were similar for real fMRI data and simulated stationary data with matching covariance structures and spectral contents. We also found that, for both the real and simulated data, CAPs were clustered into spatially heterogeneous modules. Moreover, for each of the modules in the real data, a spatially similar module was found in the simulated data. The present results suggest that care needs to be taken when interpreting observations drawn from CAP analysis as it does not necessarily reflect non-stationarity or a mixture of states in resting brain activity.


2001 ◽  
Vol 11 (07) ◽  
pp. 1881-1896 ◽  
Author(s):  
D. KUGIUMTZIS

In the analysis of real world data, the surrogate data test is often performed in order to investigate nonlinearity in the data. The null hypothesis of the test is that the original time series is generated from a linear stochastic process possibly undergoing a nonlinear static transform. We argue against reported rejection of the null hypothesis and claims of evidence of nonlinearity based on a single nonlinear statistic. In particular, two schemes for the generation of surrogate data are examined, the amplitude adjusted Fourier transform (AAFT) and the iterated AAFT (IAFFT) and many nonlinear discriminating statistics are used for testing, i.e. the fit with the Volterra series of polynomials and the fit with local average mappings, the mutual information, the correlation dimension, the false nearest neighbors, the largest Lyapunov exponent and simple nonlinear averages (the three point autocorrelation and the time reversal asymmetry). The results on simulated data and real data (EEG and exchange rates) suggest that the test depends on the method and its parameters, the algorithm generating the surrogate data and the observational data of the examined process.


Sign in / Sign up

Export Citation Format

Share Document