artificial datasets
Recently Published Documents


TOTAL DOCUMENTS

46
(FIVE YEARS 17)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Vol 1 ◽  
Author(s):  
Lucie Tamisier ◽  
Annelies Haegeman ◽  
Yoika Foucart ◽  
Nicolas Fouillien ◽  
Maher Al Rwahnih ◽  
...  

2021 ◽  
Author(s):  
Robson T. Paula ◽  
Décio G. Aguiar Neto ◽  
Davi Romero ◽  
Paulo T. Guerra

A chatbot is an artificial intelligence based system aimed at chatting with users, commonly used as a virtual assistant to help people or answer questions. Intent classification is an essential task for chatbots where it aims to identify what the user wants in a certain dialogue. However, for many domains, little data are available to properly train those systems. In this work, we evaluate the performance of two methods to generate synthetic data for chatbots, one based on template questions and another based on neural text generation. We build four datasets that are used training chatbot components in the intent classification task. We intend to simulate the task of migrating a search-based portal to an interactive dialogue-based information service by using artificial datasets for initial model training. Our results show that template-based datasets are slightly superior to those neural-based generated in our application domain, however, neural-generated present good results and they are a viable option when one has limited access to domain experts to hand-code text templates.


Author(s):  
Péter Szabó ◽  
Péter Barthó

AbstractRecent advancements in multielectrode methods and spike-sorting algorithms enable the in vivo recording of the activities of many neurons at a high temporal resolution. These datasets offer new opportunities in the investigation of the biological neural code, including the direct testing of specific coding hypotheses, but they also reveal the limitations of present decoder algorithms. Classical methods rely on a manual feature extraction step, resulting in a feature vector, like the firing rates of an ensemble of neurons. In this paper, we present a recurrent neural-network-based decoder and evaluate its performance on experimental and artificial datasets. The experimental datasets were obtained by recording the auditory cortical responses of rats exposed to sound stimuli, while the artificial datasets represent preset encoding schemes. The task of the decoder was to classify the action potential timeseries according to the corresponding sound stimuli. It is illustrated that, depending on the coding scheme, the performance of the recurrent-network-based decoder can exceed the performance of the classical methods. We also show how randomized copies of the training datasets can be used to reveal the role of candidate spike-train features. We conclude that artificial neural network decoders can be a useful alternative to classical population vector-based techniques in studies of the biological neural code.


Author(s):  
Kabir Bindawa Abdullahi

The statistical properties of a good estimator include robustness, unbiasedness, efficiency, and consistency. However, the commonly used estimators of dispersion have lack or are weak in one or more of these properties. In this paper, I proposed statistical mirroring as a good alternative estimator of dispersion around defined location estimates or points. In the main part of the paper, attention is restricted to Gaussian distribution and only estimators of dispersion around the mean that functionalize with all the observations of a dataset were considered at this time. The different estimators were compared with the proposed estimators in terms of alternativeness, scale and sample size robustness, outlier biasedness, and efficiency. Monte Carlo simulation was used to generate artificial datasets for application. The proposed estimators (of statistical meanic mirroring) turn out to be suitable alternative estimators of dispersion that is less biased (more resistant) to contaminations, robust to scale and sample size, and more efficient to a random distribution of variable than the standard deviation, variance, and coefficient of variation. However, statistical meanic mirroring is not suitable with a mean (of a normal distribution) close to zero, and on a scale below ratio level.


Author(s):  
Yoshinao Ishii ◽  
Satoshi Koide ◽  
Keiichiro Hayakawa

AbstractUnsupervised outlier detection without the need for clean data has attracted great attention because it is suitable for real-world problems as a result of its low data collection costs. Reconstruction-based methods are popular approaches for unsupervised outlier detection. These methods decompose a data matrix into low-dimensional manifolds and an error matrix. Then, samples with a large error are detected as outliers. To achieve high outlier detection accuracy, when data are corrupted by large noise, the detection method should have the following two properties: (1) it should be able to decompose the data under the L0-norm constraint on the error matrix and (2) it should be able to reflect the nonlinear features of the data in the manifolds. Despite significant efforts, no method with both of these properties exists. To address this issue, we propose a novel reconstruction-based method: “L0-norm constrained autoencoders (L0-AE).” L0-AE uses autoencoders to learn low-dimensional manifolds that capture the nonlinear features of the data and uses a novel optimization algorithm that can decompose the data under the L0-norm constraints on the error matrix. This novel L0-AE algorithm provably guarantees the convergence of the optimization if the autoencoder is trained appropriately. The experimental results show that L0-AE is more robust, accurate and stable than other unsupervised outlier detection methods not only for artificial datasets with corrupted samples but also artificial datasets with well-known outlier distributions and real datasets. Additionally, the results show that the accuracy of L0-AE is moderately stable to changes in the parameter of the constrained term, and for real datasets, L0-AE achieves higher accuracy than the baseline non-robustified method for most parameter values.


Author(s):  
Yuto Kingetsu ◽  
Yukihiro Hamasuna ◽  
◽  

Several conventional clustering methods use the squared L2-norm as the dissimilarity. The squared L2-norm is calculated from only the object coordinates and obtains a linear cluster boundary. To extract meaningful cluster partitions from a set of massive objects, it is necessary to obtain cluster partitions that consisting of complex cluster boundaries. In this study, a JS-divergence-based k-medoids (JSKMdd) is proposed. In the proposed method, JS-divergence, which is calculated from the object distribution, is considered as the dissimilarity. The object distribution is estimated from kernel density estimation to calculate the dissimilarity based on both the object coordinates and their neighbors. Numerical experiments were conducted using five artificial datasets to verify the effectiveness of the proposed method. In the numerical experiments, the proposed method was compared with the k-means clustering, k-medoids clustering, and spectral clustering. The results show that the proposed method yields better results in terms of clustering performance than other conventional methods.


2021 ◽  
Vol 25 (1) ◽  
pp. 321-331
Author(s):  
Wei Hu ◽  
Bing Si

Abstract. Bivariate wavelet coherency is a measure of correlation between two variables in the location–scale (spatial data) or time–frequency (time series) domain. It is particularly suited to geoscience, where relationships between multiple variables differ with locations (times) and/or scales (frequencies) because of the various processes involved. However, it is well-known that bivariate relationships can be misleading when both variables are dependent on other variables. Partial wavelet coherency (PWC) has been proposed to detect scale-specific and localized bivariate relationships by excluding the effects of other variables but is limited to one excluding variable and provides no phase information. We aim to develop a new PWC method that can deal with multiple excluding variables and provide phase information. Both stationary and non-stationary artificial datasets with the response variable being the sum of five cosine waves at 256 locations are used to test the method. The new method was also applied to a free water evaporation dataset. Our results verified the advantages of the new method in capturing phase information and dealing with multiple excluding variables. Where there is one excluding variable, the new PWC implementation produces higher and more accurate PWC values than the previously published PWC implementation that mistakenly considered bivariate real coherence rather than bivariate complex coherence. We suggest the PWC method is used to untangle scale-specific and localized bivariate relationships after removing the effects of other variables in geosciences. The PWC implementations were coded with Matlab and are freely accessible (https://figshare.com/s/bc97956f43fe5734c784, last access: 14 January 2021).


2020 ◽  
Author(s):  
Péter Szabó ◽  
Péter Barthó

Recent advancements in multielectrode methods and spike-sorting algorithms enable the in vivo recording of the activities of many neurons at a high temporal resolution. These datasets offer new opportunities in the investigation of the biological neural code, including the direct testing of specific coding hypotheses, but they also reveal the limitations of present decoder algorithms. Classical methods rely on a manual feature extraction step, resulting in a feature vector, like the firing rates of an ensemble of neurons. In this paper, we present a recurrent neural-network based decoder and evaluate its performance on experimental and artificial datasets. The experimental datasets were obtained by recording the auditory cortical responses of rats exposed to sound stimuli, while the artificial datasets represent preset encoding schemes. We illustrate that, depending on the coding scheme, the performance of the recurrent-network based encoder can exceed the performance of the classical methods. We also show how randomized copies of the training datasets can be used to reveal the role of candidate spike-train features.


2020 ◽  
Author(s):  
yassmine Soussi ◽  
Nizar Rokbani ◽  
Ali Wali ◽  
Adel Alimi

This paper defines a new Moth-Flame optimization version with Quantum behaved moths, QMFO. The multi-objective version of QMFO (MOQMFO) is then applied to solve clustering problems. MOQMFO used three cluster validity criteria as objective functions (the I-index, Con-index and Sym-index) to establish the multi-objective clustering optimization. This paper details the proposal and the preliminary obtained results for clustering real-life datasets (including Iris, Cancer, Newthyroid, Wine, LiverDisorder and Glass) and artificial datasets (including Sph_5_2, Sph_4_3, Sph_6_2, Sph_10_2, Sph_9_2, Pat 1, Pat 2, Long 1, Sizes 5, Spiral, Square 1, Square 4, Twenty and Fourty). Compared with key multi-objectives clustering techniques, the proposal showed interesting results essentially for Iris, Newthyroid, Wine, LiverDisorder, Sph_4_3, Sph_6_2, Long 1, Sizes 5, Twenty and Fourty; and was able to provide the exact number of clusters for all datasets.


Sign in / Sign up

Export Citation Format

Share Document