How to interpret attentional blink findings? A practical MCMC tool to assess the attentional blink with the eSTST model

Mapping Intimacies ◽

10.31234/osf.io/643fq ◽

2021 ◽

Author(s):

Shekoofeh Hedayati ◽

Brad Wyble

Keyword(s):

Attentional Blink ◽

Monte Carlo Algorithm ◽

Model Parameters ◽

Data Sets ◽

Individual Participant Data ◽

Data Set ◽

Attentional Deployment ◽

Individual Participant ◽

Data Points ◽

High Uncertainty

Previous research has shown that Attentional blink (AB) data can differ between tasks, or subjects and it can be challenging to interpret these differences. In this paper, we provided a ready-to-use tool that allows researchers to map their data onto the episodic Simultaneous Type, Serial Token (eSTST) model. This tool uses the Markov Chain Monte Carlo algorithm to find the best set of 3 model parameters to simulate a given AB pattern. These 3 parameters have cognitive interpretations, such that differences in these parameters between different paradigms can be used for inferences about the timing of attentional deployment or the encoding of memory. Additionally, our tool allows for a combination of quantitative fitting against the overall pattern of data points, and qualitative fitting for theoretically important features. We demonstrate the algorithm using several data sets, showing that it can find cognitively interpretable parameter sets for some of them, but fails to find a good fit for one data set. This indicates an explanatory boundary of the eSTST model. Finally, we provide a feature to avoid overfitting of individual data points with high uncertainty, such as in the case of individual participant data.

Download Full-text

Bayesian Inference of Species Trees using Diffusion Models

Systematic Biology ◽

10.1093/sysbio/syaa051 ◽

2020 ◽

Vol 70 (1) ◽

pp. 145-161 ◽

Cited By ~ 1

Author(s):

Marnus Stoltz ◽

Boris Baeumer ◽

Remco Bouckaert ◽

Colin Fox ◽

Gordon Hiscott ◽

...

Keyword(s):

Bayesian Inference ◽

Numerical Algorithms ◽

Diffusion Models ◽

Model Parameters ◽

Data Sets ◽

Species Trees ◽

Computationally Efficient ◽

Data Set ◽

Snp Data ◽

Binary Markers

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

ON THE APPLICATION OF METHODS USED TO CALCULATE THE FRACTAL DIMENSION OF FRACTURE SURFACES

Fractals ◽

10.1142/s0218348x01000464 ◽

2001 ◽

Vol 09 (01) ◽

pp. 105-128 ◽

Cited By ~ 26

Author(s):

TAYFUN BABADAGLI ◽

KAYHAN DEVELI

Keyword(s):

Fractal Dimension ◽

Fractal Dimensions ◽

Natural Fracture ◽

Data Sets ◽

Data Set ◽

Fracture Surfaces ◽

Power Spectral ◽

Data Points ◽

2D Data ◽

Fracturing Mechanism

This paper presents an evaluation of the methods applied to calculate the fractal dimension of fracture surfaces. Variogram (applicable to 1D self-affine sets) and power spectral density analyses (applicable to 2D self-affine sets) are selected to calculate the fractal dimension of synthetic 2D data sets generated using fractional Brownian motion (fBm). Then, the calculated values are compared with the actual fractal dimensions assigned in the generation of the synthetic surfaces. The main factor considered is the size of the 2D data set (number of data points). The critical sample size that yields the best agreement between the calculated and actual values is defined for each method. Limitations and the proper use of each method are clarified after an extensive analysis. The two methods are also applied to synthetically and naturally developed fracture surfaces of different types of rocks. The methods yield inconsistent fractal dimensions for natural fracture surfaces and the reasons of this are discussed. The anisotropic feature of fractal dimension that may lead to a correlation of fracturing mechanism and multifractality of the fracture surfaces is also addressed.

Download Full-text

SPSM: A NEW HYBRID DATA CLUSTERING ALGORITHM FOR NONLINEAR DATA ANALYSIS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001409007685 ◽

2009 ◽

Vol 23 (08) ◽

pp. 1701-1737 ◽

Cited By ~ 3

Author(s):

UREERAT WATTANACHON ◽

CHIDCHANOK LURSINSAP

Keyword(s):

Clustering Algorithm ◽

Color Image ◽

Clustering Algorithms ◽

Noisy Data ◽

Second Phase ◽

Data Sets ◽

Data Set ◽

Cluster Distance ◽

Data Points ◽

Hybrid Data

Existing clustering algorithms, such as single-link clustering, k-means, CURE, and CSM are designed to find clusters based on predefined parameters specified by users. These algorithms may be unsuccessful if the choice of parameters is inappropriate with respect to the data set being clustered. Most of these algorithms work very well for compact and hyper-spherical clusters. In this paper, a new hybrid clustering algorithm called Self-Partition and Self-Merging (SPSM) is proposed. The SPSM algorithm partitions the input data set into several subclusters in the first phase and, then, removes the noisy data in the second phase. In the third phase, the normal subclusters are continuously merged to form the larger clusters based on the inter-cluster distance and intra-cluster distance criteria. From the experimental results, the SPSM algorithm is very efficient to handle the noisy data set, and to cluster the data sets of arbitrary shapes of different density. Several examples for color image show the versatility of the proposed method and compare with results described in the literature for the same images. The computational complexity of the SPSM algorithm is O(N2), where N is the number of data points.

Download Full-text

A dynamic K-means clustering for data mining

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i2.pp521-526 ◽

2019 ◽

Vol 13 (2) ◽

pp. 521

Author(s):

Md. Zakir Hossain ◽

Md.Nasim Akhtar ◽

R.B. Ahmad ◽

Mostafijur Rahman

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Large Data ◽

Threshold Value ◽

Specific Pattern ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

Data Points

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>

Download Full-text

Extreme data compression while searching for new physics

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa2589 ◽

2020 ◽

Vol 498 (3) ◽

pp. 3440-3451

Author(s):

Alan F Heavens ◽

Elena Sellentin ◽

Andrew H Jaffe

Keyword(s):

Data Compression ◽

Principal Component ◽

New Physics ◽

Standard Theory ◽

Data Sets ◽

Data Set ◽

Formidable Challenge ◽

Public Data ◽

Bayesian Evidence ◽

Data Points

ABSTRACT Bringing a high-dimensional data set into science-ready shape is a formidable challenge that often necessitates data compression. Compression has accordingly become a key consideration for contemporary cosmology, affecting public data releases, and reanalyses searching for new physics. However, data compression optimized for a particular model can suppress signs of new physics, or even remove them altogether. We therefore provide a solution for exploring new physics during data compression. In particular, we store additional agnostic compressed data points, selected to enable precise constraints of non-standard physics at a later date. Our procedure is based on the maximal compression of the MOPED algorithm, which optimally filters the data with respect to a baseline model. We select additional filters, based on a generalized principal component analysis, which are carefully constructed to scout for new physics at high precision and speed. We refer to the augmented set of filters as MOPED-PC. They enable an analytic computation of Bayesian Evidence that may indicate the presence of new physics, and fast analytic estimates of best-fitting parameters when adopting a specific non-standard theory, without further expensive MCMC analysis. As there may be large numbers of non-standard theories, the speed of the method becomes essential. Should no new physics be found, then our approach preserves the precision of the standard parameters. As a result, we achieve very rapid and maximally precise constraints of standard and non-standard physics, with a technique that scales well to large dimensional data sets.

Download Full-text

The Adjusted Log-logistic Generalized Exponential Distribution with Application to Lifetime Data

International Journal of Statistics and Probability ◽

10.5539/ijsp.v6n4p1 ◽

2017 ◽

Vol 5 (4) ◽

pp. 1

Author(s):

I. E. Okorie ◽

A. C. Akpanta ◽

J. Ohakwe ◽

D. C. Chikezie ◽

C. U. Onyemachi ◽

...

Keyword(s):

Exponential Distribution ◽

Likelihood Estimation ◽

Model Parameters ◽

Data Sets ◽

Lifetime Data ◽

Generalized Exponential Distribution ◽

Data Set ◽

Method Of Maximum Likelihood ◽

The One ◽

Special Case

This paper introduces a new generator of probability distribution-the adjusted log-logistic generalized (ALLoG) distribution and a new extension of the standard one parameter exponential distribution called the adjusted log-logistic generalized exponential (ALLoGExp) distribution. The ALLoGExp distribution is a special case of the ALLoG distribution and we have provided some of its statistical and reliability properties. Notably, the failure rate could be monotonically decreasing, increasing or upside-down bathtub shaped depending on the value of the parameters $\delta$ and $\theta$. The method of maximum likelihood estimation was proposed to estimate the model parameters. The importance and flexibility of he ALLoGExp distribution was demonstrated with a real and uncensored lifetime data set and its fit was compared with five other exponential related distributions. The results obtained from the model fittings shows that the ALLoGExp distribution provides a reasonably better fit than the one based on the other fitted distributions. The ALLoGExp distribution is therefore ecommended for effective modelling of lifetime data sets.

Download Full-text

Determination of Optimal Clusters Using a Genetic Algorithm

Data Mining and Knowledge Discovery Technologies ◽

10.4018/978-1-59904-960-1.ch005 ◽

2008 ◽

pp. 98-117 ◽

Cited By ~ 1

Author(s):

Tushar ◽

Shibendu Shekhar Roy ◽

Dilip Kumar Pratihar

Keyword(s):

Genetic Algorithm ◽

Threshold Value ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Fcm Algorithm ◽

Data Points ◽

The Relationship

Clustering is a potential tool of data mining. A clustering method analyzes the pattern of a data set and groups the data into several clusters based on the similarity among themselves. Clusters may be either crisp or fuzzy in nature. The present chapter deals with clustering of some data sets using Fuzzy C-Means (FCM) algorithm and Entropy-based Fuzzy Clustering (EFC) algorithm. In FCM algorithm, the nature and quality of clusters depend on the pre-defined number of clusters, level of cluster fuzziness and a threshold value utilized for obtaining the number of outliers (if any). On the other hand, the quality of clusters obtained by the EFC algorithm is dependent on a constant used to establish the relationship between the distance and similarity of two data points, a threshold value of similarity and another threshold value used for determining the number of outliers. The clusters should ideally be distinct and at the same time compact in nature. Moreover, the number of outliers should be as minimum as possible. Thus, the above problem may be posed as an optimization problem, which will be solved using a Genetic Algorithm (GA). The best set of multi-dimensional clusters will be mapped into 2-D for visualization using a Self-Organizing Map (SOM).

Download Full-text

Apple Disease Recognition Based on Small-scale Data Sets

Applied Engineering in Agriculture ◽

10.13031/aea.14187 ◽

2021 ◽

Vol 37 (3) ◽

pp. 481-490

Author(s):

Chenyong Song ◽

Dongwei Wang ◽

Haoran Bai ◽

Weihao Sun

Keyword(s):

Identification Accuracy ◽

Model Complexity ◽

Geometric Transformation ◽

Small Scale ◽

Model Parameters ◽

Data Sets ◽

List Type ◽

Data Set ◽

Disease Identification ◽

Scale Data

HighlightsThe proposed data enhancement method can be used for small-scale data sets with rich sample image features.The accuracy of the new model reaches 98.5%, which is better than the traditional CNN method.Abstract: GoogLeNet offers far better performance in identifying apple disease compared to traditional methods. However, the complexity of GoogLeNet is relatively high. For small volumes of data, GoogLeNet does not achieve the same performance as it does with large-scale data. We propose a new apple disease identification model using GoogLeNet’s inception module. The model adopts a variety of methods to optimize its generalization ability. First, geometric transformation and image modification of data enhancement methods (including rotation, scaling, noise interference, random elimination, color space enhancement) and random probability and appropriate combination of strategies are used to amplify the data set. Second, we employ a deep convolution generative adversarial network (DCGAN) to enhance the richness of generated images by increasing the diversity of the noise distribution of the generator. Finally, we optimize the GoogLeNet model structure to reduce model complexity and model parameters, making it more suitable for identifying apple tree diseases. The experimental results show that our approach quickly detects and classifies apple diseases including rust, spotted leaf disease, and anthrax. It outperforms the original GoogLeNet in recognition accuracy and model size, with identification accuracy reaching 98.5%, making it a feasible method for apple disease classification. Keywords: Apple disease identification, Data enhancement, DCGAN, GoogLeNet.

Download Full-text