Functional brain segmentation using inter-subject correlation in fMRI

AbstractThe human brain continuously processes massive amounts of rich sensory information. To better understand such highly complex brain processes, modern neuroimaging studies are increasingly utilizing experimental setups that better mimic daily-life situations. We propose a new exploratory data-analysis approach, functional segmentation intersubject correlation analysis (FuSeISC), to facilitate the analysis of functional magnetic resonance (fMRI) data sets collected in these experiments. The method provides a new type of functional segmentation of brain areas, not only characterizing areas that display similar processing across subjects but also areas in which processing across subjects is highly variable.We tested FuSeISC using fMRI data sets collected during traditional block-design stimuli (37 subjects) as well as naturalistic auditory narratives (19 subjects). The method identified spatially local and/or bilaterally symmetric clusters in several cortical areas, many of which are known to be processing the types of stimuli used in the experiments. The method is not only prominent for spatial exploration of large fMRI data sets obtained using naturalistic stimuli, but has other potential applications such as generation of a functional brain atlases including both lower-and higher-order processing areas.Finally, as a part of FuSeISC, we propose a criterion-based sparsification of the shared nearest-neighbor graph for detecting clusters in noisy data. In our tests with synthetic data, this technique was superior to well-known clustering methods, such as Ward's method, affinity propagation and K-means++.

Download Full-text

Clustering Based on a Novel Density Estimation Method

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.748.590 ◽

2013 ◽

Vol 748 ◽

pp. 590-594

Author(s):

Li Liao ◽

Yong Gang Lu ◽

Xu Rong Chen

Keyword(s):

Density Estimation ◽

Nearest Neighbor ◽

Mean Shift ◽

Estimation Method ◽

Synthetic Data ◽

Real Data ◽

Data Sets ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Data Set

We propose a novel density estimation method using both the k-nearest neighbor (KNN) graph and the potential field of the data points to capture the local and global data distribution information respectively. The clustering is performed based on the computed density values. A forest of trees is built using each data point as the tree node. And the clusters are formed according to the trees in the forest. The new clustering method is evaluated by comparing with three popular clustering methods, K-means++, Mean Shift and DBSCAN. Experiments on two synthetic data sets and one real data set show that our approach can effectively improve the clustering results.

Download Full-text

The Influence of Hubness on NN-Descent

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019600029 ◽

2019 ◽

Vol 28 (06) ◽

pp. 1960002 ◽

Cited By ~ 3

Author(s):

Brankica Bratić ◽

Michael E. Houle ◽

Vladimir Kurbalija ◽

Vincent Oria ◽

Miloš Radovanović

Keyword(s):

Nearest Neighbor ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Data Sets ◽

Accurate Approximation ◽

K Nearest Neighbor ◽

Major Drawback ◽

Neighbor Graph ◽

Nearest Neighbor Graph ◽

The Cost

The K-nearest neighbor graph (K-NNG) is a data structure used by many machine-learning algorithms. Naive computation of the K-NNG has quadratic time complexity, which in many cases is not efficient enough, producing the need for fast and accurate approximation algorithms. NN-Descent is one such algorithm that is highly efficient, but has a major drawback in that K-NNG approximations are accurate only on data of low intrinsic dimensionality. This paper represents an experimental analysis of this behavior, and investigates possible solutions. Experimental results show that there is a link between the performance of NN-Descent and the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs – points with large in-degrees in the K-NNG. First, we explain how the presence of the hubness phenomenon causes bad NN-Descent performance. In light of that, we propose four NN-Descent variants to alleviate the observed negative inuence of hubs. By evaluating the proposed approaches on several real and synthetic data sets, we conclude that our approaches are more accurate, but often at the cost of higher scan rates.

Download Full-text

Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

Exploring Advances in Interdisciplinary Data Mining and Analytics ◽

10.4018/978-1-61350-474-1.ch006 ◽

2011 ◽

pp. 91-109

Author(s):

Amit Saxena ◽

John Wang

Keyword(s):

Classification Accuracy ◽

Nearest Neighbor ◽

Fitness Function ◽

Synthetic Data ◽

Feature Subset Selection ◽

Second Phase ◽

Data Sets ◽

Feature Subset ◽

K Nearest Neighbor ◽

Two Phase

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.

Download Full-text

An Efficient Clustering Method for Hyperspectral Optimal Band Selection via Shared Nearest Neighbor

Remote Sensing ◽

10.3390/rs11030350 ◽

2019 ◽

Vol 11 (3) ◽

pp. 350 ◽

Cited By ~ 7

Author(s):

Qiang Li ◽

Qi Wang ◽

Xuelong Li

Keyword(s):

Nearest Neighbor ◽

Hyperspectral Image ◽

Local Density ◽

Computational Time ◽

Band Selection ◽

Data Sets ◽

Selection Methods ◽

Clustering Method ◽

Slope Change ◽

Shared Nearest Neighbor

A hyperspectral image (HSI) has many bands, which leads to high correlation between adjacent bands, so it is necessary to find representative subsets before further analysis. To address this issue, band selection is considered as an effective approach that removes redundant bands for HSI. Recently, many band selection methods have been proposed, but the majority of them have extremely poor accuracy in a small number of bands and require multiple iterations, which does not meet the purpose of band selection. Therefore, we propose an efficient clustering method based on shared nearest neighbor (SNNC) for hyperspectral optimal band selection, claiming the following contributions: (1) the local density of each band is obtained by shared nearest neighbor, which can more accurately reflect the local distribution characteristics; (2) in order to acquire a band subset containing a large amount of information, the information entropy is taken as one of the weight factors; (3) a method for automatically selecting the optimal band subset is designed by the slope change. The experimental results reveal that compared with other methods, the proposed method has competitive computational time and the selected bands achieve higher overall classification accuracy on different data sets, especially when the number of bands is small.

Download Full-text

A FAST IMPLEMENTATION OF THE ISODATA CLUSTERING ALGORITHM

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195907002252 ◽

2007 ◽

Vol 17 (01) ◽

pp. 71-103 ◽

Cited By ~ 93

Author(s):

NARGESS MEMARSADEGHI ◽

DAVID M. MOUNT ◽

NATHAN S. NETANYAHU ◽

JACQUELINE LE MOIGNE

Keyword(s):

Clustering Algorithm ◽

Empirical Studies ◽

Synthetic Data ◽

Large Data ◽

Large Data Sets ◽

Cluster Center ◽

Data Sets ◽

Clustering Methods ◽

Sensing Applications ◽

Remote Sensing Applications

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.

Download Full-text

A Nonparametric Model for Multi-Manifold Clustering with Mixture of Gaussians and Graph Consistency

Entropy ◽

10.3390/e20110830 ◽

2018 ◽

Vol 20 (11) ◽

pp. 830 ◽

Cited By ~ 2

Author(s):

Xulun Ye ◽

Jieyu Zhao ◽

Yu Chen

Keyword(s):

Dirichlet Process ◽

Nearest Neighbor ◽

Nonparametric Model ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Cluster Number ◽

Mixture Of Gaussians ◽

Neighbor Graph ◽

Nearest Neighbor Graph ◽

Manifold Clustering

Multi-manifold clustering is among the most fundamental tasks in signal processing and machine learning. Although the existing multi-manifold clustering methods are quite powerful, learning the cluster number automatically from data is still a challenge. In this paper, a novel unsupervised generative clustering approach within the Bayesian nonparametric framework has been proposed. Specifically, our manifold method automatically selects the cluster number with a Dirichlet Process (DP) prior. Then, a DP-based mixture model with constrained Mixture of Gaussians (MoG) is constructed to handle the manifold data. Finally, we integrate our model with the k-nearest neighbor graph to capture the manifold geometric information. An efficient optimization algorithm has also been derived to do the model inference and optimization. Experimental results on synthetic datasets and real-world benchmark datasets exhibit the effectiveness of this new DP-based manifold method.

Download Full-text

A Weight-Adaptive Laplacian Embedding for Graph-Based Clustering

Neural Computation ◽

10.1162/neco_a_00973 ◽

2017 ◽

Vol 29 (7) ◽

pp. 1902-1918 ◽

Cited By ~ 6

Author(s):

De Cheng ◽

Feiping Nie ◽

Jiande Sun ◽

Yihong Gong

Keyword(s):

Input Data ◽

Synthetic Data ◽

Data Sets ◽

Clustering Methods ◽

Data Similarity ◽

Initial Graph ◽

L2 Norm ◽

Data Graph ◽

Fixed Input ◽

Graph Based Clustering

Graph-based clustering methods perform clustering on a fixed input data graph. Thus such clustering results are sensitive to the particular graph construction. If this initial construction is of low quality, the resulting clustering may also be of low quality. We address this drawback by allowing the data graph itself to be adaptively adjusted in the clustering procedure. In particular, our proposed weight adaptive Laplacian (WAL) method learns a new data similarity matrix that can adaptively adjust the initial graph according to the similarity weight in the input data graph. We develop three versions of these methods based on the L2-norm, fuzzy entropy regularizer, and another exponential-based weight strategy, that yield three new graph-based clustering objectives. We derive optimization algorithms to solve these objectives. Experimental results on synthetic data sets and real-world benchmark data sets exhibit the effectiveness of these new graph-based clustering methods.

Download Full-text

Efficient K-Nearest Neighbor Graph Construction Using MapReduce for Large-Scale Data Sets

IEICE Transactions on Information and Systems ◽

10.1587/transinf.2014edp7108 ◽

2014 ◽

Vol E97.D (12) ◽

pp. 3142-3154 ◽

Cited By ~ 1

Author(s):

Tomohiro WARASHINA ◽

Kazuo AOYAMA ◽

Hiroshi SAWADA ◽

Takashi HATTORI

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Data Sets ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Large Scale Data ◽

Nearest Neighbor Graph ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Locality Sensitive Semi-Supervised Dimensionality Reduction on Multimodal Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.148-149.258 ◽

2011 ◽

Vol 148-149 ◽

pp. 258-261

Author(s):

Zhi Kai Zhao ◽

Jian Sheng Qian

Keyword(s):

Dimensionality Reduction ◽

Nearest Neighbor ◽

Special Kind ◽

Learning Performance ◽

Data Sets ◽

Multimodal Data ◽

Neighbor Graph ◽

Locality Preserving ◽

Label Information ◽

Nearest Neighbor Graph

A special kind of data is considered in this paper called multimodal data. It has the property that samples in a class are from several separate clusters. Locality Preserving Projection (LPP) can work well with multimodal data due to its locality preserving property. However, the label information is not used to improve the learning performance due to the unsupervised character of LPP. In this paper, we propose a method called Locality Sensitive Semi-Supervised Dimensionality Reduction (semi-LSDR). It takes both the discriminant information and geometry structure into account. Specifically, we construct a between-class graph on labeled samples and a nearest neighbor graph both from the perspective of locality. A directly mapping can be achieved by solving a generalized eigenvalue problem. Effectiveness of the proposed method is showed through simulations with benchmark data sets.

Download Full-text

Understanding the SNN Input Parameters and How They Affect the Clustering Results

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2015070102 ◽

2015 ◽

Vol 11 (3) ◽

pp. 26-48 ◽

Cited By ~ 3

Author(s):

Guilherme Moreira ◽

Maribel Yasmina Santos ◽

João Moura Pires ◽

João Galvão

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Data Sets ◽

Comprehensive Understanding ◽

Analysis Process ◽

Arduous Task ◽

Input Parameters ◽

Definition Of ◽

Shared Nearest Neighbor

Huge amounts of data are available for analysis in nowadays organizations, which are facing several challenges when trying to analyze the generated data with the aim of extracting useful information. This analytical capability needs to be enhanced with tools capable of dealing with big data sets without making the analytical process an arduous task. Clustering is usually used in the data analysis process, as this technique does not require any prior knowledge about the data. However, clustering algorithms usually require one or more input parameters that influence the clustering process and the results that can be obtained. This work analyses the relation between the three input parameters of the SNN (Shared Nearest Neighbor) clustering algorithm, providing a comprehensive understanding of the relationships that were identified between k, Eps and MinPts, the algorithm's input parameters. Moreover, this work also proposes specific guidelines for the definition of the appropriate input parameters, optimizing the processing time, as the number of trials needed to achieve appropriate results can be substantial reduced.

Download Full-text