A Novel Strategy for Retrieving Large Scale Scene Images Based on Emotional Feature Clustering

Due to complicated data structure, image can present rich information, and so images are applied widely at different fields. Although the image can offer a lot of convenience, handling such data consume much time and multi-dimensional space. Especially when users need to retrieve some images from larger-scale image datasets, the disadvantage is more obvious. So, in order to retrieve larger-scale image data effectively, a scene images retrieval strategy based on the MapReduce parallel programming model is proposed. The proposed strategy first, investigates how to effectively store large-scale scene images under a Hadoop cluster parallel processing architecture. Second, a distributed feature clustering algorithm MeanShift is introduced to implement the clustering process of emotional feature of scene images. Finally, several experiments are conducted to verify the effectiveness and efficiency of the proposed strategy in terms of different aspects such as retrieval accuracy, speedup ratio and efficiency and data scalability.

Download Full-text

Parallel Implementation of Improved K-Means Based on a Cloud Platform

Information Technology And Control ◽

10.5755/j01.itc.48.4.23881 ◽

2019 ◽

Vol 48 (4) ◽

pp. 673-681

Author(s):

Shufen Zhang ◽

Zhiyu Liu ◽

Xuebin Chen ◽

Changyin Luo

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Programming Model ◽

Parallel Implementation ◽

Clustering Algorithms ◽

Data Set ◽

Large Scale Data ◽

Sample Density ◽

Scale Data ◽

Selection Of

In order to solve the problem of traditional K-Means clustering algorithm in dealing with large-scale data set, a Hadoop K-Means (referred to HKM) clustering algorithm is proposed. Firstly, according to the sample density, the algorithm eliminates the effects of noise points in the data set. Secondly, it optimizes the selection of the initial center point using the thought of the max-min distance. Finally, it uses a MapReduce programming model to realize the parallelization. Experimental results show that the proposed algorithm not only has high accuracy and stability in clustering results, but can also solve the problems of scalability encountered by traditional clustering algorithms in dealing with large scale data.

Download Full-text

Fuzzy Rough C-Mean Based Unsupervised CNN Clustering for Large-Scale Image Data

Applied Sciences ◽

10.3390/app8101869 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1869 ◽

Cited By ~ 3

Author(s):

Saman Riaz ◽

Ali Arshad ◽

Licheng Jiao

Keyword(s):

Deep Learning ◽

Large Scale ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Main Idea ◽

Image Data ◽

Training Image ◽

Stochastic Gradient Descent ◽

Cluster Center ◽

Clustering Method

Deep learning has been well-known for a couple of years, and it indicates incredible possibilities for unsupervised learning of representations with the clustering algorithm. The forms of Convolution Neural Networks (CNN) are now state-of-the-art for many recognition and clustering tasks. However, with the perpetual incrementation of digital images, there exist more and more redundant, irrelevant, and noisy samples which cause CNN running to gradually decrease, and its clustering accuracy decreases concurrently. To conquer these issues, we proposed an effective clustering method for a large-scale image dataset which combines CNN and a Fuzzy-Rough C-Mean (FRCM) clustering algorithm. The main idea is that first a high-level representation, learned by multi-layers of CNN with one clustering layer, produce the initial cluster center, then during training image clusters, and representations, are updating jointly. FRCM is utilized to update the cluster centers in the forward pass, while the parameters of proposed CNN are updated by the backward pass based on Stochastic Gradient Descent (SGD). The concept of the rough set of lower and boundary approximations deal with uncertainty, vagueness, and incompleteness in cluster definition, and fuzzy sets enable efficient handling of overlapping partitions in the noisy environment. The experiment results show that the proposed FRCM based unsupervised CNN clustering method is better than the standard K-Mean, Fuzzy C-Mean, FRCM and also other deep-learning-based clustering algorithms on large-scale image data.

Download Full-text

A New Approach for Large-Scale Scene Image Retrieval Based on Improved Parallelk-Means Algorithm in MapReduce Environment

Mathematical Problems in Engineering ◽

10.1155/2016/3593975 ◽

2016 ◽

Vol 2016 ◽

pp. 1-17 ◽

Cited By ~ 4

Author(s):

Jianfang Cao ◽

Min Wang ◽

Hao Shi ◽

Guohua Hu ◽

Yun Tian

Keyword(s):

Image Retrieval ◽

Large Scale ◽

Clustering Algorithm ◽

Image Data ◽

Data Retrieval ◽

Single Node ◽

New Approach ◽

Scene Image ◽

Computational Performance ◽

Traditional Image

The rapid growth of digital images has caused the traditional image retrieval technology to be faced with new challenge. In this paper we introduce a new approach for large-scale scene image retrieval to solve the problems of massive image processing using traditional image retrieval methods. First, we improved traditionalk-Means clustering algorithm, which optimized the selection of the initial cluster centers and iteration procedure. Second, we presented a parallel design and realization method for improvedk-Means algorithm applied it to feature clustering of scene images. Finally, a storage and retrieval scheme for large-scale scene images was put forward using the large storage capacity and powerful parallel computing ability of the Hadoop distributed platform. The experimental results demonstrated that the proposed method achieved good performance. Compared with the traditional algorithms with single node architecture and parallelk-Means algorithm, the proposed method has obvious advantages for use in large-scale scene image data retrieval in terms of retrieval accuracy, retrieval time overhead, and computational performance (speedup and efficiency, sizeup, and scaleup), which is a significant improvement from applying parallel processing to intelligent algorithms with large-scale datasets.

Download Full-text

Neuronal classification from network connectivity via adjacency spectral embedding

Network Neuroscience ◽

10.1162/netn_a_00195 ◽

2021 ◽

pp. 1-35

Author(s):

Ketan Mehta ◽

Rebecca F. Goldin ◽

David Marchette ◽

Joshua T. Vogelstein ◽

Carey E. Priebe ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Probability Distributions ◽

Network Connectivity ◽

Gaussian Mixture ◽

Model Parameters ◽

Agglomerative Clustering ◽

Spectral Embedding ◽

Hierarchical Agglomerative Clustering ◽

Novel Strategy

Abstract This work presents a novel strategy for classifying neurons, represented by nodes of a directed graph, based on their circuitry (edge connectivity). We assume a stochastic block model (SBM) in which neurons belong together if they connect to neurons of other groups according to the same probability distributions. Following adjacency spectral embedding of the SBM graph, we derive the number of classes and assign each neuron to a class with a Gaussian mixture model-based expectation-maximization (EM) clustering algorithm. To improve accuracy, we introduce a simple variation using random hierarchical agglomerative clustering to initialize the EM algorithm and picking the best solution over multiple EM restarts. We test this procedure on a large (≈212–215 neurons), sparse, biologically inspired connectome with eight neuron classes. The simulation results demonstrate that the proposed approach is broadly stable to the choice of embedding dimension, and scales extremely well as the number of neurons in the network increases. Clustering accuracy is robust to variations in model parameters and highly tolerant to simulated experimental noise, achieving perfect classifications with up to 40% of swapped edges. Thus, this approach may be useful to analyze and interpret large-scale brain connectomics data in terms of underlying cellular components.

Download Full-text

Neuronal Classification from Network Connectivity via Adjacency Spectral Embedding

10.1101/2020.06.18.160259 ◽

2020 ◽

Author(s):

Ketan Mehta ◽

Rebecca F. Goldin ◽

David Marchette ◽

Joshua T. Vogelstein ◽

Carey E. Priebe ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Probability Distributions ◽

Network Connectivity ◽

Gaussian Mixture ◽

Model Parameters ◽

Agglomerative Clustering ◽

Spectral Embedding ◽

Hierarchical Agglomerative Clustering ◽

Novel Strategy

AbstractThis work presents a novel strategy for classifying neurons, represented by nodes of a directed graph, based on their circuitry (edge connectivity). We assume a stochastic block model (SBM) where neurons belong together if they connect to neurons of other groups according to the same probability distributions. Following adjacency spectral embedding (ASE) of the SBM graph, we derive the number of classes and assign each neuron to a class with a Gaussian mixture model-based expectation-maximization (EM) clustering algorithm. To improve accuracy, we introduce a simple variation using random hierarchical agglomerative clustering to initialize the EM algorithm and picking the best solution over multiple EM restarts. We test this procedure on a large (n ~ 212 − 215 neurons), sparse, biologically inspired connectome with eight neuron classes. The simulation results demonstrate that the proposed approach is broadly stable to the choice of dimensional embedding and scales extremely well as the number of neurons in the network increases. Clustering accuracy is robust to variations in model parameters and highly tolerant to simulated experimental noise, achieving perfect classifications with up to 40% of swapped edges. Thus, this approach may be useful to analyze and interpret large-scale brain connectomics data in terms of underlying cellular components.

Download Full-text

A DISTRIBUTED POLYGON RETRIEVAL ALGORITHM USING MAPREDUCE

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-4-w2-51-2015 ◽

2015 ◽

Vol II-4/W2 ◽

pp. 51-53

Author(s):

Q. Guo ◽

B. Palanisamy ◽

H. A. Karimi

Keyword(s):

Data Processing ◽

Spatial Data ◽

Large Scale ◽

Programming Model ◽

Processing Technique ◽

Spatial Data Analysis ◽

Retrieval Algorithm ◽

Quad Tree ◽

Parallel Data ◽

Hadoop Cluster

The burst of large-scale spatial terrain data due to the proliferation of data acquisition devices like 3D laser scanners poses challenges to spatial data analysis and computation. Among many spatial analyses and computations, polygon retrieval is a fundamental operation which is often performed under real-time constraints. However, existing sequential algorithms fail to meet this demand for larger sizes of terrain data. Motivated by the MapReduce programming model, a well-adopted large-scale parallel data processing technique, we present a MapReduce-based polygon retrieval algorithm designed with the objective of reducing the IO and CPU loads of spatial data processing. By indexing the data based on a quad-tree approach, a significant amount of unneeded data is filtered in the filtering stage and it reduces the IO overhead. The indexed data also facilitates querying the relationship between the terrain data and query area in shorter time. The results of the experiments performed in our Hadoop cluster demonstrate that our algorithm performs significantly better than the existing distributed algorithms.

Download Full-text

Large-Scale Biomolecular Dynamics Using SMP Clusters

12th International Conference on Nuclear Engineering, Volume 1 ◽

10.1115/icone12-49573 ◽

2004 ◽

Author(s):

Masaaki Suzuki ◽

Hiroshi Okuda ◽

Genki Yagawa

Keyword(s):

Parallel Programming ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Md Simulation ◽

Programming Model ◽

Fast Multipole Method ◽

Long Distance ◽

Parallel Efficiency ◽

Parallel Programming Model

The authors have applied Message Passing Interface (MPI) / OpenMP hybrid parallel programming model to molecular dynamics (MD) method for simulating a protein structure on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directives such as OpenMP for intra-SMP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in case that the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows: Without using FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: - 90% with the hybrid style, - 75% with the flat-MPI style, for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: - 60% with the hybrid style, - 48% with the flat-MPI style, for MD simulation with 117,649 atoms.

Download Full-text

WAPM: A parallel programming model in large scale Internet distributed computing environments

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.02161 ◽

2009 ◽

Vol 29 (8) ◽

pp. 2161-2166 ◽

Cited By ~ 2

Author(s):

Chong-guo FU ◽

Sheng-chao XU

Keyword(s):

Distributed Computing ◽

Parallel Programming ◽

Large Scale ◽

Programming Model ◽

Parallel Programming Model ◽

Computing Environments

Download Full-text

A Fast Clustering Algorithm for Large-scale and High Dimensional Data

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.00859 ◽

2009 ◽

Vol 35 (7) ◽

pp. 859-866

Author(s):

Ming LIU ◽

Xiao-Long WANG ◽

Yuan-Chao LIU

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

A Novel Unsupervised Classification Method for Sandy Land Using Fully Polarimetric SAR Data

Remote Sensing ◽

10.3390/rs13030355 ◽

2021 ◽

Vol 13 (3) ◽

pp. 355

Author(s):

Weixian Tan ◽

Borong Sun ◽

Chenyu Xiao ◽

Pingping Huang ◽

Wei Xu ◽

...

Keyword(s):

Spectral Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Feature Vector ◽

Unsupervised Classification ◽

Classification Method ◽

Sandy Land ◽

Classification Methods ◽

The Many ◽

Representative Points

Classification based on polarimetric synthetic aperture radar (PolSAR) images is an emerging technology, and recent years have seen the introduction of various classification methods that have been proven to be effective to identify typical features of many terrain types. Among the many regions of the study, the Hunshandake Sandy Land in Inner Mongolia, China stands out for its vast area of sandy land, variety of ground objects, and intricate structure, with more irregular characteristics than conventional land cover. Accounting for the particular surface features of the Hunshandake Sandy Land, an unsupervised classification method based on new decomposition and large-scale spectral clustering with superpixels (ND-LSC) is proposed in this study. Firstly, the polarization scattering parameters are extracted through a new decomposition, rather than other decomposition approaches, which gives rise to more accurate feature vector estimate. Secondly, a large-scale spectral clustering is applied as appropriate to meet the massive land and complex terrain. More specifically, this involves a beginning sub-step of superpixels generation via the Adaptive Simple Linear Iterative Clustering (ASLIC) algorithm when the feature vector combined with the spatial coordinate information are employed as input, and subsequently a sub-step of representative points selection as well as bipartite graph formation, followed by the spectral clustering algorithm to complete the classification task. Finally, testing and analysis are conducted on the RADARSAT-2 fully PolSAR dataset acquired over the Hunshandake Sandy Land in 2016. Both qualitative and quantitative experiments compared with several classification methods are conducted to show that proposed method can significantly improve performance on classification.

Download Full-text