GENERAL IMAGE CLASSIFICATION USING ADAPTIVE CELLULAR COLOR DECOMPOSITION

Author(s):  
CHIH-HSIEN LO ◽  
SHU-YUAN CHEN

In this paper, a coarse-to-fine hierarchical classification method based on the features derived from adaptive cellular color decomposition is proposed. The proposed method is general and can be applied to all kinds of color image databases as long as a sample set of images have been classified. In addition, the number of classes can be as versatile as required. To achieve the goal mentioned above, our method consists of two phases: color quantization and classification. In the color quantization step, cellular decomposition is used to adaptively quantize color images in the HSV color space since H and S components construct a hexagon structure that is same as the cellular pattern. In the classification step, a coarse-to-fine strategy is employed. In the coarse stage, five image-based features extracted directly from the quantization results of the query images are used to prune irrelevant database images. In the fine stage, two cluster-based features are extracted from a small set of candidate images using closest-cluster matching. On the other hand, according to feature evaluation, one image-based and two cluster-based features are selected to derive individual-based similarity measure, which, in turn, is used to measure image-to-image similarity. In addition, class-based similarity measure using class characteristics is proposed to evaluate image-to-class similarity. Candidate images are then sorted according to the similarity measure, which is a combination of individual-based and class-based similarity measures. Finally, k-NN rule is used to assign the query image to a single class according to the sorting results. The effectiveness and practicability of the proposed method have been demonstrated by various experimental results.

2011 ◽  
Vol 2 (1) ◽  
Author(s):  
Vina Chovan Epifania ◽  
Eko Sediyono

Abstract. Image File Searching Based on Color Domination. One characteristic of an image that can be used in image searching process is the composition of the colors. Color is a trait that is easily seen by man in the picture. The use of color as a searching parameter can provide a solution in an easier searching for images stored in computer memory. Color images have RGB values that can be computed and converted into HSL color space model. Use of HSL images model is very easy because it can be calculated using a percent, so that in each pixel of the image can be grouped and named, this can give a dominant values of the colors contained in one image. By obtaining these values, the image search can be done quickly just by using these values to a retrieval system image file. This article discusses the use of the HSL color space model to facilitate the searching for a digital image in the digital image data warehouse. From the test results of the application form, a searching is faster by using the colors specified by the user. Obstacles encountered were still searching with a choice of 15 basic colors available, with a limit of 33% dominance of the color image search was not found. This is due to the dominant color in each image has the most dominant value below 33%.   Keywords: RGB, HSL, image searching Abstrak. Salah satu ciri gambar yang dapat dipergunakan dalam proses pencarian gambar adalah komposisi warna. Warna adalah ciri yang mudah dilihat oleh manusia dalam citra gambar. Penggunaan warna sebagai parameter pencarian dapat memberikan solusi dalam memudahkan pencarian gambar yang tersimpan dalam memori komputer. Warna gambar memiliki nilai RGB yang dapat dihitung dan dikonversi ke dalam model HSL color space. Penggunaan model gambar HSL sangat mudah karena dapat dihitung dengan menggunakan persen, sehingga dalam setiap piksel gambar dapat dikelompokan dan diberi nama, hal ini dapat memberikan suatu nilai dominan dari warna yang terdapat dalam satu gambar. Dengan diperolehnya nilai tersebut, pencarian gambar dapat dilakukan dengan cepat hanya dengan menggunakan nilai tersebut pada sistem pencarian file gambar. Artikel ini membahas tentang penggunaan model HSL color space untuk mempermudah pencarian suatu gambar digital didalam gudang data gambar digital. Dari hasil uji aplikasi yang sudah dibuat, diperoleh pencarian yang lebih cepat dengan menggunakan pilihan warna yang ditentukan sendiri oleh pengguna. Kendala yang masih dijumpai adalah pencarian dengan pilihan 15 warna dasar yang tersedia, dengan batas dominasi warna 33% tidak ditemukan gambar yang dicari. Hal ini disebabkan warna dominan disetiap gambar kebanyakan memiliki nilai dominan di bawah 33%. Kata Kunci: RGB, HSL, pencarian gambar


Author(s):  
B. Mathura Bai ◽  
N. Mangathayaru ◽  
B. Padmaja Rani ◽  
Shadi Aljawarneh

: Missing attribute values in medical datasets are one of the most common problems faced when mining medical datasets. Estimation of missing values is a major challenging task in pre-processing of datasets. Any wrong estimate of missing attribute values can lead to inefficient and improper classification thus resulting in lower classifier accuracies. Similarity measures play a key role during the imputation process. The use of an appropriate and better similarity measure can help to achieve better imputation and improved classification accuracies. This paper proposes a novel imputation measure for finding similarity between missing and non-missing instances in medical datasets. Experiments are carried by applying both the proposed imputation technique and popular benchmark existing imputation techniques. Classification is carried using KNN, J48, SMO and RBFN classifiers. Experiment analysis proved that after imputation of medical records using proposed imputation technique, the resulting classification accuracies reported by the classifiers KNN, J48 and SMO have improved when compared to other existing benchmark imputation techniques.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ali A. Amer ◽  
Hassan I. Abdalla

Abstract Similarity measures have long been utilized in information retrieval and machine learning domains for multi-purposes including text retrieval, text clustering, text summarization, plagiarism detection, and several other text-processing applications. However, the problem with these measures is that, until recently, there has never been one single measure recorded to be highly effective and efficient at the same time. Thus, the quest for an efficient and effective similarity measure is still an open-ended challenge. This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification. Furthermore, the study aims to provide a comprehensive scrutinization for seven of the most widely used similarity measures, mainly concerning their effectiveness and efficiency. Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined in detail. The experimental evaluation has been made on two of the most popular datasets, namely, Reuters-21 and Web-KB. The obtained results confirm that the proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.


Author(s):  
HUA YANG ◽  
MASAAKI KASHIMURA ◽  
NORIKADU ONDA ◽  
SHINJI OZAWA

This paper describes a new system for extracting and classifying bibliography regions from the color image of a book cover. The system consists of three major components: preprocessing, color space segmentation and text region extraction and classification. Preprocessing extracts the edge lines of the book and geometrically corrects and segments the input image, into the parts of front cover, spine and back cover. The same as all color image processing researches, the segmentation of color space is an essential and important step here. Instead of RGB color space, HSI color space is used in this system. The color space is segmented into achromatic and chromatic regions first; and both the achromatic and chromatic regions are segmented further to complete the color space segmentation. Then text region extraction and classification follow. After detecting fundamental features (stroke width and local label width) text regions are determined. By comparing the text regions on front cover with those on spine, all extracted text regions are classified into suitable bibliography categories: author, title, publisher and other information, without applying OCR.


2021 ◽  
Vol 10 (2) ◽  
pp. 90
Author(s):  
Jin Zhu ◽  
Dayu Cheng ◽  
Weiwei Zhang ◽  
Ci Song ◽  
Jie Chen ◽  
...  

People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.


2021 ◽  
Vol 13 (1) ◽  
pp. 1-25
Author(s):  
Michael Loster ◽  
Ioannis Koumarelas ◽  
Felix Naumann

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity—duplicates—into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.


2022 ◽  
Vol 23 (1) ◽  
pp. 116-128
Author(s):  
Baydaa Khaleel

Image retrieval is an important system for retrieving similar images by searching and browsing in a large database. The image retrieval system can be a reliable tool for people to optimize the use of image accumulation, and finding efficient methods to retrieve images is very important. Recent decades have marked increased research interest in field image retrieval. To retrieve the images, an important set of features is used. In this work, a combination of methods was used to examine all the images and detect images in a database according to a query image. Linear Discriminant Analysis (LDA) was used for feature extraction of the images into the dataset. The images in the database were processed by extracting their important and robust features and storing them in the feature store. Likewise, the strong features were extracted for specific query images. By using some Meta Heuristic algorithms such as Cuckoo Search (CS), Ant Colony Optimization (ACO), and using an artificial neural network such as single-layer Perceptron Neural Network (PNN), similarity was evaluated. It also proposed a new two method by hybridized PNN and CS with fuzzy logic to produce a new method called Fuzzy Single Layer Perceptron Neural Network (FPNN), and Fuzzy Cuckoo Search to examine the similarity between features for query images and features for images in the database. The efficiency of the system methods was evaluated by calculating the precision recall value of the results. The proposed method of FCS outperformed other methods such as (PNN), (ACO), (CS), and (FPNN) in terms of precision and image recall. ABSTRAK: Imej dapatan semula adalah sistem penting bagi mendapatkan imej serupa melalui carian imej dan melayari pangkalan besar data. Sistem dapatan semula imej ini boleh dijadikan alat boleh percaya untuk orang mengoptimum penggunaan pengumpulan imej, dan kaedah pencarian yang berkesan bagi mendapatkan imej adalah sangat penting. Beberapa dekad yang lalu telah menunjukan banyak penyelidikan dalam bidang imej dapatan semula. Bagi mendapatkan imej-imej ini, ciri-ciri set penting telah digunakan. Kajian ini menggunakan beberapa kaedah bagi memeriksa semua imej dan mengesan imej dalam pangkalan data berdasarkan imej carian. Kami menggunakan Analisis Diskriminan Linear (LDA) bagi mengekstrak ciri imej ke dalam set data. Imej-imej dalam pangkalan data diproses dengan mengekstrak ciri-ciri penting dan berkesan daripadanya dan menyimpannya dalam simpanan ciri. Begitu juga, ciri-ciri penting ini diekstrak bagi imej carian tertentu. Persamaan dinilai melalui beberapa algoritma Meta Heuristik seperti Carian Cuckoo (CS), Pengoptimuman Koloni Semut (ACO), dan menggunakan lapisan tunggal rangkaian neural buatan seperti Rangkaian Neural Perseptron (PNN). Dua cadangan baru dengan kombinasi hibrid PNN dan CS bersama logik kabur bagi menghasilkan kaedah baru yang disebut Lapisan Tunggal Kabur Rangkaian Neural Perceptron (FPNN), dan Carian Cuckoo Kabur bagi mengkaji persamaan antara ciri carian imej dan imej pangkalan data. Nilai kecekapan kaedah sistem dinilai dengan mengira ketepatan mengingat pada dapatan hasil. Kaedah FCS yang dicadangkan ini mengatasi kaedah lain seperti (PNN), (ACO), (CS) dan (FPNN) dari segi ketepatan dan ingatan imej.


In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.


2018 ◽  
Author(s):  
Solly Aryza

It is very challenging to recognize a face from an image due to the wide variety of face and the uncertain of face position. The research on detecting human faces in color image and in video sequence has been attracted with more and more people. In this paper, we propose a novel face detection method that achieves better detection rates. The new face detection algorithms based on skin color model in YCgCr chrominance space. Firstly, we build a skin Gaussian model in Cg-Cr color space. Secondly, a calculation of correlation coefficient is performed between the given template and the candidates. Experimental results demonstrate that our system has achieved high detection rates and low false positives over a wide range of facial variations in color, position and varying lighting conditions.


Sign in / Sign up

Export Citation Format

Share Document