Robotic hand grasping of objects classified by using support vector machine and bag of visual words

Author(s):  
Mehmet Celalettin Ergene ◽  
Akif Durdu
JURTEKSI ◽  
2019 ◽  
Vol 5 (2) ◽  
pp. 153-160
Author(s):  
Mahardika Abdi Prawira Tanjung

Abstract: The human eye can distinguish objects from digital images, however, computers do not have the ability as human eyes that can directly distinguish objects from digital images. Therefore the bag of visual words method was created. Bag of visual words is a method for presenting digital images based on local features. Bag of visual words illustrates how an image can be taken its characteristics, so that computers can distinguish objects on digital images. The test results show that the bag of visual words are still not maximal in classifying digital image categories, especially the chair category, which is only able to produce the most accurate accuracy of 75%. To improve the performance quality of bag of visual words in classifying digital image categories, especially the chair category, you can add an approach to determine the good number of K in clustering the visual words pattern.            Keywords: Bag Of Visual Words, Classification, Digital Image, Speed-Up Robust Feature, Support Vector Machine   Abstrak: Secara kasat mata manusia bisa membedakan objek pada citra digital, namun, komputer tidak memiliki kemampuan sebagai mata manusia yang dapat secara langsung membedakan objek pada citra digital. Maka dari itu diciptakanlah metode bag of visual words. Bag of visual words adalah metode untuk menyajikan citra digital berdasarkan fitur lokal. Bag of visual words menggambarkan bagaimana suatu gambar dapat diambil karakteristiknya, sehingga komputer dapat membedakan objek pada citra digital. Hasil  pengujian  menunjukkan  bag of visual words   masih belum maksimal dalam  mengklasifikasi  kategori citra digital khususnya kategori chair, yang hanya mampu menghasilkan akurasi paling akurat sebesar 75 %. Untuk       meningkatkan        kualitas kinerja bag of visual words dalam mengklasifikasi kategori citra digital khususnya kategori chair, dapat menambahkan pendekatan untuk menentukan jumlah K yang baik dalam mengkluster pola visual words.  Kata kunci: Bag Of Visual Words, Klasifikasi, Citra Digital, Speed-Up Robust Feature, Support Vector Machine


2017 ◽  
Vol 31 (2) ◽  
pp. 310-319 ◽  
Author(s):  
Anton Ustyuzhanin ◽  
Karl-Heinz Dammer ◽  
Antje Giebel ◽  
Cornelia Weltzien ◽  
Michael Schirrmann

Common ragweed is a plant species causing allergic and asthmatic symptoms in humans. To control its propagation, an early identification system is needed. However, due to its similar appearance with mugwort, proper differentiation between these two weed species is important. Therefore, we propose a method to discriminate common ragweed and mugwort leaves based on digital images using bag of visual words (BoVW). BoVW is an object-based image classification that has gained acceptance in many areas of science. We compared speeded-up robust features (SURF) and grid sampling for keypoint selection. The image vocabulary was built using K-means clustering. The image classifier was trained using support vector machines. To check the robustness of the classifier, specific model runs were conducted with and without damaged leaves in the trainings dataset. The results showed that the BoVW model allows the discrimination between common ragweed and mugwort leaves with high accuracy. Based on SURF keypoints with 50% of 788 images in total as training data, we achieved a 100% correct recognition of the two plant species. The grid sampling resulted in slightly less recognition accuracy (98 to 99%). In addition, the classification based on SURF was up to 31 times faster.


Technologies ◽  
2019 ◽  
Vol 7 (1) ◽  
pp. 20 ◽  
Author(s):  
Evaggelos Spyrou ◽  
Rozalia Nikopoulou ◽  
Ioannis Vernikos ◽  
Phivos Mylonas

It is noteworthy nowadays that monitoring and understanding a human’s emotional state plays a key role in the current and forthcoming computational technologies. On the other hand, this monitoring and analysis should be as unobtrusive as possible, since in our era the digital world has been smoothly adopted in everyday life activities. In this framework and within the domain of assessing humans’ affective state during their educational training, the most popular way to go is to use sensory equipment that would allow their observing without involving any kind of direct contact. Thus, in this work, we focus on human emotion recognition from audio stimuli (i.e., human speech) using a novel approach based on a computer vision inspired methodology, namely the bag-of-visual words method, applied on several audio segment spectrograms. The latter are considered to be the visual representation of the considered audio segment and may be analyzed by exploiting well-known traditional computer vision techniques, such as construction of a visual vocabulary, extraction of speeded-up robust features (SURF) features, quantization into a set of visual words, and image histogram construction. As a last step, support vector machines (SVM) classifiers are trained based on the aforementioned information. Finally, to further generalize the herein proposed approach, we utilize publicly available datasets from several human languages to perform cross-language experiments, both in terms of actor-created and real-life ones.


2018 ◽  
Vol 10 (10) ◽  
pp. 1530 ◽  
Author(s):  
Michael Pflanz ◽  
Henning Nordmeyer ◽  
Michael Schirrmann

Weed detection with aerial images is a great challenge to generate field maps for site-specific plant protection application. The requirements might be met with low altitude flights of unmanned aerial vehicles (UAV), to provide adequate ground resolutions for differentiating even single weeds accurately. The following study proposed and tested an image classifier based on a Bag of Visual Words (BoVW) framework for mapping weed species, using a small unmanned aircraft system (UAS) with a commercial camera on board, at low flying altitudes. The image classifier was trained with support vector machines after building a visual dictionary of local features from many collected UAS images. A window-based processing of the models was used for mapping the weed occurrences in the UAS imagery. The UAS flight campaign was carried out over a weed infested wheat field, and images were acquired between a 1 and 6 m flight altitude. From the UAS images, 25,452 weed plants were annotated on species level, along with wheat and soil as background classes for training and validation of the models. The results showed that the BoVW model allowed the discrimination of single plants with high accuracy for Matricaria recutita L. (88.60%), Papaver rhoeas L. (89.08%), Viola arvensis M. (87.93%), and winter wheat (94.09%), within the generated maps. Regarding site specific weed control, the classified UAS images would enable the selection of the right herbicide based on the distribution of the predicted weed species.


Author(s):  
Yuanyuan Zuo ◽  
Bo Zhang

The sparse representation based classification algorithm has been used to solve the problem of human face recognition, but the image database is restricted to human frontal faces with only slight illumination and expression changes. This paper applies the sparse representation based algorithm to the problem of generic image classification, with a certain degree of intra-class variations and background clutter. Experiments are conducted with the sparse representation based algorithm and Support Vector Machine (SVM) classifiers on 25 object categories selected from the Caltech101 dataset. Experimental results show that without the time-consuming parameter optimization, the sparse representation based algorithm achieves comparable performance with SVM. The experiments also demonstrate that the algorithm is robust to a certain degree of background clutter and intra-class variations with the bag-of-visual-words representations. The sparse representation based algorithm can also be applied to generic image classification task when the appropriate image feature is used.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2790 ◽  
Author(s):  
Saima Nazir ◽  
Muhammad Haroon Yousaf ◽  
Jean-Christophe Nebel ◽  
Sergio A. Velastin

Human action recognition (HAR) has emerged as a core research domain for video understanding and analysis, thus attracting many researchers. Although significant results have been achieved in simple scenarios, HAR is still a challenging task due to issues associated with view independence, occlusion and inter-class variation observed in realistic scenarios. In previous research efforts, the classical bag of visual words approach along with its variations has been widely used. In this paper, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach. Expressions are formed based on the density of a spatio-temporal cube of a visual word. To handle inter-class variation, we use class-specific visual word representation for visual expression generation. In contrast to the Bag of Expressions (BoE) model, the formation of visual expressions is based on the density of spatio-temporal cubes built around each visual word, as constructing neighborhoods with a fixed number of neighbors could include non-relevant information making a visual expression less discriminative in scenarios with occlusion and changing viewpoints. Thus, the proposed approach makes the model more robust to occlusion and changing viewpoint challenges present in realistic scenarios. Furthermore, we train a multi-class Support Vector Machine (SVM) for classifying bag of expressions into action classes. Comprehensive experiments on four publicly available datasets: KTH, UCF Sports, UCF11 and UCF50 show that the proposed model outperforms existing state-of-the-art human action recognition methods in term of accuracy to 99.21%, 98.60%, 96.94 and 94.10%, respectively.


Sign in / Sign up

Export Citation Format

Share Document