Learning to Recognize 3D Objects using Sparse Depth and Intensity Information

Author(s):  
Brendan MccCane ◽  
Terry Caelli ◽  
Olivier de Vel

In this paper we further explore the use of machine learning (ML) for the recognition of 3D objects in isolation or embedded in scenes. Of particular interest is the use of a recent ML technique (specifically CRG — Conditional Rule Generation) which generates descriptions of objects in terms of object parts and part-relational attribute bounds. We show how this technique can be combined with intensity-based model and scene–views to locate objects and their pose. The major contributions of this paper are: the extension of the CRG classifier to incorporate fuzzy decisions (FCRG), the application of the FCRG classifier to the problem of learning 3D objects from 2D intensity images, the study of the usefulness of sparse depth data in regards to recognition performance, and the implementation of a complete object recognition system that does not rely on perfect or synthetic data. We report a recognition rate of 80% for unseen single object scenes in a database of 18 non-trivial objects.

2014 ◽  
Vol 2 (2) ◽  
pp. 43-53 ◽  
Author(s):  
S. Rojathai ◽  
M. Venkatesulu

In speech word recognition systems, feature extraction and recognition plays a most significant role. More number of feature extraction and recognition methods are available in the existing speech word recognition systems. In most recent Tamil speech word recognition system has given high speech word recognition performance with PAC-ANFIS compared to the earlier Tamil speech word recognition systems. So the investigation of speech word recognition by various recognition methods is needed to prove their performance in the speech word recognition. This paper presents the investigation process with well known Artificial Intelligence method as Feed Forward Back Propagation Neural Network (FFBNN) and Adaptive Neuro Fuzzy Inference System (ANFIS). The Tamil speech word recognition system with PAC-FFBNN performance is analyzed in terms of statistical measures and Word Recognition Rate (WRR) and compared with PAC-ANFIS and other existing Tamil speech word recognition systems.


2020 ◽  
Vol 39 (4) ◽  
pp. 5749-5760
Author(s):  
Yanfei Hai

The purpose of this paper is to use English specific syllables and prosodic features in spoken speech data to carry out English spoken recognition, and to explore effective methods for the design and application of English speech detection and automatic recognition systems. The method proposed by this study is a combination of SVM_FF based classifier, SVM_IER based classifier and syllable classifier. Compared with the method based on the combination of other phonological characteristics such as phonological rate, intensity, formant and energy statistics and pronunciation rate, and the syllable-based classifier based on specific syllable training, a better recognition rate is obtained. In addition, this study conducts simulation experiments on the proposed English recognition and identification method based on specific syllables and prosodic features and analyzes the experimental results. The result found that the recognition performance of the English spoken recognition system constructed by this study is significantly better than the traditional model.


Author(s):  
Y. S. Huang ◽  
K. Liu ◽  
C. Y. Suen ◽  
Y. Y. Tang

This paper proposes a novel method which enables a Chinese character recognition system to obtain reliable recognition. In this method, two thresholds, i.e. class region thresholdRk and disambiguity thresholdAk, are used by each Chinese character k when the classifier is designed based on the nearest neighbor rule, where Rk defines the pattern distribution region of character k, and Ak prevents the samples not belonging to character k from being ambiguously recognized as character k. A novel algorithm to derive the appropriate thresholds Ak and Rk is developed so that a better recognition reliability can be obtained through iterative learning. Experiments performed on the ITRI printed Chinese character database have achieved highly reliable recognition performance (such as 0.999 reliability with a 95.14% recognition rate), which shows the feasibility and effectiveness of the proposed method.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5185
Author(s):  
Yu Zhai ◽  
Jieyu Lei ◽  
Wenze Xia ◽  
Shaokun Han ◽  
Fei Liu ◽  
...  

This work introduces a super-resolution (SR) algorithm for range images on the basis of self-guided joint filtering (SGJF), adding the range information of the range image as a coefficient of the filter to reduce the influence of the intensity image texture on the super-resolved image. A range image SR recognition system is constructed to study the effect of four SR algorithms including the SGJF algorithm on the recognition of the laser radar (ladar) range image. The effects of different model library sizes, SR algorithms, SR factors and noise conditions on the recognition are tested via experiments. Results demonstrate that all tested SR algorithms can improve the recognition rate of low-resolution (low-res) range images to varying degrees and the proposed SGJF algorithm has a very good comprehensive recognition performance. Finally, suggestions for the use of SR algorithms in actual scene recognition are proposed on the basis of the experimental results.


Author(s):  
PEI-YIH TING ◽  
CHIU-YU TSENG ◽  
LIN-SHAN LEE

In a long-term research project, the recognition of Mandarin speech for very large vocabulary and unlimited text is considered. Its first stage goal is to recognize the Mandarin syllables. In a previous paper, an initial/final two-phase recognition approach to recognize these very confusing syllables was proposed, in which each syllable is divided into initial and final parts and recognized separately, and efficient recognition techniques for the finals were proposed and discussed. This paper serves as a continuation and proposes an efficient system to recognize the Mandarin initials. In this system, a classification procedure is first used to categorize the unknown initials into two groups C1 and C2; different approaches are then separately applied and independently optimized to recognize C1 and C2. It is found that Finite State Vector Quantization (FSVQ) is very useful, whose two modified versions, Modified FSVQ (MFSVQ) and the Second Order FSVQ (SOFSVQ), can provide the best recognition performance for C1 and C2 by carefully adjusting a design parameter called characteristic interval. Experimental results show that a recognition rate of 94.1% to 94.7% can be achieved using this system. Such a design is accomplished by carefully considering the special characteristics of Mandarin syllables and initials.


2016 ◽  
Vol 2016 ◽  
pp. 1-15 ◽  
Author(s):  
Yibing Li ◽  
Jie Chen ◽  
Fang Ye ◽  
Dandan Liu

ATR system has a broad application prospect in the military field, especially in the field of modern defense technology. When paradoxes are in existence in ATR system due to adverse battlefield environment, integration cannot be effectively and reliably carried out only by traditional DS evidence theory. In this paper, a modified DS evidence theory is presented and applied in IR/MMW target recognition system. The improvement of DS evidence theory is realized by three parts: the introduction of sensor priority and evidence credibility to realize the discount processing of evidences, the modification of DS combination rule to enhance the accuracy of synthesis results, and the compound decision-making rule. The application of the modified algorithm in IR/MMW system is designed to deal with paradoxes, improve the target recognition rate, and ensure the reliability of target recognition system. Experiments are given to illustrate that the introduction of the modified DS evidence theory in IR/MMW system is better able to realize satisfactory target recognition performance through multisensor information fusion than any single-mode system.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Su Yang ◽  
Jose Miguel Sanchez Bornot ◽  
Ricardo Bruña Fernandez ◽  
Farzin Deravi ◽  
KongFatt Wong-Lin ◽  
...  

AbstractMagnetoencephalography (MEG) has been combined with machine learning techniques, to recognize the Alzheimer’s disease (AD), one of the most common forms of dementia. However, most of the previous studies are limited to binary classification and do not fully utilize the two available MEG modalities (extracted using magnetometer and gradiometer sensors). AD consists of several stages of progression, this study addresses this limitation by using both magnetometer and gradiometer data to discriminate between participants with AD, AD-related mild cognitive impairment (MCI), and healthy control (HC) participants in the form of a three-class classification problem. A series of wavelet-based biomarkers are developed and evaluated, which concurrently leverage the spatial, frequency and time domain characteristics of the signal. A bimodal recognition system based on an improved score-level fusion approach is proposed to reinforce interpretation of the brain activity captured by magnetometers and gradiometers. In this preliminary study, it was found that the markers derived from gradiometer tend to outperform the magnetometer-based markers. Interestingly, out of the total 10 regions of interest, left-frontal lobe demonstrates about 8% higher mean recognition rate than the second-best performing region (left temporal lobe) for AD/MCI/HC classification. Among the four types of markers proposed in this work, the spatial marker developed using wavelet coefficients provided the best recognition performance for the three-way classification. Overall, the proposed approach provides promising results for the potential of AD/MCI/HC three-way classification utilizing the bimodal MEG data.


Author(s):  
Soo‐Young Suk ◽  
Hyun‐Yeol Chung

PurposeThe purpose of this paper is to describe a speech and character combined recognition engine (SCCRE) developed for working on personal digital assistants (PDAs) or on mobile devices. Also, the architecture of a distributed recognition system for providing a more convenient user interface is discussed.Design/methodology/approachIn SCCRE, feature extraction for speech and for character is carried out separately, but the recognition is performed in an engine. The client recognition engine essentially employs a continuous hidden Markov model (CHMM) structure and this CHMM structure consists of variable parameter topology in order to minimize the number of model parameters and to reduce recognition time. This model also adopts the proposed successive state and mixture splitting (SSMS) method for generating context independent model. SSMS optimizes the number of mixtures through splitting in mixture domain and the number of states through splitting in time domain.FindingsThe recognition results show that the developed engine can reduce the total number of Gaussian up to 40 per cent compared with the fixed parameter models at the same recognition performance when applied to speech recognition for mobile devices. It shows that SSMS can reduce the size of memory for models to 65 per cent and that for processing to 82 per cent. Moreover, the recognition time decreases 17 per cent with the SMS model while maintaining the recognition rate.Originality/valueThe proposed system will be very useful for many on‐line multimodal interfaces such as PDAs and mobile applications.


Recent research in the surface-based ear and palm print recognition additionally shows that ear identification and palm print identification. The surface-based ear and palm print recognition are strong against sign corruption and encoding antiques. Based on these discoveries, further research and look at the comparison of surface descriptors for ear and palm print recognition and try to investigate potential outcomes to supplement surface descriptors with depth data. The proposed Multimodal ear and palm print Biometric Recognition work is based on the feature level fusion. Based on the ear images and palm print images from noticeable brightness as well as profundity records, we remove surface with outside labels starting complete contour images. In this paper, think about the recognition performance of choose strategies for describing the surface structure, which is Local Binary Pattern (LBP), Weber Local Descriptor (WLD), Histogram of oriented gradients (HOG), and Binarised Statistical Image Features (BSIF). The broad test examination dependent scheduled target IIT Delhi-2 ear and IIT Delhi palm print records affirmed to facilitate and expected multimodal biometric framework can build recognition rates contrasted and that delivered by single-modular for example, Unimodal biometrics. The proposed method Histogram of Oriented Gradients (HOG) achieving a recognition rate of 124%


2019 ◽  
Vol 19 (2) ◽  
pp. 28-37
Author(s):  
Hawraa H. Abbas ◽  
Bilal Z. Ahmed ◽  
Ahmed Kamil Abbas

Abstract The face is the preferable biometrics for person recognition or identification applications because person identifying by face is a human connate habit. In contrast to 2D face recognition, 3D face recognition is practically robust to illumination variance, facial cosmetics, and face pose changes. Traditional 3D face recognition methods describe shape variation across the whole face using holistic features. In spite of that, taking into account facial regions, which are unchanged within expressions, can acquire high performance 3D face recognition system. In this research, the recognition analysis is based on defining a set of coherent parts. Those parts can be considered as latent factors in the face shape space. Non-negative matrix Factorisation technique is used to segment the 3D faces to coherent regions. The best recognition performance is achieved when the vertices of 20 face regions are utilised as a feature vector for recognition task. The region-based 3D face recognition approach provides a 96.4% recognition rate in FRGCv2 dataset.


Sign in / Sign up

Export Citation Format

Share Document