recognition performance
Recently Published Documents





Karanrat Thammarak ◽  
Prateep Kongkla ◽  
Yaowarat Sirisathitkul ◽  
Sarun Intakosum

Optical character recognition (OCR) is a technology to digitize a paper-based document to digital form. This research studies the extraction of the characters from a Thai vehicle registration certificate via a Google Cloud Vision API and a Tesseract OCR. The recognition performance of both OCR APIs is also examined. The 84 color image files comprised three image sizes/resolutions and five image characteristics. For suitable image type comparison, the greyscale and binary image are converted from color images. Furthermore, the three pre-processing techniques, sharpening, contrast adjustment, and brightness adjustment, are also applied to enhance the quality of image before applying the two OCR APIs. The recognition performance was evaluated in terms of accuracy and readability. The results showed that the Google Cloud Vision API works well for the Thai vehicle registration certificate with an accuracy of 84.43%, whereas the Tesseract OCR showed an accuracy of 47.02%. The highest accuracy came from the color image with 1024×768 px, 300dpi, and using sharpening and brightness adjustment as pre-processing techniques. In terms of readability, the Google Cloud Vision API has more readability than the Tesseract. The proposed conditions facilitate the possibility of the implementation for Thai vehicle registration certificate recognition system.

2022 ◽  
Vol 11 (1) ◽  
pp. 1-50
Bahar Irfan ◽  
Michael Garcia Ortiz ◽  
Natalia Lyubova ◽  
Tony Belpaeme

User identification is an essential step in creating a personalised long-term interaction with robots. This requires learning the users continuously and incrementally, possibly starting from a state without any known user. In this article, we describe a multi-modal incremental Bayesian network with online learning, which is the first method that can be applied in such scenarios. Face recognition is used as the primary biometric, and it is combined with ancillary information, such as gender, age, height, and time of interaction to improve the recognition. The Multi-modal Long-term User Recognition Dataset is generated to simulate various human-robot interaction (HRI) scenarios and evaluate our approach in comparison to face recognition, soft biometrics, and a state-of-the-art open world recognition method (Extreme Value Machine). The results show that the proposed methods significantly outperform the baselines, with an increase in the identification rate up to 47.9% in open-set and closed-set scenarios, and a significant decrease in long-term recognition performance loss. The proposed models generalise well to new users, provide stability, improve over time, and decrease the bias of face recognition. The models were applied in HRI studies for user recognition, personalised rehabilitation, and customer-oriented service, which showed that they are suitable for long-term HRI in the real world.

2022 ◽  
Vol 2022 ◽  
pp. 1-8
Xin Liu ◽  
Hua Pan

The purpose is to provide a more reliable human-computer interaction (HCI) guarantee for animation works under virtual reality (VR) technology. Inspired by artificial intelligence (AI) technology and based on the convolutional neural network—support vector machine (CNN-SVM), the differences between animation works under VR technology and traditional animation works are analyzed through a comprehensive analysis of VR technology. The CNN-SVM gesture recognition algorithm using the error correction strategy is designed based on HCI recognition. To have better recognition performance, the advantages of depth image and color image are combined, and the collected information is preprocessed including the relations between the times of image training iterations and the accuracy of different methods in the direction of the test set. After experiments, the maximum accuracy of the preprocessed image can reach 0.86 showing the necessity of image preprocessing. The recognition accuracy of the optimized CNN-SVM is compared with other algorithm models. Experiments show that the accuracy of the optimized CNN-SVM has an upward trend compared with the previous CNN-SVM, and the accuracy reaches 0.97. It proves that the designed algorithm can provide good technical support for VR animation, so that VR animation works can interact well with the audience. It is of great significance for the development of VR animation and the improvement of people’s artistic life quality.

2022 ◽  
pp. 1-18
Binghua Shi ◽  
Yixin Su ◽  
Cheng Lian ◽  
Chang Xiong ◽  
Yang Long ◽  

Abstract Recognition of obstacle type based on visual sensors is important for navigation by unmanned surface vehicles (USV), including path planning, obstacle avoidance, and reactive control. Conventional detection techniques may fail to distinguish obstacles that are similar in visual appearance in a cluttered environment. This work proposes a novel obstacle type recognition approach that combines a dilated operator with the deep-level features map of ResNet50 for autonomous navigation. First, visual images are collected and annotated from various different scenarios for USV test navigation. Second, the deep learning model, based on a dilated convolutional neural network, is set and trained. Dilated convolution allows the whole network to learn deep features with increased receptive field and further improves the performance of obstacle type recognition. Third, a series of evaluation parameters are utilised to evaluate the obtained model, such as the mean average precision (mAP), missing rate and detection speed. Finally, some experiments are designed to verify the accuracy of the proposed approach using visual images in a cluttered environment. Experimental results demonstrate that the dilated convolutional neural network obtains better recognition performance than the other methods, with an mAP of 88%.

Siqing Qin ◽  
Longbiao Wang ◽  
Sheng Li ◽  
Jianwu Dang ◽  
Lixin Pan

AbstractConventional automatic speech recognition (ASR) and emerging end-to-end (E2E) speech recognition have achieved promising results after being provided with sufficient resources. However, for low-resource language, the current ASR is still challenging. The Lhasa dialect is the most widespread Tibetan dialect and has a wealth of speakers and transcriptions. Hence, it is meaningful to apply the ASR technique to the Lhasa dialect for historical heritage protection and cultural exchange. Previous work on Tibetan speech recognition focused on selecting phone-level acoustic modeling units and incorporating tonal information but underestimated the influence of limited data. The purpose of this paper is to improve the speech recognition performance of the low-resource Lhasa dialect by adopting multilingual speech recognition technology on the E2E structure based on the transfer learning framework. Using transfer learning, we first establish a monolingual E2E ASR system for the Lhasa dialect with different source languages to initialize the ASR model to compare the positive effects of source languages on the Tibetan ASR model. We further propose a multilingual E2E ASR system by utilizing initialization strategies with different source languages and multilevel units, which is proposed for the first time. Our experiments show that the performance of the proposed method-based ASR system exceeds that of the E2E baseline ASR system. Our proposed method effectively models the low-resource Lhasa dialect and achieves a relative 14.2% performance improvement in character error rate (CER) compared to DNN-HMM systems. Moreover, from the best monolingual E2E model to the best multilingual E2E model of the Lhasa dialect, the system’s performance increased by 8.4% in CER.

Sensors ◽  
2022 ◽  
Vol 22 (1) ◽  
pp. 402
Zhanjun Hao ◽  
Juan Niu ◽  
Xiaochao Dang ◽  
Zhiqiang Qiao

Motion recognition has a wide range of applications at present. Recently, motion recognition by analyzing the channel state information (CSI) in Wi-Fi packets has been favored by more and more scholars. Because CSI collected in the wireless signal environment of human activity usually carries a large amount of human-related information, the motion-recognition model trained for a specific person usually does not work well in predicting another person’s motion. To deal with the difference, we propose a personnel-independent action-recognition model called WiPg, which is built by convolutional neural network (CNN) and generative adversarial network (GAN). According to CSI data of 14 yoga movements of 10 experimenters with different body types, model training and testing were carried out, and the recognition results, independent of bod type, were obtained. The experimental results show that the average correct rate of WiPg can reach 92.7% for recognition of the 14 yoga poses, and WiPg realizes “cross-personnel” movement recognition with excellent recognition performance.

2022 ◽  
Vol 19 (3) ◽  
pp. 2206-2218
Chaofan Li ◽  
Kai Ma

<abstract> <p>Named entities are the main carriers of relevant medical knowledge in Electronic Medical Records (EMR). Clinical electronic medical records lead to problems such as word segmentation ambiguity and polysemy due to the specificity of Chinese language structure, so a Clinical Named Entity Recognition (CNER) model based on multi-head self-attention combined with BILSTM neural network and Conditional Random Fields is proposed. Firstly, the pre-trained language model organically combines char vectors and word vectors for the text sequences of the original dataset. The sequences are then fed into the parallel structure of the multi-head self-attention module and the BILSTM neural network module, respectively. By splicing the output of the neural network module to obtain multi-level information such as contextual information and feature association weights. Finally, entity annotation is performed by CRF. The results of the multiple comparison experiments show that the structure of the proposed model is very reasonable and robust, and it can effectively improve the Chinese CNER model. The model can extract multi-level and more comprehensive text features, compensate for the defect of long-distance dependency loss, with better applicability and recognition performance.</p> </abstract>

2021 ◽  
Julia Bahnmueller ◽  
Roberta Barrocas ◽  
Korbinian Moeller ◽  
Stephanie Roesch

Through repeated use of fingers for counting and representing numerical magnitudes in early childhood, specific finger patterns become associated with mental representations of specific quantities. Although children as young as three years of age already use their fingers for representing numerical quantities, evidence on advantageous recognition of such canonical compared to non-canonical finger patterns as well as its association with numerical skills in young children is scarce. In this study, we investigated the performance of N=101 children aged around four years in canonical vs. non-canonical finger pattern recognition and its concurrent association with skills tapping into children’s’ knowledge about quantity-number linkage. Extending previous findings observed for older children, the present results indicated that despite considerable variability on the individual level performance in canonical finger pattern recognition was better compared to non-canonical finger pattern recognition on the group level. Moreover, both canonical and non-canonical finger pattern recognition was positively correlated with tasks tapping into quantity-number linkage. However, when controlling for verbal counting skills, correlations that remained significant were only found for canonical but not non-canonical finger pattern recognition performance. Overall, these results provide insights into the early onset and significance of the effect of canonicity in finger pattern recognition during early numerical development.

2021 ◽  
Vol 14 (4) ◽  
pp. 1-33
Saad Hassan ◽  
Oliver Alonzo ◽  
Abraham Glasser ◽  
Matt Huenerfauth

Advances in sign-language recognition technology have enabled researchers to investigate various methods that can assist users in searching for an unfamiliar sign in ASL using sign-recognition technology. Users can generate a query by submitting a video of themselves performing the sign they believe they encountered somewhere and obtain a list of possible matches. However, there is disagreement among developers of such technology on how to report the performance of their systems, and prior research has not examined the relationship between the performance of search technology and users’ subjective judgements for this task. We conducted three studies using a Wizard-of-Oz prototype of a webcam-based ASL dictionary search system to investigate the relationship between the performance of such a system and user judgements. We found that, in addition to the position of the desired word in a list of results, the placement of the desired word above or below the fold and the similarity of the other words in the results list affected users’ judgements of the system. We also found that metrics that incorporate the precision of the overall list correlated better with users’ judgements than did metrics currently reported in prior ASL dictionary research.

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Khalid Twarish Alhamazani ◽  
Jalawi Alshudukhi ◽  
Saud Aljaloud ◽  
Solomon Abebaw

Chronic kidney disease (CKD) is a global health issue with a high rate of morbidity and mortality and a high rate of disease progression. Because there are no visible symptoms in the early stages of CKD, patients frequently go unnoticed. The early detection of CKD allows patients to receive timely treatment, slowing the disease’s progression. Due to its rapid recognition performance and accuracy, machine learning models can effectively assist physicians in achieving this goal. We propose a machine learning methodology for the CKD diagnosis in this paper. This information was completely anonymized. As a reference, the CRISP-DM® model (Cross industry standard process for data mining) was used. The data were processed in its entirety in the cloud on the Azure platform, where the sample data was unbalanced. Then the processes for exploration and analysis were carried out. According to what we have learned, the data were balanced using the SMOTE technique. Four matching algorithms were used after the data balancing was completed successfully. Artificial intelligence (AI) (logistic regression, decision forest, neural network, and jungle of decisions). The decision forest outperformed the other machine learning models with a score of 92%, indicating that the approach used in this study provides a good baseline for solutions in the production.

Sign in / Sign up

Export Citation Format

Share Document