recognition system
Recently Published Documents





Deepang Raval ◽  
Vyom Pathak ◽  
Muktan Patel ◽  
Brijesh Bhatt

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.

Karanrat Thammarak ◽  
Prateep Kongkla ◽  
Yaowarat Sirisathitkul ◽  
Sarun Intakosum

Optical character recognition (OCR) is a technology to digitize a paper-based document to digital form. This research studies the extraction of the characters from a Thai vehicle registration certificate via a Google Cloud Vision API and a Tesseract OCR. The recognition performance of both OCR APIs is also examined. The 84 color image files comprised three image sizes/resolutions and five image characteristics. For suitable image type comparison, the greyscale and binary image are converted from color images. Furthermore, the three pre-processing techniques, sharpening, contrast adjustment, and brightness adjustment, are also applied to enhance the quality of image before applying the two OCR APIs. The recognition performance was evaluated in terms of accuracy and readability. The results showed that the Google Cloud Vision API works well for the Thai vehicle registration certificate with an accuracy of 84.43%, whereas the Tesseract OCR showed an accuracy of 47.02%. The highest accuracy came from the color image with 1024×768 px, 300dpi, and using sharpening and brightness adjustment as pre-processing techniques. In terms of readability, the Google Cloud Vision API has more readability than the Tesseract. The proposed conditions facilitate the possibility of the implementation for Thai vehicle registration certificate recognition system.

Atallah Mahmoud Al-Shatnawi ◽  
Faisal Al-Saqqar ◽  
Alireza Souri

This paper is aimed at improving the performance of the word recognition system (WRS) of handwritten Arabic text by extracting features in the frequency domain using the Stationary Wavelet Transform (SWT) method using machine learning, which is a wavelet transform approach created to compensate for the absence of translation invariance in the  Discrete Wavelets Transform (DWT) method. The proposed SWT-WRS of Arabic handwritten text consists of three main processes: word normalization, feature extraction based on SWT, and recognition. The proposed SWT-WRS based on the SWT method is evaluated on the IFN/ENIT database applying the Gaussian, linear, and polynomial support vector machine, the k-nearest neighbors, and ANN classifiers. ANN performance was assessed by applying the Bayesian Regularization (BR) and Levenberg-Marquardt (LM) training methods. Numerous wavelet transform (WT) families are applied, and the results prove that level 19 of the Daubechies family is the best WT family for the proposed SWT-WRS. The results also confirm the effectiveness of the proposed SWT-WRS in improving the performance of handwritten Arabic word recognition using machine learning. Therefore, the suggested SWT-WRS overcomes the lack of translation invariance in the DWT method by eliminating the up-and-down samplers from the proposed machine learning method.

Sangamesh Hosgurmath ◽  
Viswanatha Vanjre Mallappa ◽  
Nagaraj B. Patil ◽  
Vishwanath Petli

Face recognition is one of the important biometric authentication research areas for security purposes in many fields such as pattern recognition and image processing. However, the human face recognitions have the major problem in machine learning and deep learning techniques, since input images vary with poses of people, different lighting conditions, various expressions, ages as well as illumination conditions and it makes the face recognition process poor in accuracy. In the present research, the resolution of the image patches is reduced by the max pooling layer in convolutional neural network (CNN) and also used to make the model robust than other traditional feature extraction technique called local multiple pattern (LMP). The extracted features are fed into the linear collaborative discriminant regression classification (LCDRC) for final face recognition. Due to optimization using CNN in LCDRC, the distance ratio between the classes has maximized and the distance of the features inside the class reduces. The results stated that the CNN-LCDRC achieved 93.10% and 87.60% of mean recognition accuracy, where traditional LCDRC achieved 83.35% and 77.70% of mean recognition accuracy on ORL and YALE databases respectively for the training number 8 (i.e. 80% of training and 20% of testing data).

Music is a widely used data format in the explosion of Internet information. Automatically identifying the style of online music in the Internet is an important and hot topic in the field of music information retrieval and music production. Recently, automatic music style recognition has been used in many real life scenes. Due to the emerging of machine learning, it provides a good foundation for automatic music style recognition. This paper adopts machine learning technology to establish an automatic music style recognition system. First, the online music is process by waveform analysis to remove the noises. Second, the denoised music signals are represented as sample entropy features by using empirical model decomposition. Lastly, the extracted features are used to learn a relative margin support vector machine model to predict future music style. The experimental results demonstrate the effectiveness of the proposed framework.

Marwa Fadhel Jassim ◽  
Wafaa mohammed Saeed Hamzah ◽  
Abeer Fadhil Shimal

Biometric technique includes of uniquely identifying person based on their physical or behavioural characteristics. It is mainly used for authentication. Storing the template in the database is not a safe approach, because it can be stolen or be tampered with. Due to its importance the template needs to be protected. To treat this safety issue, the suggested system employed a method for securely storing the iris template in the database which is a merging approach for secret image sharing and hiding to enhance security and protect the privacy by decomposing the template into two independent host (public) iris images. The original template can be reconstructed only when both host images are available. Either host image does not expose the identity of the original biometric image. The security and privacy in biometrics-based authentication system is augmented by storing the data in the form of shadows at separated places instead of whole data at one. The proposed biometric recognition system includes iris segmentation algorithms, feature extraction algorithms, a (2, 2) secret sharing and hiding. The experimental results are conducted on standard colour UBIRIS v1 data set. The results indicate that the biometric template protection methods are capable of offering a solution for vulnerability that threatens the biometric template.

Fadwa Abakarim ◽  
Abdenbi Abenaou

In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker’s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method.

Ida Syafiza Binti Md Isa ◽  
Choy Ja Yeong ◽  
Nur Latif Azyze bin Mohd Shaari Azyze

Nowadays, the number of road accident in Malaysia is increasing expeditiously. One of the ways to reduce the number of road accident is through the development of the advanced driving assistance system (ADAS) by professional engineers. Several ADAS system has been proposed by taking into consideration the delay tolerance and the accuracy of the system itself. In this work, a traffic sign recognition system has been developed to increase the safety of the road users by installing the system inside the car for driver’s awareness. TensorFlow algorithm has been considered in this work for object recognition through machine learning due to its high accuracy. The algorithm is embedded in the Raspberry Pi 3 for processing and analysis to detect the traffic sign from the real-time video recording from Raspberry Pi camera NoIR. This work aims to study the accuracy, delay and reliability of the developed system using a Raspberry Pi 3 processor considering several scenarios related to the state of the environment and the condition of the traffic signs. A real-time testbed implementation has been conducted considering twenty different traffic signs and the results show that the system has more than 90% accuracy and is reliable with an acceptable delay.

Chunling Tu ◽  
Shengzhi Du

<span>Vehicle and vehicle license detection obtained incredible achievements during recent years that are also popularly used in real traffic scenarios, such as intelligent traffic monitoring systems, auto parking systems, and vehicle services. Computer vision attracted much attention in vehicle and vehicle license detection, benefit from image processing and machine learning technologies. However, the existing methods still have some issues with vehicle and vehicle license plate recognition, especially in a complex environment. In this paper, we propose a multivehicle detection and license plate recognition system based on a hierarchical region convolutional neural network (RCNN). Firstly, a higher level of RCNN is employed to extract vehicles from the original images or video frames. Secondly, the regions of the detected vehicles are input to a lower level (smaller) RCNN to detect the license plate. Thirdly, the detected license plate is split into single numbers. Finally, the individual numbers are recognized by an even smaller RCNN. The experiments on the real traffic database validated the proposed method. Compared with the commonly used all-in-one deep learning structure, the proposed hierarchical method deals with the license plate recognition task in multiple levels for sub-tasks, which enables the modification of network size and structure according to the complexity of sub-tasks. Therefore, the computation load is reduced.</span>

Prof. Kalpana Malpe

Abstract: In recent years, the safety constitutes the foremost necessary section of the human life. At this point, the price is that the greatest issue. This technique is incredibly helpful for reducing the price of watching the movement from outside. During this paper, a period of time recognition system is planned which will equip for handling pictures terribly quickly. The most objective of this paper is to safeguard home, workplace by recognizing individuals. The face is that the foremost distinctivea part of human’s body. So, it will replicate several emotions of associate degree Expression. A few years past, humans were mistreatment the non-living things like good cards, plastic cards, PINS, tokens and keys for authentication, and to urge grant access in restricted areas like ISRO, National Aeronautics and Space Administration and DRDO. The most necessary options of the face image are Eyes, Nose and mouth. Face detection and recognition system is simpler, cheaper, a lot of accurate, process. The system under two categories one is face detection and face recognition. Throughout this case, among the paper, the Raspberry Pi single-board computer is also a heart of the embedded face recognition system. Keywords: Raspberry Pi, Face recognition system

Sign in / Sign up

Export Citation Format

Share Document