scholarly journals A Prototypical Network-Based Approach for Low-Resource Font Typeface Feature Extraction and Utilization

Data ◽  
2021 ◽  
Vol 6 (12) ◽  
pp. 134
Author(s):  
Kangying Li ◽  
Biligsaikhan Batjargal ◽  
Akira Maeda

This paper introduces a framework for retrieving low-resource font typeface databases by handwritten input. A new deep learning model structure based on metric learning is proposed to extract the features of a character typeface and predict the category of handwrittten input queries. Rather than using sufficient training data, we aim to utilize ancient character font typefaces with only one sample per category. Our research aims to achieve decent retrieval performances over more than 600 categories of handwritten characters automatically. We consider utilizing generic handcrafted features to train a model to help the voting classifier make the final prediction. The proposed method is implemented on the ‘Shirakawa font oracle bone script’ dataset as an isolated ancient-character-recognition system based on free ordering and connective strokes. We evaluate the proposed model on several standard character and symbol datasets. The experimental results showed that the proposed method provides good performance in extracting the features of symbols or characters’ font images necessary to perform further retrieval tasks. The demo system has been released, and it requires only one sample for each character to predict the user input. The extracted features have a better effect in finding the highest-ranked relevant item in retrieval tasks and can also be utilized in various technical frameworks for ancient character recognition and can be applied to educational application development.

2021 ◽  
pp. 1-12
Author(s):  
Irfan Javid ◽  
Ahmed Khalaf Zager Alsaedi ◽  
Rozaida Binti Ghazali ◽  
Yana Mazwin ◽  
Muhammad Zulqarnain

In previous studies, various machine-driven decision support systems based on recurrent neural networks (RNN) were ordinarily projected for the detection of cardiovascular disease. However, the majority of these approaches are restricted to feature preprocessing. In this paper, we concentrate on both, including, feature refinement and the removal of the predictive model’s problems, e.g., underfitting and overfitting. By evading overfitting and underfitting, the model will demonstrate good enactment on equally the training and testing datasets. Overfitting the training data is often triggered by inadequate network configuration and inappropriate features. We advocate using Chi2 statistical model to remove irrelevant features when searching for the best-configured gated recurrent unit (GRU) using an exhaustive search strategy. The suggested hybrid technique, called Chi2 GRU, is tested against traditional ANN and GRU models, as well as different progressive machine learning models and antecedently revealed strategies for cardiopathy prediction. The prediction accuracy of proposed model is 92.17% . In contrast to formerly stated approaches, the obtained outcomes are promising. The study’s results indicate that medical practitioner will use the proposed diagnostic method to reliably predict heart disease.


2020 ◽  
pp. 016555152096278
Author(s):  
Rouzbeh Ghasemi ◽  
Seyed Arad Ashrafi Asli ◽  
Saeedeh Momtazi

With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source–target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.


Author(s):  
Satyanand Singh

This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.


Author(s):  
PREMKUMAR NATARAJAN ◽  
ZHIDONG LU ◽  
RICHARD SCHWARTZ ◽  
ISSAM BAZZI ◽  
JOHN MAKHOUL

This paper presents a script-independent methodology for optical character recognition (OCR) based on the use of hidden Markov models (HMM). The feature extraction, training and recognition components of the system are all designed to be script independent. The training and recognition components were taken without modification from a continuous speech recognition system; the only component that is specific to OCR is the feature extraction component. To port the system to a new language, all that is needed is text image training data from the new language, along with ground truth which gives the identity of the sequences of characters along each line of each text image, without specifying the location of the characters on the image. The parameters of the character HMMs are estimated automatically from the training data, without the need for laborious handwritten rules. The system does not require presegmentation of the data, neither at the word level nor at the character level. Thus, the system is able to handle languages with connected characters in a straightforward manner. The script independence of the system is demonstrated in three languages with different types of script: Arabic, English, and Chinese. The robustness of the system is further demonstrated by testing the system on fax data. An unsupervised adaptation method is then described to improve performance under degraded conditions.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Raghad Tariq Al-Hassani ◽  
Dogu Cagdas Atilla ◽  
Çağatay Aydin

Speech signal is enriched with plenty of features used for biometrical recognition and other applications like gender and emotional recognition. Channel conditions manifested by background noise and reverberation are the main challenges causing feature shifts in the test and training data. In this paper, a hybrid speaker identification model for consistent speech features and high recognition accuracy is made. Features using Mel frequency spectrum coefficients (MFCC) have been improved by incorporating a pitch frequency coefficient from speech time domain analysis. In order to enhance noise immunity, we proposed a single hidden layer feed-forward neural network (FFNN) tuned by an optimized particle swarm optimization (OPSO) algorithm. The proposed model is tested using 10-fold cross-validation over different levels of Adaptive White Gaussian Noise (AWGN) (0-50 dB). A recognition accuracy of 97.83% was obtained from the proposed model in clean voice environments. However, a noisy channel is realized with lesser impact on the proposed model as compared with other baseline classifiers such as plain-FFNN, random forest (RF), K -nearest neighbour (KNN), and support vector machine (SVM).


2020 ◽  
Vol 34 (05) ◽  
pp. 8344-8351 ◽  
Author(s):  
KyungTae Lim ◽  
Jay Yoon Lee ◽  
Jaime Carbonell ◽  
Thierry Poibeau

Multi-view learning makes use of diverse models arising from multiple sources of input or different feature subsets for the same task. For example, a given natural language processing task can combine evidence from models arising from character, morpheme, lexical, or phrasal views. The most common strategy with multi-view learning, especially popular in the neural network community, is to unify multiple representations into one unified vector through concatenation, averaging, or pooling, and then build a single-view model on top of the unified representation. As an alternative, we examine whether building one model per view and then unifying the different models can lead to improvements, especially in low-resource scenarios. More specifically, taking inspiration from co-training methods, we propose a semi-supervised learning approach based on multi-view models through consensus promotion, and investigate whether this improves overall performance. To test the multi-view hypothesis, we use moderately low-resource scenarios for nine languages and test the performance of the joint model for part-of-speech tagging and dependency parsing. The proposed model shows significant improvements across the test cases, with average gains of -0.9 ∼ +9.3 labeled attachment score (LAS) points. We also investigate the effect of unlabeled data on the proposed model by varying the amount of training data and by using different domains of unlabeled data.


Author(s):  
Abhisek Sethy ◽  
Prashanta Kumar Patra

Offline handwritten recognition system for Odia characters has received attention in the last few years. Although the recent research showed that there has been lots of work reported in different language, there is limited research carried out in Odia character recognition. Most of Odia characters are round in nature, similar in orientation and size also, which increases the ambiguity among characters. This chapter has harnessed the rectangle histogram-oriented gradient (R-HOG) for feature extraction method along with the principal component analysis. This gradient-based approach has been able to produce relevant features of individual ones in to the proposed model and helps to achieve high recognition rate. After certain simulations, the respective analysis of classifier shows that SVM performed better than quadratic. Among them, SVM produces with 98.8% and QC produces 96.8%, respectively, as recognition rate. In addition to it, the authors have also performed the 10-fold cross-validation to make the system more robust.


2021 ◽  
Vol 11 (11) ◽  
pp. 661
Author(s):  
Marah Alhalabi ◽  
Mohammed Ghazal ◽  
Fasila Haneefa ◽  
Jawad Yousaf ◽  
Ayman El-Baz

Resolving circuit diagrams is a regular part of learning for school and university students from engineering backgrounds. Simulating circuits is usually done manually by creating circuit diagrams on circuit tools, which is a time-consuming and tedious process. We propose an innovative method of simulating circuits from hand-drawn diagrams using smartphones through an image recognition system. This method allows students to use their smartphones to capture images instead of creating circuit diagrams before simulation. Our contribution lies in building a circuit recognition system using a deep learning capsule networks algorithm. The developed system receives an image captured by a smartphone that undergoes preprocessing, region proposal, classification, and node detection to get a Netlist and exports it to a circuit simulator program for simulation. We aim to improve engineering education using smartphones by (1) achieving higher accuracy using less training data with capsule networks and (2) developing a comprehensive system that captures hand-drawn circuit diagrams and produces circuit simulation results. We use 400 samples per class and report an accuracy of 96% for stratified 5-fold cross-validation. Through testing, we identify the optimum distance for taking circuit images to be 10 to 20 cm. Our proposed model can identify components of different scales and rotations.


Author(s):  
Manish M. Kayasth ◽  
Bharat C. Patel

The entire character recognition system is logically characterized into different sections like Scanning, Pre-processing, Classification, Processing, and Post-processing. In the targeted system, the scanned image is first passed through pre-processing modules then feature extraction, classification in order to achieve a high recognition rate. This paper describes mainly on Feature extraction and Classification technique. These are the methodologies which play an important role to identify offline handwritten characters specifically in Gujarati language. Feature extraction provides methods with the help of which characters can identify uniquely and with high degree of accuracy. Feature extraction helps to find the shape contained in the pattern. Several techniques are available for feature extraction and classification, however the selection of an appropriate technique based on its input decides the degree of accuracy of recognition. 


Sign in / Sign up

Export Citation Format

Share Document