scholarly journals Optical character recognition system for Baybayin scripts using support vector machine

2021 ◽  
Vol 7 ◽  
pp. e360
Author(s):  
Rodney Pino ◽  
Renier Mendoza ◽  
Rachelle Sambayan

In 2018, the Philippine Congress signed House Bill 1022 declaring the Baybayin script as the Philippines’ national writing system. In this regard, it is highly probable that the Baybayin and Latin scripts would appear in a single document. In this work, we propose a system that discriminates the characters of both scripts. The proposed system considers the normalization of an individual character to identify if it belongs to Baybayin or Latin script and further classify them as to what unit they represent. This gives us four classification problems, namely: (1) Baybayin and Latin script recognition, (2) Baybayin character classification, (3) Latin character classification, and (4) Baybayin diacritical marks classification. To the best of our knowledge, this is the first study that makes use of Support Vector Machine (SVM) for Baybayin script recognition. This work also provides a new dataset for Baybayin, its diacritics, and Latin characters. Classification problems (1) and (4) use binary SVM while (2) and (3) apply the multiclass SVM classification. On average, our numerical experiments yield satisfactory results: (1) has 98.5% accuracy, 98.5% precision, 98.49% recall, and 98.5% F1 Score; (2) has 96.51% accuracy, 95.62% precision, 95.61% recall, and 95.62% F1 Score; (3) has 95.8% accuracy, 95.85% precision, 95.8% recall, and 95.83% F1 Score; and (4) has 100% accuracy, 100% precision, 100% recall, and 100% F1 Score.

2010 ◽  
Vol 20 (1) ◽  
pp. 17-25 ◽  
Author(s):  
Alireza Behrad ◽  
Malike Khoddami ◽  
Mehdi Salehpour

Optical character recognition is an important task for converting handwritten and printed documents to digital format. In multilingual systems, a necessary process before OCR algorithm is script identification. In this paper novel methods for the script language identification and the recognition of Farsi handwritten digits are proposed. Our method for script identification is based on curvature scale space features. The proposed features are rotation and scale invariant and can be used to identify scripts with different fonts. We assumed that the bilingual scripts may have Farsi and English words and characters together; therefore the algorithm is designed to be able to recognize scripts in the connected components level. The output of the recognition is then generalized to word, line and page levels. We used cluster based weighted support vector machine for the classification and recognition of Farsi handwritten digits that is reasonably robust against rotation and scaling. The algorithm extracts the required features using principle component analysis (PCA) and linear discrimination analysis (LDA) algorithms. The extracted features are then classified using a new classification algorithm called cluster based weighted SVM (CBWSVM). The experimental results showed the promise of the algorithms.


Author(s):  
Eko Sanjaya ◽  
Agi Prasetiadi ◽  
WAHYU ANDI SAPUTRA

Meme merupakan penyebaran informasi dalam bentuk gambar. Berdasarkan data yang diperoleh, pengembangan meme mulai meningkat menjelang pemilu 2019. Informasi yang diperoleh dari meme politik beragam. Salah satunya memberikan dukungan untuk suatu partai atau tokoh politik atau digunakan untuk mengkritik / mencaci-maki partai politik atau tokoh. Sehingga diperlukan suatu sistem yang dapat mengklasifikasikan meme berdasarkan kelas Penelitian ini bertujuan untuk menciptakan sistem yang dapat mengklasifikasikan meme politik berdasarkan kelas. Algoritma yang akan digunakan dalam mengklasifikasikan adalah Support vector macine (SVM) dengan ekstraksi fitur TF-IDF. Library yang akan digunakan dalam optical character recognition (OCR) adalah Tesseract. Berdasarkan hasil pengujian diketahui bahwa akurasi yang dihasilkan oleh SVM linier lebih baik daripada SVM non-linear. Akurasi terbaik dalam SVM linear dengan kombinasi TF-IDF adalah 75.71%.


2017 ◽  
Vol 116 ◽  
pp. 351-357 ◽  
Author(s):  
Michael Reynaldo Phangtriastu ◽  
Jeklin Harefa ◽  
Dian Felita Tanoto

Author(s):  
Fardilla Zardi Putri ◽  
Budhi Irawan ◽  
Umar Ali Ahmad

Pada era global ini menguasai bahasa selain bahasa Indonesia merupakan salah satu kebutuhan penting yang harus dimiliki setiap orang. Banyak orang berkunjung ke negara lain untuk melakukan banyak kegiatan seperti bekerja, belajar, bahkan berlibur. Salah satu negara yang banyak dikunjungi adalah negara Jepang. Negara Jepang memiliki bentuk huruf yang berbeda dengan huruf latin pada umumnya. Untuk mempelajari bahasa Jepang tersebut dibutuhkan pemahaman dengan huruf-hurufnya. Seiring dengan berkembangnya teknologi, pengenalan karakter atau sering Optical Character Recognition (OCR) merupakan salah satu aplikasi teknologi pada bidang pengenalan karakter atau pola dan kecerdasan buatan sebagai mesin pembaca. Pada penelitian ini, akan dirancang sebuah aplikasi penerjemah kata dalam bahasa Jepang berbasis Android dengan memanfaatkan prinsip dasar OCR dengan menggunakan metode Directional Feature Extraction dan Support Vector Machine. Pengujian yang dilakukan memberikan hasil terbaik pada nilai akurasi yang dicapai dengan menggunakan metode Directional Feature Extraction dan Support Vector Machine adalah 85,71%. Pada penelitian ini, menggunakan 104 data latih. Hasil pengujian Beta atas empat poin, yaitu tampilan aplikasi, waktu respons sistem, ketepatan penerjemahan, dan manfaat aplikasi menunjukkan aplikasi dapat diklasifikasikan baik.


Author(s):  
Ritam Guha ◽  
Manosij Ghosh ◽  
Pawan Kumar Singh ◽  
Ram Sarkar ◽  
Mita Nasipuri

AbstractIn any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.


Author(s):  
Htwe Pa Pa Win ◽  
Phyo Thu Thu Khine ◽  
Khin Nwe Ni Tun

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the scripts’ natures. One major contribution of the work in this paper is the design of logically rigorous coding based features. To show the effectiveness of the proposed method, this paper assumed the documents are successfully segmented into characters and extracted features from these isolated Myanmar characters. These features are extracted using structural analysis of the Myanmar scripts. The experimental results have been carried out using the Support Vector Machine (SVM) classifier and compare the pervious proposed feature extraction method.


Sign in / Sign up

Export Citation Format

Share Document