Studi Perbandingan Algoritma Pencarian String dalam Metode Approximate String Matching  untuk Identifikasi Kesalahan Pengetikan Teks

Yeny Rochmawati; Retno Kusumaningrum

doi:10.24002/jbi.v7i2.491

Studi Perbandingan Algoritma Pencarian String dalam Metode Approximate String Matching untuk Identifikasi Kesalahan Pengetikan Teks

Jurnal Buana Informatika ◽

10.24002/jbi.v7i2.491 ◽

2016 ◽

Vol 7 (2) ◽

Cited By ~ 1

Author(s):

Yeny Rochmawati ◽

Retno Kusumaningrum

Keyword(s):

Hamming Distance ◽

String Matching ◽

Mean Average Precision ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Average Precision ◽

Relevance Judgments ◽

Typing Error ◽

The Mean ◽

Distance Hamming

Abstract. Error typing resulting in the change of standard words into non-standard words are often caused by misspelling. This can be addressed by developing a system to identify errors in typing. Approximate string matching is one method that is widely implemented to identify error typing by using several string search algorithms, i.e. Levenshtein Distance, Hamming Distance, Damerau Levenshtein Distance and Jaro Winkler Distance. However, there is no study that compares the performance of the four algorithms.Â Therefore, this research aims to compare the performance between the four algorithms in order to identify which algorithm is the most accurate and precise in the search string based on various errors typing. Evaluation is performed by using usersâ€™ relevance judgments which produce the mean average precision (MAP) to determine the best algorithm. The result shows that Jaro Winkler Distance algorithm is the best in word-checking with 0.87 of MAP value when identifying the typing error of 50 incorrect words.Keywords: Errors typing, Levenshtein, Hamming, Damerau Levenshtein, Jaro WinklerÂ Abstrak. Kesalahan pengetikan mengakibatkan kata baku berubah menjadi kata tidak baku karena ejaan yang digunakan tidak sesuai. Hal tersebut dapat ditangani dengan mengembangkan sistem untuk mengidentifikasi kesalahan pengetikan. Metode approximate string matching merupakan salah satu metode yang banyak diterapkan untuk mengidentifikasi kesalahan pengetikan dengan berbagai jenis algoritma pencarian string yaitu Levenshtein Distance, Hamming Distance, Damerau Levenshtein Distance dan Jaro Winkler Distance. Akan tetapi studi perbandingan kinerja dari keempat algoritma tersebut untuk Bahasa Indonesia belum pernah dilakukan. Oleh karena itu penelitian ini bertujuan untuk melakukan studi perbandingan kinerja dari keempat algoritma tersebut sehingga dapat diketahui algoritma mana yang lebih akurat dan tepat dalam pencarian string berdasarkan kesalahan penulisan yang bervariasi. Evaluasi yang dilakukan menggunakan user relevance judgement yang menghasilkan nilai mean average precision (MAP) untuk menentukan algoritma yang terbaik. Hasil penelitian terhadap 50 kata salah menunjukkan bahwa algoritma Jaro Winkler Distance terbaik dalam melakukan pengecekan kata dengan nilai MAP sebesar 0,87.Kata Kunci: Kesalahan pengetikan, Levenshtein, Hamming, Damerau Levenshtein, Jaro Winkler

Download Full-text

Designing a word recommendation application using the Levenshtein Distance algorithm

Matrix Jurnal Manajemen Teknologi dan Informatika ◽

10.31940/matrix.v11i2.2419 ◽

2021 ◽

Vol 11 (2) ◽

pp. 63-70

Author(s):

Nadhia Nurin Syarafina ◽

◽

Jozua Ferjanus Palandi ◽

Keyword(s):

String Matching ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Test Results ◽

Average Accuracy ◽

Typing Error ◽

Word Spelling ◽

Written Word ◽

Word Writing ◽

High Level

Good scriptwriting or reporting requires a high level of accuracy. The basic problem is that the level of accuracy of the authors is not the same. The low level of accuracy allows for mistyping of words in a sentence. Typing errors caused the word to become non-standard. Even worse, the word became meaningless. In this case, the recommendation application serves to provide word-writing recommendations in case of a typing error. This application can reduce the error rate of the writer when typing. One method to improve word spelling is Approximate String Matching. This method applies an approach to the string search process. The Levenshtein Distance algorithm is a part of the Approximate String-Matching method. This method, firstly, is necessary to go through the preprocessing stage to correct an incorrectly written word using the Levenshtein Distance algorithm. The application testing phase uses ten texts composed of 100 words, ten texts composed of 100 to 250 words, and ten texts composed of 250 to 500 words. The average accuracy rate of these test results was 95%, 94%, and 90%.

Download Full-text

Pencarian Question-Answer Menggunakan Convolutional Neural Network Pada Topik Agama Berbahasa Indonesia

Jurnal ULTIMATICS ◽

10.31937/ti.v10i1.842 ◽

2018 ◽

Vol 10 (1) ◽

pp. 57-64 ◽

Cited By ~ 1

Author(s):

Rizqa Raaiqa Bintana ◽

Chastine Fatichah ◽

Diana Purwitasari

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Question Answering ◽

Mean Average Precision ◽

Average Precision ◽

Community Based ◽

Network Methods ◽

The Mean ◽

Index Terms ◽

Search Information

Community-based question answering (CQA) is formed to help people who search information that they need through a community. One condition that may occurs in CQA is when people cannot obtain the information that they need, thus they will post a new question. This condition can cause CQA archive increased because of duplicated questions. Therefore, it becomes important problems to find semantically similar questions from CQA archive towards a new question. In this study, we use convolutional neural network methods for semantic modeling of sentence to obtain words that they represent the content of documents and new question. The result for the process of finding the same question semantically to a new question (query) from the question-answer documents archive using the convolutional neural network method, obtained the mean average precision value is 0,422. Whereas by using vector space model, as a comparison, obtained mean average precision value is 0,282. Index Terms—community-based question answering, convolutional neural network, question retrieval

Download Full-text

Position-restricted approximate string matching with metric Hamming distance

2017 IEEE International Conference on Big Data and Smart Computing (BigComp) ◽

10.1109/bigcomp.2017.7881724 ◽

2017 ◽

Author(s):

Sung-Hwan Kim ◽

Hwan-Gue Cho

Keyword(s):

Hamming Distance ◽

String Matching ◽

Approximate String Matching

Download Full-text

Correction to: New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance

The Journal of Supercomputing ◽

10.1007/s11227-018-2324-7 ◽

2018 ◽

Vol 74 (5) ◽

pp. 1835-1835

Author(s):

ThienLuan Ho ◽

Seung-Rohk Oh ◽

HyunJin Kim

Keyword(s):

Hamming Distance ◽

String Matching ◽

Approximate String Matching ◽

Fixed Length ◽

New Algorithms

Download Full-text

Generalised Implementation for Fixed-Length Approximate String Matching under Hamming Distance and Applications

2015 IEEE International Parallel and Distributed Processing Symposium Workshop ◽

10.1109/ipdpsw.2015.106 ◽

2015 ◽

Cited By ~ 4

Author(s):

Solon Pissis ◽

Ahmad Retha

Keyword(s):

Hamming Distance ◽

String Matching ◽

Approximate String Matching ◽

Fixed Length

Download Full-text

A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

PLoS ONE ◽

10.1371/journal.pone.0186251 ◽

2017 ◽

Vol 12 (10) ◽

pp. e0186251 ◽

Cited By ~ 11

Author(s):

ThienLuan Ho ◽

Seung-Rohk Oh ◽

HyunJin Kim

Keyword(s):

Graphics Processing Units ◽

String Matching ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Graphics Processing

Download Full-text

Mice Tracking Using The YOLO Algorithm

10.5753/semish.2020.11326 ◽

2020 ◽

Author(s):

Richardson Santiago Teles De Menezes ◽

John Victor Alves Luiz ◽

Aron Miranda Henrique-Alves ◽

Rossana Moreno Santa Cruz ◽

Helton Maia

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Behavioral Neuroscience ◽

High Accuracy ◽

Mean Average Precision ◽

Computational Tool ◽

Training Set ◽

Average Precision ◽

The Mean

The computational tool developed in this study is based on convolutional neural networks and the You Only Look Once (YOLO) algorithm for detecting and tracking mice in videos recorded during behavioral neuroscience experiments. We analyzed a set of data composed of 13622 images, made up of behavioral videos of three important researches in this area. The training set used 50% of the images, 25% for validation, and 25% for the tests. The results show that the mean Average Precision (mAP) reached by the developed system was 90.79% and 90.75% for the Full and Tiny versions of YOLO, respectively. Considering the high accuracy of the results, the developed work allows the experimentalists to perform mice tracking in a reliable and non-evasive way.

Download Full-text

Deep learning-based detection of dental prostheses and restorations

Scientific Reports ◽

10.1038/s41598-021-81202-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Toshihito Takahashi ◽

Kazunori Nozaki ◽

Tomoya Gonda ◽

Tomoaki Mameno ◽

Kazunori Ikebe

Keyword(s):

Deep Learning ◽

High Accuracy ◽

Mean Average Precision ◽

Learning Performance ◽

Average Precision ◽

Tooth Color ◽

Photographic Images ◽

Dental Prostheses ◽

Learning Procedure ◽

The Mean

AbstractThe purpose of this study is to develop a method for recognizing dental prostheses and restorations of teeth using a deep learning. A dataset of 1904 oral photographic images of dental arches (maxilla: 1084 images; mandible: 820 images) was used in the study. A deep-learning method to recognize the 11 types of dental prostheses and restorations was developed using TensorFlow and Keras deep learning libraries. After completion of the learning procedure, the average precision of each prosthesis, mean average precision, and mean intersection over union were used to evaluate learning performance. The average precision of each prosthesis varies from 0.59 to 0.93. The mean average precision and mean intersection over union of this system were 0.80 and 0.76, respectively. More than 80% of metallic dental prostheses were detected correctly, but only 60% of tooth-colored prostheses were detected. The results of this study suggest that dental prostheses and restorations that are metallic in color can be recognized and predicted with high accuracy using deep learning; however, those with tooth color are recognized with moderate accuracy.

Download Full-text

New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance

The Journal of Supercomputing ◽

10.1007/s11227-017-2192-6 ◽

2017 ◽

Vol 74 (5) ◽

pp. 1815-1834 ◽

Cited By ~ 1

Author(s):

ThienLuan Ho ◽

Seung-Rohk Oh ◽

HyunJin Kim

Keyword(s):

Hamming Distance ◽

String Matching ◽

Approximate String Matching ◽

Fixed Length ◽

New Algorithms

Download Full-text

BDD-BASED ANALYSIS OF GAPPED q-GRAM FILTERS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054105003698 ◽

2005 ◽

Vol 16 (06) ◽

pp. 1121-1134 ◽

Cited By ~ 2

Author(s):

MARC FONTAINE ◽

STEFAN BURKHARDT ◽

JUHA KÄRKKÄINEN

Keyword(s):

Hamming Distance ◽

String Matching ◽

Combinatorial Problem ◽

Good Choice ◽

Design Parameters ◽

Second Step ◽

Approximate String Matching ◽

Binary Decision ◽

New Class ◽

Important Design

Recently, there has been a surge of interest in gapped q-gram filters for approximate string matching. Important design parameters for filters are for example the value of q, the filter-threshold and in particular the shape (aka seed) of the filter. A good choice of parameters can improve the performance of a q-gram filter by orders of magnitude and optimizing these parameters is a nontrivial combinatorial problem. We describe a new method for analyzing gapped q-gram filters. This method is simple and generic. It applies to a variety of filters, overcomes many restrictions that are present in existing algorithms and can easily be extended to new filter variants. To implement our approach, we use an extended version of BDDs (Binary Decision Diagrams), a data structure that efficiently represents sets of bit-strings. In a second step, we define a new class of multi-shape filters and analyze these filters with the BDD-based approach. Experiments show that multi-shape filters can outperform the best single-shape filters, which are currently in use, in many aspects. The BDD-based algorithm is crucial for the design and analysis of these new and better multi-shape filters. Our results apply to the k-mismatches problem, i.e. approximate string matching with Hamming distance.

Download Full-text