Designing a word recommendation application using the Levenshtein Distance algorithm

Good scriptwriting or reporting requires a high level of accuracy. The basic problem is that the level of accuracy of the authors is not the same. The low level of accuracy allows for mistyping of words in a sentence. Typing errors caused the word to become non-standard. Even worse, the word became meaningless. In this case, the recommendation application serves to provide word-writing recommendations in case of a typing error. This application can reduce the error rate of the writer when typing. One method to improve word spelling is Approximate String Matching. This method applies an approach to the string search process. The Levenshtein Distance algorithm is a part of the Approximate String-Matching method. This method, firstly, is necessary to go through the preprocessing stage to correct an incorrectly written word using the Levenshtein Distance algorithm. The application testing phase uses ten texts composed of 100 words, ten texts composed of 100 to 250 words, and ten texts composed of 250 to 500 words. The average accuracy rate of these test results was 95%, 94%, and 90%.

Download Full-text

Studi Perbandingan Algoritma Pencarian String dalam Metode Approximate String Matching untuk Identifikasi Kesalahan Pengetikan Teks

Jurnal Buana Informatika ◽

10.24002/jbi.v7i2.491 ◽

2016 ◽

Vol 7 (2) ◽

Cited By ~ 1

Author(s):

Yeny Rochmawati ◽

Retno Kusumaningrum

Keyword(s):

Hamming Distance ◽

String Matching ◽

Mean Average Precision ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Average Precision ◽

Relevance Judgments ◽

Typing Error ◽

The Mean ◽

Distance Hamming

Abstract. Error typing resulting in the change of standard words into non-standard words are often caused by misspelling. This can be addressed by developing a system to identify errors in typing. Approximate string matching is one method that is widely implemented to identify error typing by using several string search algorithms, i.e. Levenshtein Distance, Hamming Distance, Damerau Levenshtein Distance and Jaro Winkler Distance. However, there is no study that compares the performance of the four algorithms.Â Therefore, this research aims to compare the performance between the four algorithms in order to identify which algorithm is the most accurate and precise in the search string based on various errors typing. Evaluation is performed by using usersâ€™ relevance judgments which produce the mean average precision (MAP) to determine the best algorithm. The result shows that Jaro Winkler Distance algorithm is the best in word-checking with 0.87 of MAP value when identifying the typing error of 50 incorrect words.Keywords: Errors typing, Levenshtein, Hamming, Damerau Levenshtein, Jaro WinklerÂ Abstrak. Kesalahan pengetikan mengakibatkan kata baku berubah menjadi kata tidak baku karena ejaan yang digunakan tidak sesuai. Hal tersebut dapat ditangani dengan mengembangkan sistem untuk mengidentifikasi kesalahan pengetikan. Metode approximate string matching merupakan salah satu metode yang banyak diterapkan untuk mengidentifikasi kesalahan pengetikan dengan berbagai jenis algoritma pencarian string yaitu Levenshtein Distance, Hamming Distance, Damerau Levenshtein Distance dan Jaro Winkler Distance. Akan tetapi studi perbandingan kinerja dari keempat algoritma tersebut untuk Bahasa Indonesia belum pernah dilakukan. Oleh karena itu penelitian ini bertujuan untuk melakukan studi perbandingan kinerja dari keempat algoritma tersebut sehingga dapat diketahui algoritma mana yang lebih akurat dan tepat dalam pencarian string berdasarkan kesalahan penulisan yang bervariasi. Evaluasi yang dilakukan menggunakan user relevance judgement yang menghasilkan nilai mean average precision (MAP) untuk menentukan algoritma yang terbaik. Hasil penelitian terhadap 50 kata salah menunjukkan bahwa algoritma Jaro Winkler Distance terbaik dalam melakukan pengecekan kata dengan nilai MAP sebesar 0,87.Kata Kunci: Kesalahan pengetikan, Levenshtein, Hamming, Damerau Levenshtein, Jaro Winkler

Download Full-text

Kombinasi Damerau Levenshtein dan Jaro-Winkler Distance Untuk Koreksi Kata Bahasa Inggris

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v6i2.2493 ◽

2020 ◽

Vol 6 (2) ◽

Author(s):

Bonifacius Vicky Indriyono

Keyword(s):

String Matching ◽

Levenshtein Distance ◽

Test Results ◽

Spelling Errors ◽

Word Spelling ◽

English Spelling

Writing is one of the efforts made by the writer to express ideas and ideas to others. But sometimes when writing, there are many errors in typing spelling, especially English spelling, resulting in errors in capturing the meaning and meaning of the writing. To overcome this problem, we need a system that can detect word spelling errors. Damerau Levenshtein and Jaro Winkler Distance Algorithms are algorithms that can be used as a solution to detect English typing errors. From the test results, it can be concluded that the Damerau Levenshtein and Jaro-Winkler Distance are able to optimally detect word mismatches and look for similarities of words compared. The Damerau Levenshtein Distance works by finding the smallest distance value, while the Jaro-Winkler Distance works by finding the greatest proximity value of the string being compared. Using this algorithm, errors in writing the spelling of words can be minimized. Keywords— Algorithm; Damerau Levenshtein; Jaro Winkler; Spelling Cheker; String Matching.

Download Full-text

A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

PLoS ONE ◽

10.1371/journal.pone.0186251 ◽

2017 ◽

Vol 12 (10) ◽

pp. e0186251 ◽

Cited By ~ 11

Author(s):

ThienLuan Ho ◽

Seung-Rohk Oh ◽

HyunJin Kim

Keyword(s):

Graphics Processing Units ◽

String Matching ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Graphics Processing

Download Full-text

LEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties

10.1101/133157 ◽

2017 ◽

Author(s):

Hongyi Xin ◽

Jeremie Kim ◽

Sunny Nahar ◽

Can Alkan ◽

Onur Mutlu

Keyword(s):

State Of The Art ◽

String Matching ◽

The State ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Matching Problem ◽

De Bruijn Sequence ◽

Scoring Schemes ◽

Bit Vector ◽

Selection Of

AbstractMotivationApproximate String Matching is a pivotal problem in the field of computer science. It serves as an integral component for many string algorithms, most notably, DNA read mapping and alignment. The improved LV algorithm proposes an improved dynamic programming strategy over the banded Smith-Waterman algorithm but suffers from support of a limited selection of scoring schemes. In this paper, we propose the Leaping Toad problem, a generalization of the approximate string matching problem, as well as LEAP, a generalization of the Landau-Vishkin’s algorithm that solves the Leaping Toad problem under a broader selection of scoring schemes.ResultsWe benchmarked LEAP against 3 state-of-the-art approximate string matching implementations. We show that when using a bit-vectorized de Bruijn sequence based optimization, LEAP is up to 7.4x faster than the state-of-the-art bit-vector Levenshtein distance implementation and up to 32x faster than the state-of-the-art affine-gap-penalty parallel Needleman Wunsch Implementation.AvailabilityWe provide an implementation of LEAP in C++ at github.com/CMU-SAFARI/[email protected], [email protected] or [email protected]

Download Full-text

THE INFLUENCE OF INQUIRY AND DISCOVERY LEARNING METHOD AND CREATIVITY LEVEL IN WRITING SKILLS DESCRIPTION IN NAUTIC TARUNA AND TECHNIQUE IN SURABAYA SHIPPING POLYTECHNIC

SASTRANESIA Jurnal Program Studi Pendidikan Bahasa dan Sastra Indonesia ◽

10.32682/sastranesia.v8i1.1372 ◽

2020 ◽

Vol 8 (1) ◽

Author(s):

Ni Putu Dian Permata Prasetyaningrum

Keyword(s):

Quantitative Methods ◽

Writing Skills ◽

Discovery Learning ◽

Learning Method ◽

Test Results ◽

Learning Methods ◽

Is Research ◽

Investigation Methods ◽

High Level ◽

Discovery Method

Surabaya Shipping Polytechnic emphasizes on certain areas of expertise that Taruna must possess. This is the basis after graduating from shipping polytechnics, cadets must have expertise and skills. The purpose of this study was to study the effect of inquiry, discovery learning, and creativity levels on the ability to write descriptive essays on nautical and technical cadets at Surabaya Shipping Polytechnic. This type of research is research. This research uses quantitative methods using experiments. The location used in this research is Surabaya Shipping Polytechnic. The subjects in this study were the cadets of the Nautika A, Nautika B, Teknika A, and Teknika B. classes. Based on the results of the research and discussion, the following conclusions are obtained: There are those that can be solved looking for description essays in the cadets. learning discovery method. The test results show better investigation methods than the discovery of learning, There is a difference in the ability to write a description essay about cadets who have a high level of creativity with cadets who have a low level of creativity, the test results show better who have a high level of creativity, there are related with learning methods and descriptions of the ability to write essay descriptions, the test results show learning methods and creativity descriptions of the ability to write essay descriptions.

Download Full-text

Children’s Astronomy. Development of the Shape of the Earth Concept in Polish Children between 5 and 10 Years of Age

Education Sciences ◽

10.3390/educsci11020075 ◽

2021 ◽

Vol 11 (2) ◽

pp. 75

Author(s):

Jan Amos Jelinek

Keyword(s):

Mental Model ◽

Cultural Influences ◽

Test Results ◽

The Earth ◽

Spherical Earth ◽

The Individual ◽

High Level ◽

Cognitive Problems ◽

Teaching Situations ◽

Model Approach

The Earth’s shape concept develops as consecutive cognitive problems (e.g., the location of people and trees on the spherical Earth) are gradually resolved. Establishing the order of problem solving may be important for the organisation of teaching situations. This study attempted to determine the sequence of problems to be resolved based on tasks included in the EARTH2 test. The study covered a group of 444 children between 5 and 10 years of age. It captured the order in which children solve cognitive problems on the way to constructing a science-like concept. The test results were compared with previous studies. The importance of cultural influences connected to significant differences (24%) in test results was emphasised. Attention was drawn to the problem of the consistency of the mental model approach highlighted in the literature. The analysis of the individual sets of answers provided a high level of consistency of indications referring to the same model (36%), emphasising the importance of the concept of mental models.

Download Full-text

Hybrid-Data Approach for Estimating Trip Purposes

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211018474 ◽

2021 ◽

pp. 036119812110184

Author(s):

Xiaoling Luo ◽

Adrian Cottam ◽

Yao-Jan Wu ◽

Yangsheng Jiang

Keyword(s):

Transportation Systems ◽

Data Driven ◽

Trip Purpose ◽

Additional Information ◽

Point Of Interest ◽

Average Accuracy ◽

Hybrid Data ◽

Added Benefit ◽

Data Source ◽

High Level

Trip purpose information plays a significant role in transportation systems. Existing trip purpose information is traditionally collected through human observation. This manual process requires many personnel and a large amount of resources. Because of this high cost, automated trip purpose estimation is more attractive from a data-driven perspective, as it could improve the efficiency of processes and save time. Therefore, a hybrid-data approach using taxi operations data and point-of-interest (POI) data to estimate trip purposes was developed in this research. POI data, an emerging data source, was incorporated because it provides a wealth of additional information for trip purpose estimation. POI data, an open dataset, has the added benefit of being readily accessible from online platforms. Several techniques were developed and compared to incorporate this POI data into the hybrid-data approach to achieve a high level of accuracy. To evaluate the performance of the approach, data from Chengdu, China, were used. The results show that the incorporation of POI information increases the average accuracy of trip purpose estimation by 28% compared with trip purpose estimation not using the POI data. These results indicate that the additional trip attributes provided by POI data can increase the accuracy of trip purpose estimation.

Download Full-text

Generation of an EDS Key Based on a Graphic Image of a Subject’s Face Using the RC4 Algorithm

Information ◽

10.3390/info12010019 ◽

2021 ◽

Vol 12 (1) ◽

pp. 19

Author(s):

Alexey Semenkov ◽

Dmitry Bragin ◽

Yakov Usoltsev ◽

Anton Konev ◽

Evgeny Kostuchenko

Keyword(s):

Image Recognition ◽

Random Number ◽

Statistical Test ◽

The Other ◽

Test Results ◽

Random Generation ◽

Facial Image ◽

Random Number Generators ◽

Graphic Image ◽

High Level

Modern facial recognition algorithms make it possible to identify system users by their appearance with a high level of accuracy. In such cases, an image of the user’s face is converted to parameters that later are used in a recognition process. On the other hand, the obtained parameters can be used as data for pseudo-random number generators. However, the closeness of the sequence generated by such a generator to a truly random one is questionable. This paper proposes a system which is able to authenticate users by their face, and generate pseudo-random values based on the facial image that will later serve to generate an encryption key. The generator of a random value was tested with the NIST Statistical Test Suite. The subsystem of image recognition was also tested under various conditions of taking the image. The test results of the random value generator show a satisfactory level of randomness, i.e., an average of 0.47 random generation (NIST test), with 95% accuracy of the system as a whole.

Download Full-text

Incremental maintenance of length normalized indexes for approximate string matching

Proceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09 ◽

10.1145/1559845.1559891 ◽

2009 ◽

Cited By ~ 23

Author(s):

Marios Hadjieleftheriou ◽

Nick Koudas ◽

Divesh Srivastava

Keyword(s):

String Matching ◽

Approximate String Matching ◽

Incremental Maintenance

Download Full-text

Bearing Capacity Characteristic of Unsaturated Granular Soils

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.261-263.989 ◽

2011 ◽

Vol 261-263 ◽

pp. 989-993 ◽

Cited By ~ 4

Author(s):

Anuchit Uchaipichat ◽

Ekachai Man Koksung

Keyword(s):

Bearing Capacity ◽

Matric Suction ◽

Ultimate Bearing Capacity ◽

Experimental Program ◽

Granular Soils ◽

Test Results ◽

High Level ◽

Unsaturated Granular Soils ◽

Capacity Characteristic

An experimental program of laboratory bearing tests was performed to characterize the bearing capacity of foundation on unsaturated granular soils. All tests were performed by pushing a circular rod on the surface of compacted sand specimens with different values of matric suction until failure. The test results show an increase in ultimate bearing capacity with increasing matric suction at low suction value but a decrease in that at high level of suction. The comparisons between the test results and simulations using the expressions proposed in this paper are presented and discussed. Good agreements are achieved for all testing values of suction.

Download Full-text