Relating the Average-Case Costs of the Brute-Force and Knuth-Morris-Pratt String Matching Algorithm

Author(s):  
Gerhard Barth
2021 ◽  
Vol 1 (2) ◽  
pp. 54-60
Author(s):  
Candra Irawan ◽  
Mudafiq Riyan Pratama

String matching is an algorithm for matching a text to another text or also known as a text search. There are several algorithms that can be used for string matching, including the Boyer-Moore algorithm and the Brute Force algorithm. The Boyer-Moore algorithm is a string matching algorithm published by Robert S. Boyer and J. Strother Moore in 1977. This algorithm is considered the most efficient algorithm in general applications. The Boyer-Moore algorithm starts matching characters from the pattern on the right. While the Brute Force algorithm is an algorithm that matches a pattern with all text between 0 and n-m to find the existence of a pattern in the text. These two algorithms have different patterns in the search process. In this article, a comparative analysis of the performance of the Boyer-Moore and Brute Force algorithms is carried out in a case study of the search for the Big Indonesian Dictionary (KBBI) based on Android. The search process is carried out by searching based on words and word descriptions. The results of this study indicate that the criteria for running time, the Brute Force algorithm is faster than the Boyer-Moore algorithm with the total running time of the Brute Force algorithm is 168.3 ms in words, 6994.16 ms in word descriptions, while the Boyer-Moore algorithm for running time reached 304.7 ms on the word, 8654.77 ms on the word description. In the testing criteria based on related keywords, the two algorithms can display the same list of related keywords.


2019 ◽  
Vol 5 (2) ◽  
pp. 140
Author(s):  
Rachmad Fitriyanto ◽  
Anton Yudhana ◽  
Sunardi Sunardi

Management of jpeg/exif file fingerprint with Brute Force string matching algorithm and Hash Function SHA256Metode pengamanan berkas gambar jpeg/exif saat ini hanya mencakup aspek pencegahan, belum pada aspek deteksi integritas data. Digital Signature Algorithm (DSA) adalah metode kriptografi yang digunakan untuk memverifikasi integritas data menggunakan hash value. SHA256 merupakan hash function yang menghasilkan 256-bit hash value yang berfungsi sebagai file fingerprint. Penelitian ini bertujuan untuk menyusun file fingerprint dari berkas jpeg/exif menggunakan SHA256 dan algoritma Brute Force string matching untuk verifikasi integritas berkas jpeg/exif. Penelitian dilakukan dalam lima tahap. Tahap pertama adalah identifikasi struktur berkas jpeg/exif. Tahap kedua adalah akuisisi konten segmen. Tahap ketiga penghitungan hash value. Tahap keempat adalah eksperimen modifikasi berkas jpeg/exif. Tahap kelima adalah pemilihan elemen dan penyusunan file fingerprint. Hasil penelitian menunjukkan sebuah jpeg/exif file fingerprint tersusun atas tiga hash value. SOI (Start of Image) segment hash value digunakan untuk mendeteksi terjadinya modifikasi berkas dalam bentuk perubahan tipe berkas dan penambahan objek pada konten gambar. Hash value segmen APP1 digunakan untuk mendeteksi modifikasi pada metadata berkas. Hash value segmen SOF0 digunakan untuk mendeteksi gambar yang dimodifikasi dengan teknik recoloring, resizing, dan cropping. The method of securing jpeg/exif image files currently has covered only the prevention aspect instead of the data integrity detection aspect. Digital Signature Algorithm is a cryptographic method used to verify the data integrity using hash value. SHA256 is a hash function that produces a 256-bit hash value functioning as a fingerprint file. This study aimed at compiling fingerprint files from jpeg/exif files using SHA256 and Brute Force string matching algorithm to verify the integrity of jpeg/exif files. The research was conducted in five steps. The first step was identifying the jpeg/exif file structure. The second step was the acquisition of the segment content. The third step was calculating the hash value. The fourth step was the jpeg/exif file modification experiment. The fifth step was the selection of elements and compilation of fingerprint files. The obtained results showed a jpeg/exif fingerprint file which was compiled in three hash values. The hash value of SOI segment was used to detect the occurrence of file modification in the form of file type changing and object addition on the image content. The hash value of APP1 segment was used to detect the metadata file modification. The hash value of SOF0 segment was used to detect the images modified by recoloring, resizing, and cropping techniques.


2020 ◽  
pp. 298-324
Author(s):  
Abdulrakeeb M. Al-Ssulami ◽  
Hassan I. Mathkour ◽  
Mohammed Amer Arafah

The exact string matching is essential in application areas such as Bioinformatics and Intrusion Detection Systems. Speeding-up the string matching algorithm will therefore result in accelerating the searching process in DNA and binary data. Previously, there are two types of fast algorithms exist, bit-parallel based algorithms and hashing algorithms. The bit-parallel based are efficient when dealing with patterns of short lengths, less than 64, but slow on long patterns. On the other hand, hashing algorithms have optimal sublinear average case on large alphabets and long patterns, but the efficiency not so good on small alphabet such as DNA and binary texts. In this paper, the authors present hybrid algorithm to overcome the shortcomings of those previous algorithms. The proposed algorithm is based on q-gram hashing with guaranteeing the maximal shift in advance. Experimental results on random and complete human genome confirm that the proposed algorithm is efficient on various pattern lengths and small alphabet.


2017 ◽  
Vol 13 (4) ◽  
pp. 198-220
Author(s):  
Abdulrakeeb M. Al-Ssulami ◽  
Hassan Mathkour ◽  
Mohammed Amer Arafah

The exact string matching is essential in application areas such as Bioinformatics and Intrusion Detection Systems. Speeding-up the string matching algorithm will therefore result in accelerating the searching process in DNA and binary data. Previously, there are two types of fast algorithms exist, bit-parallel based algorithms and hashing algorithms. The bit-parallel based are efficient when dealing with patterns of short lengths, less than 64, but slow on long patterns. On the other hand, hashing algorithms have optimal sublinear average case on large alphabets and long patterns, but the efficiency not so good on small alphabet such as DNA and binary texts. In this paper, the authors present hybrid algorithm to overcome the shortcomings of those previous algorithms. The proposed algorithm is based on q-gram hashing with guaranteeing the maximal shift in advance. Experimental results on random and complete human genome confirm that the proposed algorithm is efficient on various pattern lengths and small alphabet.


2015 ◽  
Vol 27 (2) ◽  
pp. 143-156 ◽  
Author(s):  
TANVER ATHAR ◽  
CARL BARTON ◽  
WIDMER BLAND ◽  
JIA GAO ◽  
COSTAS S. ILIOPOULOS ◽  
...  

Circular string matching is a problem which naturally arises in many contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal worst- and average-case algorithms for circular string matching. Here, we present a suboptimal average-case algorithm for circular string matching requiring time $\mathcal{O}$(n) and space $\mathcal{O}$(m). The importance of our contribution is underlined by the fact that the proposed algorithm can be easily adapted to deal with circular dictionary matching. In particular, we show how the circular dictionary-matching problem can be solved in average-case time $\mathcal{O}$(n + M) and space $\mathcal{O}$(M), where M is the total length of the dictionary patterns, assuming that the shortest pattern is sufficiently long. Moreover, the presented average-case algorithms and other worst-case approaches were also implemented. Experimental results, using real and synthetic data, demonstrate that the implementation of the presented algorithms can accelerate the computations by more than a factor of two compared to the corresponding implementation of other approaches.


Author(s):  
Fince Tinus Waruwu ◽  
Putri Ramadhani

Translator is a tool to assist in translating a language from another language. Translator is very useful for tourists when traveling to other countries or other regions that do not understand the language of the country or area visited. Translator is also very useful for a student or student to learn and add knowledge about foreign languages that are not understood. The availability of translators in a university to improve services for students or those who need them. Students don't all understand foreign languages. String matching is a search technique in a text that is often called string search. String matching is often used in text search, location search, dictionary and so on. String matching has several algorithms, including brute force, boyer moore, knuth morris pratt and many more algorithms contained in string mathing. The Indonesian to English translator application uses the string mathing brute force algorithm. Brute force algorithm is matching each character from left to right.Keywords: Translator, String Matching, Algorithm, Brute Force


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Anis Zouaghi ◽  
Mounir Zrigui ◽  
Georges Antoniadis ◽  
Laroussi Merhbene

We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.


Sign in / Sign up

Export Citation Format

Share Document