string matching Latest Research Papers

Efficiency of hybrid algorithm for COVID-19 online screening test based on its symptoms

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v25.i1.pp440-449 ◽

2022 ◽

Vol 25 (1) ◽

pp. 440

Author(s):

Mohd Kamir Yusof ◽

Wan Mohd Amir Fazamin Wan Hamzah ◽

Nur Shuhada Md Rusli

Keyword(s):

Data Storage ◽

Hybrid Algorithm ◽

String Matching ◽

Data Retrieval ◽

World Health ◽

Screening Process ◽

System A ◽

Extensible Markup ◽

Health Organization ◽

Hashing Algorithm

The coronavirus COVID-19 is affecting 196 countries and territories around the world. The number of deaths keep on increasing each day because of COVID-19. According to World Health Organization (WHO), infected COVID-19 is slightly increasing day by day and now reach to 570,000. WHO is prefer to conduct a screening COVID-19 test via online system. A suitable approach especially in string matching based on symptoms is required to produce fast and accurate result during retrieving process. Currently, four latest approaches in string matching have been implemented in string matching; characters-based algorithm, hashing algorithm, suffix automation algorithm and hybrid algorithm. Meanwhile, extensible markup language (XML), JavaScript object notation (JSON), asynchronous JavaScript XML (AJAX) and JQuery tehnology has been used widelfy for data transmission, data storage and data retrieval. This paper proposes a combination of algorithm among hybrid, JSON and JQuery in order to produce a fast and accurate results during COVID-19 screening process. A few experiments have been by comparison performance in term of execution time and memory usage using five different collections of datasets. Based on the experiments, the results show hybrid produce better performance compared to JSON and JQuery. Online screening COVID-19 is hopefully can reduce the number of effected and deaths because of COVID.

Get full-text (via PubEx)

A generic approach to detect design patterns in model transformations using a string-matching algorithm

Software & Systems Modeling ◽

10.1007/s10270-021-00936-4 ◽

2021 ◽

Author(s):

Chihab eddine Mokaddem ◽

Houari Sahraoui ◽

Eugene Syriani

Keyword(s):

Design Patterns ◽

String Matching ◽

Model Transformations ◽

Matching Algorithm

Get full-text (via PubEx)

Quantifying Substitutability

10.26686/wgtn.17009885 ◽

2021 ◽

Author(s):

◽

David X. Wang

Keyword(s):

String Matching ◽

Evaluation Process ◽

Approximate String Matching ◽

Keyphrase Extraction ◽

Human Volunteers ◽

Matching Criteria ◽

The Cost ◽

Generic Design ◽

Matching Techniques ◽

Generic System

In this thesis, we will tackle the problem of how keyphrase extraction systems can be evaluated to reveal their true efficacy. The aim is to develop a new semantically-oriented approximate string matching criteria, one that is comparable to human judgements, but without the cost and energy associated with manual evaluation. This matching criteria can also be adapted for any information retrieval (IR) system where the evaluation process involves comparing candidate strings (produced by the IR system) to a gold standard (created by humans). Our contributions are threefold. First, we define a new semantic relationship called substitutability – how suitable a phrase is when used in place of another – and then design a generic system which measures/quantifies this relationship by exploiting the interlinking structure of external knowledge sources. Second, we develop two concrete substitutability systems based on our generic design: WordSub, which is backed by WordNet; and WikiSub, which is backed by Wikipedia. Third, we construct a dataset, with the help of human volunteers, that isolates the task of measuring substitutability. This dataset is then used to evaluate the performance of our substitutability systems, along with existing approximate string matching techniques, by comparing them using a set of agreement metrics. Our results clearly demonstrate that WordSub and WikiSub comfortably outperform current approaches to approximate string matching, including both lexical-based methods, such as R-precision; and semantically-oriented techniques, such as METEOR. In fact, WikiSub’s performance comes sensibly close to that of an average human volunteer, when comparing it to the optimistic (best-case) interhuman agreement.

Get full-text (via PubEx)

Quantifying Substitutability

10.26686/wgtn.17009885.v1 ◽

2021 ◽

Author(s):

◽

David X. Wang

Keyword(s):

String Matching ◽

Evaluation Process ◽

Approximate String Matching ◽

Keyphrase Extraction ◽

Human Volunteers ◽

Matching Criteria ◽

The Cost ◽

Generic Design ◽

Matching Techniques ◽

Generic System

In this thesis, we will tackle the problem of how keyphrase extraction systems can be evaluated to reveal their true efficacy. The aim is to develop a new semantically-oriented approximate string matching criteria, one that is comparable to human judgements, but without the cost and energy associated with manual evaluation. This matching criteria can also be adapted for any information retrieval (IR) system where the evaluation process involves comparing candidate strings (produced by the IR system) to a gold standard (created by humans). Our contributions are threefold. First, we define a new semantic relationship called substitutability – how suitable a phrase is when used in place of another – and then design a generic system which measures/quantifies this relationship by exploiting the interlinking structure of external knowledge sources. Second, we develop two concrete substitutability systems based on our generic design: WordSub, which is backed by WordNet; and WikiSub, which is backed by Wikipedia. Third, we construct a dataset, with the help of human volunteers, that isolates the task of measuring substitutability. This dataset is then used to evaluate the performance of our substitutability systems, along with existing approximate string matching techniques, by comparing them using a set of agreement metrics. Our results clearly demonstrate that WordSub and WikiSub comfortably outperform current approaches to approximate string matching, including both lexical-based methods, such as R-precision; and semantically-oriented techniques, such as METEOR. In fact, WikiSub’s performance comes sensibly close to that of an average human volunteer, when comparing it to the optimistic (best-case) interhuman agreement.

Get full-text (via PubEx)

dh2loop 1.0: an open-source Python library for automated processing and classification of geological logs

Geoscientific Model Development ◽

10.5194/gmd-14-6711-2021 ◽

2021 ◽

Vol 14 (11) ◽

pp. 6711-6740

Author(s):

Ranee Joshi ◽

Kavitha Madaiah ◽

Mark Jessell ◽

Mark Lindsay ◽

Guillaume Pirot

Keyword(s):

Open Source ◽

Mineral Exploration ◽

String Matching ◽

Drill Hole ◽

Geological Survey ◽

Database Structure ◽

Automated Processing ◽

Study Case ◽

Textual Form

Abstract. A huge amount of legacy drilling data is available in geological survey but cannot be used directly as they are compiled and recorded in an unstructured textual form and using different formats depending on the database structure, company, logging geologist, investigation method, investigated materials and/or drilling campaign. They are subjective and plagued by uncertainty as they are likely to have been conducted by tens to hundreds of geologists, all of whom would have their own personal biases. dh2loop (https://github.com/Loop3D/dh2loop, last access: 30 September 2021) is an open-source Python library for extracting and standardizing geologic drill hole data and exporting them into readily importable interval tables (collar, survey, lithology). In this contribution, we extract, process and classify lithological logs from the Geological Survey of Western Australia (GSWA) Mineral Exploration Reports (WAMEX) database in the Yalgoo–Singleton greenstone belt (YSGB) region. The contribution also addresses the subjective nature and variability of the nomenclature of lithological descriptions within and across different drilling campaigns by using thesauri and fuzzy string matching. For this study case, 86 % of the extracted lithology data is successfully matched to lithologies in the thesauri. Since this process can be tedious, we attempted to test the string matching with the comments, which resulted in a matching rate of 16 % (7870 successfully matched records out of 47 823 records). The standardized lithological data are then classified into multi-level groupings that can be used to systematically upscale and downscale drill hole data inputs for multiscale 3D geological modelling. dh2loop formats legacy data bridging the gap between utilization and maximization of legacy drill hole data and drill hole analysis functionalities available in existing Python libraries (lasio, welly, striplog).

Get full-text (via PubEx)

Real time Aho-Corasick Implementation of String Matching technique suitable for smart IOT

10.1109/smartgencon51891.2021.9645846 ◽

2021 ◽

Author(s):

Mohammad Equebal Hussain ◽

Rashid Hussain

Keyword(s):

Real Time ◽

String Matching ◽

Matching Technique

Get full-text (via PubEx)

Food Ingredients Similarity Based on Conceptual and Textual Similarity

Halal Research Journal ◽

10.12962/j22759970.v1i2.107 ◽

2021 ◽

Vol 1 (2) ◽

pp. 87-95

Author(s):

Nur Aini Rakhmawati ◽

Miftahul Jannah

Keyword(s):

Word Meaning ◽

String Matching ◽

Similarity Measures ◽

Food Products ◽

Levenshtein Distance ◽

Conceptual Similarity ◽

Food Ingredients ◽

Jaccard Distance ◽

Similarity Method ◽

Existing Data

Open Food Facts provides a database of food products such as product names, compositions, and additives, where everyone can contribute to add the data or reuse the existing data. The open food facts data are dirty and needs to be processed before storing the data to our system. To reduce redundancy in food ingredients data, we measure the similarity of ingredient food using two similarities: the conceptual similarity and textual similarity. The conceptual similarity measures the similarity between the two datasets by its word meaning (synonym), while the textual similarity is based on fuzzy string matching, namely Levenshtein distance, Jaro-Winkler distance, and Jaccard distance. Based on our evaluation, the combination of similarity measurements using textual and Wordnet similarity (conceptual) was the most optimal similarity method in food ingredients.

Get full-text (via PubEx)

NCBITaxonomy.jl - rapid biological names finding and reconciliation

10.32942/osf.io/uvbfj ◽

2021 ◽

Author(s):

Timothée Poisot ◽

Rory Gibb ◽

Sadie Jane Ryan ◽

Colin Carlson

Keyword(s):

Quality Of Life ◽

Programming Language ◽

String Matching ◽

R Package ◽

Amount Of Information ◽

Manual Curation

NCBITaxonomy.jl is a package designed to facilitate the reconciliation and cleaning of taxonomic names, using a local copy of the NCBI taxonomic backbone (Federhen 2012, Schoch et al. 2020); The basic search functions are coupled with quality-of-life functions including case-insensitive search and custom fuzzy string matching to facilitate the amount of information that can be extracted automatically while allowing efficient manual curation and inspection of results. NCBITaxonomy.jl works with version 1.6 of the Julia programming language (Bezanson et al. 2017), and relies on the Apache Arrow format to store a local copy of the NCBI raw taxonomy files. The design of NCBITaxonomy.jl has been inspired by similar efforts, like the R package taxadb (Norman et al. 2020), which provides an offline alternative to packages like taxize (Chamberlain and Szöcs 2013).

Get full-text (via PubEx)

A string matching based ultra-low complexity lossless screen content coding technique

Multimedia Tools and Applications ◽

10.1007/s11042-021-11418-6 ◽

2021 ◽

Author(s):

Yufen Yang ◽

Tao Lin ◽

Liping Zhao ◽

Kailun Zhou ◽

Shuhui Wang

Keyword(s):

String Matching ◽

Low Complexity ◽

Screen Content ◽

Screen Content Coding

Get full-text (via PubEx)

PEMBOBOTAN DINAMIS BERBASIS POSISI PADA APPROXIMATE STRING MATCHING

Jurnal Komputer dan Informatika ◽

10.35508/jicon.v9i2.5149 ◽

2021 ◽

Vol 9 (2) ◽

pp. 168-175

Author(s):

Sebastianus A S Mola ◽

Meiton Boru ◽

Emerensye Sofia Yublina Pandie

Keyword(s):

Edit Distance ◽

String Matching ◽

Approximate String Matching ◽

Standard Word

Komunikasi tertulis dalam media sosial yang menekankan pada kecepatan penyebaran informasi sering kali terjadi fenomena penggunaan bahasa yang tidak baku baik pada level kalimat, klausa, frasa maupun kata. Sebagai sebuah sumber data, media sosial dengan fenomena ini memberikan tantangan dalam proses ekstraksi informasi. Normalisasi bahasa yang tidak baku menjadi bahasa baku dimulai pada proses normalisasi kata di mana kata yang tidak baku (non-standard word (NSW)) dinormalisasikan ke bentuk baku (standard word (SW)). Proses normalisasi dengan menggunakan edit distance memiliki keterbatasan dalam proses pembobotan nilai mismatch, match, dan gap yang bersifat statis. Dalam perhitungan nilai mismatch, pembobotan statida tidak dapat memberikan pembedaan bobot akibat kesalahan penekanan tombol pada keyboard terutama tombol yang berdekatan. Karena keterbatasan pembobotan edit distance ini maka dalam penelitian ini diusulkan sebuah metode pembobotan dinamis untuk bobot mismatch. Hasil dari penelitian ini adalah adanya metode baru dalam pembobotan dinamis berbasis posisi tombol keyboard yang dapat digunakan dalam melakukan normalisasi NSW menggunakan metode approximate string matching.

Get full-text (via PubEx)

string matching
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Efficiency of hybrid algorithm for COVID-19 online screening test based on its symptoms

A generic approach to detect design patterns in model transformations using a string-matching algorithm

Quantifying Substitutability

Quantifying Substitutability

<i>dh2loop</i> 1.0: an open-source Python library for automated processing and classification of geological logs

Real time Aho-Corasick Implementation of String Matching technique suitable for smart IOT

Food Ingredients Similarity Based on Conceptual and Textual Similarity

NCBITaxonomy.jl - rapid biological names finding and reconciliation

A string matching based ultra-low complexity lossless screen content coding technique

PEMBOBOTAN DINAMIS BERBASIS POSISI PADA APPROXIMATE STRING MATCHING

Export Citation Format

string matchingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Efficiency of hybrid algorithm for COVID-19 online screening test based on its symptoms

A generic approach to detect design patterns in model transformations using a string-matching algorithm

Quantifying Substitutability

Quantifying Substitutability

<i>dh2loop</i> 1.0: an open-source Python library for automated processing and classification of geological logs

Real time Aho-Corasick Implementation of String Matching technique suitable for smart IOT

Food Ingredients Similarity Based on Conceptual and Textual Similarity

NCBITaxonomy.jl - rapid biological names finding and reconciliation

A string matching based ultra-low complexity lossless screen content coding technique

PEMBOBOTAN DINAMIS BERBASIS POSISI PADA APPROXIMATE STRING MATCHING

string matching
Recently Published Documents