scholarly journals Similarity-Based Unsupervised Spelling Correction Using BioWordVec: Development and Usability Study of Bacterial Culture and Antimicrobial Susceptibility Reports

10.2196/25530 ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. e25530
Author(s):  
Taehyeong Kim ◽  
Sung Won Han ◽  
Minji Kang ◽  
Se Ha Lee ◽  
Jong-Ho Kim ◽  
...  

Background Existing bacterial culture test results for infectious diseases are written in unrefined text, resulting in many problems, including typographical errors and stop words. Effective spelling correction processes are needed to ensure the accuracy and reliability of data for the study of infectious diseases, including medical terminology extraction. If a dictionary is established, spelling algorithms using edit distance are efficient. However, in the absence of a dictionary, traditional spelling correction algorithms that utilize only edit distances have limitations. Objective In this research, we proposed a similarity-based spelling correction algorithm using pretrained word embedding with the BioWordVec technique. This method uses a character-level N-grams–based distributed representation through unsupervised learning rather than the existing rule-based method. In other words, we propose a framework that detects and corrects typographical errors when a dictionary is not in place. Methods For detected typographical errors not mapped to Systematized Nomenclature of Medicine (SNOMED) clinical terms, a correction candidate group with high similarity considering the edit distance was generated using pretrained word embedding from the clinical database. From the embedding matrix in which the vocabulary is arranged in descending order according to frequency, a grid search was used to search for candidate groups of similar words. Thereafter, the correction candidate words were ranked in consideration of the frequency of the words, and the typographical errors were finally corrected according to the ranking. Results Bacterial identification words were extracted from 27,544 bacterial culture and antimicrobial susceptibility reports, and 16 types of spelling errors and 914 misspelled words were found. The similarity-based spelling correction algorithm using BioWordVec proposed in this research corrected 12 types of typographical errors and showed very high performance in correcting 97.48% (based on F1 score) of all spelling errors. Conclusions This tool corrected spelling errors effectively in the absence of a dictionary based on bacterial identification words in bacterial culture and antimicrobial susceptibility reports. This method will help build a high-quality refined database of vast text data for electronic health records.

2020 ◽  
Author(s):  
Tae Hyeong Kim ◽  
Min Ji Kang ◽  
Se Ha Lee ◽  
Jong-Ho Kim ◽  
Hyung Joon Joo ◽  
...  

BACKGROUND Existing bacterial culture test results for infectious diseases are written in unrefined text, resulting in many problems including typographical errors and stop words. Effective spelling correction processes are needed to ensure the accuracy and reliability of data for the study of infectious diseases, including medical terminology extraction. If a dictionary is established, spelling algorithms using edit distance are efficient. However, in the absence of dictionaries, traditional spelling correction algorithms that utilize only edit distances have limitations. OBJECTIVE In this research, we proposed a similarity-based spelling correction algorithm using pre-trained word embedding with the BioWordVec technique. This method uses a character-level N-grams-based distributed representation through unsupervised learning rather than the existing rule-based method. In other words, we propose a framework that detects and corrects typographical errors when a dictionary is not in place. METHODS For detected typographical errors not mapped to SNOMED clinical terms, a correction candidate group with high similarity considering the edit distance was generated using pre-trained word embedding from the clinical database. From the embedding matrix in which the vocabulary is arranged in descending order according to frequency, the grid search is used to search for candidate groups of similar words. Then, the correction candidate words are ranked in consideration of the frequency of the words, and the typos are finally corrected according to the ranking. RESULTS Bacteria identification words were extracted from 27,544 bacteria culture reports, and 16 types of 914 spelling errors were found. The similarity-based spelling correction algorithm using BioWordVec proposed in this research corrected 12 types of typographical errors and showed very high performance in correcting 99.45% of all spelling errors. CONCLUSIONS This tool corrected spelling errors effectively in the absence of a dictionary based on bacterial identification words in the bacteria culture reports. This method will help build a high-quality refined database of vast text data for electronic health records.


The Analyst ◽  
2021 ◽  
Author(s):  
Pengfei Zhang ◽  
Aniruddha Kaushik ◽  
Kathleen E Mach ◽  
Kuangwen Hsieh ◽  
Joseph C. Liao ◽  
...  

The development of accelerated methods for pathogen identification (ID) and antimicrobial susceptibility testing (AST) for infectious diseases is necessary to facilitate evidence-based antibiotic therapy and reduce clinical overreliance on broad-spectrum...


1999 ◽  
Vol 37 (5) ◽  
pp. 1415-1418 ◽  
Author(s):  
Joan Barenfanger ◽  
Cheryl Drake ◽  
Gail Kacich

To assess the expected clinical and financial benefits of rapid reporting of microbiology results, we compared patients whose cultured samples were processed in the normal manner to patients whose samples were processed more rapidly due to a minor change in work flow. For the samples tested in the rapid-reporting time period, the vast majority of bacterial identification and antimicrobial susceptibility testing (AST) results were verified with the Vitek system on the same day that they were available. This time period was called rapid AST (RAST). For RAST, a technologist on the evening shift verified the data that became available during that shift. For the control time period, cultures were processed in the normal manner (normal AST [NAST]), which did not include evening-shift verification. For NAST, the results for approximately half of the cultures were verified on the first day that the result was available. The average turnaround time for the reporting of AST results was 39.2 h for RAST and 44.4 h for NAST (5.2 h faster for RAST [P = 0.001]). Subsequently, physicians were able to initiate appropriate antimicrobial therapy sooner for patients whose samples were tested as part of RAST (P = 0.006). The mortality rates were 7.9 and 9.6% for patients whose samples were tested as part of RAST and NAST, respectively (P = 0.45). The average length of stay was 10.7 days per patient for RAST and 12.6 days for NAST, a difference of 2.0 days less for RAST (P = 0.006). The average variable cost was $4,927 per patient for RAST and $6,677 for NAST, a difference of $1,750 less per patient for RAST (P = 0.001). This results in over $4 million in savings in variable costs per year in our hospital.


2019 ◽  
Vol 4 (4) ◽  
pp. 144 ◽  
Author(s):  
Olga Perovic ◽  
Ali A. Yahaya ◽  
Crystal Viljoen ◽  
Jean-Bosco Ndihokubwayo ◽  
Marshagne Smith ◽  
...  

Background: In 2002, the World Health Organization (WHO) launched a regional microbiology external quality assessment (EQA) programme for national public health laboratories in the African region, initially targeting priority epidemic-prone bacterial diseases, and later including other common bacterial pathogens. Objectives: The aim of this study was to analyse the efficacy of an EQA programme as a laboratory quality system evaluation tool. Methods: We analysed the proficiency of laboratories’ performance of bacterial identification and antimicrobial susceptibility testing (AST) for the period 2011–2016. The National Institute for Communicable Diseases of South Africa provided technical coordination following an agreement with WHO, and supplied EQA samples of selected bacterial organisms for microscopy (Gram stain), identification, and antimicrobial susceptibility testing (AST). National public health laboratories, as well as laboratories involved in the Invasive Bacterial Diseases Surveillance Network, were enrolled by the WHO Regional Office for Africa to participate in the EQA programme. We analysed participants’ results of 41 surveys, which included the following organisms sent as challenges: Streptococcus pneumonia, Haemophilus influenzae, Neisseria meningitidis, Salmonella Typhi, Salmonella Enteritidis, Shigella flexneri, Staphylococcus aureus, Streptococcus agalactiae, Streptococcus anginosus, Enterococcus faecium, Serratia marcescens, Acinetobacter baumannii, and Enterobacter cloacae. Results: Eighty-one laboratories from 45 countries participated. Overall, 76% of participants obtained acceptable scores for identification, but a substantial proportion of AST scores were not in the acceptable range. Of 663 assessed AST responses, only 42% had acceptable scores. Conclusion: In the African Region, implementation of diagnostic stewardship in clinical bacteriology is generally suboptimal. This report illustrates that AST is poorly done compared to microscopy and identification. It is critically important to make the case for implementation of quality assurance in AST, as it is the cornerstone of antimicrobial resistance surveillance reporting and implementation of the Global Antimicrobial Resistance Surveillance System.


2018 ◽  
Vol 15 (2) ◽  
pp. 92
Author(s):  
Umi Chuzaimah Chuzaimah Zulkifli

Data media sosial saat ini telah banyak digunakan untuk melakukan analisis baik analisis sentimen maupun analisis terkait lainnya. Nyatanya, data yang diperoleh dari media sosial tersebut pada umumnya memiliki kesalahan yang akan mempengaruhi hasil analisis. Kesalahan tersebut berupa penggunaan kata yang tidak baku dan adanya kesalahan ejaan dalam penulisan kata. Solusi yang ditawarkan berupa formalisasi kata dan pengecekan ejaan. Berdasarkan masalah tersebut, akan dibangun modul preprocessing untuk mengatasi dua kesalahan di atas. Metode yang digunakan pada formalisasi adalah mengubah kata ke bentuk formal berdasarkan KBBI sedangkan metode yang digunakan pada pengecekan ejaan adalah spelling correction. Metode spelling correction tersebut terdiri dari tiga yaitu edit distance, bigram dan edit distance + rule. Pada penelitian ini, selain penerapan kedua metode juga akan dilakukan analisis untuk melihat perbandingan hasil pada metode spelling correction. Dari hasil analisis tersebut, diketahui bahwa metode edit distance + rule memiliki akurasi yang lebih tinggi yaitu sebesar 83,39% dibandingkan dengan kedua metode lainnya yaitu edit distance dan bigram. Selain itu, metode edit distance + rule juga memiliki performa tercepat dibandingkan kedua metode lainnya. Secara keseluruhan, metode mengubah kata ke bentuk formal berdasarkan KBBI dan spelling correction telah mampu mengatasi masalah pada dua kasus di atas sehingga dapat meningkatkan akurasi hasil analisis.


Sign in / Sign up

Export Citation Format

Share Document