scholarly journals IMPLEMENTASI JACCARD INDEX DAN N-GRAM PADA REKAYASA APLIKASI KOREKSI KATA BERBAHASA INDONESIA

Sebatik ◽  
2018 ◽  
Vol 22 (2) ◽  
pp. 95-101
Author(s):  
Aida Indriani ◽  
Muhammad Muhammad ◽  
Suprianto Suprianto ◽  
Hadriansa Hadriansa

Banyaknya informasi diberbagai media, membuat pengguna harus jeli dalam mencari informasi yang benar. Informasi yang dikatakan benar bukan hanya dilihat dari sumber terpercaya, tetapi dalam penulisan tidak boleh terjadi kesalahan ejaan kata (typo) yang dapat mengakibatkan kesalahpahaman makna informasi yang dibaca. Untuk meminimalkan kesalahan ejaan kata dibutuhkan peran editor dengan melakukan koreksi kata secara satu per satu. Tujuan dari penelitian ini adalah untuk membuat aplikasi koreksi kata secara otomatis, dengan memanfaatkan teknik text mining yaitu set based similarity measure. Teknik yang digunakan yaitu jaccard index dan menggunakan bantuan fitur N-gram sebanyak 3 yaitu Bi-gram, Tri-gram dan Quad-gram. Selain itu, penelitian ini bertujuan untuk menentukan fitur N-gram yang tepat dalam melakukan koreksi kata. Dengan adanya aplikasi koreksi kata ini diharapkan dapat membantu tim editor dalam melakukan pengecekan kata sebelum dipubikasikan ke umum. Untuk analisa fitur N-gram yang tepat untuk melakukan koreksi kata adalah fitur Bi-gram.

Author(s):  
Matthias Schonlau ◽  
Nick Guenther ◽  
Ilia Sucholutsky
Keyword(s):  

2019 ◽  
Vol 8 (4) ◽  
pp. 1232-1238
Author(s):  
Daud Mohamad ◽  
Noorlisa Sara Adlene Ramlan ◽  
Sharifah Aniza Sayed Ahmad

Similarity measure between two fuzzy sets is an important tool for comparing various characteristics of the fuzzy sets. It is a preferred approach as compared to distance methods as the defuzzification process in obtaining the distance between fuzzy sets will incur loss of information. Many similarity measures have been introduced but most of them are not capable to discriminate certain type of fuzzy numbers. In this paper, an improvised similarity measure for generalized fuzzy numbers that incorporate several essential features is proposed. The features under consideration are geometric mean averaging, Hausdorff distance, distance between elements, distance between center of gravity and the Jaccard index. The new similarity measure is validated using some benchmark sample sets. The proposed similarity measure is found to be consistent with other existing methods with an advantage of able to solve some discriminant problems that other methods cannot. Analysis of the advantages of the improvised similarity measure is presented and discussed. The proposed similarity measure can be incorporated in decision making procedure with fuzzy environment for ranking purposes.


Author(s):  
Matthias Schonlau ◽  
Nick Guenther ◽  
Ilia Sucholutsky
Keyword(s):  

2018 ◽  
Vol 12 (3) ◽  
pp. 1
Author(s):  
DAHIWALE PRASHANT ◽  
MATE SANJAY ◽  
M.M RAGHUWANSHI ◽  
◽  
◽  
...  

2019 ◽  
pp. 1-9 ◽  
Author(s):  
Maryam Rahimian ◽  
Jeremy L. Warner ◽  
Sandeep K. Jain ◽  
Roger B. Davis ◽  
Jessica A. Zerillo ◽  
...  

PURPOSE OpenNotes is a national movement established in 2010 that gives patients access to their visit notes through online patient portals, and its goal is to improve transparency and communication. To determine whether granting patients access to their medical notes will have a measurable effect on provider behavior, we developed novel methods to quantify changes in the length and frequency of use of n-grams (sets of words used in exact sequence) in the notes. METHODS We analyzed 102,135 notes of 36 hematology/oncology clinicians before and after the OpenNotes debut at Beth Israel Deaconess Medical Center. We applied methods to quantify changes in the length and frequency of use of sequential co-occurrence of words ( n-grams) in the unstructured content of the notes by unsupervised hierarchical clustering and proportional analysis of n-grams. RESULTS The number of significant n-grams averaged over all providers did not change, but for individual providers, there were significant changes. That is, all significant observed changes were provider specific. We identified eight providers who were late note signers. This group significantly reduced its late signing behavior after OpenNotes implementation. CONCLUSION Although the number of significant n-grams averaged over all providers did not change, our text-mining method detected major content changes in specific providers’ documentation at the n-gram level. The method successfully identified a group of providers who decreased their late note signing behavior.


Author(s):  
Chao-Ming Hwang ◽  
Miin-Shen Yang

Similarity measures between generalized trapezoidal fuzzy numbers (GTFNs) are employed to indicate the degrees of similarity between GTFNs. Although several similarity measures of GTFNs have been proposed in the literature, none has considered using the Jaccard index. In general, the Jaccard index is a statistic used for comparing the similarity and diversity of sample sets. This paper presents a new similarity measure between GTFNs, which involves the Jaccard index. The proposed similarity measure is found to have better properties. Several examples are employed to compare the proposed measure with some existing methods. An experiment is performed using 15 sets of GTFNs to compare the proposed similarity measure with existing ones. Numerical results show that the proposed measure is more reasonable than those existing methods.


Author(s):  
PRADNYA S. RANDIVE ◽  
NITIN N. PISE

In text mining most techniques depends on statistical analysis of terms. Statistical analysis trances important terms within document only. However this concept based mining model analyses terms in sentence, document and corpus level. This mining model consist of sentence based concept analysis, document based and corpus based concept analysis and concept based similarity measure. Experimental result enhances text clustering quality by using sentence, document, corpus and combined approach of concept analysis.


2020 ◽  
Vol 18 (2) ◽  
pp. 31
Author(s):  
Anni Karimatul Fauziyyah

The impact of the novel coronavirus (COVID-19) is widespread and will likely shape community behavior for months to come. And while the humanitarian and safety-related aspects of this outbreak are top of mind globally, it’s unquestionable that social distancing, quarantining, and staying home will have a significant effect on media consumption, which could rise up to 60%, according to recent research from Nielsen’s U.S. media team.  Social media, now a part of everyday life for most consumers engaged with the world digitally, became the primary source for buzz about all things COVID-19 as worries and news intensified. Sentiment analysis is applied in this study to analyze the opinions, feelings, and interests of individuals in the COVID-19. The purpose of this study is to analyze sentiment based on an opinion by classifying individual feelings such as sadness, happiness, or panic in facing a COVID-19 into sentiment level that is negative, positive or, neutral. In this paper, an open-source approach is presented where we have collected tweets from the Twitter API and then reprocessing, analyzing and, visualizing these tweets using python. Furthermore, Twitter data streaming will be processed and cleaned to parse data that can be classified based on opinion with a text mining algorithm using text blob Python. Feature extraction is done for the relationship between words by the Bigram and N-gram methods.


Sign in / Sign up

Export Citation Format

Share Document