Features of Distributional Method for Indonesian Word Clustering
2019 ◽
Vol 5
(2)
◽
pp. 164
Keyword(s):
We described the results of a study to determine the best features for algorithm EWSB (Extended Word Similarity Based). EWSB is a word clustering algorithm that can be used for all languages with a common feature. We provided four alternative features that can be used for word similarity computation and experimented toward the Indonesian Language to determine the best feature format for the language. We found that the best feature used in the algorithm to Indonesian EWSB is t w w' format (3-gram) with 0 (zero) word relation. Moreover, we found that using 3-gram is better than 4-gram for all the proposed features. Average recall of 3-gram is 83.50%, while the average 4-gram recall is 57.25%.
2011 ◽
Vol 58-60
◽
pp. 995-1000
Keyword(s):
2017 ◽
Vol 26
(6)
◽
pp. 1221-1226
◽
Keyword(s):
2010 ◽
Vol 439-440
◽
pp. 481-485