Large Scale Text Classification Using Map Reduce and Naive Bayes Algorithm for Domain Specified Ontology Building

Author(s):  
Joan Santoso ◽  
Eko Mulyanto Yuniarno ◽  
Mochamad Hariadi
2019 ◽  
Vol 26 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Peng Liu ◽  
Hui-han Zhao ◽  
Jia-yu Teng ◽  
Yan-yan Yang ◽  
Ya-feng Liu ◽  
...  

Author(s):  
Jonathan Radot Fernando ◽  
Raymond Budiraharjo ◽  
Emeraldi Haganusa

Text classification are used in many aspect of technologies such as spam classification, news categorization, Auto-correct texting. One of the most popular algorithm for text classification nowadays is Multinomial Naïve-Bayes. This paper explained how Naïve-Bayes assumption method works to classify 2019 Indonesian Election Youtube comments. The output prediction of this algorithm is spam or not spam. Spam messages are defined as racist comments, advertising comments, and unsolicited comments. The algorithms text representation method used bag-of-words method. Bag-of-words method defined a text as the multiset of its words. The algorithm then calculate the probability of a word given the class of spam or not spam. The main difference between normal Naïve-Bayes algorithm and Multinomial Naïve-Bayes is the way the algorithm treats the data itself. Multinomial Naïve-Bayes treats data as a frequency data hence it is suitable for text classification task.


2020 ◽  
Vol 5 (3) ◽  
pp. 302
Author(s):  
Imam Cholissodin ◽  
Diajeng Sekar Seruni ◽  
Junda Alfiah Zulqornain ◽  
Audi Nuermey Hanafi ◽  
Afwan Ghofur ◽  
...  

Big Data App is a developed framework that we made based on our previous project research and we have uploaded it on github, which is developing lightweight serverless both on Windows and Linux OS with the term of EdUBig as Open Source Hadoop Distribution. In this study, the focus is on solving problems related to difficulties in building a frontend and backend model of a Big Data application which by default only runs scripts through consoles in the terminal. This will be quite a tribulation for the end users when the Big Data application has been released and mass produced to general users (end users) and at the same time how the end users test the performance of the Map Reduce Naive Bayes algorithm used in several datasets. In accordance to these problems, we created the Big Data App framework to make the end users, especially developers, feel easier to build a Big Data application by integrating the frontend using the Web App from Django framework and Mobile App Native, while for the backend, we use Django framework that is able to communicate directly with the script either hadoop batch, streaming processing or spark streaming very easily and also to use the script for pig, hive, web hdfs, sqoop, oozie, etc. the making of which is extremely fast with reliable results. Based on the test results, a very significant result in the ease of data computation processing by the end users and the final results showing the highest classification accuracy of 88.3576% was obtained.Keywords: big data, map reduce of naive bayes, serverless, web and mobile app, restful api, django framework


2021 ◽  
Vol 5 (1) ◽  
pp. 157
Author(s):  
Samsir Samsir ◽  
Ambiyar Ambiyar ◽  
Unung Verawardina ◽  
Firman Edi ◽  
Ronal Watrianthos

The WHO announced that more than 52 million people tested positive for Covid-19, and 1.2 million died in the second week of November 2020. Meanwhile, Indonesia recorded 463 thousand individuals with 15,148 deaths that were confirmed positive. Strategy against pandemics by incorporating socialization. However, learning that was initially bold as a technique became controversial due to the briefness of the adaptation process. a wide continuum of social reactions has resulted in the sudden transition from face-to-face learning to bold learning on a large scale. This research focuses on public opinion on online learning during the Indonesian COVID-19 pandemic in early November 2020. The analysis was carried out on Twitter by mining document-based text that was interpreted using the Naïve Bayes algorithm. The results show that online learning has a positive sentiment of 30 percent, a negative sentiment of 69 percent, and a neutral 1 percent over the period. Due to community dissatisfaction about online learning, a significant amount of negative sentiment is created. Some tweets indicate disappointment with the words' stress 'and' lazy 'in the conversation being high-frequency words.


2021 ◽  
Vol 4 (1) ◽  
pp. 47-52
Author(s):  
Saptari Wijaya Mulia ◽  
Sujiharno Sujiharno ◽  
Arief Wibowo

Determining the need of money for ATM is usually different, that is one of the problems in managing money allocation of ATM. Some seasonal factors such as holidays and the implementation of transition large-scale social restrictions related to the covid-19 pandemic that can affect fluctuations in cash transactions. In this paper aims to determine the frequency of cash withdrawals at ATM since the enactment of transition large-scale social restrictions in Jakarta using the naive bayes algorithm so it can be identified which ATM require more allocation money or not. Providing the right money allocation can improve the quality of service to customers and minimize unused money in ATM. Results of analysis using a Naive Bayes algorithm to predict cash withdrawals frequencies at ATM that show a prediction accuracy up to 81%


2020 ◽  
Vol 4 (2) ◽  
pp. 362-369
Author(s):  
Sharazita Dyah Anggita ◽  
Ikmah

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


Sign in / Sign up

Export Citation Format

Share Document