scholarly journals Development of Big Data App for Classification based on Map Reduce of Naive Bayes with or without Web and Mobile Interface by RESTful API Using Hadoop and Spark

2020 ◽  
Vol 5 (3) ◽  
pp. 302
Author(s):  
Imam Cholissodin ◽  
Diajeng Sekar Seruni ◽  
Junda Alfiah Zulqornain ◽  
Audi Nuermey Hanafi ◽  
Afwan Ghofur ◽  
...  

Big Data App is a developed framework that we made based on our previous project research and we have uploaded it on github, which is developing lightweight serverless both on Windows and Linux OS with the term of EdUBig as Open Source Hadoop Distribution. In this study, the focus is on solving problems related to difficulties in building a frontend and backend model of a Big Data application which by default only runs scripts through consoles in the terminal. This will be quite a tribulation for the end users when the Big Data application has been released and mass produced to general users (end users) and at the same time how the end users test the performance of the Map Reduce Naive Bayes algorithm used in several datasets. In accordance to these problems, we created the Big Data App framework to make the end users, especially developers, feel easier to build a Big Data application by integrating the frontend using the Web App from Django framework and Mobile App Native, while for the backend, we use Django framework that is able to communicate directly with the script either hadoop batch, streaming processing or spark streaming very easily and also to use the script for pig, hive, web hdfs, sqoop, oozie, etc. the making of which is extremely fast with reliable results. Based on the test results, a very significant result in the ease of data computation processing by the end users and the final results showing the highest classification accuracy of 88.3576% was obtained.Keywords: big data, map reduce of naive bayes, serverless, web and mobile app, restful api, django framework

2021 ◽  
Vol 5 (1) ◽  
pp. 32
Author(s):  
Hartatik Hartatik

<p>Abstrak :</p><p>Prediksi tentang status kelulusan mahasiswa menjadi persoalan tersendiri di perguruan tinggi. Perguruan tinggi utamanya di era Big Data sangatlah penting untuk melakukan prediksi perilaku akademik mahasiswa aktif sehingga dapat di ketahui kemungkinan mahasiswa bisa studi secara tepat waktu serta dapat diketahui langkah preventive dalam membuat prpgram perencanaan. Salah satu cara yang digunakan adalah teknik data mining yaitu menggunakan Algoritma <em>naive bayes</em>. Algoritma <em>Naive bayes</em> merupakan salah satu metode yang digunakan untuk memprediksi kelulusan mahasiswa.  Peneliti  dalam hal ini menerapkan  metode  <em>Naive bayes</em> menggunakan parameter Indeks prestasi kumulatif( IPK) dan membandingkan dengan menggunakan prediksi <em>naive bayes methods</em> berdasarkan parameter IPK dan sosial parameter yaitu jenis kelamin dan status tinggal. Dalam penelitian ini menggunakan parameter akademis  dan dilakukan optimasi menggunakan parameter sosial yang melekat pada mahasiswa. Berdasarkan hasil evaluasi untuk mendapatkan akurasi, hasil dari penelitian ini mendapatkan nilai akurasi untuk metode <em>Naive bayes</em>  sebesar 75% dan akurasi untuk model prediksi dengan parameter sosial  sebesar 85% dengan selisih akurasi 10%.</p><p>__________________________</p><p>Abstract : </p><p><em>Predictions about a student's graduation status are a problem in college. Major tertiary institutions in the era of Big Data are very important to predict the behavior of active students so that they can find out the possibility of students in a timely manner and can determine preventive steps in making program planning. One method used is data mining techniques using the Naive bayes Algorithm. The Naive bayes algorithm is one of the methods used to predict student graduation. Researchers in this case applied the Naive bayes method using the cumulative achievement index (GPA) parameter and compared using the prediction of the Naive bayes method based on the GPA parameters and social parameters, namely gender and status. This study uses academic parameters and is carried out optimally using social parameters inherent in students. Based on the results of the evaluation to get an accuracy value, the results of this study get an accurate value for the Naive bayes method of 75% and accurate for prediction models with social parameters of 85% with a difference of 10%.</em></p>


Repositor ◽  
2020 ◽  
Vol 2 (2) ◽  
pp. 193
Author(s):  
Khoirir Rosikin ◽  
Setio Basuki ◽  
Yufis Azhar

AbstrakKesehatan merupakan kebutuhan utama manusia. Di Indonesia terdapat  permasalahan tentang kesehatan, yaitu meningkatnya penyakit menular dan penyakit tidak menular. Untuk mengatasinya perlu dilakukan tidakan pencegahan. Salah satu usaha untuk melakukan pencegahan penyakit, adalah dengan mengetahui informasi penyakit tersebut, temasuk tentang penyebab dan akibat yang ditimbulkan, sehingga bisa melakukan pencegahan. Informasi bisa didapatkan dengan berbagai macam cara, salah satunya diambil dari media sosial, terutama twitter. Twitter digunakan karena banyaknya tweet yang dihasilkan sehingga memunculkan fenomena big data. Karena hal itulah, penelitian ini bermaksud untuk melakukan suatu metode ekstraksi informasi. Ekstraksi informasi merupakan metode penerapan data mining terutama bidang text mining yang digunakan untuk mendapatkan informasi dari kumpulan banyak data. Informasi yang dimaksud adalah penyakit, akibat, dan penyebab. Penelitian ini menggunakan pendekatan ekstraksi informasi berbasis klasifikasi dengan algoritma Naive Bayes. Penelitian ini menggunakan 7 set fitur dan sebuah model algoritma klasifikasi yaitu Naive Bayes. Dalam ekstraksi fitur terjadi imbalance dataset, sehingga dilakukan resample filtering data. Pengujian dilakukan dengan 2 metode, yaitu pengujian model dengan menggunakan 10-folds cross-validation dan pengujian klasifikasi dengan menggunakan 100 data uji. Hasil dari pengujian model mendapatkan nilai akurasi 77,27% dan pengujian klasifikasi mendapatkan nilai akurasi 74,07%. AbstractHealth is a primary human need. In Indonesia there are health problems, namely the increase of infectious diseases and non-communicable diseases. To overcome this need to do precautionary measures. One effort to prevent disease, is to know the disease information, including about the causes and effects caused, so it can do prevention. Information can be obtained in various ways, one of which is taken from social media, especially twitter. Twitter is used because of the number of tweets produced resulting in big data phenomenon. Because of that, this research intends to perform an information extraction method. Information extraction is a method of application of data mining, especially the text mining field used to obtain information from a large collection of data. The information in question is a disease, effect, and cause. This research uses a classification-based information extraction approach with Naive Bayes algorithm. This research uses 7 feature sets and a model of classification algorithm that is Naive Bayes. In feature extraction there is imbalance dataset, so it is done resample filtering data. The test is done by 2 methods, namely model testing using 10-folds cross-validation and classification testing using 100 test data. The result of model test get the accuracy value 77,27% and the classification test get the accuracy value 74,07%.


2020 ◽  
Vol 4 (2) ◽  
pp. 362-369
Author(s):  
Sharazita Dyah Anggita ◽  
Ikmah

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


Sign in / Sign up

Export Citation Format

Share Document