scholarly journals Prediksi Tweet Netizen Menggunakan Random Forest, Decision Tree, Naïve Bayes, dan Ensemble Algorithm

Author(s):  
Vivi Nadenia Harahap ◽  
Deci Irmayani ◽  
Syaiful Zuhri Harahap

Gubernur DKI Jakarta saat ini, meski sudah terpilih sejak tahun 2017 selalu menarik untuk dibicarakan atau bahkan dikomentari. Komentar yang muncul berasal dari media secara langsung atau melalui media sosial. Twitter menjadi salah satu media sosial yang sering digunakan sebagai media untuk mengomentari gubernur terpilih bahkan bisa menjadi trending topic di media sosial Twitter. Netizen yang berkomentar pun beragam, ada yang selalu menge-Tweet kritik, ada yang berkomentar Positif, dan ada pula yang hanya me-retweet. Dalam penelitian ini, prediksi apakah Netizen aktif akan cenderung selalu menimbulkan komentar Positif atau Negatif akan dilakukan dalam penelitian ini. Model algoritma yang digunakan adalah Decision Tree, Naïve Bayes, Random Forest, dan juga Ensemble. Data Twitter yang diolah harus melalui preprocessing terlebih dahulu sebelum dilanjutkan menggunakan Rapidminer. Dalam uji coba menggunakan Rapidminer dilakukan dalam empat kali uji coba dengan membagi menjadi dua bagian yaitu data testing dan data latih. Perbandingan yang dilakukan adalah 10% data pengujian: 90% data pelatihan, kemudian 20% data pengujian: 80% data pelatihan, kemudian 30% data pengujian: 70% data pelatihan, dan yang terakhir adalah 35% data pengujian: 65% data pelatihan. Rata-rata Akurasi untuk algoritma Decision Tree adalah 93,15%, sedangkan untuk algoritma Naïve Bayes Akurasinya adalah 91,55%, kemudian untuk algoritma Random Forest adalah 93,41, dan yang terakhir adalah algoritma Ensemble dengan Akurasi sebesar 93,42%. sini. 65% data pelatihan. Rata-rata Akurasi untuk algoritma Decision Tree adalah 93,15%, sedangkan untuk algoritma Naïve Bayes Akurasinya adalah 91,55%, kemudian untuk algoritma Random Forest adalah 93,41, dan yang terakhir adalah algoritma Ensemble dengan Akurasi sebesar 93,42%. sini. 65% data pelatihan. Rata-rata Akurasi untuk algoritma Decision Tree adalah 93,15%, sedangkan untuk algoritma Naïve Bayes Akurasinya adalah 91,55%, kemudian untuk algoritma Random Forest adalah 93,41, dan yang terakhir adalah algoritma Ensemble dengan Akurasi sebesar 93,42%. sini.

SinkrOn ◽  
2020 ◽  
Vol 5 (1) ◽  
pp. 9-20
Author(s):  
Antonius Yadi Kuntoro

Abstract — The current Governor of DKI Jakarta, even though he has been elected since 2017 is always interesting to talk about or even comment on. Comments that appear come from the media directly or through social media. Twitter has become one of the social media that is often used as a media to comment on elected governors and can even become a trending topic on Twitter social media. Netizens who comment are also varied, some are always Tweeting criticism, some are commenting Positively, and some are only re-Tweeting. In this research, a prediction of whether active Netizens will tend to always lead to Positive or Negative comments will be carried out in this study. Model algorithms used are Decision Tree, Naïve Bayes, Random Forest, and also Ensemble. Twitter data that is processed must go through preprocessing first before proceeding using Rapidminer. In trials using Rapidminer conducted in four trials by dividing into two parts, namely testing data and training data. Comparisons made are 10% testing data: 90% Training data, then 20% testing data: 80% training data, then 30% testing data: 70% training data, and the last is 35% testing data: 65% training data. The average Accuracy for the Decision Tree algorithm is 93.15%, while for the Naïve Bayes algorithm the Accuracy is 91.55%, then for the Random Forest algorithm is 93.41, and the last is the Ensemble algorithm with an Accuracy of 93, 42%. here. Keywords — Decision Tree, Naïve Bayes, Random Forest, Set, Twitter.  


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


2021 ◽  
Vol 11 (4) ◽  
pp. 1378
Author(s):  
Seung Hyun Lee ◽  
Jaeho Son

It has been pointed out that the act of carrying a heavy object that exceeds a certain weight by a worker at a construction site is a major factor that puts physical burden on the worker’s musculoskeletal system. However, due to the nature of the construction site, where there are a large number of workers simultaneously working in an irregular space, it is difficult to figure out the weight of the object carried by the worker in real time or keep track of the worker who carries the excess weight. This paper proposes a prototype system to track the weight of heavy objects carried by construction workers by developing smart safety shoes with FSR (Force Sensitive Resistor) sensors. The system consists of smart safety shoes with sensors attached, a mobile device for collecting initial sensing data, and a web-based server computer for storing, preprocessing and analyzing such data. The effectiveness and accuracy of the weight tracking system was verified through the experiments where a weight was lifted by each experimenter from +0 kg to +20 kg in 5 kg increments. The results of the experiment were analyzed by a newly developed machine learning based model, which adopts effective classification algorithms such as decision tree, random forest, gradient boosting algorithm (GBM), and light GBM. The average accuracy classifying the weight by each classification algorithm showed similar, but high accuracy in the following order: random forest (90.9%), light GBM (90.5%), decision tree (90.3%), and GBM (89%). Overall, the proposed weight tracking system has a significant 90.2% average accuracy in classifying how much weight each experimenter carries.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Moaz Hiba ◽  
Ahmed Farid Ibrahim ◽  
Salaheldin Elkatatny ◽  
Abdulwahab Ali

2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Faizan Ullah ◽  
Qaisar Javaid ◽  
Abdu Salam ◽  
Masood Ahmad ◽  
Nadeem Sarwar ◽  
...  

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.


2017 ◽  
Vol 6 (1) ◽  
Author(s):  
Rendra Dwi Lingga P. ◽  
Chastine Fatichah ◽  
Diana Purwitasari
Keyword(s):  

Lubricant condition monitoring (LCM), part of condition monitoring techniques under Condition Based Maintenance, monitors the condition and state of the lubricant which reveal the condition and state of the equipment. LCM has proved and evidenced to represent a key concept driving maintenance decision making involving sizeable number of parameter (variables) tests requiring classification and interpretation based on the lubricant’s condition. Reduction of the variables to a manageable and admissible level and utilization for prediction is key to ensuring optimization of equipment performance and lubricant condition. This study advances a methodology on feature selection and predictive modelling of in-service oil analysis data to assist in maintenance decision making of critical equipment. Proposed methodology includes data pre-processing involving cleaning, expert assessment and standardization due to the different measurement scales. Limits provided by the Original Equipment Manufacturers (OEM) are used by the analysts to manually classify and indicate samples with significant lubricant deterioration. In the last part of the methodology, Random Forest (RF) is used as a feature selection tool and a Decision Tree-based (DT) classification of the in-service oil samples. A case study of a thermal power plant is advanced, to which the framework is applied. The selection of admissible variables using Random Forest exposes critical used oil analysis (UOA) variables indicative of lubricant/machine degradation, while DT model, besides predicting the classification of samples, offers visual interpretability of parametric impact to the classification outcome. The model evaluation returned acceptable predictive, while the framework renders speedy classification with insights for maintenance decision making, thus ensuring timely interventions. Moreover, the framework highlights critical and relevant oil analysis parameters that are indicative of lubricant degradation; hence, by addressing such critical parameters, organizations can better enhance the reliability of their critical operable equipment.


Sign in / Sign up

Export Citation Format

Share Document