scholarly journals Analisis Sentimen Terhadap Review Film Menggunakan Metode Modified Balanced Random Forest dan Mutual Information

2021 ◽  
Vol 5 (2) ◽  
pp. 415
Author(s):  
Firdausi Nuzula Zamzami ◽  
Adiwijaya Adiwijaya ◽  
Mahendra Dwifebri P

Information exchange is currently the most happening on the internet. Information exchange can be done in many ways, such as expressing expressions on social media. One of them is reviewing a film. When someone reviews a film he will use his emotions to express their feelings, it can be positive or negative. The fast growth of the internet has made information more diverse, plentiful and unstructured. Sentiment analysis can handle this, because sentiment analysis is a classification process to understand opinions, interactions, and emotions of a document or text that is carried out automatically by a computer system. One suitable machine learning method is the Modified Balanced Random Forest. To deal with the various data, the feature selection used is Mutual Information. With these two methods, the system is able to produce an accuracy value of 79% and F1-scores value of 75%.

Author(s):  
Agus Sasmito Aribowo ◽  
Halizah Basiron ◽  
Noor Fazilla Abd Yusof ◽  
Siti Khomsah

A cross-domain sentiment analysis (CDSA) study in the Indonesian language and tree-based ensemble machine learning is quite interesting. CDSA is useful to support the labeling process of cross-domain sentiment and reduce any dependence on the experts; however, the mechanism in the opinion unstructured by stop word, language expressions, and Indonesian slang words is unidentified yet. This study aimed to obtain the best model of CDSA for the opinion in Indonesia language that commonly is full of stop words and slang words in the Indonesian dialect. This study was purposely to observe the benefits of the stop words cleaning and slang words conversion in CDSA in the Indonesian language form. It was also to find out which machine learning method is suitable for this model. This study started by crawling five datasets of the comments on YouTube from 5 different domains. The dataset was copied into two groups: the dataset group without any process of stop word cleaning and slang word conversion and the dataset group to stop word cleaning and slang word conversion. CDSA model was built for each dataset group and then tested using two types of tree-based ensemble machine learning, i.e., Random Forest (RF) and Extra Tree (ET) classifier, and tested using three types of non-ensemble machine learning, including Naïve Bayes (NB), SVM, and Decision Tree (DT) as the comparison. Then, It can be suggested that the accuracy of CDSA in Indonesia Language increased if it still removed the stop words and converted the slang words. The best classifier model was built using tree-based ensemble machine learning, particularly ET, as in this study, the ET model could achieve the highest accuracy by 91.19%. This model is expected to be the CDSA technique alternative in the Indonesian language.


Author(s):  
Qiaoman Yang ◽  
Chunyu Liu

Classification modeling is one of the key issues in sentiment analysis. Support vector machine (SVM) has been widely used in classification as an effective machine learning method. Generally, a common SVM is only for decision-making that sacrifices the distribution of data. In practice, sentiment data are big and mazy, which results in the deficiency of accuracy and stability when common SVM is used. The study investigates sentiment analysis by applying the twin objective function SVM, including nonparallel SVM(NPSVM) and twin SVM (TWSVM). From the experiments, we concluded that twin objective function SVMs are superior to NB and single objective function SVM in accuracy and stability.


2021 ◽  
Author(s):  
Md Anawar Hossen Wadud ◽  
Md Ashraf Uddin

Abstract The popularity of social media has exploded worldwide over the last few decades and becomes the most preferred mode of social interaction. The internet also provides a new platform through which adolescents are being bullied. Appropriate means of cyberbullying detection is still partial and in some cases very limited. Moreover, research on cyberbullying detection extensively focuses on surveys and its psychological impacts on victims. However, prevention has not been widely addressed. To bridge the gap, this paper aims to detect cyberbullying efficiently. This paper employs a standard machine learning method and natural language processing technique as a part of the detection process in decentralized Blockchain leveraged architecture. We provide a fog based architecture for cyberbullying detection, aiming at relieving the server's load by placing the detection and the prevention of cyberbullying processes at the fog layer. The proposal might offer a probable solution to save users, particularly adolescents from severe consequences of cyberbullying.


The system identifies a duplicate record from the database using the machine learning method. We must pass unstructured data. Data are prepared using any natural language processing technique such as text similarity. This prepared data is then fed into the latest machine learning method called Random Forest. After this data collection, using these files, the target file is compared to the source file. We make input and output files. This is carried out until accurate efficiency is generated


2020 ◽  
Author(s):  
Eric S. Pahl ◽  
W. Nick Street ◽  
Hans J. Johnson ◽  
Alan I. Reed

Kidney transplantation is the best treatment for end-stage renal failure patients. The predominant method used for kidney quality assessment is the Cox regression-based, kidney donor risk index. A machine learning method may provide improved prediction of transplant outcomes and help decision-making. A popular tree-based machine learning method, random forest, was trained and evaluated with the same data originally used to develop the risk index (70,242 observations from 1995-2005). The random forest successfully predicted an additional 2,148 transplants than the risk index with equal type II error rates of 10%. Predicted results were analyzed with follow-up survival outcomes up to 240 months after transplant using Kaplan-Meier analysis and confirmed that the random forest performed significantly better than the risk index (p<0.05). The random forest predicted significantly more successful and longer-surviving transplants than the risk index. Random forests and other machine learning models may improve transplant decisions.


Atmosphere ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 173
Author(s):  
Gaoyun Wang ◽  
Hongqing Wang ◽  
Yizhou Zhuang ◽  
Qiong Wu ◽  
Siyue Chen ◽  
...  

Tropical overshooting convection has a strong impact on both heat budget and moisture distribution in the upper troposphere and lower stratosphere, and it can pose a great risk to aviation safety. Cloud-top height is one of the essential concerns of overshooting convection for both the climate system and the aviation weather forecast. The main purpose of our work is to verify the application of the machine learning method, taking the random forest (RF) model as an instance, in overshooting cloud-top height retrieval from Himawari-8 data. By using collocated CloudSat observations as a reference, we utilize several infrared indicators of Himawari-8 that are commonly recognized to relate to cloud-top height, along with some temporal and geographical parameters (latitude, month, satellite zenith angle, etc.), as predictors to construct and validate the model. Analysis of variable importance shows that the brightness temperature of 6.2 um acts as the dominant predictor, followed by satellite zenith angle, brightness temperature of 13.3 um, latitude, and month. In the comparison between the RF model and the traditional single-channel interpolation method, retrievals from the RF model agree well with observation with a high correlation coefficient (0.92), small RMSE (222 m), and small MAE (164 m), while these metrics from traditional single-channel interpolation method shows lower skills (0.70, 1305 m, and 1179 m). This work presents a new sight of overshooting cloud-top height retrieval based on the machine learning method.


Sign in / Sign up

Export Citation Format

Share Document