scholarly journals Improved Random Forest Algorithm Performance For Big Data

2021 ◽  
Vol 1897 (1) ◽  
pp. 012071
Author(s):  
Yousif Abdulsattar Saadoon ◽  
Riam Hossam Abdulamir
2017 ◽  
Vol 28 (4) ◽  
pp. 919-933 ◽  
Author(s):  
Jianguo Chen ◽  
Kenli Li ◽  
Zhuo Tang ◽  
Kashif Bilal ◽  
Shui Yu ◽  
...  

2016 ◽  
Vol 15 (3) ◽  
pp. 6563-6569
Author(s):  
S.J.SATHISH AARON JOSEPH ◽  
R. BALASUBRAMANIAN

Intrusion detection is one of the major necessities of the current networked environment, where every information is available in its corresponding digital form. This paper presents an enhanced tree based approach that can be used to perform intrusion detection faster and with better accuracy. The training data is subject to the random forest algorithm. This algorithm is a combination of tree predictors, and each tree depends upon the random vector generated. Spark based implementations of the Random Forest algorithm is used in a Hadoop cluster on datasets with varied imbalance to obtain the results. It has been observed that the classifier provided results in real time with an accuracy >90%, hence is more appropriate for online intrusion detection.


2020 ◽  
Vol 15 (4) ◽  
pp. 1238-1247
Author(s):  
Weiwei Li ◽  
Chunqing Li ◽  
Tao Wang

Abstract Membrane bioreactors (MBRs) are a sewage treatment process that combines membrane separation with bioreactor technology. It has great advantages in sewage treatment. Membrane fouling hinders MBR process development, however. Studies have shown that the degree of membrane fouling can be judged using the membrane flux rate. In this study, principal component analysis was used to extract the main factors affecting membrane fouling, then the random forest algorithm on the Hadoop big data platform was used to establish an MBR membrane flux prediction model, which was tested. In order to verify the model's effectiveness, BP neural network and SVM support vector machine models were established using the same experimental data. The experimental results from the different models were compared, and the results showed that the random forest algorithm gave the best MBR membrane flux predictions.


Author(s):  
Xinye Liu ◽  
Xiaotong Zhang ◽  
Tao Wang ◽  
Kun Cheng ◽  
Shangbing Jiao ◽  
...  

This chapter analyzes the social value of the TV drama Entrepreneurial Age through the mining of the audience's comments, so as to provide reference for the TV drama producers in topic selection, casting, and script design. Design/methodology/approach: The research is based on a three-step approach including data crawling, two-dimension data tags, and the random forest algorithm design. Findings: This chapter finds that there are three factors related to demand of TV drama:1) the appearance and acting skill of actors; 2) the closeness between TV plays and real life; 3) whether the topic of TV plays has high attention. Value: Based on the big data of audience comments, this chapter explores the factors that influence the number of TV plays. It provides an important reference for TV drama producers on how to design the plot of TV drama, how to choose actors, and how to create topics.


2019 ◽  
Vol 25 (6) ◽  
pp. 55-61
Author(s):  
Yujun Liu ◽  
Yi Hong ◽  
Cheng Hu

Thousands of electric vehicles (EV), which are large in number and flexible in their use of electricity, will be connected to the power system in the near future, which will bring more uncertainty to the power system. Therefore, it is necessary to study the general characteristics of EV charging behaviours. In the charging process, big data regarding charging behaviour of EVs are generated. This paper proposes a big data mining technique based on Random Forest and Principle Component Analysis for EV charging behaviour to identify and analyse clusters with different charging characteristics from the big data. This paper uses Dundee’s January 2018 EV charging data to conduct experiments, and obtains the charging behaviour clusters of the workdays, weekends, and holidays of January. The superiority of the random forest algorithm in the EV clustering problem is reflected when compared to the Euclidean distance method. The clusters obtained by the random forest algorithm have clearer characteristics, including the user’s charging method and travel behaviour. The results show that the charging behaviour of EVs has certain regularity, and the charging load has obvious peak-to-valley difference that is necessary to be regulated.


Sign in / Sign up

Export Citation Format

Share Document