Improving the Performance of Neural Networks with Random Forest in Detecting Network Intrusions

Author(s):  
Wenjuan Li ◽  
Yuxin Meng
2020 ◽  
Vol 2 (1) ◽  
pp. 23-36
Author(s):  
Syed Aamir Ali Shah ◽  
Muhammad Asif Manzoor ◽  
Abdul Bais

Forest structure estimation is very important in geological, ecological and environmental studies. It provides the basis for the carbon stock estimation and effective means of sequestration of carbon sources and sinks. Multiple parameters are used to estimate the forest structure like above ground biomass, leaf area index and diameter at breast height. Among all these parameters, vegetation height has unique standing. In addition to forest structure estimation it provides the insight into long term historical changes and the estimates of stand age of the forests as well. There are multiple techniques available to estimate the canopy height. Light detection and ranging (LiDAR) based methods, being the accurate and useful ones, are very expensive to obtain and have no global coverage. There is a need to establish a mechanism to estimate the canopy height using freely available satellite imagery like Landsat images. Multiple studies are available which contribute in this area. The majority use Landsat images with random forest models. Although random forest based models are widely used in remote sensing applications, they lack the ability to utilize the spatial association of neighboring pixels in modeling process. In this research work, we define Convolutional Neural Network based model and analyze that model for three test configurations. We replicate the random forest based setup of Grant et al., which is a similar state-of-the-art study, and compare our results and show that the convolutional neural networks (CNN) based models not only capture the spatial association of neighboring pixels but also outperform the state-of-the-art.


2019 ◽  
Vol 24 (12) ◽  
pp. 9243-9256
Author(s):  
Jordan J. Bird ◽  
Anikó Ekárt ◽  
Diego R. Faria

Abstract In this work, we argue that the implications of pseudorandom and quantum-random number generators (PRNG and QRNG) inexplicably affect the performances and behaviours of various machine learning models that require a random input. These implications are yet to be explored in soft computing until this work. We use a CPU and a QPU to generate random numbers for multiple machine learning techniques. Random numbers are employed in the random initial weight distributions of dense and convolutional neural networks, in which results show a profound difference in learning patterns for the two. In 50 dense neural networks (25 PRNG/25 QRNG), QRNG increases over PRNG for accent classification at + 0.1%, and QRNG exceeded PRNG for mental state EEG classification by + 2.82%. In 50 convolutional neural networks (25 PRNG/25 QRNG), the MNIST and CIFAR-10 problems are benchmarked, and in MNIST the QRNG experiences a higher starting accuracy than the PRNG but ultimately only exceeds it by 0.02%. In CIFAR-10, the QRNG outperforms PRNG by + 0.92%. The n-random split of a Random Tree is enhanced towards and new Quantum Random Tree (QRT) model, which has differing classification abilities to its classical counterpart, 200 trees are trained and compared (100 PRNG/100 QRNG). Using the accent and EEG classification data sets, a QRT seemed inferior to a RT as it performed on average worse by − 0.12%. This pattern is also seen in the EEG classification problem, where a QRT performs worse than a RT by − 0.28%. Finally, the QRT is ensembled into a Quantum Random Forest (QRF), which also has a noticeable effect when compared to the standard Random Forest (RF). Ten to 100 ensembles of trees are benchmarked for the accent and EEG classification problems. In accent classification, the best RF (100 RT) outperforms the best QRF (100 QRF) by 0.14% accuracy. In EEG classification, the best RF (100 RT) outperforms the best QRF (100 QRT) by 0.08% but is extremely more complex, requiring twice the amount of trees in committee. All differences are observed to be situationally positive or negative and thus are likely data dependent in their observed functional behaviour.


2020 ◽  
Vol 131 ◽  
pp. 205-212
Author(s):  
Bikash Santra ◽  
Angshuman Paul ◽  
Dipti Prasad Mukherjee

Sebatik ◽  
2020 ◽  
Vol 24 (2) ◽  
Author(s):  
Anifuddin Azis

Indonesia merupakan negara dengan keanekaragaman hayati terbesar kedua di dunia setelah Brazil. Indonesia memiliki sekitar 25.000 spesies tumbuhan dan 400.000 jenis hewan dan ikan. Diperkirakan 8.500 spesies ikan hidup di perairan Indonesia atau merupakan 45% dari jumlah spesies yang ada di dunia, dengan sekitar 7.000an adalah spesies ikan laut. Untuk menentukan berapa jumlah spesies tersebut dibutuhkan suatu keahlian di bidang taksonomi. Dalam pelaksanaannya mengidentifikasi suatu jenis ikan bukanlah hal yang mudah karena memerlukan suatu metode dan peralatan tertentu, juga pustaka mengenai taksonomi. Pemrosesan video atau citra pada data ekosistem perairan yang dilakukan secara otomatis mulai dikembangkan. Dalam pengembangannya, proses deteksi dan identifikasi spesies ikan menjadi suatu tantangan dibandingkan dengan deteksi dan identifikasi pada objek yang lain. Metode deep learning yang berhasil dalam melakukan klasifikasi objek pada citra mampu untuk menganalisa data secara langsung tanpa adanya ekstraksi fitur pada data secara khusus. Sistem tersebut memiliki parameter atau bobot yang berfungsi sebagai ektraksi fitur maupun sebagai pengklasifikasi. Data yang diproses menghasilkan output yang diharapkan semirip mungkin dengan data output yang sesungguhnya.  CNN merupakan arsitektur deep learning yang mampu mereduksi dimensi pada data tanpa menghilangkan ciri atau fitur pada data tersebut. Pada penelitian ini akan dikembangkan model hybrid CNN (Convolutional Neural Networks) untuk mengekstraksi fitur dan beberapa algoritma klasifikasi untuk mengidentifikasi spesies ikan. Algoritma klasifikasi yang digunakan pada penelitian ini adalah : Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN),  Random Forest, Backpropagation.


10.2196/23938 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23938
Author(s):  
Ruairi O'Driscoll ◽  
Jake Turicchi ◽  
Mark Hopkins ◽  
Cristiana Duarte ◽  
Graham W Horgan ◽  
...  

Background Accurate solutions for the estimation of physical activity and energy expenditure at scale are needed for a range of medical and health research fields. Machine learning techniques show promise in research-grade accelerometers, and some evidence indicates that these techniques can be applied to more scalable commercial devices. Objective This study aims to test the validity and out-of-sample generalizability of algorithms for the prediction of energy expenditure in several wearables (ie, Fitbit Charge 2, ActiGraph GT3-x, SenseWear Armband Mini, and Polar H7) using two laboratory data sets comprising different activities. Methods Two laboratory studies (study 1: n=59, age 44.4 years, weight 75.7 kg; study 2: n=30, age=31.9 years, weight=70.6 kg), in which adult participants performed a sequential lab-based activity protocol consisting of resting, household, ambulatory, and nonambulatory tasks, were combined in this study. In both studies, accelerometer and physiological data were collected from the wearables alongside energy expenditure using indirect calorimetry. Three regression algorithms were used to predict metabolic equivalents (METs; ie, random forest, gradient boosting, and neural networks), and five classification algorithms (ie, k-nearest neighbor, support vector machine, random forest, gradient boosting, and neural networks) were used for physical activity intensity classification as sedentary, light, or moderate to vigorous. Algorithms were evaluated using leave-one-subject-out cross-validations and out-of-sample validations. Results The root mean square error (RMSE) was lowest for gradient boosting applied to SenseWear and Polar H7 data (0.91 METs), and in the classification task, gradient boost applied to SenseWear and Polar H7 was the most accurate (85.5%). Fitbit models achieved an RMSE of 1.36 METs and 78.2% accuracy for classification. Errors tended to increase in out-of-sample validations with the SenseWear neural network achieving RMSE values of 1.22 METs in the regression tasks and the SenseWear gradient boost and random forest achieving an accuracy of 80% in classification tasks. Conclusions Algorithms trained on combined data sets demonstrated high predictive accuracy, with a tendency for superior performance of random forests and gradient boosting for most but not all wearable devices. Predictions were poorer in the between-study validations, which creates uncertainty regarding the generalizability of the tested algorithms.


2016 ◽  
Author(s):  
Mathias Seibert ◽  
Bruno Merz ◽  
Heiko Apel

Abstract. The Limpopo basin in southern Africa is prone to droughts, which affect the livelihoods of millions of people in South Africa, Botswana, Zimbabwe, and Mozambique. Seasonal drought early warning is thus vital for the whole region. In this study, the predictability of hydrological droughts during the main runoff period from December to May is assessed with statistical approaches. Three methods (Multiple Linear Models, Artifical Neural Networks, Random Forest Regression Trees) are compared in terms of their ability to forecast streamflow with up to 12 months lead time. The following four main findings result from the study. 1) There are stations in the basin at which standardised streamflow is predictable with lead times up to 12 months. The results show high interstation differences of forecast skill but reach a coefficient of determination as high as 0.73 (cross validated). 2) A large range of potential predictors is considered in this study, comprising well established climate indices, customised teleconnection indices derived from sea surface temperatures, and antecedent streamflow as proxy of catchment conditions. El-Niño and customised indices, representing sea surface temperature in the Atlantic and Indian Ocean, prove to be important teleconnection predictors for the region. Antecedent streamflow is a strong predictor in small catchments (with median 42 % explained variance), whereas teleconnections exert a stronger influence in large catchments. 3) Multiple linear models show the best forecast skill in this study and the greatest robustness compared to artificial neural networks and Random Forest regression trees, despite their capabilities to represent non-linear relationships. 4) Employed in early warning the models can be used to forecast a specific drought level. Even if the coefficient of determination is low, the forecast models have a skill better than a climatological forecast, which is shown by analysis of receiver operating characteristics (ROC). Seasonal statistical forecasts in the Limpopo show promising results, and thus it is recommended to employ them complementary to existing forecasts in order to strengthen preparedness for droughts.


Sign in / Sign up

Export Citation Format

Share Document