A Comparison of Apache Spark Supervised Machine Learning Algorithms for DNA Splicing Site Prediction

Author(s):  
Valerio Morfino ◽  
Salvatore Rampone ◽  
Emanuel Weitschek
Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 444 ◽  
Author(s):  
Valerio Morfino ◽  
Salvatore Rampone

In the fields of Internet of Things (IoT) infrastructures, attack and anomaly detection are rising concerns. With the increased use of IoT infrastructure in every domain, threats and attacks in these infrastructures are also growing proportionally. In this paper the performances of several machine learning algorithms in identifying cyber-attacks (namely SYN-DOS attacks) to IoT systems are compared both in terms of application performances, and in training/application times. We use supervised machine learning algorithms included in the MLlib library of Apache Spark, a fast and general engine for big data processing. We show the implementation details and the performance of those algorithms on public datasets using a training set of up to 2 million instances. We adopt a Cloud environment, emphasizing the importance of the scalability and of the elasticity of use. Results show that all the Spark algorithms used result in a very good identification accuracy (>99%). Overall, one of them, Random Forest, achieves an accuracy of 1. We also report a very short training time (23.22 sec for Decision Tree with 2 million rows). The experiments also show a very low application time (0.13 sec for over than 600,000 instances for Random Forest) using Apache Spark in the Cloud. Furthermore, the explicit model generated by Random Forest is very easy-to-implement using high- or low-level programming languages. In light of the results obtained, both in terms of computation times and identification performance, a hybrid approach for the detection of SYN-DOS cyber-attacks on IoT devices is proposed: the application of an explicit Random Forest model, implemented directly on the IoT device, along with a second level analysis (training) performed in the Cloud.


2021 ◽  
Vol 1916 (1) ◽  
pp. 012042
Author(s):  
Ranjani Dhanapal ◽  
A AjanRaj ◽  
S Balavinayagapragathish ◽  
J Balaji

2021 ◽  
Vol 11 (15) ◽  
pp. 6728
Author(s):  
Muhammad Asfand Hafeez ◽  
Muhammad Rashid ◽  
Hassan Tariq ◽  
Zain Ul Abideen ◽  
Saud S. Alotaibi ◽  
...  

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.


Author(s):  
Charalambos Kyriakou ◽  
Symeon E. Christodoulou ◽  
Loukas Dimitriou

The paper presents a data-driven framework and related field studies on the use of supervised machine learning and smartphone technology for the spatial condition-assessment mapping of roadway pavement surface anomalies. The study explores the use of data, collected by sensors from a smartphone and a vehicle’s onboard diagnostic device while the vehicle is in movement, for the detection of roadway anomalies. The research proposes a low-cost and automated method to obtain up-to-date information on roadway pavement surface anomalies with the use of smartphone technology, artificial neural networks, robust regression analysis, and supervised machine learning algorithms for multiclass problems. The technology for the suggested system is readily available and accurate and can be utilized in pavement monitoring systems and geographical information system applications. Further, the proposed methodology has been field-tested, exhibiting accuracy levels higher than 90%, and it is currently expanded to include larger datasets and a bigger number of common roadway pavement surface defect types. The proposed system is of practical importance since it provides continuous information on roadway pavement surface conditions, which can be valuable for pavement engineers and public safety.


Sign in / Sign up

Export Citation Format

Share Document