Machine Learning Methods for Detecting Internet-of-Things (IoT) Malware

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Using Machine Learning to Predict the pKa of C–H Bonds. Relevance to Catalytic Methane Functionalization

10.26434/chemrxiv.12646772 ◽

2020 ◽

Author(s):

Christopher Zhou ◽

William Grumbles ◽

Thomas Cundari

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Models ◽

K Nearest Neighbors ◽

Network Support ◽

Highest Occupied Molecular Orbital ◽

Conjugate Acid ◽

Organometallic Catalyst ◽

Conjugate Base ◽

Machine Learning Models

Six machine learning models (random forest, neural network, support vector machine, k-nearest neighbors, Bayesian ridge regression, least squares linear regression) were trained on a dataset of 3d transition metal-methyl and -methane complexes to predict pKa(C–H), a property demonstrated to be important in catalytic activity and selectivity. Results illustrate that the machine learning models are quite promising, with RMSE metrics ranging from 4.6 to 8.8 pKa units, despite the relatively modest amount of data available to train on. Importantly, the machine learning models agreed that (a) conjugate base properties were more impactful than those of the corresponding conjugate acid, and (b) the energy of the highest occupied molecular orbital conjugate base was the most significant input feature in the prediction of pKa(C–H). Furthermore, results from additional testing conducted using an external dataset of Sc-methyl complexes demonstrated the robustness of all models, with RMSE metrics ranging from 1.5 to 6.6 pKa units. In all, this research demonstrates the potential of machine learning models in organometallic catalyst development.

Download Full-text

Analysis of Machine Learning Techniques Applied to Sensory Detection of Vehicles in Intelligent Crosswalks

Sensors ◽

10.3390/s20216019 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6019

Author(s):

José Manuel Lozano Domínguez ◽

Faroq Al-Tam ◽

Tomás de J. Mateo Sanguino ◽

Noélia Correia

Keyword(s):

Machine Learning ◽

Smart Cities ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Fuzzy Classifier ◽

Logistic Regression Models ◽

The Road ◽

Learning Agent ◽

Machine Learning Models

Improving road safety through artificial intelligence-based systems is now crucial turning smart cities into a reality. Under this highly relevant and extensive heading, an approach is proposed to improve vehicle detection in smart crosswalks using machine learning models. Contrarily to classic fuzzy classifiers, machine learning models do not require the readjustment of labels that depend on the location of the system and the road conditions. Several machine learning models were trained and tested using real traffic data taken from urban scenarios in both Portugal and Spain. These include random forest, time-series forecasting, multi-layer perceptron, support vector machine, and logistic regression models. A deep reinforcement learning agent, based on a state-of-the-art double-deep recurrent Q-network, is also designed and compared with the machine learning models just mentioned. Results show that the machine learning models can efficiently replace the classic fuzzy classifier.

Download Full-text

CPT Data Interpretation Employing Different Machine Learning Techniques

Geosciences ◽

10.3390/geosciences11070265 ◽

2021 ◽

Vol 11 (7) ◽

pp. 265

Author(s):

Stefan Rauter ◽

Franz Tschuchnigg

Keyword(s):

Machine Learning ◽

Grain Size ◽

Random Forest ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Cone Penetration ◽

Tip Resistance ◽

Machine Learning Models

The classification of soils into categories with a similar range of properties is a fundamental geotechnical engineering procedure. At present, this classification is based on various types of cost- and time-intensive laboratory and/or in situ tests. These soil investigations are essential for each individual construction site and have to be performed prior to the design of a project. Since Machine Learning could play a key role in reducing the costs and time needed for a suitable site investigation program, the basic ability of Machine Learning models to classify soils from Cone Penetration Tests (CPT) is evaluated. To find an appropriate classification model, 24 different Machine Learning models, based on three different algorithms, are built and trained on a dataset consisting of 1339 CPT. The applied algorithms are a Support Vector Machine, an Artificial Neural Network and a Random Forest. As input features, different combinations of direct cone penetration test data (tip resistance qc, sleeve friction fs, friction ratio Rf, depth d), combined with “defined”, thus, not directly measured data (total vertical stresses σv, effective vertical stresses σ’v and hydrostatic pore pressure u0), are used. Standard soil classes based on grain size distributions and soil classes based on soil behavior types according to Robertson are applied as targets. The different models are compared with respect to their prediction performance and the required learning time. The best results for all targets were obtained with models using a Random Forest classifier. For the soil classes based on grain size distribution, an accuracy of about 75%, and for soil classes according to Robertson, an accuracy of about 97–99%, was reached.

Download Full-text

Diagnosis of Problems in Truck Ore Transport Operations in Underground Mines Using Various Machine Learning Models and Data Collected by Internet of Things Systems

Minerals ◽

10.3390/min11101128 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1128

Author(s):

Sebeom Park ◽

Dahee Jung ◽

Hoang Nguyen ◽

Yosoon Choi

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Production Management ◽

Classification And Regression Tree ◽

Underground Mines ◽

Validation Dataset ◽

Support Vector ◽

Learning Models ◽

K Nearest Neighbor ◽

Machine Learning Models

This study proposes a method for diagnosing problems in truck ore transport operations in underground mines using four machine learning models (i.e., Gaussian naïve Bayes (GNB), k-nearest neighbor (kNN), support vector machine (SVM), and classification and regression tree (CART)) and data collected by an Internet of Things system. A limestone underground mine with an applied mine production management system (using a tablet computer and Bluetooth beacon) is selected as the research area, and log data related to the truck travel time are collected. The machine learning models are trained and verified using the collected data, and grid search through 5-fold cross-validation is performed to improve the prediction accuracy of the models. The accuracy of CART is highest when the parameters leaf and split are set to 1 and 4, respectively (94.1%). In the validation of the machine learning models performed using the validation dataset (1500), the accuracy of the CART was 94.6%, and the precision and recall were 93.5% and 95.7%, respectively. In addition, it is confirmed that the F1 score reaches values as high as 94.6%. Through field application and analysis, it is confirmed that the proposed CART model can be utilized as a tool for monitoring and diagnosing the status of truck ore transport operations.

Download Full-text

Rice Crop Detection Using LSTM, Bi-LSTM, and Machine Learning Models from Sentinel-1 Time Series

Remote Sensing ◽

10.3390/rs12162655 ◽

2020 ◽

Vol 12 (16) ◽

pp. 2655 ◽

Cited By ~ 4

Author(s):

Hugo Crisóstomo de Castro Filho ◽

Osmar Abílio de Carvalho Júnior ◽

Osmar Luiz Ferreira de Carvalho ◽

Pablo Pozzobon de Bem ◽

Rebeca dos Santos de Moura ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Rio Grande ◽

Machine Learning Techniques ◽

Support Vector ◽

High Temporal Resolution ◽

Rio Grande Do Sul ◽

Learning Models ◽

Free Data ◽

Machine Learning Models

The Synthetic Aperture Radar (SAR) time series allows describing the rice phenological cycle by the backscattering time signature. Therefore, the advent of the Copernicus Sentinel-1 program expands studies of radar data (C-band) for rice monitoring at regional scales, due to the high temporal resolution and free data distribution. Recurrent Neural Network (RNN) model has reached state-of-the-art in the pattern recognition of time-sequenced data, obtaining a significant advantage at crop classification on the remote sensing images. One of the most used approaches in the RNN model is the Long Short-Term Memory (LSTM) model and its improvements, such as Bidirectional LSTM (Bi-LSTM). Bi-LSTM models are more effective as their output depends on the previous and the next segment, in contrast to the unidirectional LSTM models. The present research aims to map rice crops from Sentinel-1 time series (band C) using LSTM and Bi-LSTM models in West Rio Grande do Sul (Brazil). We compared the results with traditional Machine Learning techniques: Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Normal Bayes (NB). The developed methodology can be subdivided into the following steps: (a) acquisition of the Sentinel time series over two years; (b) data pre-processing and minimizing noise from 3D spatial-temporal filters and smoothing with Savitzky-Golay filter; (c) time series classification procedures; (d) accuracy analysis and comparison among the methods. The results show high overall accuracy and Kappa (>97% for all methods and metrics). Bi-LSTM was the best model, presenting statistical differences in the McNemar test with a significance of 0.05. However, LSTM and Traditional Machine Learning models also achieved high accuracy values. The study establishes an adequate methodology for mapping the rice crops in West Rio Grande do Sul.

Download Full-text

IPAssess: A Protocol-Based Fingerprinting Model for Device Identification in the IoT

10.36227/techrxiv.16815232.v1 ◽

2021 ◽

Author(s):

Siddhartha Bhattacharyya ◽

Parth Ganeriwala ◽

Shreya Nandanwar ◽

Raja Muthalagu ◽

anubhav gupta

Keyword(s):

Machine Learning ◽

Experimental Study ◽

Internet Of Things ◽

Classification Accuracy ◽

Learning Models ◽

Device Identification ◽

Full Protocol ◽

Iot Devices ◽

Monitoring Technologies ◽

Machine Learning Models

Internet of Things (IoT) are the most commonly used devices today, that provide services that have become widely prevalent. With their success and growing need, the number of threats and attacks against IoT devices and services have been increasing exponentially. With the increase in knowledge of IoT related threats and adequate monitoring technologies, the potential to detect these threats is becoming a reality. There have been various studies consisting of fingerprinting based approaches on device identification but none have taken into account the full protocol spectrum. IPAssess is a novel fingerprinting based model which takes a feature set based on the correlation between the device characteristics and the protocols and then applies various machine learning models to perform device identification and classification. We have also used aggregation and augmentation to enhance the algorithm. In our experimental study, IPAssess performs IoT device identification with a 99.6\% classification accuracy.

Download Full-text

IPAssess: A Protocol-Based Fingerprinting Model for Device Identification in the IoT

10.36227/techrxiv.16815232 ◽

2021 ◽

Author(s):

Siddhartha Bhattacharyya ◽

Parth Ganeriwala ◽

Shreya Nandanwar ◽

Raja Muthalagu ◽

anubhav gupta

Keyword(s):

Machine Learning ◽

Experimental Study ◽

Internet Of Things ◽

Classification Accuracy ◽

Learning Models ◽

Device Identification ◽

Full Protocol ◽

Iot Devices ◽

Monitoring Technologies ◽

Machine Learning Models

Internet of Things (IoT) are the most commonly used devices today, that provide services that have become widely prevalent. With their success and growing need, the number of threats and attacks against IoT devices and services have been increasing exponentially. With the increase in knowledge of IoT related threats and adequate monitoring technologies, the potential to detect these threats is becoming a reality. There have been various studies consisting of fingerprinting based approaches on device identification but none have taken into account the full protocol spectrum. IPAssess is a novel fingerprinting based model which takes a feature set based on the correlation between the device characteristics and the protocols and then applies various machine learning models to perform device identification and classification. We have also used aggregation and augmentation to enhance the algorithm. In our experimental study, IPAssess performs IoT device identification with a 99.6\% classification accuracy.

Download Full-text

On the Performance of Machine Learning Models for Anomaly-Based Intelligent Intrusion Detection Systems for the Internet of Things

IEEE Internet of Things Journal ◽

10.1109/jiot.2021.3103829 ◽

2021 ◽

pp. 1-1

Author(s):

Ghada Abdelmoumin ◽

Danda B. Rawat ◽

Abdul Rahman

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Intrusion Detection ◽

Intrusion Detection Systems ◽

The Internet ◽

Learning Models ◽

Detection Systems ◽

The Internet Of Things ◽

Machine Learning Models

Download Full-text

Using Machine Learning to Predict the pKa of C–H Bonds. Relevance to Catalytic Methane Functionalization

10.26434/chemrxiv.12646772.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Christopher Zhou ◽

William Grumbles ◽

Thomas Cundari

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Models ◽

K Nearest Neighbors ◽

Network Support ◽

Highest Occupied Molecular Orbital ◽

Conjugate Acid ◽

Organometallic Catalyst ◽

Conjugate Base ◽

Machine Learning Models

Six machine learning models (random forest, neural network, support vector machine, k-nearest neighbors, Bayesian ridge regression, least squares linear regression) were trained on a dataset of 3d transition metal-methyl and -methane complexes to predict pKa(C–H), a property demonstrated to be important in catalytic activity and selectivity. Results illustrate that the machine learning models are quite promising, with RMSE metrics ranging from 4.6 to 8.8 pKa units, despite the relatively modest amount of data available to train on. Importantly, the machine learning models agreed that (a) conjugate base properties were more impactful than those of the corresponding conjugate acid, and (b) the energy of the highest occupied molecular orbital conjugate base was the most significant input feature in the prediction of pKa(C–H). Furthermore, results from additional testing conducted using an external dataset of Sc-methyl complexes demonstrated the robustness of all models, with RMSE metrics ranging from 1.5 to 6.6 pKa units. In all, this research demonstrates the potential of machine learning models in organometallic catalyst development.

Download Full-text