VariantSpark, A Random Forest Machine Learning Implementation for Ultra High Dimensional Data

Mapping Intimacies ◽

10.1101/702902 ◽

2019 ◽

Cited By ~ 1

Author(s):

Arash Bayat ◽

Piotr Szul ◽

Aidan R. O’Brien ◽

Robert Dunne ◽

Oscar J. Luo ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Internet Of Things ◽

Random Forests ◽

Life Sciences ◽

High Dimensional ◽

The Internet ◽

Machine Learning Methods ◽

High Dimensional Datasets ◽

The Internet Of Things

AbstractThe demands on machine learning methods to cater for ultra high dimensional datasets, datasets with millions of features, have been increasing in domains like life sciences and the Internet of Things (IoT). While Random Forests are suitable for “wide” datasets, current implementations such as Google’s PLANET lack the ability to scale to such dimensions. Recent improvements by Yggdrasil begin to address these limitations but do not extend to Random Forest. This paper introduces CursedForest, a novel Random Forest implementation on top of Apache Spark and part of the VariantSpark platform, which parallelises processing of all nodes over the entire forest. CursedForest is 9 and up to 89 times faster than Google’s PLANET and Yggdrasil, respectively, and is the first method capable of scaling to millions of features.

Download Full-text

Social Media Analytics of the Internet of Things

10.21203/rs.3.rs-647683/v1 ◽

2021 ◽

Author(s):

Jim Scheibmeir ◽

Yashwant K. Malaiya

Keyword(s):

Machine Learning ◽

Social Media ◽

Internet Of Things ◽

Smart Cities ◽

The Internet ◽

Social Media Analytics ◽

Learning Methods ◽

Machine Learning Methods ◽

Internet Of Things Technology ◽

The Internet Of Things

Abstract The Internet of Things technology offers convenience and innovation in areas such as smart homes and smart cities. Internet of Things solutions require careful management of devices and the risk mitigation of potential vulnerabilities within cyber-physical systems. The Internet of Things concept, its implementations, and applications are frequently discussed on social media platforms. This article illuminates the public view of the Internet of Things through a content-based analysis of contemporary conversations occurring on the Twitter platform. Tweets can be analyzed with machine learning methods to converge the volume and variety of conversations into predictive and descriptive models. We have reviewed 684,503 tweets collected in a two-week period. Using supervised and unsupervised machine learning methods, we have identified interconnecting relationships between trending themes and the most mentioned industries. We have identified characteristics of language sentiment which can help to predict popularity within the realm of IoT conversation. We found the healthcare industry as the leading use case industry for IoT implementations. This is not surprising as the current Covid-19 pandemic is driving significant social media discussions. There was an alarming dearth of conversations towards cybersecurity. Only 12% of the tweets relating to the Internet of Things contained any mention of topics such as encryption, vulnerabilities, or risk, among other cybersecurity-related terms.

Download Full-text

Usages of Spark Framework with Different Machine Learning Algorithms

Computational Intelligence and Neuroscience ◽

10.1155/2021/1896953 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Mohamed Ali Mohamed ◽

Ibrahim Mahmoud El-henawy ◽

Ahmad Salah

Keyword(s):

Machine Learning ◽

Big Data ◽

Internet Of Things ◽

Large Data ◽

Machine Learning Algorithms ◽

Smart Devices ◽

The Internet ◽

Learning Methods ◽

Machine Learning Methods ◽

The Internet Of Things

Sensors, satellites, mobile devices, social media, e-commerce, and the Internet, among others, saturate us with data. The Internet of Things, in particular, enables massive amounts of data to be generated more quickly. The Internet of Things is a term that describes the process of connecting computers, smart devices, and other data-generating equipment to a network and transmitting data. As a result, data is produced and updated on a regular basis to reflect changes in all areas and activities. As a consequence of this exponential growth of data, a new term and idea known as big data have been coined. Big data is required to illuminate the relationships between things, forecast future trends, and provide more information to decision-makers. The major problem at present, however, is how to effectively collect and evaluate massive amounts of diverse and complicated data. In some sectors or applications, machine learning models are the most frequently utilized methods for interpreting and analyzing data and obtaining important information. On their own, traditional machine learning methods are unable to successfully handle large data problems. This article gives an introduction to Spark architecture as a platform that machine learning methods may utilize to address issues regarding the design and execution of large data systems. This article focuses on three machine learning types, including regression, classification, and clustering, and how they can be applied on top of the Spark platform.

Download Full-text

Social media analytics of the Internet of Things

Discover Internet of Things ◽

10.1007/s43926-021-00016-5 ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Jim A. Scheibmeir ◽

Yashwant K. Malaiya

Keyword(s):

Machine Learning ◽

Social Media ◽

Internet Of Things ◽

Smart Cities ◽

The Internet ◽

Social Media Analytics ◽

Learning Methods ◽

Machine Learning Methods ◽

Internet Of Things Technology ◽

The Internet Of Things

AbstractThe Internet of Things technology offers convenience and innovation in areas such as smart homes and smart cities. Internet of Things solutions require careful management of devices and the risk mitigation of potential vulnerabilities within cyber-physical systems. The Internet of Things concept, its implementations, and applications are frequently discussed on social media platforms. This research illuminates the public view of the Internet of Things through a content-based and network analysis of contemporary conversations occurring on the Twitter platform. Tweets can be analyzed with machine learning methods to converge the volume and variety of conversations into predictive and descriptive models. We have reviewed 684,503 tweets collected in a 2-week period. Using supervised and unsupervised machine learning methods, we have identified trends within the realm of IoT and their interconnecting relationships between the most mentioned industries. We have identified characteristics of language sentiment which can help to predict the popularity of IoT conversation topics. We found the healthcare industry as the leading use case industry for IoT implementations. This is not surprising as the current COVID-19 pandemic is driving significant social media discussions. There was an alarming dearth of conversations towards cybersecurity. Recent breaches and ransomware events denote that organizations should spend more time communicating about risks and mitigations. Only 12% of the tweets relating to the Internet of Things contained any mention of topics such as encryption, vulnerabilities, or risk, among other cybersecurity-related terms. We propose an IoT Cybersecurity Communication Scorecard to help organizations benchmark the density and sentiment of their corporate communications regarding security against their specific industry.

Download Full-text

Substantiation of the relevance of ensuring the information security of the Internet of things networks through the development of machine learning methods

Information Security Questions ◽

10.52190/2073-2600_2021_3_34 ◽

2021 ◽

pp. 34-39

Author(s):

D. S. Karnuta

Keyword(s):

Machine Learning ◽

Information Security ◽

Internet Of Things ◽

The Internet ◽

Learning Methods ◽

Machine Learning Methods ◽

The Internet Of Things

Download Full-text

Design of Machine Learning Prediction System Based on the Internet of Things Framework for Monitoring Fine PM Concentrations

Environments ◽

10.3390/environments8100099 ◽

2021 ◽

Vol 8 (10) ◽

pp. 99

Author(s):

Shun-Yuan Wang ◽

Wen-Bin Lin ◽

Yu-Chieh Shu

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Particulate Matter ◽

Random Forest ◽

Internet Of Things ◽

Random Forest Model ◽

The Internet ◽

Learning Models ◽

Forest Model ◽

The Internet Of Things

In this study, a mobile air pollution sensing unit based on the Internet of Things framework was designed for monitoring the concentration of fine particulate matter in three urban areas. This unit was developed using the NodeMCU-32S microcontroller, PMS5003-G5 (particulate matter sensing module), and Ublox NEO-6M V2 (GPS positioning module). The sensing unit transmits data of the particulate matter concentration and coordinates of a polluted location to the backend server through 3G and 4G telecommunication networks for data collection. This system will complement the government’s PM2.5 data acquisition system. Mobile monitoring stations meet the air pollution monitoring needs of some areas that require special observation. For example, an AIoT development system will be installed. At intersections with intensive traffic, it can be used as a reference for government transportation departments or environmental inspection departments for environmental quality monitoring or evacuation of traffic flow. Furthermore, the particulate matter distributions in three areas, namely Xinzhuang, Sanchong, and Luzhou Districts, which are all in New Taipei City of Taiwan, were estimated using machine learning models, the data of stationary monitoring stations, and the measurements of the mobile sensing system proposed in this study. Four types of learning models were trained, namely the decision tree, random forest, multilayer perceptron, and radial basis function neural network, and their prediction results were evaluated. The root mean square error was used as the performance indicator, and the learning results indicate that the random forest model outperforms the other models for both the training and testing sets. To examine the generalizability of the learning models, the models were verified in relation to data measured on three days: 15 February, 28 February, and 1 March 2019. A comparison between the model predicted and the measured data indicates that the random forest model provides the most stable and accurate prediction values and could clearly present the distribution of highly polluted areas. The results of these models are visualized in the form of maps by using a web application. The maps allow users to understand the distribution of polluted areas intuitively.

Download Full-text

Time-Aware Detection Systems

Proceedings ◽

10.3390/proceedings2019021039 ◽

2019 ◽

Vol 21 (1) ◽

pp. 39

Author(s):

Manuel López-Vizcaíno ◽

Laura Vigoya ◽

Fidel Cacheda ◽

Francisco J. Novoa

Keyword(s):

Machine Learning ◽

Communication Network ◽

Internet Of Things ◽

The Internet ◽

Network Data ◽

Detection Systems ◽

Machine Learning Methods ◽

Public Datasets ◽

The Internet Of Things ◽

Time Aware

Communication network data has been growing in the last decades and with the generalisation of the Internet of Things (IoT) its growth has increased. The number of attacks to this kind of infrastructures have also increased due to the relevance they are gaining. As a result, it is vital to guarantee an adequate level of security and to detect threats as soon as possible. Classical methods emphasise in detection but not taking into account the number of records needed to successfully identify an attack. To achieve this, time-aware techniques both for detection and measure may be used. In this work, well-known machine learning methods will be explored to detect attacks based on public datasets. In order to obtain the performance, classic metrics will be used but also the number of elements processed will be taken into account in order to determine a time-aware performance of the method.

Download Full-text

Detecting Abnormal Behavior of an IoT Device in the Network Based on a Traffic Model

Telecom IT ◽

10.31854/2307-1303-2019-7-3-50-55 ◽

2019 ◽

Vol 7 (3) ◽

pp. 50-55

Author(s):

D. Saharov ◽

D. Kozlov

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Mobile Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Abnormal Behavior ◽

Traffic Model ◽

The Internet ◽

Wide Spread ◽

The Internet Of Things

The article deals with the СoAP Protocol that regulates the transmission and reception of information traf-fic by terminal devices in IoT networks. The article describes a model for detecting abnormal traffic in 5G/IoT networks using machine learning algorithms, as well as the main methods for solving this prob-lem. The relevance of the article is due to the wide spread of the Internet of things and the upcoming update of mobile networks to the 5g generation.

Download Full-text

Understanding and personalising smart city services using machine learning, The Internet-of-Things and Big Data

2017 IEEE 26th International Symposium on Industrial Electronics (ISIE) ◽

10.1109/isie.2017.8001570 ◽

2017 ◽

Cited By ~ 17

Author(s):

Jeannette Chin ◽

Vic Callaghan ◽

Ivan Lam

Keyword(s):

Machine Learning ◽

Big Data ◽

Internet Of Things ◽

Smart City ◽

The Internet ◽

The Internet Of Things

Download Full-text

Early Botnet Detection for the Internet and the Internet of Things by Autonomous Machine Learning

2020 16th International Conference on Mobility, Sensing and Networking (MSN) ◽

10.1109/msn50589.2020.00087 ◽

2020 ◽

Author(s):

Anderson Bergamini de Neira ◽

Alex Medeiros Araujo ◽

Michele Nogueira

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

The Internet ◽

Botnet Detection ◽

Autonomous Machine ◽

The Internet Of Things

Download Full-text

IMPROVING INTERNET OF THINGS PARKING SYSTEMS

Facta Universitatis Series Automatic Control and Robotics ◽

10.22190/fuacr2003163b ◽

2021 ◽

Vol 19 (3) ◽

pp. 163

Author(s):

Dušan Bogićević

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Cloud Computing ◽

Internet Of Things ◽

Information Needs ◽

Vehicle Classification ◽

The Internet ◽

Parking Space ◽

Automated Systems ◽

The Internet Of Things

Edge data processing represents the new evolution of the Internet and Cloud computing. Its application to the Internet of Things (IoT) is a step towards faster processing of information from sensors for better performance. In automated systems, we have a large number of sensors, whose information needs to be processed in the shortest possible time and acted upon. The paper describes the possibility of applying Artificial Intelligence on Edge devices using the example of finding a parking space for a vehicle, and directing it based on the segment the vehicle belongs to. Algorithm of Machine Learning is used for vehicle classification, which is based on vehicle dimensions.

Download Full-text