scholarly journals VariantSpark, A Random Forest Machine Learning Implementation for Ultra High Dimensional Data

2019 ◽  
Author(s):  
Arash Bayat ◽  
Piotr Szul ◽  
Aidan R. O’Brien ◽  
Robert Dunne ◽  
Oscar J. Luo ◽  
...  

AbstractThe demands on machine learning methods to cater for ultra high dimensional datasets, datasets with millions of features, have been increasing in domains like life sciences and the Internet of Things (IoT). While Random Forests are suitable for “wide” datasets, current implementations such as Google’s PLANET lack the ability to scale to such dimensions. Recent improvements by Yggdrasil begin to address these limitations but do not extend to Random Forest. This paper introduces CursedForest, a novel Random Forest implementation on top of Apache Spark and part of the VariantSpark platform, which parallelises processing of all nodes over the entire forest. CursedForest is 9 and up to 89 times faster than Google’s PLANET and Yggdrasil, respectively, and is the first method capable of scaling to millions of features.

2021 ◽  
Author(s):  
Jim Scheibmeir ◽  
Yashwant K. Malaiya

Abstract The Internet of Things technology offers convenience and innovation in areas such as smart homes and smart cities. Internet of Things solutions require careful management of devices and the risk mitigation of potential vulnerabilities within cyber-physical systems. The Internet of Things concept, its implementations, and applications are frequently discussed on social media platforms. This article illuminates the public view of the Internet of Things through a content-based analysis of contemporary conversations occurring on the Twitter platform. Tweets can be analyzed with machine learning methods to converge the volume and variety of conversations into predictive and descriptive models. We have reviewed 684,503 tweets collected in a two-week period. Using supervised and unsupervised machine learning methods, we have identified interconnecting relationships between trending themes and the most mentioned industries. We have identified characteristics of language sentiment which can help to predict popularity within the realm of IoT conversation. We found the healthcare industry as the leading use case industry for IoT implementations. This is not surprising as the current Covid-19 pandemic is driving significant social media discussions. There was an alarming dearth of conversations towards cybersecurity. Only 12% of the tweets relating to the Internet of Things contained any mention of topics such as encryption, vulnerabilities, or risk, among other cybersecurity-related terms.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Mohamed Ali Mohamed ◽  
Ibrahim Mahmoud El-henawy ◽  
Ahmad Salah

Sensors, satellites, mobile devices, social media, e-commerce, and the Internet, among others, saturate us with data. The Internet of Things, in particular, enables massive amounts of data to be generated more quickly. The Internet of Things is a term that describes the process of connecting computers, smart devices, and other data-generating equipment to a network and transmitting data. As a result, data is produced and updated on a regular basis to reflect changes in all areas and activities. As a consequence of this exponential growth of data, a new term and idea known as big data have been coined. Big data is required to illuminate the relationships between things, forecast future trends, and provide more information to decision-makers. The major problem at present, however, is how to effectively collect and evaluate massive amounts of diverse and complicated data. In some sectors or applications, machine learning models are the most frequently utilized methods for interpreting and analyzing data and obtaining important information. On their own, traditional machine learning methods are unable to successfully handle large data problems. This article gives an introduction to Spark architecture as a platform that machine learning methods may utilize to address issues regarding the design and execution of large data systems. This article focuses on three machine learning types, including regression, classification, and clustering, and how they can be applied on top of the Spark platform.


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Jim A. Scheibmeir ◽  
Yashwant K. Malaiya

AbstractThe Internet of Things technology offers convenience and innovation in areas such as smart homes and smart cities. Internet of Things solutions require careful management of devices and the risk mitigation of potential vulnerabilities within cyber-physical systems. The Internet of Things concept, its implementations, and applications are frequently discussed on social media platforms. This research illuminates the public view of the Internet of Things through a content-based and network analysis of contemporary conversations occurring on the Twitter platform. Tweets can be analyzed with machine learning methods to converge the volume and variety of conversations into predictive and descriptive models. We have reviewed 684,503 tweets collected in a 2-week period. Using supervised and unsupervised machine learning methods, we have identified trends within the realm of IoT and their interconnecting relationships between the most mentioned industries. We have identified characteristics of language sentiment which can help to predict the popularity of IoT conversation topics. We found the healthcare industry as the leading use case industry for IoT implementations. This is not surprising as the current COVID-19 pandemic is driving significant social media discussions. There was an alarming dearth of conversations towards cybersecurity. Recent breaches and ransomware events denote that organizations should spend more time communicating about risks and mitigations. Only 12% of the tweets relating to the Internet of Things contained any mention of topics such as encryption, vulnerabilities, or risk, among other cybersecurity-related terms. We propose an IoT Cybersecurity Communication Scorecard to help organizations benchmark the density and sentiment of their corporate communications regarding security against their specific industry.


Environments ◽  
2021 ◽  
Vol 8 (10) ◽  
pp. 99
Author(s):  
Shun-Yuan Wang ◽  
Wen-Bin Lin ◽  
Yu-Chieh Shu

In this study, a mobile air pollution sensing unit based on the Internet of Things framework was designed for monitoring the concentration of fine particulate matter in three urban areas. This unit was developed using the NodeMCU-32S microcontroller, PMS5003-G5 (particulate matter sensing module), and Ublox NEO-6M V2 (GPS positioning module). The sensing unit transmits data of the particulate matter concentration and coordinates of a polluted location to the backend server through 3G and 4G telecommunication networks for data collection. This system will complement the government’s PM2.5 data acquisition system. Mobile monitoring stations meet the air pollution monitoring needs of some areas that require special observation. For example, an AIoT development system will be installed. At intersections with intensive traffic, it can be used as a reference for government transportation departments or environmental inspection departments for environmental quality monitoring or evacuation of traffic flow. Furthermore, the particulate matter distributions in three areas, namely Xinzhuang, Sanchong, and Luzhou Districts, which are all in New Taipei City of Taiwan, were estimated using machine learning models, the data of stationary monitoring stations, and the measurements of the mobile sensing system proposed in this study. Four types of learning models were trained, namely the decision tree, random forest, multilayer perceptron, and radial basis function neural network, and their prediction results were evaluated. The root mean square error was used as the performance indicator, and the learning results indicate that the random forest model outperforms the other models for both the training and testing sets. To examine the generalizability of the learning models, the models were verified in relation to data measured on three days: 15 February, 28 February, and 1 March 2019. A comparison between the model predicted and the measured data indicates that the random forest model provides the most stable and accurate prediction values and could clearly present the distribution of highly polluted areas. The results of these models are visualized in the form of maps by using a web application. The maps allow users to understand the distribution of polluted areas intuitively.


Proceedings ◽  
2019 ◽  
Vol 21 (1) ◽  
pp. 39
Author(s):  
Manuel López-Vizcaíno ◽  
Laura Vigoya ◽  
Fidel Cacheda ◽  
Francisco J. Novoa

Communication network data has been growing in the last decades and with the generalisation of the Internet of Things (IoT) its growth has increased. The number of attacks to this kind of infrastructures have also increased due to the relevance they are gaining. As a result, it is vital to guarantee an adequate level of security and to detect threats as soon as possible. Classical methods emphasise in detection but not taking into account the number of records needed to successfully identify an attack. To achieve this, time-aware techniques both for detection and measure may be used. In this work, well-known machine learning methods will be explored to detect attacks based on public datasets. In order to obtain the performance, classic metrics will be used but also the number of elements processed will be taken into account in order to determine a time-aware performance of the method.


Telecom IT ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. 50-55
Author(s):  
D. Saharov ◽  
D. Kozlov

The article deals with the СoAP Protocol that regulates the transmission and reception of information traf-fic by terminal devices in IoT networks. The article describes a model for detecting abnormal traffic in 5G/IoT networks using machine learning algorithms, as well as the main methods for solving this prob-lem. The relevance of the article is due to the wide spread of the Internet of things and the upcoming update of mobile networks to the 5g generation.


2021 ◽  
Vol 19 (3) ◽  
pp. 163
Author(s):  
Dušan Bogićević

Edge data processing represents the new evolution of the Internet and Cloud computing. Its application to the Internet of Things (IoT) is a step towards faster processing of information from sensors for better performance. In automated systems, we have a large number of sensors, whose information needs to be processed in the shortest possible time and acted upon. The paper describes the possibility of applying Artificial Intelligence on Edge devices using the example of finding a parking space for a vehicle, and directing it based on the segment the vehicle belongs to. Algorithm of Machine Learning is used for vehicle classification, which is based on vehicle dimensions.


Sign in / Sign up

Export Citation Format

Share Document