On-the-Go Network Establishment of IoT Devices to Meet the Need of Processing Big Data Using Machine Learning Algorithms

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

Download Full-text

What is Machine Learning? A Primer for the Epidemiologist

American Journal of Epidemiology ◽

10.1093/aje/kwz189 ◽

2019 ◽

Cited By ~ 6

Author(s):

Qifang Bi ◽

Katherine E Goodman ◽

Joshua Kaminsky ◽

Justin Lessler

Keyword(s):

Machine Learning ◽

Big Data ◽

Computer Science ◽

Research Methods ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Epidemiologic Research ◽

Learning Techniques ◽

Applications Of Machine Learning

Abstract Machine learning is a branch of computer science that has the potential to transform epidemiologic sciences. Amid a growing focus on “Big Data,” it offers epidemiologists new tools to tackle problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction to classification to clustering. We provide a brief introduction to 5 common machine learning algorithms and 4 ensemble-based approaches. We then summarize epidemiologic applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiologic research and discuss opportunities and challenges for integrating machine learning and existing epidemiologic research methods.

Download Full-text

Towards Near-Real-Time Intrusion Detection for IoT Devices using Supervised Learning and Apache Spark

Electronics ◽

10.3390/electronics9030444 ◽

2020 ◽

Vol 9 (3) ◽

pp. 444 ◽

Cited By ~ 1

Author(s):

Valerio Morfino ◽

Salvatore Rampone

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Hybrid Approach ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Apache Spark ◽

Identification Accuracy ◽

Supervised Machine Learning ◽

Iot Devices

In the fields of Internet of Things (IoT) infrastructures, attack and anomaly detection are rising concerns. With the increased use of IoT infrastructure in every domain, threats and attacks in these infrastructures are also growing proportionally. In this paper the performances of several machine learning algorithms in identifying cyber-attacks (namely SYN-DOS attacks) to IoT systems are compared both in terms of application performances, and in training/application times. We use supervised machine learning algorithms included in the MLlib library of Apache Spark, a fast and general engine for big data processing. We show the implementation details and the performance of those algorithms on public datasets using a training set of up to 2 million instances. We adopt a Cloud environment, emphasizing the importance of the scalability and of the elasticity of use. Results show that all the Spark algorithms used result in a very good identification accuracy (>99%). Overall, one of them, Random Forest, achieves an accuracy of 1. We also report a very short training time (23.22 sec for Decision Tree with 2 million rows). The experiments also show a very low application time (0.13 sec for over than 600,000 instances for Random Forest) using Apache Spark in the Cloud. Furthermore, the explicit model generated by Random Forest is very easy-to-implement using high- or low-level programming languages. In light of the results obtained, both in terms of computation times and identification performance, a hybrid approach for the detection of SYN-DOS cyber-attacks on IoT devices is proposed: the application of an explicit Random Forest model, implemented directly on the IoT device, along with a second level analysis (training) performed in the Cloud.

Download Full-text

Machine Learning Methods in Precision Medicine Targeting Epigenetic Diseases

Current Pharmaceutical Design ◽

10.2174/1381612824666181112114228 ◽

2019 ◽

Vol 24 (34) ◽

pp. 3998-4006

Author(s):

Shijie Fan ◽

Yu Chen ◽

Cheng Luo ◽

Fanwang Meng

Keyword(s):

Machine Learning ◽

Big Data ◽

Precision Medicine ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Advantages And Disadvantages ◽

Machine Learning Methods ◽

Accelerated Studies ◽

Applications Of Machine Learning

Background: On a tide of big data, machine learning is coming to its day. Referring to huge amounts of epigenetic data coming from biological experiments and clinic, machine learning can help in detecting epigenetic features in genome, finding correlations between phenotypes and modifications in histone or genes, accelerating the screen of lead compounds targeting epigenetics diseases and many other aspects around the study on epigenetics, which consequently realizes the hope of precision medicine. Methods: In this minireview, we will focus on reviewing the fundamentals and applications of machine learning methods which are regularly used in epigenetics filed and explain their features. Their advantages and disadvantages will also be discussed. Results: Machine learning algorithms have accelerated studies in precision medicine targeting epigenetics diseases. Conclusion: In order to make full use of machine learning algorithms, one should get familiar with the pros and cons of them, which will benefit from big data by choosing the most suitable method(s).

Download Full-text