Big data Predictive Analytics for Apache Spark using Machine Learning

This chapter aims to give an overview of the wide range of Big Data approaches and technologies today. The data features of Volume, Velocity, and Variety are examined against new database technologies. It explores the complexity of data types, methodologies of storage, access and computation, current and emerging trends of data analysis, and methods of extracting value from data. It aims to address the need for clarity regarding the future of RDBMS and the newer systems. And it highlights the methods in which Actionable Insights can be built into public sector domains, such as Machine Learning, Data Mining, Predictive Analytics and others.

Download Full-text

Big Data Analytics and Machine Learning Paradigm: Predictive Analytics in the Healthcare Sector

Intelligent System Algorithms and Applications in Science and Technology ◽

10.1201/9781003187059-2 ◽

2021 ◽

pp. 3-17

Author(s):

Pratiyush Guleria

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Predictive Analytics ◽

Big Data Analytics ◽

Healthcare Sector ◽

Learning Paradigm

Download Full-text

DATA SCIENCE, BIG DATA, AND PREDICTIVE ANALYTICS: A PLATFORM FOR CYBERSPACE SECURITY INTELLIGENCE

Jurnal Pertahanan & Bela Negara ◽

10.33172/jpbh.v7i2.192 ◽

2017 ◽

Vol 7 (2) ◽

Author(s):

Dicky R. M. Nainggolan

Keyword(s):

Machine Learning ◽

Big Data ◽

Cyber Security ◽

High Speed ◽

Data Science ◽

Predictive Analytics ◽

Cyber Attacks ◽

Artificial Intelligence Technology ◽

Science Methodology ◽

Cyberspace Security

Abstract – Data are the prominent elements in scientific researches and approaches. Data Science methodology is used to select and to prepare enormous numbers of data for further processing and analysing. Big Data technology collects vast amount of data from many sources in order to exploit the information and to visualise trend or to discover a certain phenomenon in the past, present, or in the future at high speed processing capability. Predictive analytics provides in-depth analytical insights and the emerging of machine learning brings the data analytics to a higher level by processing raw data with artificial intelligence technology. Predictive analytics and machine learning produce visual reports for decision makers and stake-holders. Regarding cyberspace security, big data promises the opportunities in order to prevent and to detect any advanced cyber-attacks by using internal and external security data. Keywords: Big Data, Cyber Security, Data Science, Intelligence, Predictive Analytics Abstrak – Data merupakan unsur terpenting dalam setiap penelitian dan pendekatan ilmiah. Metodologi sains data digunakan untuk memilah, memilih dan mempersiapkan sejumlah data untuk diproses dan dianalisis. Teknologi big data mampu mengumpulkan data dengan sangat banyak dari berbagai sumber dengan tujuan untuk mendapatkan informasi dengan visualisasi tren atau menyingkapkan pengetahuan dari suatu peristiwa yang terjadi baik dimasa lalu, sekarang, maupun akan datang dengan kecepatan pemrosesan data sangat tinggi. Analisis prediktif memberikan wawasan analisis lebih dalam dan kemunculan machine learning membawa analisis data ke tingkat yang lebih tinggi dengan bantuan teknologi kecerdasan buatan dalam tahap pemrosesan data mentah. Analisis prediktif dan machine learning menghasilkan laporan berbentuk visual untuk pengambil keputusan dan pemangku kepentingan. Berkenaan dengan keamanan siber, big data menjanjikan kesempatan dalam rangka untuk mencegah dan mendeteksi setiap serangan canggih siber dengan memanfaatkan data keamanan internal dan eksternal. Kata Kunci: Analisis Prediktif, Big Data, Intelijen, Keamanan Siber, Sains Data

Download Full-text

SAINS DATA, BIG DATA, DAN ANALISIS PREDIKTIF: SEBUAH LANDASAN UNTUK KECERDASAN KEAMANAN SIBER

Jurnal Pertahanan & Bela Negara ◽

10.33172/jpbh.v7i2.187 ◽

2017 ◽

Vol 7 (2) ◽

Author(s):

Dicky R. M. Nainggolan

Keyword(s):

Machine Learning ◽

Big Data ◽

Cyber Security ◽

High Speed ◽

Data Science ◽

Predictive Analytics ◽

Cyber Attacks ◽

Artificial Intelligence Technology ◽

Science Methodology ◽

Many Sources

Abstrak – Data merupakan unsur terpenting dalam setiap penelitian dan pendekatan ilmiah. Metodologi sains data digunakan untuk memilah, memilih dan mempersiapkan sejumlah data untuk diproses dan dianalisis. Teknologi big data mampu mengumpulkan data dengan sangat banyak dari berbagai sumber dengan tujuan untuk mendapatkan informasi dengan visualisasi tren atau menyingkapkan pengetahuan dari suatu peristiwa yang terjadi baik dimasa lalu, sekarang, maupun akan datang dengan kecepatan pemrosesan data sangat tinggi. Analisis prediktif memberikan wawasan analisis lebih dalam dan kemunculan machine learning membawa analisis data ke tingkat yang lebih tinggi dengan bantuan teknologi kecerdasan buatan dalam tahap pemrosesan data mentah. Analisis prediktif dan machine learning menghasilkan laporan berbentuk visual untuk pengambil keputusan dan pemangku kepentingan. Berkenaan dengan keamanan siber, big data menjanjikan kesempatan dalam rangka untuk mencegah dan mendeteksi setiap serangan canggih siber dengan memanfaatkan data keamanan internal dan eksternal. Kata Kunci: analisis prediktif, big data, intelijen, keamanan siber, sains dataAbstract – Data are the prominent elements in scientific researches and approaches. Data Science methodology is used to select and to prepare enormous numbers of data for further processing and analysing. Big Data technology collects vast amount of data from many sources in order to exploit the information and to visualise trend or to discover a certain phenomenon in the past, present, or in the future at high speed processing capability. Predictive analytics provides in-depth analytical insights and the emerging of machine learning brings the data analytics to a higher level by processing raw data with artificial intelligence technology. Predictive analytics and machine learning produce visual reports for decision makers and stake-holders. Regarding cyberspace security, big data promises the opportunities in order to prevent and to detect any advanced cyber-attacks by using internal and external security data. Keywords: big data, cyber security, data science, intelligence, predictive analytics

Download Full-text

Influence of AI and Machine Learning in Insurance Sector

10.31234/osf.io/un2bc ◽

2022 ◽

Author(s):

Nitin Prajapati

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Processing ◽

Quantitative Research ◽

Predictive Analytics ◽

Big Data Analytics ◽

Insurance Companies ◽

Big Data Processing ◽

Insurance Sector ◽

Sector Analysis

The Aim of this research is to identify influence, usage, and the benefits of AI (Artificial Intelligence) and ML (Machine learning) using big data analytics in Insurance sector. Insurance sector is the most volatile industry since multiple natural influences like Brexit, pandemic, covid 19, Climate changes, Volcano interruptions. This research paper will be used to explore potential scope and use cases for AI, ML and Big data processing in Insurance sector for Automate claim processing, fraud prevention, predictive analytics, and trend analysis towards possible cause for business losses or benefits. Empirical quantitative research method is used to verify the model with the sample of UK insurance sector analysis. This research will conclude some practical insights for Insurance companies using AI, ML, Big data processing and Cloud computing for the better client satisfaction, predictive analysis, and trending.

Download Full-text

On Scalability of Distributed Machine Learning with Big Data on Apache Spark

Big Data – BigData 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-94301-5_16 ◽

2018 ◽

pp. 209-219

Author(s):

Ameen Abdel Hai ◽

Babak Forouraghi

Keyword(s):

Machine Learning ◽

Big Data ◽

Apache Spark ◽

Distributed Machine Learning

Download Full-text

Development of Big Data Predictive Analytics Model for Disease Prediction using Machine learning Technique

Journal of Medical Systems ◽

10.1007/s10916-019-1398-y ◽

2019 ◽

Vol 43 (8) ◽

Cited By ~ 5

Author(s):

R. Venkatesh ◽

C. Balasubramanian ◽

M. Kaliappan

Keyword(s):

Machine Learning ◽

Big Data ◽

Predictive Analytics ◽

Disease Prediction ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Impact of Interventional Policies Including Vaccine on COVID-19 Propagation and Socio-Economic Factors: Predictive Model Enabling Simulations Using Machine Learning and Big Data (Preprint)

10.2196/preprints.25972 ◽

2020 ◽

Author(s):

Haonan Wu ◽

Rajarshi Banerjee ◽

Indhumathi V ◽

Daniel Percy-Hughes ◽

Praveen Chougale

Keyword(s):

Machine Learning ◽

Big Data ◽

Economic Impact ◽

Predictive Model ◽

Predictive Analytics ◽

Scenario Planning ◽

Economic Factors ◽

Targeted Interventions ◽

Controllable Factors ◽

Forward Looking

BACKGROUND A novel coronavirus disease has emerged (later named COVID-19) and caused the world to enter a new reality, with many direct and indirect factors influencing it. Some are human-controllable (e.g. interventional policies, mobility and the vaccine); some are not (e.g. the weather). We have sought to test how a change in these human-controllable factors might influence two measures: the number of daily cases against economic impact. If applied at the right level and with up-to-date data to measure, policymakers would be able to make targeted interventions and measure their cost. OBJECTIVE The study aimed to provide a predictive analytics framework to model, predict and simulate COVID-19 propagation and the socio-economic impact of interventions intended to reduce the spread of the disease such as policy and/or vaccine. It allows policymakers, government representatives and business leaders to make better-informed decisions about the potential effect of various interventions with forward-looking views via scenario planning. METHODS We leveraged a recently launched opensource COVID-19 big data platform and used published research to find potentially relevant variables (features), completing feature selection and engineering via in-depth data quality checks and analytics. An advanced machine learning pipeline has been developed. It contains the ensemble models, auto/semi-auto hyperparameter tuning and customized interpretability functions. And It is self-evolving as always learned from the most recent data. The output predicts daily cases and economic factors (e.g. small business revenue) to allow simulation of interventions including a vaccine (proxied by an influenza vaccination efficacy model). This framework is built using an open-source technology stack and we make the source code being publicly available as well. RESULTS This model is self-evolving and deployed on modern machine learning architecture. It has high accuracy for trend prediction (back-tested with r-squared). We bring simulation and interpretability in the framework. It models not just daily-cases, but also socio-economic demographics. CONCLUSIONS Human behaviour and extreme natural disasters are hard to measure with data points. No model can provide an answer that is correct 100% of the time; however, with high-quality model and big data, a forward-looking view can be inferred or at least noted. This predictive model can help the policymakers to test scenarios, plan proactive actions, optimize logistics, measure the cost and create an open dialogue with the general public.

Download Full-text

The Bigger Picture: Combining Econometrics with Analytics Improves Forecasts of Movie Success

Management Science ◽

10.1287/mnsc.2020.3911 ◽

2021 ◽

Author(s):

Steven F. Lehrer ◽

Tian Xie

Keyword(s):

Machine Learning ◽

Social Media ◽

Big Data ◽

Predictive Analytics ◽

Big Data Analytics ◽

Forecast Accuracy ◽

Support Vector ◽

Significant Heterogeneity ◽

Social Media Data ◽

Media Data

There exists significant hype regarding how much machine learning and incorporating social media data can improve forecast accuracy in commercial applications. To assess if the hype is warranted, we use data from the film industry in simulation experiments that contrast econometric approaches with tools from the predictive analytics literature. Further, we propose new strategies that combine elements from each literature in a bid to capture richer patterns of heterogeneity in the underlying relationship governing revenue. Our results demonstrate the importance of social media data and value from hybrid strategies that combine econometrics and machine learning when conducting forecasts with new big data sources. Specifically, although both least squares support vector regression and recursive partitioning strategies greatly outperform dimension reduction strategies and traditional econometrics approaches in forecast accuracy, there are further significant gains from using hybrid approaches. Further, Monte Carlo experiments demonstrate that these benefits arise from the significant heterogeneity in how social media measures and other film characteristics influence box office outcomes. This paper was accepted by J. George Shanthikumar, big data analytics.

Download Full-text

Big data machine learning using apache spark MLlib

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8258338 ◽

2017 ◽

Cited By ~ 20

Author(s):

Mehdi Assefi ◽

Ehsun Behravesh ◽

Guangchi Liu ◽

Ahmad P. Tafti

Keyword(s):

Machine Learning ◽

Big Data ◽

Apache Spark

Download Full-text