Big data Predictive Analytics for Apache Spark using Machine Learning

Author(s):  
Muhammad Junaid ◽  
Shiraz Ali Wagan ◽  
Nawab Muhammad Faseeh Qureshi ◽  
Choon Sung Nam ◽  
Dong Ryeol Shin
Web Services ◽  
2019 ◽  
pp. 105-126
Author(s):  
N. Nawin Sona

This chapter aims to give an overview of the wide range of Big Data approaches and technologies today. The data features of Volume, Velocity, and Variety are examined against new database technologies. It explores the complexity of data types, methodologies of storage, access and computation, current and emerging trends of data analysis, and methods of extracting value from data. It aims to address the need for clarity regarding the future of RDBMS and the newer systems. And it highlights the methods in which Actionable Insights can be built into public sector domains, such as Machine Learning, Data Mining, Predictive Analytics and others.


2017 ◽  
Vol 7 (2) ◽  
Author(s):  
Dicky R. M. Nainggolan

<p><em><strong>Abstract</strong> – Data are the prominent elements in scientific researches and approaches. Data Science methodology is used to select and to prepare enormous numbers of data for further processing and analysing. Big Data technology collects vast amount of data from many sources in order to exploit the information and to visualise trend or to discover a certain phenomenon in the past, present, or in the future at high speed processing capability. Predictive analytics provides in-depth analytical insights and the emerging of machine learning brings the data analytics to a higher level by processing raw data with artificial intelligence technology. Predictive analytics and machine learning produce visual reports for decision makers and stake-holders. Regarding cyberspace security, big data promises the opportunities in order to prevent and to detect any advanced cyber-attacks by using internal and external security data.</em></p><p><br /><em><strong>Keywords</strong>: Big Data, Cyber Security, Data Science, Intelligence, Predictive Analytics</em></p><p><br /><em><strong>Abstrak</strong> – Data merupakan unsur terpenting dalam setiap penelitian dan pendekatan ilmiah. Metodologi sains data digunakan untuk memilah, memilih dan mempersiapkan sejumlah data untuk diproses dan dianalisis. Teknologi big data mampu mengumpulkan data dengan sangat banyak dari berbagai sumber dengan tujuan untuk mendapatkan informasi dengan visualisasi tren atau menyingkapkan pengetahuan dari suatu peristiwa yang terjadi baik dimasa lalu, sekarang, maupun akan datang dengan kecepatan pemrosesan data sangat tinggi. Analisis prediktif memberikan wawasan analisis lebih dalam dan kemunculan machine learning membawa analisis data ke tingkat yang lebih tinggi dengan bantuan teknologi kecerdasan buatan dalam tahap pemrosesan data mentah. Analisis prediktif dan machine learning menghasilkan laporan berbentuk visual untuk pengambil keputusan dan pemangku kepentingan. Berkenaan dengan keamanan siber, big data menjanjikan kesempatan dalam rangka untuk mencegah dan mendeteksi setiap serangan canggih siber dengan memanfaatkan data keamanan internal dan eksternal.</em></p><p><br /><strong>Kata Kunci</strong>: Analisis Prediktif, Big Data, Intelijen, Keamanan Siber, Sains Data</p>


2017 ◽  
Vol 7 (2) ◽  
Author(s):  
Dicky R. M. Nainggolan

<p><strong>Abstrak</strong> – Data merupakan unsur terpenting dalam setiap penelitian dan pendekatan ilmiah. Metodologi sains data digunakan untuk memilah, memilih dan mempersiapkan sejumlah data untuk diproses dan dianalisis. Teknologi big data mampu mengumpulkan data dengan sangat banyak dari berbagai sumber dengan tujuan untuk mendapatkan informasi dengan visualisasi tren atau menyingkapkan pengetahuan dari suatu peristiwa yang terjadi baik dimasa lalu, sekarang, maupun akan datang dengan kecepatan pemrosesan data sangat tinggi. Analisis prediktif memberikan wawasan analisis lebih dalam dan kemunculan machine learning membawa analisis data ke tingkat yang lebih tinggi dengan bantuan teknologi kecerdasan buatan dalam tahap pemrosesan data mentah. Analisis prediktif dan machine learning menghasilkan laporan berbentuk visual untuk pengambil keputusan dan pemangku kepentingan. Berkenaan dengan keamanan siber, big data menjanjikan kesempatan dalam rangka untuk mencegah dan mendeteksi setiap serangan canggih siber dengan memanfaatkan data keamanan internal dan eksternal.</p><p><br /><strong>Kata Kunci</strong>: analisis prediktif, big data, intelijen, keamanan siber, sains data</p><p><strong><em>Abstract</em> </strong>– Data are the prominent elements in scientific researches and approaches. Data Science methodology is used to select and to prepare enormous numbers of data for further processing and analysing. Big Data technology collects vast amount of data from many sources in order to exploit the information and to visualise trend or to discover a certain phenomenon in the past, present, or in the future at high speed processing capability. Predictive analytics provides in-depth analytical insights and the emerging of machine learning brings the data analytics to a higher level by processing raw data with artificial intelligence technology. Predictive analytics and machine learning produce visual reports for decision makers and stake-holders. Regarding cyberspace security, big data promises the opportunities in order to prevent and to detect any advanced cyber-attacks by using internal and external security data.</p><p><br /><strong><em>Keywords</em></strong>: big data, cyber security, data science, intelligence, predictive analytics</p>


2022 ◽  
Author(s):  
Nitin Prajapati

The Aim of this research is to identify influence, usage, and the benefits of AI (Artificial Intelligence) and ML (Machine learning) using big data analytics in Insurance sector. Insurance sector is the most volatile industry since multiple natural influences like Brexit, pandemic, covid 19, Climate changes, Volcano interruptions. This research paper will be used to explore potential scope and use cases for AI, ML and Big data processing in Insurance sector for Automate claim processing, fraud prevention, predictive analytics, and trend analysis towards possible cause for business losses or benefits. Empirical quantitative research method is used to verify the model with the sample of UK insurance sector analysis. This research will conclude some practical insights for Insurance companies using AI, ML, Big data processing and Cloud computing for the better client satisfaction, predictive analysis, and trending.


2020 ◽  
Author(s):  
Haonan Wu ◽  
Rajarshi Banerjee ◽  
Indhumathi V ◽  
Daniel Percy-Hughes ◽  
Praveen Chougale

BACKGROUND A novel coronavirus disease has emerged (later named COVID-19) and caused the world to enter a new reality, with many direct and indirect factors influencing it. Some are human-controllable (e.g. interventional policies, mobility and the vaccine); some are not (e.g. the weather). We have sought to test how a change in these human-controllable factors might influence two measures: the number of daily cases against economic impact. If applied at the right level and with up-to-date data to measure, policymakers would be able to make targeted interventions and measure their cost. OBJECTIVE The study aimed to provide a predictive analytics framework to model, predict and simulate COVID-19 propagation and the socio-economic impact of interventions intended to reduce the spread of the disease such as policy and/or vaccine. It allows policymakers, government representatives and business leaders to make better-informed decisions about the potential effect of various interventions with forward-looking views via scenario planning. METHODS We leveraged a recently launched opensource COVID-19 big data platform and used published research to find potentially relevant variables (features), completing feature selection and engineering via in-depth data quality checks and analytics. An advanced machine learning pipeline has been developed. It contains the ensemble models, auto/semi-auto hyperparameter tuning and customized interpretability functions. And It is self-evolving as always learned from the most recent data. The output predicts daily cases and economic factors (e.g. small business revenue) to allow simulation of interventions including a vaccine (proxied by an influenza vaccination efficacy model). This framework is built using an open-source technology stack and we make the source code being publicly available as well. RESULTS This model is self-evolving and deployed on modern machine learning architecture. It has high accuracy for trend prediction (back-tested with r-squared). We bring simulation and interpretability in the framework. It models not just daily-cases, but also socio-economic demographics. CONCLUSIONS Human behaviour and extreme natural disasters are hard to measure with data points. No model can provide an answer that is correct 100% of the time; however, with high-quality model and big data, a forward-looking view can be inferred or at least noted. This predictive model can help the policymakers to test scenarios, plan proactive actions, optimize logistics, measure the cost and create an open dialogue with the general public.


2021 ◽  
Author(s):  
Steven F. Lehrer ◽  
Tian Xie

There exists significant hype regarding how much machine learning and incorporating social media data can improve forecast accuracy in commercial applications. To assess if the hype is warranted, we use data from the film industry in simulation experiments that contrast econometric approaches with tools from the predictive analytics literature. Further, we propose new strategies that combine elements from each literature in a bid to capture richer patterns of heterogeneity in the underlying relationship governing revenue. Our results demonstrate the importance of social media data and value from hybrid strategies that combine econometrics and machine learning when conducting forecasts with new big data sources. Specifically, although both least squares support vector regression and recursive partitioning strategies greatly outperform dimension reduction strategies and traditional econometrics approaches in forecast accuracy, there are further significant gains from using hybrid approaches. Further, Monte Carlo experiments demonstrate that these benefits arise from the significant heterogeneity in how social media measures and other film characteristics influence box office outcomes. This paper was accepted by J. George Shanthikumar, big data analytics.


Author(s):  
Mehdi Assefi ◽  
Ehsun Behravesh ◽  
Guangchi Liu ◽  
Ahmad P. Tafti

Sign in / Sign up

Export Citation Format

Share Document