scholarly journals Predictors of outpatients’ no-show: big data analytics using apache spark

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.

2020 ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.


2020 ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813, 19) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques


2020 ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.


2020 ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in Spark platform. The aim of this paper is exploring factors that affect no-sow rate then can be used to formulate predictions using big data machine learning techniques.


2020 ◽  
Vol 17 (1) ◽  
pp. 92-100
Author(s):  
Balanand Jha ◽  
Kumar Abhishek ◽  
Akshay Deepak ◽  
Prakhar Shrivastav ◽  
Suraj Thakre ◽  
...  

In the age of start-ups and technical research, the demand for high-end computing power and loads of space is ever increasing. Machine learning techniques have become an inseparable part of the big data analytics. Setting up one’s own infrastructure to deal with all this vastness is usually not feasible due to high expenses and lack of desired expertise. As a solution to this problem, this paper proposes a system for Big-Data Analytics and Machine Learning based on Hadoop and Spark frameworks that also supports Operating System (OS) Rental Services. Machine Learning (ML) services provide option to use both existing inbuilt popular models or create one’s own model. OS Rental services provide users with high end infrastructure on their low-end devices on rent. The entire implementation has been made open source for ease of access and facilitating extensibility.


2019 ◽  
Vol 2019 (2) ◽  
pp. 103-112
Author(s):  
Dr. Pasumpon pandian

The recent technological growth at a rapid pace has paved way for the big data that denotes to the exponential growth of the information’s. The big data analytics are the trending concepts that have emerged as the promising technology that offers more enhanced perceptions from the huge set of the data that have been produced from the diverse areas. The review in the paper proceeds with the methods of the big-data-analytics and the machine-learning in handling, the huge set of data flow. The overview of the utilization of the machine-learning algorithms in the analytics of high voluminous data would provide with the deeper and the richer analysis of the huge set of information gathered to extract the valuable and turn it into actionable information’s. The paper is to review the part of machine-learning algorithms in the analytics of high voluminous data


Author(s):  
Mark Wallis ◽  
Kuldeep Kumar ◽  
Adrian Gepp

Credit ratings are an important metric for business managers and a contributor to economic growth. Forecasting such ratings might be a suitable application of big data analytics. As machine learning is one of the foundations of intelligent big data analytics, this chapter presents a comparative analysis of traditional statistical models and popular machine learning models for the prediction of Moody's long-term corporate debt ratings. Machine learning techniques such as artificial neural networks, support vector machines, and random forests generally outperformed their traditional counterparts in terms of both overall accuracy and the Kappa statistic. The parametric models may be hindered by missing variables and restrictive assumptions about the underlying distributions in the data. This chapter reveals the relative effectiveness of non-parametric big data analytics to model a complex process that frequently arises in business, specifically determining credit ratings.


Sign in / Sign up

Export Citation Format

Share Document