Machine Learning Model for Anomaly Detection in Big Data for Health Care Applications

Author(s):  
M. G. Sharavana Kumar ◽  
V. R. Sarma Dhulipala
IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Milos Kotlar ◽  
Marija Punt ◽  
Zaharije Radivojevic ◽  
Milos Cvetanovic ◽  
Veljko Milutinovic

Machine learning is a prominent tool for getting data from large amounts of information. Whereas a good amount of machine learning analysis has targeted on increasing the accuracy and potency of coaching and reasoning algorithms, there is less attention within the equally vital issues of observing the standard of information fed into the machine learning model. The standard of huge information is far away from good. Recent studies have shown that poor quality will bring serious errors to the result of big data analysis and this could have an effect on in making additional precise results from the information. Advantages of data preprocessing within the context of ML are advanced detection of errors, model-quality improves by the usage of better data, savings in engineering hours to debug issues


2020 ◽  
Vol 3 (12) ◽  
pp. e2029230
Author(s):  
Fernando A. Wilson ◽  
Leah Zallman ◽  
José A. Pagán ◽  
Alexander N. Ortega ◽  
Yang Wang ◽  
...  

2020 ◽  
Author(s):  
Gang Luo ◽  
Claudia L Nau ◽  
William W Crawford ◽  
Michael Schatz ◽  
Robert S Zeiger ◽  
...  

BACKGROUND Asthma causes numerous hospital encounters annually, including emergency department visits and hospitalizations. To improve patient outcomes and reduce the number of these encounters, predictive models are widely used to prospectively pinpoint high-risk patients with asthma for preventive care via care management. However, previous models do not have adequate accuracy to achieve this goal well. Adopting the modeling guideline for checking extensive candidate features, we recently constructed a machine learning model on Intermountain Healthcare data to predict asthma-related hospital encounters in patients with asthma. Although this model is more accurate than the previous models, whether our modeling guideline is generalizable to other health care systems remains unknown. OBJECTIVE This study aims to assess the generalizability of our modeling guideline to Kaiser Permanente Southern California (KPSC). METHODS The patient cohort included a random sample of 70.00% (397,858/568,369) of patients with asthma who were enrolled in a KPSC health plan for any duration between 2015 and 2018. We produced a machine learning model via a secondary analysis of 987,506 KPSC data instances from 2012 to 2017 and by checking 337 candidate features to project asthma-related hospital encounters in the following 12-month period in patients with asthma. RESULTS Our model reached an area under the receiver operating characteristic curve of 0.820. When the cutoff point for binary classification was placed at the top 10.00% (20,474/204,744) of patients with asthma having the largest predicted risk, our model achieved an accuracy of 90.08% (184,435/204,744), a sensitivity of 51.90% (2259/4353), and a specificity of 90.91% (182,176/200,391). CONCLUSIONS Our modeling guideline exhibited acceptable generalizability to KPSC and resulted in a model that is more accurate than those formerly built by others. After further enhancement, our model could be used to guide asthma care management. INTERNATIONAL REGISTERED REPORT RR2-10.2196/resprot.5039


10.2196/22689 ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. e22689
Author(s):  
Gang Luo ◽  
Claudia L Nau ◽  
William W Crawford ◽  
Michael Schatz ◽  
Robert S Zeiger ◽  
...  

Background Asthma causes numerous hospital encounters annually, including emergency department visits and hospitalizations. To improve patient outcomes and reduce the number of these encounters, predictive models are widely used to prospectively pinpoint high-risk patients with asthma for preventive care via care management. However, previous models do not have adequate accuracy to achieve this goal well. Adopting the modeling guideline for checking extensive candidate features, we recently constructed a machine learning model on Intermountain Healthcare data to predict asthma-related hospital encounters in patients with asthma. Although this model is more accurate than the previous models, whether our modeling guideline is generalizable to other health care systems remains unknown. Objective This study aims to assess the generalizability of our modeling guideline to Kaiser Permanente Southern California (KPSC). Methods The patient cohort included a random sample of 70.00% (397,858/568,369) of patients with asthma who were enrolled in a KPSC health plan for any duration between 2015 and 2018. We produced a machine learning model via a secondary analysis of 987,506 KPSC data instances from 2012 to 2017 and by checking 337 candidate features to project asthma-related hospital encounters in the following 12-month period in patients with asthma. Results Our model reached an area under the receiver operating characteristic curve of 0.820. When the cutoff point for binary classification was placed at the top 10.00% (20,474/204,744) of patients with asthma having the largest predicted risk, our model achieved an accuracy of 90.08% (184,435/204,744), a sensitivity of 51.90% (2259/4353), and a specificity of 90.91% (182,176/200,391). Conclusions Our modeling guideline exhibited acceptable generalizability to KPSC and resulted in a model that is more accurate than those formerly built by others. After further enhancement, our model could be used to guide asthma care management. International Registered Report Identifier (IRRID) RR2-10.2196/resprot.5039


Sign in / Sign up

Export Citation Format

Share Document