scholarly journals Early Prediction of Movie Success using Machine Learning Models

2021 ◽  
Vol 183 (44) ◽  
pp. 14-21
Author(s):  
D.M.L. Dissanayake ◽  
V.G.T.N. Vidanagama
2021 ◽  
Author(s):  
Celia ALVAREZ-ROMERO ◽  
Alicia MARTÍNEZ-GARCÍA ◽  
Jara Eloisa TERNERO-VEGA ◽  
Pablo DÍAZ-JIMÉNEZ ◽  
Carlos JIMÉNEZ-DE-JUAN ◽  
...  

BACKGROUND Due to the nature of health data, its sharing and reuse for research are limited by legal, technical and ethical implications. In this sense, to address that challenge, and facilitate and promote the discovery of scientific knowledge, the FAIR (Findable, Accessible, Interoperable, and Reusable) principles help organizations to share research data in a secure, appropriate and useful way for other researchers. OBJECTIVE The objective of this study was the FAIRification of health research existing datasets and applying a federated machine learning architecture on top of the FAIRified datasets of different health research performing organizations. The whole FAIR4Health solution was validated through the assessment of the generated model for real-time prediction of 30-days readmission risk in patients with Chronic Obstructive Pulmonary Disease (COPD). METHODS The application of the FAIR principles in health research datasets in three different health care settings enabled a retrospective multicenter study for the generation of federated machine learning models, aiming to develop the early prediction model for 30-days readmission risk in COPD patients. This prediction model was implemented upon the FAIR4Health platform and, finally, an observational prospective study with 30-days follow-up was carried out in two health care centers from different countries. The same inclusion and exclusion criteria were used in both retrospective and prospective parts of the study. RESULTS The prediction model for the 30-days hospital readmission risk was trained using the retrospective data of 4.944 COPD patients. The assessment of the prediction model was performed using the data of 100 recruited (22 from Spain and 78 from Serbia) out of 2070 observed (records viewed) patients in total for the observational prospective study from April 2021 to September 2021. The significant accuracy (0.98) and precision (0.25) of the prediction model generated upon the FAIR4Health platform was observed and, as a result, the generated prediction of 30-day readmission risk was confirmed in 87% of the cases. CONCLUSIONS A clinical validation was demonstrated through the implementation of federated machine learning models on top of the FAIRified datasets from different health research performing organizations, providing an assessment for predicting 30-days readmission risk in COPD patients. This demonstration allowed to state the relevance and need of implementing a FAIR data policy to facilitate data sharing and reuse in health research.


2021 ◽  
Vol 8 ◽  
Author(s):  
Longxiang Su ◽  
Zheng Xu ◽  
Fengxiang Chang ◽  
Yingying Ma ◽  
Shengjun Liu ◽  
...  

Background: Early prediction of the clinical outcome of patients with sepsis is of great significance and can guide treatment and reduce the mortality of patients. However, it is clinically difficult for clinicians.Methods: A total of 2,224 patients with sepsis were involved over a 3-year period (2016–2018) in the intensive care unit (ICU) of Peking Union Medical College Hospital. With all the key medical data from the first 6 h in the ICU, three machine learning models, logistic regression, random forest, and XGBoost, were used to predict mortality, severity (sepsis/septic shock), and length of ICU stay (LOS) (>6 days, ≤ 6 days). Missing data imputation and oversampling were completed on the dataset before introduction into the models.Results: Compared to the mortality and LOS predictions, the severity prediction achieved the best classification results, based on the area under the operating receiver characteristics (AUC), with the random forest classifier (sensitivity = 0.65, specificity = 0.73, F1 score = 0.72, AUC = 0.79). The random forest model also showed the best overall performance (mortality prediction: sensitivity = 0.50, specificity = 0.84, F1 score = 0.66, AUC = 0.74; LOS prediction: sensitivity = 0.79, specificity = 0.66, F1 score = 0.69, AUC = 0.76) among the three models. The predictive ability of the SOFA score itself was inferior to that of the above three models.Conclusions: Using the random forest classifier in the first 6 h of ICU admission can provide a comprehensive early warning of sepsis, which will contribute to the formulation and management of clinical decisions and the allocation and management of resources.


2020 ◽  
Author(s):  
Xi Yang ◽  
Qian Li ◽  
Yonghui Wu ◽  
Jiang Bian ◽  
Tianchen Lyu ◽  
...  

AbstractAlzheimer’s disease (AD) and AD-related dementias (ADRD) are a class of neurodegenerative diseases affecting about 5.7 million Americans. There is no cure for AD/ADRD. Current interventions have modest effects and focus on attenuating cognitive impairment. Detection of patients at high risk of AD/ADRD is crucial for timely interventions to modify risk factors and primarily prevent cognitive decline and dementia, and thus to enhance the quality of life and reduce health care costs. This study seeks to investigate both knowledge-driven (where domain experts identify useful features) and data-driven (where machine learning models select useful features among all available data elements) approaches for AD/ADRD early prediction using real-world electronic health records (EHR) data from the University of Florida (UF) Health system. We identified a cohort of 59,799 patients and examined four widely used machine learning algorithms following a standard case-control study. We also examined the early prediction of AD/ADRD using patient information 0-years, 1-year, 3-years, and 5-years before the disease onset date. The experimental results showed that models based on the Gradient Boosting Trees (GBT) achieved the best performance for the data-driven approach and the Random Forests (RF) achieved the best performance for the knowledge-driven approach. Among all models, GBT using a data-driven approach achieved the best area under the curve (AUC) score of 0.7976, 0.7192, 0.6985, and 0.6798 for 0, 1, 3, 5-years prediction, respectively. We also examined the top features identified by the machine learning models and compared them with the knowledge-driven features identified by domain experts. Our study demonstrated the feasibility of using electronic health records for the early prediction of AD/ADRD and discovered potential challenges for future investigations.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


Sign in / Sign up

Export Citation Format

Share Document