Introduction to the Data Science Framework

2020 ◽  
pp. 1-7
Author(s):  
Juan J. Cuadrado-Gallego ◽  
Yuri Demchenko
2019 ◽  
Vol 8 (10) ◽  
pp. 1709 ◽  
Author(s):  
Tsung-Lun Tsai ◽  
Min-Hsin Huang ◽  
Chia-Yen Lee ◽  
Wu-Wei Lai

Besides the traditional indices such as biochemistry, arterial blood gas, rapid shallow breathing index (RSBI), acute physiology and chronic health evaluation (APACHE) II score, this study suggests a data science framework for extubation prediction in the surgical intensive care unit (SICU) and investigates the value of the information our prediction model provides. A data science framework including variable selection (e.g., multivariate adaptive regression splines, stepwise logistic regression and random forest), prediction models (e.g., support vector machine, boosting logistic regression and backpropagation neural network (BPN)) and decision analysis (e.g., Bayesian method) is proposed to identify the important variables and support the extubation decision. An empirical study of a leading hospital in Taiwan in 2015–2016 is conducted to validate the proposed framework. The results show that APACHE II and white blood cells (WBC) are the two most critical variables, and then the priority sequence is eye opening, heart rate, glucose, sodium and hematocrit. BPN with selected variables shows better prediction performance (sensitivity: 0.830; specificity: 0.890; accuracy 0.860) than that with APACHE II or RSBI. The value of information is further investigated and shows that the expected value of experimentation (EVE), 0.652 days (patient staying in the ICU), is saved when comparing with current clinical experience. Furthermore, the maximal value of information occurs in a failure rate around 7.1% and it reveals the “best applicable condition” of the proposed prediction model. The results validate the decision quality and useful information provided by our predicted model.


2021 ◽  
Vol 85 ◽  
pp. 101539
Author(s):  
Alessia Calafiore ◽  
Gregory Palmer ◽  
Sam Comber ◽  
Daniel Arribas-Bel ◽  
Alex Singleton

2018 ◽  
Vol 115 (50) ◽  
pp. 12638-12645 ◽  
Author(s):  
Sallie Keller ◽  
Gizem Korkmaz ◽  
Carol Robbins ◽  
Stephanie Shipp

Measuring the value of intangibles is not easy, because they are critical but usually invisible components of the innovation process. Today, access to nonsurvey data sources, such as administrative data and repositories captured on web pages, opens opportunities to create intangibles based on new sources of information and capture intangible innovations in new ways. Intangibles include ownership of innovative property and human resources that make a company unique but are currently unmeasured. For example, intangibles represent the value of a company’s databases and software, the tacit knowledge of their workers, and the investments in research and development (R&D) and design. Through two case studies, the challenges and processes to both create and measure intangibles are presented using a data science framework that outlines processes to discover, acquire, profile, clean, link, explore the fitness-for-use, and statistically analyze the data. The first case study shows that creating organizational innovation is possible by linking administrative data across business processes in a Fortune 500 company. The motivation for this research is to develop company processes capable of synchronizing their supply chain end to end while capturing dynamics that can alter the inventory, profits, and service balance. The second example shows the feasibility of measurement of innovation related to the characteristics of open source software through data scraped from software repositories that provide this information. The ultimate goal is to develop accurate and repeatable measures to estimate the value of nonbusiness sector open source software to the economy. This early work shows the feasibility of these approaches.


2019 ◽  
Vol 19 (01) ◽  
pp. 1940001 ◽  
Author(s):  
D. FRANK HSU ◽  
BRUCE S. KRISTAL ◽  
YUHAN HAO ◽  
CHRISTINA SCHWEIKERT

In the context of computing and informatics, Cognitive Diversity (CD) has been proposed to characterize the degree of dissimilarity between multiple scoring systems (MSS). As such, CD serves a role in informatics analogous to that of Pearson’s Correlation in classical statistics. Here we review MSS and explore CD’s utility in relation to the notions of correlation and distance in machine learning, ensemble methods, rank aggregation, and combinatorial fusion in both parametric score space and non-parametric rank space. Finally, we survey applications of CD in combining MSS in a variety of domains in science, technology, society, business, and management. Our study provides a new data science framework for discovery in data-rich environments.


2020 ◽  
Vol 115 ◽  
pp. 102640 ◽  
Author(s):  
Luis E. Olmos ◽  
Maria Sol Tadeo ◽  
Dimitris Vlachogiannis ◽  
Fahad Alhasoun ◽  
Xavier Espinet Alegre ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document