scholarly journals Task-specific information outperforms surveillance-style big data in predictive analytics

2021 ◽  
Vol 118 (14) ◽  
pp. e2020258118
Author(s):  
Andreas Bjerre-Nielsen ◽  
Valentin Kassarnig ◽  
David Dreyer Lassen ◽  
Sune Lehmann

Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19–induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students’ privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacy-invasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with “ground truth” administrative registry data can ideally allow the identification of privacy-preserving task-specific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.

2020 ◽  
Vol 53 (4) ◽  
pp. 710-711
Author(s):  
Margaret Levi ◽  
Betsy Rajala

ABSTRACTThis article responds to King and Persily’s (2019) proposal for a new model of industry–academic partnership using an independent third party to mediate between firms and academics. We believe this is a reasonable proposal for highly sensitive individual-level data, but it may not be appropriate for all types of data. We explore alternative options to their proposal, including Administrative Data Research Facilities, Data Collaboratives at GovLab, and Tech Data for Social Good Initiative at the Center for Advanced Study in the Behavioral Sciences. We believe social scientists should continue to explore, evaluate, and scale a variety of industry–academic data-sharing models.


2018 ◽  
Vol 32 (4) ◽  
pp. 288-299 ◽  
Author(s):  
Philipp Brandt ◽  
Andrew Schrank ◽  
Josh Whitford

There is more agreement on the need for advisory services to help small and midsized manufacturers keep up with the latest managerial techniques and technologies than there is on the optimal design of those services. This study reconfigures and reanalyzes administrative data from the American Manufacturing Extension Partnership, and draws on extensive interviews with “street-level bureaucrats” at Manufacturing Extension Partnership centers, to identify and compare variation in centers’ approaches to service delivery. Centers and clients who rely on third-party providers tend to have more rather than less enduring ties, suggesting that it’s direct delivery, rather than brokerage, that is associated with one-shot deals. There is evidence also that projects generate the most impact when they help “get the relationships right” and mitigate network failures.


2020 ◽  
Vol 24 (2) ◽  
pp. 65-72
Author(s):  
N. V. Komleva ◽  
D. A. Vilyavin

The purpose of the research is to develop a digital platform for creating personalized adaptive online courses that can integrate into the university’s e-learning environment. The Digital Tutor platform is designed to provide the online learning process with tools that allow for the adaptation of the content of the electronic course in accordance with the individual level of student competency through adaptive testing tools in order to achieve the level of student competency established by educational and professional standards.Materials and research methods. The research methodological base consists of methods and technologies of system analysis and knowledge management. Conclusions and provisions of the work are based on the analysis of domestic and foreign literature on the use of digital technologies in education. In preparing the article, materials obtained by the authors during the scientific and practical development of the prototype of the Digital Tutor platform were used to create personalized adaptive online courses at Plekhanov Russian University of Economics.Results. The digital platform for hosting the repository of educational objects and the online courses themselves is available on the University’s information resources with the possibility of integration into the University’s electronic educational environment. The implementation of this project will allow: students and the audience to use educational content prepared on the basis of relevant educational material, as well as to participate in its creation and discussion; to develop more dynamic and high-quality training courses that contribute to the formation of the required competencies among students and the audience; significantly reduce the burden on lecturers when working with remote students, free up more time for updating the training material, the formation of practical and design tasks; implement the concept of personalization of training - the creation of educational material aimed at a particular student; provide support for the creation and updating of their own MOOC; transform the system of continuing education to the requirements and needs of the business; respond ahead of time to the needs of society for qualified personnel for the digital economy.Conclusion. A new model for the implementation of online education has been proposed and tested, which consists in the automatic construction of online courses from the educational objects of the repository in accordance with the monitoring of its activities and a personal trajectory to achieve the required learning outcomes. The concept of transformation of the model of online education is based on the creation of a modern educational based on advanced digital, intelligent technologies. Compared with existing analogues, the project has competitive advantages in the implementation of a new business model of education, based on the availability of a mechanism for automatic updating of educational content and preparing courses on the basis of a repository of educational objects that form the necessary competencies in accordance with the Federal State Educational Standard and approved professional standards.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Brian Ayers ◽  
Toumas Sandhold ◽  
Igor Gosev ◽  
Sunil Prasad ◽  
Arman Kilic

Introduction: Prior risk models for predicting survival after orthotopic heart transplantation (OHT) have displayed only modest discriminatory capability. With increasing interest in the application of machine learning (ML) to predictive analytics in clinical medicine, this study aimed to evaluate whether modern ML techniques could improve risk prediction in OHT. Methods: Data from the United Network for Organ Sharing registry was collected for all adult patients that underwent OHT from 2000 through 2019. The primary outcome was one-year post-transplant mortality. Dimensionality reduction and data re-sampling were employed during training. The final ensemble model was created from 100 different models of each algorithm: deep neural network, logistic regression, adaboost, and random forest. Discriminatory capability was assessed using area under receiver-operating-characteristic curve (AUROC), net reclassification index (NRI), and decision curve analysis (DCA). Results: Of the 33,657 study patients, 26,926 (80%) were randomly selected for the training set and 6,731 (20%) as a separate testing set. One-year mortality was balanced between cohorts (11.0% vs 11.3%). The optimal model performance was a final ensemble ML model. This model demonstrated an improved AUROC of 0.764 (95% CI, 0.745-0.782) in the testing set as compared to the other models (Figure). Additionally, the final model demonstrated an improvement of 72.9% ±3.8% (p<0.001) in predictive performance as assessed by NRI compared to logistic regression. The DCA showed the final ensemble method improved risk prediction across the entire spectrum of predicted risk as compared to all other models (p<0.001). Conclusions: An ensemble ML model was able to achieve greater predictive performance as compared to individual ML models as well as logistic regression for predicting survival after OHT. This analysis demonstrates the promise of ML techniques in risk prediction in OHT.


PEDIATRICS ◽  
1975 ◽  
Vol 56 (2) ◽  
pp. 329-329 ◽  
Author(s):  
Hugh C. Thompson ◽  
Stanton J. Barron ◽  
John P. Connelly ◽  
Andrew Margileth ◽  
Richard Olmsted ◽  
...  

Historically, medical records have been maintamed by individual physicians to record specific information concerning patients. This information was often understandable only to the writer. The data were of outstanding events. This was thought to be sufficient documentation for patient care. Records are now read by others than the individual physicians. Groups of physicians working together often share the same patients and their records. Patients may have multiple sources of care. Our population has become more mobile which makes it necessary to transfer vast amounts of medical information. The medical record many times is the one instrument which gives a complete and continuous documentation of the patient's medical history. Third-party payers are requesting access to medical records to document services provided. Chart audit is being tested as a mechanism for evaluating physician performance. Records must reflect what the physician does in order to be useful in such an appraisal. Much clinical research on the delivery of health care depends on accurately kept records which are easily interpreted. A chart is also a legal document for the protection of the physician as well as the patient. Thus, records will be used in other than traditional ways. Proper confidentiality must be maintained when such uses are necessary. Physicians generally agree as to the essential content of a medical record. However, there is little unanimity as to the structure of the chart. No one system of keeping records is now appropriate for all situations. The maintenance of adequate charts requires additional cost in both time and money.


Data Mining ◽  
2013 ◽  
pp. 1794-1818
Author(s):  
William H. Horsthemke ◽  
Daniela S. Raicu ◽  
Jacob D. Furst ◽  
Samuel G. Armato

Evaluating the success of computer-aided decision support systems depends upon a reliable reference standard, a ground truth. The ideal gold standard is expected to result from the marking, labeling, and rating by domain experts of the image of interest. However experts often disagree, and this lack of agreement challenges the development and evaluation of image-based feature prediction of expert-defined “truth.” The following discussion addresses the success and limitation of developing computer-aided models to characterize suspicious pulmonary nodules based upon ratings provided by multiple expert radiologists. These prediction models attempt to bridge the semantic gap between images and medically-meaningful, descriptive opinions about visual characteristics of nodules. The resultant computer-aided diagnostic characterizations (CADc) are directly usable for indexing and retrieving in content-based medical image retrieval and supporting computer-aided diagnosis. The predictive performance of CADc models are directly related to the extent of agreement between radiologists; the models better predict radiologists’ opinions when radiologists agree more with each other about the characteristics of nodules.


Author(s):  
Joslyn Barnhart

This chapter focuses on national humiliation and the triggering in the 1880s of the Scramble for Africa, an unprecedented land grab by European great powers. It demonstrates that individual-level support for aggressive policies, both vengeful in nature and directed at third-party states, increased within states that are confronted with potentially humiliating international events. The chapter reviews two international events that played an essential role in generating the competitive dynamics of the Scramble for Africa during the 1880s. The first event involved an instance of unexpected national failure, while the second event involved the denial of great power privileges by a higher status state. It also describes the acts of territorial conquest in Africa by France and Germany that generated status and security concerns within Italy and Britain, which led both states to adopt expansionary policies they likely would not have pursued otherwise.


2019 ◽  
Vol 116 (6) ◽  
pp. 2033-2038 ◽  
Author(s):  
Yang Yang ◽  
Nitesh V. Chawla ◽  
Brian Uzzi

Many leaders today do not rise through the ranks but are recruited directly out of graduate programs into leadership positions. We use a quasi-experiment and instrumental-variable regression to understand the link between students’ graduate school social networks and placement into leadership positions of varying levels of authority. Our data measure students’ personal characteristics and academic performance, as well as their social network information drawn from 4.5 million email correspondences among hundreds of students who were placed directly into leadership positions. After controlling for students’ personal characteristics, work experience, and academic performance, we find that students’ social networks strongly predict placement into leadership positions. For males, the higher a male student’s centrality in the school-wide network, the higher his leadership-job placement will be. Men with network centrality in the top quartile have an expected job placement level that is 1.5 times greater than men in the bottom quartile of centrality. While centrality also predicts women’s placement, high-placing women students have one thing more: an inner circle of predominantly female contacts who are connected to many nonoverlapping third-party contacts. Women with a network centrality in the top quartile and a female-dominated inner circle have an expected job placement level that is 2.5 times greater than women with low centrality and a male-dominated inner circle. Women who have networks that resemble those of high-placing men are low-placing, despite having leadership qualifications comparable to high-placing women.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Shifeng Niu ◽  
Guiqiang Li

The approaches monitoring fatigue driving are studied because of the fact that traffic accidents caused by fatigue driving often have fatal consequences. This paper proposes a new approach to predict driving fatigue using location data of commercial dangerous goods truck (CDT) and driver’s yawn data. The proposed location data are from an existing dataset of a transportation company that was collected from 166 vehicles and drivers in an actual driving environment. Six different categories of the predictor set are considered as fatigue-related indexes including travel time, day of week, road type, continuous driving time, average velocity, and overall mileage. The driver’s yawn data are used as a proxy for ground truth for the classification algorithm. From the six different categories of the predictor set, we obtain a set of 17 predictor variables to train logistic regression, neural network, and random forest classifiers. Then, we evaluate the predictive performance of the classifiers based on three indexes: accuracy, F1-measure, and area under the ROC curve (AUROC). The results show that the random forest is more suitable for predicting fatigue driving using location data according to its best accuracy (74.18%), F1-measure (62.02%), and AUROC (0.8059). Finally, we analyze the relationship between fatigue driving and driving environment according to variable importance described by random forest. In summary, our results obviously exhibit the potential of location data for reducing the accident rate caused by fatigue driving in practice.


2020 ◽  
Vol 7 (1) ◽  
pp. 205395172093514 ◽  
Author(s):  
Laurence Barry ◽  
Arthur Charpentier

The aim of this article is to assess the impact of Big Data technologies for insurance ratemaking, with a special focus on motor products.The first part shows how statistics and insurance mechanisms adopted the same aggregate viewpoint. It made visible regularities that were invisible at the individual level, further supporting the classificatory approach of insurance and the assumption that all members of a class are identical risks. The second part focuses on the reversal of perspective currently occurring in data analysis with predictive analytics, and how this conceptually contradicts the collective basis of insurance. The tremendous volume of data and the personalization promise through accurate individual prediction indeed deeply shakes the homogeneity hypothesis behind pooling. The third part attempts to assess the extent of this shift in motor insurance. Onboard devices that collect continuous driving behavioural data could import this new paradigm into these products. An examination of the current state of research on models with telematics data shows however that the epistemological leap, for now, has not happened.


Sign in / Sign up

Export Citation Format

Share Document