scholarly journals Big Data Predictive Modeling and Analytics

2017 ◽  
pp. 117-150
Author(s):  
Mydhili K. Nair ◽  
Arjun Rao ◽  
Mipsa Patel
Keyword(s):  
Big Data ◽  
2019 ◽  
pp. 089443931988845 ◽  
Author(s):  
Alexander Christ ◽  
Marcus Penthin ◽  
Stephan Kröner

Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE.


2017 ◽  
Vol 47 (3) ◽  
pp. 943-961 ◽  
Author(s):  
Yanwei Zhang

AbstractWhile Bayesian methods have attracted considerable interest in actuarial science, they are yet to be embraced in large-scaled insurance predictive modeling applications, due to inefficiencies of Bayesian estimation procedures. The paper presents an efficient method that parallelizes Bayesian computation using distributed computing on Apache Spark across a cluster of computers. The distributed algorithm dramatically boosts the speed of Bayesian computation and expands the scope of applicability of Bayesian methods in insurance modeling. The empirical analysis applies a Bayesian hierarchical Tweedie model to a big data of 13 million insurance claim records. The distributed algorithm achieves as much as 65 times performance gain over the non-parallel method in this application. The analysis demonstrates that Bayesian methods can be of great value to large-scaled insurance predictive modeling.


2017 ◽  
Vol 23 (3) ◽  
pp. 1585-1588 ◽  
Author(s):  
Jung-Hyok Kwon ◽  
Hwi-Ho Lee ◽  
Eui-Jik Kim

Sign in / Sign up

Export Citation Format

Share Document