Principled Statistical Inference in Data Science

2018 ◽  
pp. 21-36
Author(s):  
Todd A. Kuffner ◽  
G. Alastair Young
Author(s):  
Marina Dobrota

Book review of: Computer Age Statistical Inference: Algorithms, Evidence, and Data Science by Bradley Efron and Trevor Hastie, Cambridge University Press, 2016, Series: Institute of Mathematical Statistics Monographs (5), 495pp., ISBN13: 9781107149892, ISBN10: 1107149894, Online ISBN: 9781316576533, DOI:10.1017/CBO9781316576533


Author(s):  
Ahmed Al-Imam ◽  
Usama Khalid ◽  
Shahad Al-Qaisi ◽  
Nawfal Al-Hadithi ◽  
Dawoude Kaouche

BACKGROUND Epidemiological sciences have been evolving at an exponential rate paralleled only by the comparable growth within the discipline of data science. Digital epidemiological studies are playing a vital role in medical science analytics for the past few decades. To date, there are no published attempts at deploying the use of real-time analytics in connection with the disciplines of Dentistry or Medicine. AIMS AND OBJECTIVES We deployed a real-time statistical analysis in connection with topics in Dental Anatomy and Dental Pathology represented by the maxillary sinus, posterior maxillary teeth, related oral pathology. The purpose is to infer the digital epidemiology based on a continuous stream of raw data retrieved from Google Trends database. MATERIALS AND METHODS Statistical analysis was carried out via Microsoft Excel 2016 and SPSS version 24. Google Trends database was used to retrieve data for digital epidemiology. Real-time analysis and the statistical inference were based on encoding a programming script using Python high-level programming language. A systematic review of the literature was carried out via PubMed-NCBI, the Cochrane Library, and Elsevier databases. RESULTS The comprehensive review of databases of the literature, based on specific keywords search, yielded 491813 published studies. These were distributed as 488884 (PubMed-NCBI), 1611 (the Cochrane Library), and 1318 (Elsevier). However, there was no single study attempting real-time analytics. Nevertheless, we succeeded in achieving an automated real-time stream of data accompanied by a statistical inference based on data extrapolated from Google Trends. CONCLUSION Real-time analytics are of considerable impact when implemented in biological and life sciences as they will tremendously reduce the required resources for research. Predictive analytics, based on artificial neural networks and machine learning algorithms, can be the next step to be deployed in continuation of the real-time systems to prognosticate changes in the temporal trends and the digital epidemiology of phenomena of interest.


Author(s):  
Ahmed Al-Imam ◽  
Usama Khalid ◽  
Nawfal Al-Hadithi ◽  
Dawoude Kaouche

BACKGROUND Epidemiological sciences have been evolving at an exponential rate paralleled only by the comparable growth within the discipline of data science. Digital epidemiological studies are playing a vital role in medical science analytics for the past few decades. To date, there are no published attempts at deploying the use of real-time analytics in connection with the disciplines of Dentistry or Medicine. AIMS AND OBJECTIVES We deployed a real-time statistical analysis in connection with topics in Dental Anatomy and Dental Pathology represented by the maxillary sinus, posterior maxillary teeth, related oral pathology. The purpose is to infer the digital epidemiology based on a continuous stream of raw data retrieved from Google Trends database. MATERIALS AND METHODS Statistical analysis was carried out via Microsoft Excel 2016 and SPSS version 24. Google Trends database was used to retrieve data for digital epidemiology. Real-time analytics and the statistical inference were based on encoding a programming script using Python high-level programming language. A systematic review of the literature was carried out via PubMed-NCBI, the Cochrane Library, and Elsevier databases. RESULTS The comprehensive review of databases of the literature, based on specific keywords search, yielded 491813 published studies. These were distributed as 488884 (PubMed-NCBI), 1611 (the Cochrane Library), and 1318 (Elsevier). However, there was no single study attempting real-time analytics. Nevertheless, we succeeded in achieving an automated real-time stream of data accompanied by a statistical inference based on data extrapolated from Google Trends. CONCLUSION Real-time analytics are of considerable impact when implemented in biological and life sciences as they will tremendously reduce the required resources for research. Predictive analytics, based on artificial neural networks and machine learning algorithms, can be the next step to be deployed in continuation of the real-time systems to prognosticate changes in the temporal trends and the digital epidemiology of phenomena of interest.


2019 ◽  
Vol 1 (3) ◽  
pp. 945-961 ◽  
Author(s):  
Frank Emmert-Streib ◽  
Matthias Dehmer

Statistical hypothesis testing is among the most misunderstood quantitative analysis methods from data science. Despite its seeming simplicity, it has complex interdependencies between its procedural components. In this paper, we discuss the underlying logic behind statistical hypothesis testing, the formal meaning of its components and their connections. Our presentation is applicable to all statistical hypothesis tests as generic backbone and, hence, useful across all application domains in data science and artificial intelligence.


2020 ◽  
Vol 8 (3) ◽  
Author(s):  
Philip S Chodrow

Abstract Many empirical networks are intrinsically polyadic, with interactions occurring within groups of agents of arbitrary size. There are, however, few flexible null models that can support statistical inference in polyadic networks. We define a class of null random hypergraphs that hold constant both the node degree and edge dimension sequences, thereby generalizing the classical dyadic configuration model. We provide a Markov Chain Monte Carlo scheme for sampling from these models and discuss connections and distinctions between our proposed models and previous approaches. We then illustrate the application of these models through a triplet of data-analytic vignettes. We start with two classical topics in network science—triadic clustering and degree-assortativity. In each, we emphasize the importance of randomizing over hypergraph space rather than projected graph space, showing that this choice can dramatically alter both the quantitative and qualitative outcomes of statistical inference. We then define and study the edge intersection profile of a hypergraph as a measure of higher-order correlation between edges, and derive asymptotic approximations for this profile under the stub-labeled null. We close with suggestions for multiple avenues of future work. Taken as a whole, our experiments emphasize the ability of explicit, statistically grounded polyadic modelling to significantly enhance the toolbox of network data science.


2021 ◽  
Author(s):  
Bradley Efron ◽  
Trevor Hastie

The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and influence. 'Data science' and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? How does it all fit together? Now in paperback and fortified with exercises, this book delivers a concentrated course in modern statistical thinking. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov Chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. Each chapter ends with class-tested exercises, and the book concludes with speculation on the future direction of statistics and data science.


Sign in / Sign up

Export Citation Format

Share Document