scholarly journals Explanation, prediction, and causality: Three sides of the same coin?

Author(s):  
Duncan J Watts ◽  
Emorie D Beck ◽  
Elisa Jayne Bienenstock ◽  
Jake Bowers ◽  
Aaron Frank ◽  
...  

In this essay we make four interrelated points. First, we reiterate previous arguments (Kleinberg et al 2015) that forecasting problems are more common in social science than is often appreciated. From this observation it follows that social scientists should care about predictive accuracy in addition to unbiased or consistent estimation of causal relationships. Second, we argue that social scientists should be interested in prediction even if they have no interest in forecasting per se. Whether they do so explicitly or not, that is, causal claims necessarily make predictions; thus it is both fair and arguably useful to hold them accountable for the accuracy of the predictions they make. Third, we argue that prediction, used in either of the above two senses, is a useful metric for quantifying progress. Important differences between social science explanations and machine learning algorithms notwithstanding, social scientists can still learn from approaches like the Common Task Framework (CTF) which have successfully driven progress in certain fields of AI over the past 30 years (Donoho, 2015). Finally, we anticipate that as the predictive performance of forecasting models and explanations alike receives more attention, it will become clear that it is subject to some upper limit which lies well below deterministic accuracy for many applications of interest (Martin et al 2016). Characterizing the properties of complex social systems that lead to higher or lower predictive limits therefore poses an interesting challenge for computational social science.

2019 ◽  
Author(s):  
Donald Salami ◽  
Carla Alexandra Sousa ◽  
Maria do Rosário Oliveira Martins ◽  
César Capinha

ABSTRACTThe geographical spread of dengue is a global public health concern. This is largely mediated by the importation of dengue from endemic to non-endemic areas via the increasing connectivity of the global air transport network. The dynamic nature and intrinsic heterogeneity of the air transport network make it challenging to predict dengue importation.Here, we explore the capabilities of state-of-the-art machine learning algorithms to predict dengue importation. We trained four machine learning classifiers algorithms, using a 6-year historical dengue importation data for 21 countries in Europe and connectivity indices mediating importation and air transport network centrality measures. Predictive performance for the classifiers was evaluated using the area under the receiving operating characteristic curve, sensitivity, and specificity measures. Finally, we applied practical model-agnostic methods, to provide an in-depth explanation of our optimal model’s predictions on a global and local scale.Our best performing model achieved high predictive accuracy, with an area under the receiver operating characteristic score of 0.94 and a maximized sensitivity score of 0.88. The predictor variables identified as most important were the source country’s dengue incidence rate, population size, and volume of air passengers. Network centrality measures, describing the positioning of European countries within the air travel network, were also influential to the predictions.We demonstrated the high predictive performance of a machine learning model in predicting dengue importation and the utility of the model-agnostic methods to offer a comprehensive understanding of the reasons behind the predictions. Similar approaches can be utilized in the development of an operational early warning surveillance system for dengue importation.


2020 ◽  
pp. 004912412091493 ◽  
Author(s):  
Alex Koch ◽  
Felix Speckmann ◽  
Christian Unkelbach

Measuring the similarity of stimuli is of great interest to a variety of social scientists. Spatial arrangement by dragging and dropping “more similar” targets closer together on the computer screen is a precise and efficient method to measure stimulus similarity. We present Qualtrics-spatial arrangement method (Q-SpAM), a feature-rich and user-friendly online version of spatial arrangement. Combined with crowdsourcing platforms, Q-SpAM provides fast and affordable access to similarity data even for large stimulus sets. Participants may spatially arrange up to 100 words or images, randomly selected targets, self-selected targets, self-generated targets, and targets self-marked in different colors. These and other Q-SpAM features can be combined. We exemplify how to collect, process, and visualize similarity data with Q-SpAM and provide R and Excel scripts to do so. We then illustrate Q-SpAM’s versatility for social science, concluding that Q-SpAM is a reliable and valid method to measure the similarity of lots of stimuli with little effort.


2019 ◽  
Vol 5 ◽  
pp. 237802311881177 ◽  
Author(s):  
Stephen McKay

Computer science has devised leading methods for predicting variables; can social science compete? The author sets out a social scientific approach to the Fragile Families Challenge. Key insights included new variables constructed according to theory (e.g., a measure of shame relating to hardship), lagged values of the target variables, using predicted values of certain outcomes to inform others, and validated scales rather than individual variables. The models were competitive: a four-variable logistic regression model was placed second for predicting layoffs, narrowly beaten by a model using all the available variables (>10,000) and an ensemble of algorithms. Similarly, a relatively small random forest model (25 variables) was ranked seventh in predicting material hardship. However, a similar approach overfitted the prediction of grit. Machine learning approaches proved superior to linear regression for modeling the continuous outcomes. Overall, social scientists can contribute to predictive performance while benefiting from learning more about data science methods.


2021 ◽  
Author(s):  
Nate Breznau

Machine learning and other computer-driven prediction models are one of the fastest growing trends in computational social science. These methods and approaches were developed in computer science and with different goals and epistemologies than those in social science. The most obvious difference being a focus on prediction versus explanation. Predictive modeling offers great potential for improving research and theory development, but its adoption poses some challenges and creates new problems. For this reason, Hofman et al. (2021) published recommendations for more effective integration of predictive modeling into social science. In this communication I review their recommendations and expand on some additional concerns related to current practices and whether prediction can effectively serve the goals of most social scientists. Overall, I argue they provide a sound set of guidelines and a classification scheme that will serve those of us working in computational social science.


2019 ◽  
pp. 089443931984837
Author(s):  
Nandana Sengupta ◽  
Nati Srebro ◽  
James Evans

In the last decade, the use of simple rating and comparison surveys has proliferated on social and digital media platforms to fuel recommendations. These simple surveys and their extrapolation with machine learning algorithms such as matrix factorization shed light on user preferences over large and growing pools of items such as movies, songs, and ads. Social scientists also have a long history of measuring perceptions, preferences, and opinions, typically often over smaller, discrete item sets with exhaustive rating or ranking surveys. This article introduces simple surveys for social science application. We ran experiments to compare the predictive accuracy of both individual and aggregate comparative assessments using four types of simple surveys—pairwise comparisons (PCs) and ratings on 2, 5, and continuous point scales in three contexts—perceived safety of Google Street View images, likability of artwork, and hilarity of animal GIFs. Across contexts, we find that continuous scale ratings best predict individual assessments but consume the most time and cognitive effort. Binary choice surveys are quick and best predict aggregate assessments, useful for collective decision tasks, but poorly predict personalized preferences, for which they are currently used by Netflix to recommend movies. PCs, by contrast, successfully predict personal assessments but poorly predict aggregate assessments despite being widely used to crowdsource ideas and collective preferences. We also demonstrate how findings from these surveys can be visualized in a low-dimensional space to reveal distinct respondent interpretations of questions asked in each context. We conclude by reflecting on differences between sparse, incomplete “simple surveys” and their traditional survey counterparts in terms of efficiency, information elicited, and settings in which knowing less about more may be critical for social science.


What has social science learned about the common good? Would humanists even want to alter their definitions of the common good based on what social scientists say? In this volume, six social scientists—from economics, political science, sociology, and policy analysis—speak about what their disciplines have to contribute to discussions within Catholic social thought about the common good. None of those disciplines talks directly about “the common good”; but nearly all social scientists believe that their scientific work can help make the world a better place, and each social science does operate with some notion of human flourishing. Two theologians examine the insights of social science, including such challenging assertions that theology is overly irenic, that it does not appreciate unplanned order, and that it does not grasp how in some situations contention among self-interested nations and persons can be an effective path to the common good. In response, one theologian explicitly includes contention along with cooperation in his (altered) definition of the common good.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Donald Salami ◽  
Carla Alexandra Sousa ◽  
Maria do Rosário Oliveira Martins ◽  
César Capinha

Abstract The geographical spread of dengue is a global public health concern. This is largely mediated by the importation of dengue from endemic to non-endemic areas via the increasing connectivity of the global air transport network. The dynamic nature and intrinsic heterogeneity of the air transport network make it challenging to predict dengue importation. Here, we explore the capabilities of state-of-the-art machine learning algorithms to predict dengue importation. We trained four machine learning classifiers algorithms, using a 6-year historical dengue importation data for 21 countries in Europe and connectivity indices mediating importation and air transport network centrality measures. Predictive performance for the classifiers was evaluated using the area under the receiving operating characteristic curve, sensitivity, and specificity measures. Finally, we applied practical model-agnostic methods, to provide an in-depth explanation of our optimal model’s predictions on a global and local scale. Our best performing model achieved high predictive accuracy, with an area under the receiver operating characteristic score of 0.94 and a maximized sensitivity score of 0.88. The predictor variables identified as most important were the source country’s dengue incidence rate, population size, and volume of air passengers. Network centrality measures, describing the positioning of European countries within the air travel network, were also influential to the predictions. We demonstrated the high predictive performance of a machine learning model in predicting dengue importation and the utility of the model-agnostic methods to offer a comprehensive understanding of the reasons behind the predictions. Similar approaches can be utilized in the development of an operational early warning surveillance system for dengue importation.


Author(s):  
Klaus Solberg Söilen

The problem we want to solve is to find out what is new in the collective intelligence literature and how it is to be understood alongside other social science disciplines. The reason it is important is that collective intelligence and problems of collaboration seem familiar in the social sciences but do not necessarily fit into any of the established disciplines. Also, collective intelligence is often associated with the notion of wisdom of crowds, which demands scrutiny. We found that the collective intelligence field is valuable, truly interdisciplinary, and part of a paradigm shift in the social sciences. However, the content is not new, as suggested by the comparison with social intelligence, which is often uncritical and lacking in the data it shows and that the notion of the wisdom of crowds is misleading (RQ1). The study of social systems is still highly relevant for social scientists and scholars of collective intelligence as an alternative methodology to more traditional social science paradigms as found, for example, in the study of business or management (RQ2).


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 15-16
Author(s):  
Pablo A S Fonseca ◽  
Massimo Tornatore ◽  
Angela Cánovas

Abstract Reduced fertility is one of the main causes of economic losses in dairy farms. The cost of a stillbirth is estimated in US$ 938 per case in Holstein herds. Machine learning (ML) is gaining popularity in the livestock sector as a mean to identify hidden patterns and due to its potential to address dimensionality problems. Here we investigate the application of ML algorithms for the prediction of cows with higher stillbirth susceptibility in two scenarios: cows with >25% and >33.33% of stillbirths among birth records. These thresholds correspond to percentiles 75 (still_75) and 90 (still_90), respectively. A total of 10,570 cows and 50,541 birth records were collected to perform a haplotype-based genome-wide association study. Five-hundred significant pseudo single nucleotide polymorphisms (pseudo-SNPs) (False-Discovery Rate< 0.05) were used as input features of ML-based predictions to determine if the cow is in the top-75 and top-90 percentiles. Table 1 shows the classification performance of the investigated ML and linear models. The ML models outperformed linear models for both thresholds. In general, still_75 showed higher F1 values compared to still_90, suggesting a lower misclassification ratio when a less stringent threshold is used. We observe that accuracy of the models in our study is higher when compared to ML-based prediction accuracies in other breeds, e.g. compared to the accuracies of 0.46 and 0.67 that were achieved using SNPs for body weight in Brahman and fertility traits in Nellore, respectively. Xgboost algorithm shows the highest balanced accuracy (BA; 0.625), F1-score (0.588) and area under the curve (AUC; 0.688), suggesting that xgboost can achieve the highest predictive performance and the lowest difference in misclassification ratio between classes. The ML applied over haplotype libraries is an interesting approach for the detection of animals with higher susceptibility to stillbirths due to highest predictive accuracy and relatively lower misclassification ratio.


Horizons ◽  
1980 ◽  
Vol 7 (2) ◽  
pp. 219-230
Author(s):  
Anthony Battaglia

AbstractAn interest in the validity and importance of religion has led some social scientists to try to change some of the common assumptions of their discipline. In doing so, they have made statements, such as “Religion is true,” which invite analysis and response from the context of the traditional Ontological Argument. Recasting the argument in the vocabulary of the social sciences also sheds new light on traditional discussions of it.


Sign in / Sign up

Export Citation Format

Share Document