scholarly journals Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods

Author(s):  
Svitlana Volkova ◽  
Dustin Arendt ◽  
Emily Saldanha ◽  
Maria Glenski ◽  
Ellyn Ayton ◽  
...  

AbstractGround Truth program was designed to evaluate social science modeling approaches using simulation test beds with ground truth intentionally and systematically embedded to understand and model complex Human Domain systems and their dynamics Lazer et al. (Science 369:1060–1062, 2020). Our multidisciplinary team of data scientists, statisticians, experts in Artificial Intelligence (AI) and visual analytics had a unique role on the program to investigate accuracy, reproducibility, generalizability, and robustness of the state-of-the-art (SOTA) causal structure learning approaches applied to fully observed and sampled simulated data across virtual worlds. In addition, we analyzed the feasibility of using machine learning models to predict future social behavior with and without causal knowledge explicitly embedded. In this paper, we first present our causal modeling approach to discover the causal structure of four virtual worlds produced by the simulation teams—Urban Life, Financial Governance, Disaster and Geopolitical Conflict. Our approach adapts the state-of-the-art causal discovery (including ensemble models), machine learning, data analytics, and visualization techniques to allow a human-machine team to reverse-engineer the true causal relations from sampled and fully observed data. We next present our reproducibility analysis of two research methods team’s performance using a range of causal discovery models applied to both sampled and fully observed data, and analyze their effectiveness and limitations. We further investigate the generalizability and robustness to sampling of the SOTA causal discovery approaches on additional simulated datasets with known ground truth. Our results reveal the limitations of existing causal modeling approaches when applied to large-scale, noisy, high-dimensional data with unobserved variables and unknown relationships between them. We show that the SOTA causal models explored in our experiments are not designed to take advantage from vasts amounts of data and have difficulty recovering ground truth when latent confounders are present; they do not generalize well across simulation scenarios and are not robust to sampling; they are vulnerable to data and modeling assumptions, and therefore, the results are hard to reproduce. Finally, when we outline lessons learned and provide recommendations to improve models for causal discovery and prediction of human social behavior from observational data, we highlight the importance of learning data to knowledge representations or transformations to improve causal discovery and describe the benefit of causal feature selection for predictive and prescriptive modeling.

2021 ◽  
Author(s):  
Kai Guo ◽  
Zhenze Yang ◽  
Chi-Hua Yu ◽  
Markus J. Buehler

This review revisits the state of the art of research efforts on the design of mechanical materials using machine learning.


2020 ◽  
Author(s):  
Fei Qi ◽  
Zhaohui Xia ◽  
Gaoyang Tang ◽  
Hang Yang ◽  
Yu Song ◽  
...  

As an emerging field, Automated Machine Learning (AutoML) aims to reduce or eliminate manual operations that require expertise in machine learning. In this paper, a graph-based architecture is employed to represent flexible combinations of ML models, which provides a large searching space compared to tree-based and stacking-based architectures. Based on this, an evolutionary algorithm is proposed to search for the best architecture, where the mutation and heredity operators are the key for architecture evolution. With Bayesian hyper-parameter optimization, the proposed approach can automate the workflow of machine learning. On the PMLB dataset, the proposed approach shows the state-of-the-art performance compared with TPOT, Autostacker, and auto-sklearn. Some of the optimized models are with complex structures which are difficult to obtain in manual design.


2021 ◽  
Vol 11 (17) ◽  
pp. 8074
Author(s):  
Tierui Zou ◽  
Nader Aljohani ◽  
Keerthiraj Nagaraj ◽  
Sheng Zou ◽  
Cody Ruben ◽  
...  

Concerning power systems, real-time monitoring of cyber–physical security, false data injection attacks on wide-area measurements are of major concern. However, the database of the network parameters is just as crucial to the state estimation process. Maintaining the accuracy of the system model is the other part of the equation, since almost all applications in power systems heavily depend on the state estimator outputs. While much effort has been given to measurements of false data injection attacks, seldom reported work is found on the broad theme of false data injection on the database of network parameters. State-of-the-art physics-based model solutions correct false data injection on network parameter database considering only available wide-area measurements. In addition, deterministic models are used for correction. In this paper, an overdetermined physics-based parameter false data injection correction model is presented. The overdetermined model uses a parameter database correction Jacobian matrix and a Taylor series expansion approximation. The method further applies the concept of synthetic measurements, which refers to measurements that do not exist in the real-life system. A machine learning linear regression-based model for measurement prediction is integrated in the framework through deriving weights for synthetic measurements creation. Validation of the presented model is performed on the IEEE 118-bus system. Numerical results show that the approximation error is lower than the state-of-the-art, while providing robustness to the correction process. Easy-to-implement model on the classical weighted-least-squares solution, highlights real-life implementation potential aspects.


2017 ◽  
Vol 108 (1) ◽  
pp. 307-318 ◽  
Author(s):  
Eleftherios Avramidis

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.


Author(s):  
Anass Nouri ◽  
Christophe Charrier ◽  
Olivier Lezoray

This chapter concerns the visual saliency and the perceptual quality assessment of 3D meshes. Firstly, the chapter proposes a definition of visual saliency and describes the state-of-the-art methods for its detection on 3D mesh surfaces. A focus is made on a recent model of visual saliency detection for 3D colored and non-colored meshes whose results are compared with a ground-truth saliency as well as with the literature's methods. Since this model is able to estimate the visual saliency on 3D colored meshes, named colorimetric saliency, a description of the construction of a 3D colored mesh database that was used to assess its relevance is presented. The authors also describe three applications of the detailed model that respond to the problems of viewpoint selection, adaptive simplification and adaptive smoothing. Secondly, two perceptual quality assessment metrics for 3D non-colored meshes are described, analyzed, and compared with the state-of-the-art approaches.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Nalindren Naicker ◽  
Timothy Adeliyi ◽  
Jeanette Wing

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.


Author(s):  
Prayag Tiwari ◽  
Massimo Melucci

Machine Learning (ML) helps us to recognize patterns from raw data. ML is used in numerous domains i.e. biomedical, agricultural, food technology, etc. Despite recent technological advancements, there is still room for substantial improvement in prediction. Current ML models are based on classical theories of probability and statistics, which can now be replaced by Quantum Theory (QT) with the aim of improving the effectiveness of ML. In this paper, we propose the Binary Classifier Inspired by Quantum Theory (BCIQT) model, which outperforms the state of the art classification in terms of recall for every category.


Sign in / Sign up

Export Citation Format

Share Document