A comparison of distributed machine learning methods for the support of "Many Labs" collaborations in computational modelling of decision making (Preprint)

2021 ◽  
Author(s):  
lili Zhang ◽  
Himanshu Vashisht ◽  
Andrey Totev ◽  
Nam Trinh ◽  
Tomas Ward

UNSTRUCTURED Deep learning models, especially RNN models, are potentially powerful tools for representing the complex learning processes and decision-making strategies used by humans. Such neural network models make fewer assumptions about the underlying mechanisms thus providing experimental flexibility in terms of applicability. However this comes at the cost of requiring a larger number of tunable parameters requiring significantly more training and representative data for effective learning. This presents practical challenges given that most computational modelling experiments involve relatively small numbers of subjects, which while adequate for conventional modelling using low dimensional parameter spaces, leads to sub-optimal model training when adopting deeper neural network approaches. Laboratory collaboration is a natural way of increasing data availability however, data sharing barriers among laboratories as necessitated by data protection regulations encourage us to seek alternative methods for collaborative data science. Distributed learning, especially federated learning, which supports the preservation of data privacy, is a promising method for addressing this issue. To verify the reliability and feasibility of applying federated learning to train neural networks models used in the characterisation of human decision making, we conducted experiments on a real-world, many-labs data pool including experimentally significant data-sets from ten independent studies. The performance of single models that were trained on single laboratory data-sets was poor, especially those with small numbers of subjects. This unsurprising finding supports the need for larger and more diverse data-sets to train more generalised and reliable models. To that end we evaluated four collaborative approaches for comparison purposes. The first approach represents conventional centralized data sharing (CL-based) and is the optimal approach but requires complete sharing of data which we wish to avoid. The results however establish a benchmark for the other three distributed approaches; federated learning (FL-based), incremental learning (IL-based), and cyclic incremental learning (CIL-based). We evaluate these approaches in terms of prediction accuracy and capacity to characterise human decision-making strategies in the context of the computational modelling experiments considered here. The results demonstrate that the FL-based model achieves performance most comparable to that of a centralized data sharing approach. This demonstrate that federated learning has value in scaling data science methods to data collected in computational modelling contexts in circumstances where data sharing is not convenient, practical or permissible.

2021 ◽  
Author(s):  
lili Zhang ◽  
Himanshu Vashisht ◽  
Andrey Totev ◽  
Nam Trinh ◽  
Tomas Ward

Deep learning models, especially RNN models, are potentially powerful tools for representing the complex learning processes and decision-making strategies used by humans. Such neural network models make fewer assumptions about the underlying mechanisms thus providing experimental flexibility in terms of applicability. However this comes at the cost of requiring a larger number of tunable parameters requiring significantly more training and representative data for effective learning. This presents practical challenges given that most computational modelling experiments involve relatively small numbers of subjects, which while adequate for conventional modelling using low dimensional parameter spaces, leads to sub-optimal model training when adopting deeper neural network approaches. Laboratory collaboration is a natural way of increasing data availability however, data sharing barriers among laboratories as necessitated by data protection regulations encourage us to seek alternative methods for collaborative data science. Distributed learning, especially federated learning, which supports the preservation of data privacy, is a promising method for addressing this issue. To verify the reliability and feasibility of applying federated learning to train neural networks models used in the characterisation of human decision making, we conducted experiments on a real-world, many-labs data pool including experimentally significant data-sets from ten independent studies. The performance of single models that were trained on single laboratory data-sets was poor, especially those with small numbers of subjects. This unsurprising finding supports the need for larger and more diverse data-sets to train more generalised and reliable models. To that end we evaluated four collaborative approaches for comparison purposes. The first approach represents conventional centralized data sharing (CL-based) and is the optimal approach but requires complete sharing of data which we wish to avoid. The results however establish a benchmark for the other three distributed approaches; federated learning (FL-based), incremental learning (IL-based), and cyclic incremental learning (CIL-based). We evaluate these approaches in terms of prediction accuracy and capacity to characterise human decision-making strategies in the context of the computational modelling experiments considered here. The results demonstrate that the FL-based model achieves performance most comparable to that of a centralized data sharing approach. This demonstrate that federated learning has value in scaling data science methods to data collected in computational modelling contexts in circumstances where data sharing is not convenient, practical or permissible.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0242923
Author(s):  
P. J. Stephenson ◽  
Carrie Stengel

Many conservation managers, policy makers, businesses and local communities cannot access the biodiversity data they need for informed decision-making on natural resource management. A handful of databases are used to monitor indicators against global biodiversity goals but there is no openly available consolidated list of global data sets to help managers, especially those in high-biodiversity countries. We therefore conducted an inventory of global databases of potential use in monitoring biodiversity states, pressures and conservation responses at multiple levels. We uncovered 145 global data sources, as well as a selection of global data reports, links to which we will make available on an open-access website. We describe trends in data availability and actions needed to improve data sharing. If the conservation and science community made a greater effort to publicise data sources, and make the data openly and freely available for the people who most need it, we might be able to mainstream biodiversity data into decision-making and help stop biodiversity loss.


Author(s):  
N. Bessis ◽  
T. French ◽  
M. Burakova-Lorgnier ◽  
W. Huang

This chapter is about conceptualizing the applicability of grid related technologies for supporting intelligence in decision-making. It aims to discuss how the open grid service architecture—data, access integration (OGSA-DAI) can facilitate the discovery of and controlled access to vast data-sets, to assist intelligence in decision making. Trust is also identified as one of the main challenges for intelligence in decision-making. On this basis, the implications and challenges of using grid technologies to serve this purpose are also discussed. To further the explanation of the concepts and practices associated with the process of intelligence in decision-making using grid technologies, a minicase is employed incorporating a scenario. That is to say, “Synergy Financial Solutions Ltd” is presented as the minicase, so as to provide the reader with a central and continuous point of reference.


2020 ◽  
Author(s):  
Jordi Bolibar ◽  
Antoine Rabatel ◽  
Isabelle Gouttevin ◽  
Clovis Galiez ◽  
Thomas Condom ◽  
...  

<div>Glacier surface mass balance (SMB) and glacier evolution modelling have traditionally been tackled with physical/empirical methods, and despite some statistical studies very few efforts have been made towards machine learning approaches. With the end of this past decade, we have witnessed an impressive increase in the available amount of data, mostly coming from remote sensing products and reanalyses, as well as an extensive list of open-source tools and libraries for data science. Here we introduce a first effort to use deep learning (i.e. a deep artificial neural network) to simulate glacier-wide surface mass balance at a regional scale, based on direct and remote sensing SMB data, climate reanalysis and multitemporal glacier inventories. Coupled with a parameterized glacier-specific ice dynamics function, this allows us to simulate the evolution of glaciers for a whole region. This has been developed as the ALpine Parameterized Glacier Model (ALPGM), an open-source Python glacier evolution model. To illustrate this data science approach, we present the results of a glacier-wide surface mass balance reconstruction of all the glaciers in the French Alps from 1967-2015. These results were analysed and compared with all the available observations in the region as well as another physical/empirical SMB reconstruction study. We observe some interesting differences between the two SMB reconstructions, which further highlight the interest of using alternative methods in glacier modelling. Due to (relatively) recent advances in data availability and open tools (e.g. Tensorflow, Keras, Pangeo) this research field is ripe for progress, with many interesting challenges and opportunities lying ahead. To conclude, some perspectives on data science glacier modelling are discussed, based on the limitations of our current approach and on upcoming tools and methods, such as convolutional and physics-informed neural networks. </div>


Computer ◽  
1996 ◽  
Vol 29 (3) ◽  
pp. 64-70 ◽  
Author(s):  
Chew Lim Tan ◽  
Tong Seng Quah ◽  
Hoon Heng Teh

2020 ◽  
Vol 67 (1) ◽  
pp. 83-100 ◽  
Author(s):  
Renze Zhou ◽  
Zhiguo Xing ◽  
Haidou Wang ◽  
Zhongyu Piao ◽  
Yanfei Huang ◽  
...  

Purpose With the development of deep learning-based analytical techniques, increased research has focused on fatigue data analysis methods based on deep learning, which are gaining in popularity. However, the application of deep neural networks in the material science domain is mainly inhibited by data availability. In this paper, to overcome the difficulty of multifactor fatigue life prediction with small data sets, Design/methodology/approach A multiple neural network ensemble (MNNE) is used, and an MNNE with a general and flexible explicit function is developed to accurately quantify the complicated relationships hidden in multivariable data sets. Moreover, a variational autoencoder-based data generator is trained with small sample sets to expand the size of the training data set. A comparative study involving the proposed method and traditional models is performed. In addition, a filtering rule based on the R2 score is proposed and applied in the training process of the MNNE, and this approach has a beneficial effect on the prediction accuracy and generalization ability. Findings A comparative study involving the proposed method and traditional models is performed. The comparative experiment confirms that the use of hybrid data can improve the accuracy and generalization ability of the deep neural network and that the MNNE outperforms support vector machines, multilayer perceptron and deep neural network models based on the goodness of fit and robustness in the small sample case. Practical implications The experimental results imply that the proposed algorithm is a sophisticated and promising multivariate method for predicting the contact fatigue life of a coating when data availability is limited. Originality/value A data generated model based on variational autoencoder was used to make up lack of data. An MNNE method was proposed to apply in the small data case of fatigue life prediction.


2021 ◽  
Author(s):  
Frederic Jumelle ◽  
Kelvin So ◽  
Didan Deng

AbstractIn this paper, we are introducing a novel model of artificial intelligence, the functional neural network for modeling of human decision-making processes. This neural network is composed of multiple artificial neurons racing in the network. Each of these neurons has a similar structure programmed independently by the users and composed of an intention wheel, a motor core and a sensory core representing the user itself and racing at a specific velocity. The mathematics of the neuron’s formulation and the racing mechanism of multiple nodes in the network will be discussed, and the group decision process with fuzzy logic and the transformation of these conceptual methods into practical methods of simulation and in operations will be developed. Eventually, we will describe some possible future research directions in the fields of finance, education and medicine including the opportunity to design an intelligent learning agent with application in business operations supervision. We believe that this functional neural network has a promising potential to transform the way we can compute decision-making and lead to a new generation of neuromorphic chips for seamless human-machine interactions.


Author(s):  
Chandimal Jayawardena ◽  
Keigo Watanabe ◽  
Kiyotaka Izumi

Natural language commands are information rich and conscious because they are generated by intelligent human beings. Therefore, if it is possible to learn from such commands and reuse that knowledge, it will be very effective and useful. In this chapter, learning from information rich voice commands for controlling a robot is discussed. First, new concepts of fuzzy coach-player system and sub-coach for robot control with natural language commands are proposed. Then, the characteristics of subjective human decision making process and learning from such decisions are discussed. Finally, an experiment conducted with a PA-10 redundant manipulator in order to establish the proposed concept is described. In the experiment, a Probabilistic Neural Network (PNN) is used for learning.


2017 ◽  
Author(s):  
Ben Marwick ◽  
Suzanne E Pilaar Birch

How do archaeologists share their research data, if at all? We review what data are, according to current influential definitions, and previous work on the benefits, costs and norms of data sharing in the sciences broadly. To understand data sharing in archaeology, we present the results of three pilot studies: requests for data by email; review of data availability in published articles, and analysis of archaeological datasets deposited in repositories. We find that archaeologists are often willing to share, but discipline-wide sharing is patchy and ad hoc. Legislation and mandates are effective at increasing data-sharing, but editorial policies at journals lack adequate enforcement. Although most of data available at repositories are licensed to enable flexible reuse, only a small proportion of the data are stored in structured formats for easy reuse. We present some suggestions for improving the state of date sharing in archaeology, among these is a standard for citing data sets to ensure that researchers making their data publicly available receive appropriate credit.


Author(s):  
Brad Morantz ◽  
Thomas Whalen ◽  
G. Peter Zhang

In building a decision support system (DSS), an important component is the modeling of each potential alternative action to predict its consequence. Decision makers and automated decision systems (i.e., modelbased DSSs) depend upon quality forecasts to assist in the decision process. The more accurate the forecast, the better the DSS is at helping the decision maker to select the best solution. Forecasting is an important contributor to quality decision making, in both the business world and for engineering problems. Retail stores and wholesale distributors must predict sales in order to know how much inventory to have on hand. Too little can cause lost sales and customer dissatisfaction—If too much is on hand, then other inventory problems can occur (i.e., cash flow, ad valorem tax, etc.). If the goods are perishable, it could most certainly be a financial loss. Items that occur over time, as in the number of cars sold per day, the position of an airplane, or the price of a certain stock are called “time series.” When these values are forecast, the accuracy can vary, depending on the data set and the method. This subject has been greatly discussed in the literature and many methods have been presented. Artificial neural networks (ANN) have been shown to be very effective at prediction. Time series forecasting is based upon the assumption that the underlying causal factors are reflected in the lagged data values. Many times, a complete set of the causal factors either is not known or is not available. Predictions are made based upon the theory that whatever has occurred in the near past will continue into the near future. Time series forecasting uses past values to try and predict the future. A slight modification to this concept is the application of recency. What happened more recently is closer to the current situation than the more distant past. The older data still contain knowledge, it just is not as important (or as correct) as the newest information. Things change, life is dynamic, and what used to be may be no more or may be to a different extent. Modification of the training algorithm of a neural network forecaster to consider recency has been proven on real economic data sets to reduce residual by as much as 50%, thereby creating a more accurate model which would allow for better decision making.


Sign in / Sign up

Export Citation Format

Share Document