Insider Collusion Attack on Distributed Machine Learning System and its Solutions - A Case of SVM

Author(s):  
Peter Shaojui Wang
2021 ◽  
Vol 14 (8) ◽  
pp. 1338-1350
Author(s):  
Binhang Yuan ◽  
Dimitrije Jankov ◽  
Jia Zou ◽  
Yuxin Tang ◽  
Daniel Bourgeois ◽  
...  

We consider the question: what is the abstraction that should be implemented by the computational engine of a machine learning system? Current machine learning systems typically push whole tensors through a series of compute kernels such as matrix multiplications or activation functions, where each kernel runs on an AI accelerator (ASIC) such as a GPU. This implementation abstraction provides little built-in support for ML systems to scale past a single machine, or for handling large models with matrices or tensors that do not easily fit into the RAM of an ASIC. In this paper, we present an alternative implementation abstraction called the tensor relational algebra (TRA). The TRA is a set-based algebra based on the relational algebra. Expressions in the TRA operate over binary tensor relations, where keys are multi-dimensional arrays and values are tensors. The TRA is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML workflows in distributed clusters.


2021 ◽  
Vol 51 (4) ◽  
pp. 75-81
Author(s):  
Ahad Mirza Baig ◽  
Alkida Balliu ◽  
Peter Davies ◽  
Michal Dory

Rachid Guerraoui was the rst keynote speaker, and he got things o to a great start by discussing the broad relevance of the research done in our community relative to both industry and academia. He rst argued that, in some sense, the fact that distributed computing is so pervasive nowadays could end up sti ing progress in our community by inducing people to work on marginal problems, and becoming isolated. His rst suggestion was to try to understand and incorporate new ideas coming from applied elds into our research, and argued that this has been historically very successful. He illustrated this point via the distributed payment problem, which appears in the context of blockchains, in particular Bitcoin, but then turned out to be very theoretically interesting; furthermore, the theoretical understanding of the problem inspired new practical protocols. He then went further to discuss new directions in distributed computing, such as the COVID tracing problem, and new challenges in Byzantine-resilient distributed machine learning. Another source of innovation Rachid suggested was hardware innovations, which he illustrated with work studying the impact of RDMA-based primitives on fundamental problems in distributed computing. The talk concluded with a very lively discussion.


2021 ◽  
Author(s):  
Junjie Shi ◽  
Jiang Bian ◽  
Jakob Richter ◽  
Kuan-Hsun Chen ◽  
Jörg Rahnenführer ◽  
...  

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.


2021 ◽  
pp. 1098612X2110012
Author(s):  
Jade Renard ◽  
Mathieu R Faucher ◽  
Anaïs Combes ◽  
Didier Concordet ◽  
Brice S Reynolds

Objectives The aim of this study was to develop an algorithm capable of predicting short- and medium-term survival in cases of intrinsic acute-on-chronic kidney disease (ACKD) in cats. Methods The medical record database was searched to identify cats hospitalised for acute clinical signs and azotaemia of at least 48 h duration and diagnosed to have underlying chronic kidney disease based on ultrasonographic renal abnormalities or previously documented azotaemia. Cases with postrenal azotaemia, exposure to nephrotoxicants, feline infectious peritonitis or neoplasia were excluded. Clinical variables were combined in a clinical severity score (CSS). Clinicopathological and ultrasonographic variables were also collected. The following variables were tested as inputs in a machine learning system: age, body weight (BW), CSS, identification of small kidneys or nephroliths by ultrasonography, serum creatinine at 48 h (Crea48), spontaneous feeding at 48 h (SpF48) and aetiology. Outputs were outcomes at 7, 30, 90 and 180 days. The machine-learning system was trained to develop decision tree algorithms capable of predicting outputs from inputs. Finally, the diagnostic performance of the algorithms was calculated. Results Crea48 was the best predictor of survival at 7 days (threshold 1043 µmol/l, sensitivity 0.96, specificity 0.53), 30 days (threshold 566 µmol/l, sensitivity 0.70, specificity 0.89) and 90 days (threshold 566 µmol/l, sensitivity 0.76, specificity 0.80), with fewer cats still alive when their Crea48 was above these thresholds. A short decision tree, including age and Crea48, predicted the 180-day outcome best. When Crea48 was excluded from the analysis, the generated decision trees included CSS, age, BW, SpF48 and identification of small kidneys with an overall diagnostic performance similar to that using Crea48. Conclusions and relevance Crea48 helps predict short- and medium-term survival in cats with ACKD. Secondary variables that helped predict outcomes were age, CSS, BW, SpF48 and identification of small kidneys.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Qingsong Xi ◽  
Qiyu Yang ◽  
Meng Wang ◽  
Bo Huang ◽  
Bo Zhang ◽  
...  

Abstract Background To minimize the rate of in vitro fertilization (IVF)- associated multiple-embryo gestation, significant efforts have been made. Previous studies related to machine learning in IVF mainly focused on selecting the top-quality embryos to improve outcomes, however, in patients with sub-optimal prognosis or with medium- or inferior-quality embryos, the selection between SET and DET could be perplexing. Methods This was an application study including 9211 patients with 10,076 embryos treated during 2016 to 2018, in Tongji Hospital, Wuhan, China. A hierarchical model was established using the machine learning system XGBoost, to learn embryo implantation potential and the impact of double embryos transfer (DET) simultaneously. The performance of the model was evaluated with the AUC of the ROC curve. Multiple regression analyses were also conducted on the 19 selected features to demonstrate the differences between feature importance for prediction and statistical relationship with outcomes. Results For a single embryo transfer (SET) pregnancy, the following variables remained significant: age, attempts at IVF, estradiol level on hCG day, and endometrial thickness. For DET pregnancy, age, attempts at IVF, endometrial thickness, and the newly added P1 + P2 remained significant. For DET twin risk, age, attempts at IVF, 2PN/ MII, and P1 × P2 remained significant. The algorithm was repeated 30 times, and averaged AUC of 0.7945, 0.8385, and 0.7229 were achieved for SET pregnancy, DET pregnancy, and DET twin risk, respectively. The trend of predictive and observed rates both in pregnancy and twin risk was basically identical. XGBoost outperformed the other two algorithms: logistic regression and classification and regression tree. Conclusion Artificial intelligence based on determinant-weighting analysis could offer an individualized embryo selection strategy for any given patient, and predict clinical pregnancy rate and twin risk, therefore optimizing clinical outcomes.


Sign in / Sign up

Export Citation Format

Share Document