Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning

Author(s):  
Songze Li ◽  
Salman Avestimehr
2017 ◽  
Vol 4 (4) ◽  
pp. 627-651 ◽  
Author(s):  
Jun Zhu ◽  
Jianfei Chen ◽  
Wenbo Hu ◽  
Bo Zhang

AbstractThe explosive growth in data volume and the availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems and applications with Big Data. Bayesian methods represent one important class of statistical methods for machine learning, with substantial recent developments on adaptive, flexible and scalable Bayesian learning. This article provides a survey of the recent advances in Big learning with Bayesian methods, termed Big Bayesian Learning, including non-parametric Bayesian methods for adaptively inferring model complexity, regularized Bayesian inference for improving the flexibility via posterior regularization, and scalable algorithms and systems based on stochastic subsampling and distributed computing for dealing with large-scale applications. We also provide various new perspectives on the large-scale Bayesian modeling and inference.


2019 ◽  
Vol 214 ◽  
pp. 00001
Author(s):  
Alessandra Forti ◽  
Latchezar Betev ◽  
Maarten Litmaath ◽  
Oxana Smirnova ◽  
Petya Vasileva ◽  
...  

The 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP) took place in the National Palace of Culture, Sofia, Bulgaria from 9th to 13th of July 2018. 575 participants joined the plenary and the eight parallel sessions dedicated to: online computing; offline computing; distributed computing; data handling; software development; machine learning and physics analysis; clouds, virtualisation and containers; networks and facilities. The conference hosted 35 plenary presentations, 323 parallel presentations and 188 posters.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Ji Hwan Park ◽  
Han Eol Cho ◽  
Jong Hun Kim ◽  
Melanie M. Wall ◽  
Yaakov Stern ◽  
...  

2017 ◽  
Vol 3 (1) ◽  
Author(s):  
Giorgos Borboudakis ◽  
Taxiarchis Stergiannakos ◽  
Maria Frysali ◽  
Emmanuel Klontzas ◽  
Ioannis Tsamardinos ◽  
...  

2020 ◽  
Vol 15 (8) ◽  
pp. 084051 ◽  
Author(s):  
Puyu Feng ◽  
Bin Wang ◽  
De Li Liu ◽  
Fei Ji ◽  
Xiaoli Niu ◽  
...  

2020 ◽  
Vol 142 (8) ◽  
pp. 3814-3822 ◽  
Author(s):  
George S. Fanourgakis ◽  
Konstantinos Gkagkas ◽  
Emmanuel Tylianakis ◽  
George E. Froudakis

2019 ◽  
Vol 3 (s1) ◽  
pp. 2-2
Author(s):  
Megan C Hollister ◽  
Jeffrey D. Blume

OBJECTIVES/SPECIFIC AIMS: To examine and compare the claims in Bzdok, Altman, and Brzywinski under a broader set of conditions by using unbiased methods of comparison. To explore how to accurately use various machine learning and traditional statistical methods in large-scale translational research by estimating their accuracy statistics. Then we will identify the methods with the best performance characteristics. METHODS/STUDY POPULATION: We conducted a simulation study with a microarray of gene expression data. We maintained the original structure proposed by Bzdok, Altman, and Brzywinski. The structure for gene expression data includes a total of 40 genes from 20 people, in which 10 people are phenotype positive and 10 are phenotype negative. In order to find a statistical difference 25% of the genes were set to be dysregulated across phenotype. This dysregulation forced the positive and negative phenotypes to have different mean population expressions. Additional variance was included to simulate genetic variation across the population. We also allowed for within person correlation across genes, which was not done in the original simulations. The following methods were used to determine the number of dysregulated genes in simulated data set: unadjusted p-values, Benjamini-Hochberg adjusted p-values, Bonferroni adjusted p-values, random forest importance levels, neural net prediction weights, and second-generation p-values. RESULTS/ANTICIPATED RESULTS: Results vary depending on whether a pre-specified significance level is used or the top 10 ranked values are taken. When all methods are given the same prior information of 10 dysregulated genes, the Benjamini-Hochberg adjusted p-values and the second-generation p-values generally outperform all other methods. We were not able to reproduce or validate the finding that random forest importance levels via a machine learning algorithm outperform classical methods. Almost uniformly, the machine learning methods did not yield improved accuracy statistics and they depend heavily on the a priori chosen number of dysregulated genes. DISCUSSION/SIGNIFICANCE OF IMPACT: In this context, machine learning methods do not outperform standard methods. Because of this and their additional complexity, machine learning approaches would not be preferable. Of all the approaches the second-generation p-value appears to offer significant benefit for the cost of a priori defining a region of trivially null effect sizes. The choice of an analysis method for large-scale translational data is critical to the success of any statistical investigation, and our simulations clearly highlight the various tradeoffs among the available methods.


2019 ◽  
Vol 10 (2) ◽  
pp. 226-250 ◽  
Author(s):  
Tony Cox

AbstractManaging large-scale, geographically distributed, and long-term risks arising from diverse underlying causes – ranging from poverty to underinvestment in protecting against natural hazards or failures of sociotechnical, economic, and financial systems – poses formidable challenges for any theory of effective social decision-making. Participants may have different and rapidly evolving local information and goals, perceive different opportunities and urgencies for actions, and be differently aware of how their actions affect each other through side effects and externalities. Six decades ago, political economist Charles Lindblom viewed “rational-comprehensive decision-making” as utterly impracticable for such realistically complex situations. Instead, he advocated incremental learning and improvement, or “muddling through,” as both a positive and a normative theory of bureaucratic decision-making when costs and benefits are highly uncertain. But sparse, delayed, uncertain, and incomplete feedback undermines the effectiveness of collective learning while muddling through, even if all participant incentives are aligned; it is no panacea. We consider how recent insights from machine learning – especially, deep multiagent reinforcement learning – formalize aspects of muddling through and suggest principles for improving human organizational decision-making. Deep learning principles adapted for human use can not only help participants in different levels of government or control hierarchies manage some large-scale distributed risks, but also show how rational-comprehensive decision analysis and incremental learning and improvement can be reconciled and synthesized.


Sign in / Sign up

Export Citation Format

Share Document