scholarly journals Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning

Author(s):  
Chad Wood ◽  
Giorgis Georgakoudis ◽  
David Beckingsale ◽  
David Poliakoff ◽  
Alfredo Gimenez ◽  
...  
F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1239
Author(s):  
Shraddha Pai ◽  
Philipp Weber ◽  
Ruth Isserlin ◽  
Hussam Kaka ◽  
Shirley Hui ◽  
...  

Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data – a common problem in real-world data – without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits. Workflows offer versatility with custom feature design, choice of similarity metric; speed is improved by parallel execution. Built-in functions and examples allow users to compute model performance metrics such as AUROC, AUPR, and accuracy. netDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. Advanced users can build more complex predictor designs with functional building blocks used in the default design. Finally, the netDx Bioconductor package provides a novel workflow for pathway-based patient classification from sparse genetic data.


2021 ◽  
Vol 14 (11) ◽  
pp. 2327-2340
Author(s):  
Side Li ◽  
Arun Kumar

Many applications that use large-scale machine learning (ML) increasingly prefer different models for subgroups (e.g., countries) to improve accuracy, fairness, or other desiderata. We call this emerging popular practice learning over groups , analogizing to GROUP BY in SQL, albeit for ML training instead of SQL aggregates. From the systems standpoint, this practice compounds the already data-intensive workload of ML model selection (e.g., hyperparameter tuning). Often, thousands of models may need to be trained, necessitating high-throughput parallel execution. Alas, most ML systems today focus on training one model at a time or at best, parallelizing hyperparameter tuning. This status quo leads to resource wastage, low throughput, and high runtimes. In this work, we take the first step towards enabling and optimizing learning over groups from the data systems standpoint for three popular classes of ML: linear models, neural networks, and gradient-boosted decision trees. Analytically and empirically, we compare standard approaches to execute this workload today: task-parallelism and data-parallelism. We find neither is universally dominant. We put forth a novel hybrid approach we call grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent we call Gradient Accumulation Parallelism (GAP). We prototype our ideas into a system we call Kingpin built on top of existing ML tools and the flexible massively-parallel runtime Ray. An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP.


F1000Research ◽  
2021 ◽  
Vol 9 ◽  
pp. 1239
Author(s):  
Shraddha Pai ◽  
Philipp Weber ◽  
Ruth Isserlin ◽  
Hussam Kaka ◽  
Shirley Hui ◽  
...  

Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data – a common problem in real-world data – without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits. Workflows offer versatility with custom feature design, choice of similarity metric; speed is improved by parallel execution. Built-in functions and examples allow users to compute model performance metrics such as AUROC, AUPR, and accuracy. netDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. Advanced users can build more complex predictor designs with functional building blocks used in the default design. Finally, the netDx Bioconductor package provides a novel workflow for pathway-based patient classification from sparse genetic data.


Author(s):  
Yurdaer N. Doganata ◽  
Geetika T. Lakshmanan ◽  
Merve Unuvar

Underlying business processes in service management are people intensive and collaborative by nature. We are observing an emerging trend in the service management applications, moving away from rigid process orchestration to leveraging collaboration. Such solutions allow staffers to define their own customized, ad-hoc step flow consisting of the sequence of the activities necessary to handle a service component. These ad-hoc steps introduce uncertainty to the successful completion of a service request. When there is uncertainty, predictive guidance about future outcomes could provide value to the workers handling a time-sensitive service delivery component. Predicting the future outcomes using machine-learning techniques requires effective representation of the process execution traces. This is challenging when process model includes parallel execution flows or repeated executions of some activities. In this chapter, we describe algorithms for training machine learning models when the execution paths include parallel flows and when some activities are repeatedly executed.


2020 ◽  
Vol 43 ◽  
Author(s):  
Myrthe Faber

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.


2020 ◽  
Author(s):  
Mohammed J. Zaki ◽  
Wagner Meira, Jr
Keyword(s):  

2020 ◽  
Author(s):  
Marc Peter Deisenroth ◽  
A. Aldo Faisal ◽  
Cheng Soon Ong
Keyword(s):  

Author(s):  
Lorenza Saitta ◽  
Attilio Giordana ◽  
Antoine Cornuejols

Author(s):  
Shai Shalev-Shwartz ◽  
Shai Ben-David
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document