Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning

Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data – a common problem in real-world data – without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits. Workflows offer versatility with custom feature design, choice of similarity metric; speed is improved by parallel execution. Built-in functions and examples allow users to compute model performance metrics such as AUROC, AUPR, and accuracy. netDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. Advanced users can build more complex predictor designs with functional building blocks used in the default design. Finally, the netDx Bioconductor package provides a novel workflow for pathway-based patient classification from sparse genetic data.

Download Full-text

Towards an optimized GROUP by abstraction for large-scale machine learning

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476284 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2327-2340

Author(s):

Side Li ◽

Arun Kumar

Keyword(s):

Machine Learning ◽

Large Scale ◽

Linear Models ◽

Hybrid Approach ◽

Empirical Evaluation ◽

Parallel Execution ◽

Task Parallelism ◽

Data Systems ◽

Benchmark Datasets ◽

Boosted Decision Trees

Many applications that use large-scale machine learning (ML) increasingly prefer different models for subgroups (e.g., countries) to improve accuracy, fairness, or other desiderata. We call this emerging popular practice learning over groups , analogizing to GROUP BY in SQL, albeit for ML training instead of SQL aggregates. From the systems standpoint, this practice compounds the already data-intensive workload of ML model selection (e.g., hyperparameter tuning). Often, thousands of models may need to be trained, necessitating high-throughput parallel execution. Alas, most ML systems today focus on training one model at a time or at best, parallelizing hyperparameter tuning. This status quo leads to resource wastage, low throughput, and high runtimes. In this work, we take the first step towards enabling and optimizing learning over groups from the data systems standpoint for three popular classes of ML: linear models, neural networks, and gradient-boosted decision trees. Analytically and empirically, we compare standard approaches to execute this workload today: task-parallelism and data-parallelism. We find neither is universally dominant. We put forth a novel hybrid approach we call grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent we call Gradient Accumulation Parallelism (GAP). We prototype our ideas into a system we call Kingpin built on top of existing ML tools and the flexible massively-parallel runtime Ray. An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP.

Download Full-text

netDx: Software for building interpretable patient classifiers by multi-'omic data integration using patient similarity networks

F1000Research ◽

10.12688/f1000research.26429.2 ◽

2021 ◽

Vol 9 ◽

pp. 1239

Author(s):

Shraddha Pai ◽

Philipp Weber ◽

Ruth Isserlin ◽

Hussam Kaka ◽

Shirley Hui ◽

...

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Genomic Data ◽

Building Blocks ◽

Patient Data ◽

Parallel Execution ◽

Survival Prediction ◽

Patient Classification ◽

Bioconductor Package ◽

Real World Data

Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data – a common problem in real-world data – without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits. Workflows offer versatility with custom feature design, choice of similarity metric; speed is improved by parallel execution. Built-in functions and examples allow users to compute model performance metrics such as AUROC, AUPR, and accuracy. netDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. Advanced users can build more complex predictor designs with functional building blocks used in the default design. Finally, the netDx Bioconductor package provides a novel workflow for pathway-based patient classification from sparse genetic data.

Download Full-text

Predictive Analytics for Business Processes in Service Management

Maximizing Management Performance and Quality with Service Analytics - Advances in Logistics, Operations, and Management Science ◽

10.4018/978-1-4666-8496-6.ch013 ◽

2015 ◽

pp. 366-403

Author(s):

Yurdaer N. Doganata ◽

Geetika T. Lakshmanan ◽

Merve Unuvar

Keyword(s):

Machine Learning ◽

Process Model ◽

Service Management ◽

Business Processes ◽

Ad Hoc ◽

Predictive Analytics ◽

Parallel Execution ◽

Machine Learning Techniques ◽

Parallel Flows ◽

Future Outcomes

Underlying business processes in service management are people intensive and collaborative by nature. We are observing an emerging trend in the service management applications, moving away from rigid process orchestration to leveraging collaboration. Such solutions allow staffers to define their own customized, ad-hoc step flow consisting of the sequence of the activities necessary to handle a service component. These ad-hoc steps introduce uncertainty to the successful completion of a service request. When there is uncertainty, predictive guidance about future outcomes could provide value to the workers handling a time-sensitive service delivery component. Predicting the future outcomes using machine-learning techniques requires effective representation of the process execution traces. This is challenging when process model includes parallel execution flows or repeated executions of some activities. In this chapter, we describe algorithms for training machine learning models when the execution paths include parallel flows and when some activities are repeatedly executed.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text