Data-Driven Energy Use Estimation in Large Scale Transportation Networks

Author(s):  
Bin Wang ◽  
Cy Chan ◽  
Divya Somasi ◽  
Jane Macfarlane ◽  
Eric Rask
Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 154
Author(s):  
Marcus Walldén ◽  
Masao Okita ◽  
Fumihiko Ino ◽  
Dimitris Drikakis ◽  
Ioannis Kokkinakis

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.


Author(s):  
Ekaterina Kochmar ◽  
Dung Do Vu ◽  
Robert Belfer ◽  
Varun Gupta ◽  
Iulian Vlad Serban ◽  
...  

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.


2021 ◽  
Vol 10 (1) ◽  
pp. e001087
Author(s):  
Tarek F Radwan ◽  
Yvette Agyako ◽  
Alireza Ettefaghian ◽  
Tahira Kamran ◽  
Omar Din ◽  
...  

A quality improvement (QI) scheme was launched in 2017, covering a large group of 25 general practices working with a deprived registered population. The aim was to improve the measurable quality of care in a population where type 2 diabetes (T2D) care had previously proved challenging. A complex set of QI interventions were co-designed by a team of primary care clinicians and educationalists and managers. These interventions included organisation-wide goal setting, using a data-driven approach, ensuring staff engagement, implementing an educational programme for pharmacists, facilitating web-based QI learning at-scale and using methods which ensured sustainability. This programme was used to optimise the management of T2D through improving the eight care processes and three treatment targets which form part of the annual national diabetes audit for patients with T2D. With the implemented improvement interventions, there was significant improvement in all care processes and all treatment targets for patients with diabetes. Achievement of all the eight care processes improved by 46.0% (p<0.001) while achievement of all three treatment targets improved by 13.5% (p<0.001). The QI programme provides an example of a data-driven large-scale multicomponent intervention delivered in primary care in ethnically diverse and socially deprived areas.


2018 ◽  
Vol 20 (10) ◽  
pp. 2774-2787 ◽  
Author(s):  
Feng Gao ◽  
Xinfeng Zhang ◽  
Yicheng Huang ◽  
Yong Luo ◽  
Xiaoming Li ◽  
...  

2011 ◽  
Author(s):  
D. Suendermann ◽  
J. Liscombe ◽  
J. Bloom ◽  
G. Li ◽  
Roberto Pieraccini

2018 ◽  
Vol 22 (6) ◽  
pp. 1255-1265 ◽  
Author(s):  
Yongle Li ◽  
Chuanjin Yu ◽  
Xingyu Chen ◽  
Xinyu Xu ◽  
Koffi Togbenou ◽  
...  

A growing number of long-span bridges are under construction across straits or through valleys, where the wind characteristics are complex and inhomogeneous. The simulation of inhomogeneous random wind velocity fields on such long-span bridges with the spectral representation method will require significant computation resources due to the time-consuming issues associated with the Cholesky decomposition of the power spectrum density matrixes. In order to improve the efficiency of the decomposition, a novel and efficient formulation of the Cholesky decomposition, called “Band-Limited Cholesky decomposition,” is proposed and corresponding simulation schemes are suggested. The key idea is to convert the coherence matrixes into band matrixes whose decomposition requires less computational cost and storage. Subsequently, each decomposed coherence matrix is also a band matrix with high sparsity. As the zero-valued elements have no contribution to the simulation calculation, the proposed method is further expedited by limiting the calculation to the non-zero elements only. The proposed methods are data-driven ones, which can be applicable broadly for simulating many complicated large-scale random wind velocity fields, especially for the inhomogeneous ones. Through the data-driven strategies presented in the study, a numerical example involving inhomogeneous random wind velocity field simulation on a long-span bridge is performed. Compared to the traditional spectral representation method, the simulation results are with high accuracy and the entire simulation procedure is about 2.5 times faster by the proposed method for the simulation of one hundred wind velocity processes.


Author(s):  
Raquel Barata ◽  
Raquel Prado ◽  
Bruno Sansó

Abstract. We present a data-driven approach to assess and compare the behavior of large-scale spatial averages of surface temperature in climate model simulations and in observational products. We rely on univariate and multivariate dynamic linear model (DLM) techniques to estimate both long-term and seasonal changes in temperature. The residuals from the DLM analyses capture the internal variability of the climate system and exhibit complex temporal autocorrelation structure. To characterize this internal variability, we explore the structure of these residuals using univariate and multivariate autoregressive (AR) models. As a proof of concept that can easily be extended to other climate models, we apply our approach to one particular climate model (MIROC5). Our results illustrate model versus data differences in both long-term and seasonal changes in temperature. Despite differences in the underlying factors contributing to variability, the different types of simulation yield very similar spectral estimates of internal temperature variability. In general, we find that there is no evidence that the MIROC5 model systematically underestimates the amplitude of observed surface temperature variability on multi-decadal timescales – a finding that has considerable relevance regarding efforts to identify anthropogenic “fingerprints” in observational surface temperature data. Our methodology and results present a novel approach to obtaining data-driven estimates of climate variability for purposes of model evaluation.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (1) ◽  
pp. e1009315
Author(s):  
Ardalan Naseri ◽  
Junjie Shi ◽  
Xihong Lin ◽  
Shaojie Zhang ◽  
Degui Zhi

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.


Sign in / Sign up

Export Citation Format

Share Document