Parallel data-driven decomposition algorithm for large-scale datasets: with application to transitional boundary layers

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-021-00267-x ◽

2021 ◽

Author(s):

Ekaterina Kochmar ◽

Dung Do Vu ◽

Robert Belfer ◽

Varun Gupta ◽

Iulian Vlad Serban ◽

...

Keyword(s):

Machine Learning ◽

Student Performance ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Large Scale ◽

Intelligent Tutoring ◽

Performance Outcomes ◽

Data Driven ◽

Personalized Feedback ◽

Tutoring Systems

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.

Download Full-text

Data-Driven Energy Use Estimation in Large Scale Transportation Networks

Proceedings of the 2nd ACM/EIGSCC Symposium on Smart Cities and Communities - SCC '19 ◽

10.1145/3357492.3358632 ◽

2019 ◽

Author(s):

Bin Wang ◽

Cy Chan ◽

Divya Somasi ◽

Jane Macfarlane ◽

Eric Rask

Keyword(s):

Large Scale ◽

Energy Use ◽

Transportation Networks ◽

Data Driven

Download Full-text

Improving the management of type 2 diabetes through large-scale general practice: the role of a data-driven and technology-enabled education programme

BMJ Open Quality ◽

10.1136/bmjoq-2020-001087 ◽

2021 ◽

Vol 10 (1) ◽

pp. e001087

Author(s):

Tarek F Radwan ◽

Yvette Agyako ◽

Alireza Ettefaghian ◽

Tahira Kamran ◽

Omar Din ◽

...

Keyword(s):

Type 2 Diabetes ◽

Primary Care ◽

Large Scale ◽

Education Programme ◽

Educational Programme ◽

Data Driven ◽

Treatment Targets ◽

Care Processes ◽

Data Driven Approach

A quality improvement (QI) scheme was launched in 2017, covering a large group of 25 general practices working with a deprived registered population. The aim was to improve the measurable quality of care in a population where type 2 diabetes (T2D) care had previously proved challenging. A complex set of QI interventions were co-designed by a team of primary care clinicians and educationalists and managers. These interventions included organisation-wide goal setting, using a data-driven approach, ensuring staff engagement, implementing an educational programme for pharmacists, facilitating web-based QI learning at-scale and using methods which ensured sustainability. This programme was used to optimise the management of T2D through improving the eight care processes and three treatment targets which form part of the annual national diabetes audit for patients with T2D. With the implemented improvement interventions, there was significant improvement in all care processes and all treatment targets for patients with diabetes. Achievement of all the eight care processes improved by 46.0% (p<0.001) while achievement of all three treatment targets improved by 13.5% (p<0.001). The QI programme provides an example of a data-driven large-scale multicomponent intervention delivered in primary care in ethnically diverse and socially deprived areas.

Download Full-text

A Lagrange Relaxation Based Decomposition Algorithm for Large-Scale Offshore Oil Production Planning Optimization

Processes ◽

10.3390/pr9071257 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1257

Author(s):

Xiaoyong Gao ◽

Yue Zhao ◽

Yuhong Wang ◽

Xin Zuo ◽

Tao Chen

Keyword(s):

Production Planning ◽

Large Scale ◽

Oil Production ◽

Original Model ◽

Decomposition Algorithm ◽

Scale Model ◽

Lagrange Relaxation ◽

Planning Optimization ◽

Offshore Oil Production ◽

Offshore Oil

In this paper, a new Lagrange relaxation based decomposition algorithm for the integrated offshore oil production planning optimization is presented. In our previous study (Gao et al. Computers and Chemical Engineering, 2020, 133, 106674), a multiperiod mixed-integer nonlinear programming (MINLP) model considering both well operation and flow assurance simultaneously had been proposed. However, due to the large-scale nature of the problem, i.e., too many oil wells and long planning time cycle, the optimization problem makes it difficult to get a satisfactory solution in a reasonable time. As an effective method, Lagrange relaxation based decomposition algorithms can provide more compact bounds and thus result in a smaller duality gap. Specifically, Lagrange multiplier is introduced to relax coupling constraints of multi-batch units and thus some moderate scale sub-problems result. Moreover, dual problem is constructed for iteration. As a result, the original integrated large-scale model is decomposed into several single-batch subproblems and solved simultaneously by commercial solvers. Computational results show that the proposed method can reduce the solving time up to 43% or even more. Meanwhile, the planning results are close to those obtained by the original model. Moreover, the larger the problem size, the better the proposed LR algorithm is than the original model.

Download Full-text

Data-Driven Lightweight Interest Point Selection for Large-Scale Visual Search

IEEE Transactions on Multimedia ◽

10.1109/tmm.2018.2818012 ◽

2018 ◽

Vol 20 (10) ◽

pp. 2774-2787 ◽

Cited By ~ 2

Author(s):

Feng Gao ◽

Xinfeng Zhang ◽

Yicheng Huang ◽

Yong Luo ◽

Xiaoming Li ◽

...

Keyword(s):

Visual Search ◽

Large Scale ◽

Data Driven ◽

Interest Point ◽

Point Selection ◽

Selection For

Download Full-text

Data-driven worldwide quantification of large-scale hydroclimatic co-variation patterns and comparison with reanalysis and Earth System modeling

10.1002/essoar.10505258.1 ◽

2020 ◽

Author(s):

Navid Ghajarnia ◽

Zahra Kalantari ◽

Georgia Destouni

Keyword(s):

Large Scale ◽

System Modeling ◽

Data Driven ◽

Earth System ◽

Earth System Modeling ◽

Variation Patterns

Download Full-text

Large-scale experiments on data-driven design of commercial spoken dialog systems

10.21437/interspeech.2011-296 ◽

2011 ◽

Author(s):

D. Suendermann ◽

J. Liscombe ◽

J. Bloom ◽

G. Li ◽

Roberto Pieraccini

Keyword(s):

Large Scale ◽

Data Driven ◽

Spoken Dialog Systems ◽

Dialog Systems

Download Full-text

The Star Degree Centrality Problem: A Decomposition Approach

INFORMS Journal on Computing ◽

10.1287/ijoc.2021.1074 ◽

2021 ◽

Author(s):

Mustafa C. Camur ◽

Thomas Sharkey ◽

Chrysafis Vogiatzis

Keyword(s):

Integer Programming ◽

Large Scale ◽

Benders Decomposition ◽

Open Neighborhood ◽

Decomposition Algorithm ◽

Solution Time ◽

Set Cover ◽

Degree Centrality ◽

Decomposition Approach ◽

Acceleration Techniques

We consider the problem of identifying the induced star with the largest cardinality open neighborhood in a graph. This problem, also known as the star degree centrality (SDC) problem, is shown to be [Formula: see text]-complete. In this work, we first propose a new integer programming (IP) formulation, which has a smaller number of constraints and nonzero coefficients in them than the existing formulation in the literature. We present classes of networks in which the problem is solvable in polynomial time and offer a new proof of [Formula: see text]-completeness that shows the problem remains [Formula: see text]-complete for both bipartite and split graphs. In addition, we propose a decomposition framework that is suitable for both the existing and our formulations. We implement several acceleration techniques in this framework, motivated by techniques used in Benders decomposition. We test our approaches on networks generated based on the Barabási–Albert, Erdös–Rényi, and Watts–Strogatz models. Our decomposition approach outperforms solving the IP formulations in most of the instances in terms of both solution time and quality; this is especially true for larger and denser graphs. We then test the decomposition algorithm on large-scale protein–protein interaction networks, for which SDC is shown to be an important centrality metric. Summary of Contribution: In this study, we first introduce a new integer programming (NIP) formulation for the star degree centrality (SDC) problem in which the goal is to identify the induced star with the largest open neighborhood. We then show that, although the SDC can be efficiently solved in tree graphs, it remains [Formula: see text]-complete in both split and bipartite graphs via a reduction performed from the set cover problem. In addition, we implement a decomposition algorithm motivated by Benders decomposition together with several acceleration techniques to both the NIP formulation and the existing formulation in the literature. Our experimental results indicate that the decomposition implementation on the NIP is the best solution method in terms of both solution time and quality.

Download Full-text

An efficient Cholesky decomposition and applications for the simulation of large-scale random wind velocity fields

Advances in Structural Engineering ◽

10.1177/1369433218810642 ◽

2018 ◽

Vol 22 (6) ◽

pp. 1255-1265 ◽

Cited By ~ 1

Author(s):

Yongle Li ◽

Chuanjin Yu ◽

Xingyu Chen ◽

Xinyu Xu ◽

Koffi Togbenou ◽

...

Keyword(s):

Wind Velocity ◽

Large Scale ◽

Spectral Representation ◽

Cholesky Decomposition ◽

Velocity Fields ◽

Data Driven ◽

Long Span Bridges ◽

Long Span ◽

Spectral Representation Method ◽

Representation Method

A growing number of long-span bridges are under construction across straits or through valleys, where the wind characteristics are complex and inhomogeneous. The simulation of inhomogeneous random wind velocity fields on such long-span bridges with the spectral representation method will require significant computation resources due to the time-consuming issues associated with the Cholesky decomposition of the power spectrum density matrixes. In order to improve the efficiency of the decomposition, a novel and efficient formulation of the Cholesky decomposition, called “Band-Limited Cholesky decomposition,” is proposed and corresponding simulation schemes are suggested. The key idea is to convert the coherence matrixes into band matrixes whose decomposition requires less computational cost and storage. Subsequently, each decomposed coherence matrix is also a band matrix with high sparsity. As the zero-valued elements have no contribution to the simulation calculation, the proposed method is further expedited by limiting the calculation to the non-zero elements only. The proposed methods are data-driven ones, which can be applicable broadly for simulating many complicated large-scale random wind velocity fields, especially for the inhomogeneous ones. Through the data-driven strategies presented in the study, a numerical example involving inhomogeneous random wind velocity field simulation on a long-span bridge is performed. Compared to the traditional spectral representation method, the simulation results are with high accuracy and the entire simulation procedure is about 2.5 times faster by the proposed method for the simulation of one hundred wind velocity processes.

Download Full-text