Speeding up reactive transport simulations: statistical surrogates and caching of simulation results in lookup tables

Author(s):  
Marco De Lucia ◽  
Robert Engelmann ◽  
Michael Kühn ◽  
Alexander Lindemann ◽  
Max Lübke ◽  
...  

<p>A successful strategy for speeding up coupled reactive transport simulations at price of acceptable accuracy loss is to compute geochemistry, which represents the bottleneck of these simulations, through data-driven surrogates instead of ‘full physics‘ equation-based models [1]. A surrogate is a multivariate regressor trained on a set of pre-calculated geochemical simulations or potentially even at runtime during the coupled simulations. Many available algorithms and implementations are available from the thriving Machine Learning community: tree-based regressors such as Random Forests or xgboost, Artificial Neural Networks, Gaussian Processes and Support Vector Machines just to name a few. Given the ‘black-box‘ nature of the surrogates, however, they generally disregard physical constraints such as mass and charge balance, which are of course of paramount importance for coupled transport simulations. A runtime check of error of balances in the surrogate outcomes is therefore necessary: predictions offending a given tolerance must be rejected and the full physics chemical simulations run instead. Thus the practical speedup of this strategy is a tradeoff between careful training of the surrogate and run-time efficiency.</p><p><br>In this contribution we demonstrate that the use of surrogates can lead to a dramatic decrease of required computing time, with speedup factors in the order of 10 or even 100 in the most favorable cases. Thus, large scale simulations with some 10<sup>6</sup> grid elements are feasible on common workstations without requiring computation on HPC clusters [2].</p><p><span><br>Furthermore, we showcase our implementation of Distributed Hash Tables caching geochemical simulation results for further reuse in subsequent time steps. The computational advantage here stems from the fact that query and retrieval from lookup tables is much faster than both full physics geochemical simulations and surrogate predictions. Another advantage of this algorithm is that virtually no loss of accuracy is introduced in the simulations. Enabling the caching of geochemical simulations through DHT speeds up large scale reactive transport simulations up to a factor of four even when computing on several hundred </span><span>cores</span><span>.</span></p><p><br>These algorithmical developments are demonstrated in comparison with published reactive transport benchmarks and on a real-life scenario of CO<sub>2</sub> storage.</p><p> </p><p> </p><p><span>[1] </span><span>Jatnieks, J., De Lucia, M., Dransch, D., Sips, M. (2016): Data-driven surrogate model approach for improving the performance of reactive transport simulations. Energy Procedia </span><span>97</span><span>, pp. 447-453. DOI: 10.1016/j.egypro.2016.10.047</span></p><p>[2] De Lucia, M., Kempka, T., Jatnieks, J., Kühn, M. (2017): Integrating surrogate models into subsurface simulation framework allows computation of complex reactive transport scenarios. Energy Procedia 125, pp. 580-587. DOI: 10.1016/j.egypro.2017.08.200</p>

2021 ◽  
Vol 8 ◽  
Author(s):  
Radu Mariescu-Istodor ◽  
Pasi Fränti

The scalability of traveling salesperson problem (TSP) algorithms for handling large-scale problem instances has been an open problem for a long time. We arranged a so-called Santa Claus challenge and invited people to submit their algorithms to solve a TSP problem instance that is larger than 1 M nodes given only 1 h of computing time. In this article, we analyze the results and show which design choices are decisive in providing the best solution to the problem with the given constraints. There were three valid submissions, all based on local search, including k-opt up to k = 5. The most important design choice turned out to be the localization of the operator using a neighborhood graph. The divide-and-merge strategy suffers a 2% loss of quality. However, via parallelization, the result can be obtained within less than 2 min, which can make a key difference in real-life applications.


2015 ◽  
Vol 11 (1) ◽  
pp. 66-83 ◽  
Author(s):  
Yong Hu ◽  
Xiangzhou Zhang ◽  
Bin Feng ◽  
Kang Xie ◽  
Mei Liu

Among all investors in the Chinese stock market, more than 95% are non-professional individual investors. These individual investors are in great need of mobile apps that can provide professional and handy trading analysis and decision support everywhere. However, financial data is challenging to analyze because of its large-scale, non-linear and noisy characteristics in a varying stock environment. This paper develops a Mobile Data-Driven Stock Trading System (iTrade), which is a mobile app system based on Client-Server architecture and various data mining techniques. The iTrade is characterized by 1) a data-driven intelligent learning model, which can provide further insight compared to empirical technical analysis, 2) a concept drift adaptation process, which facilitates the model adaptation to market structure changes, and 3) a rigorous benchmark analysis, including the Buy-and-Hold strategy and the strategies of three world-famous master investors (e.g., Warren E. Buffett). Technologies used in iTrade include the Least Absolute Shrinkage and Selection Operator (Lasso) algorithm, Support Vector Machine (SVM) and risk-adjusted portfolio optimization. An application case of iTrade is presented, which is based on a seven-year (2005-2011) back-testing. Evaluation results indicated that iTrade could gain much higher cumulative return compared to the benchmark (Shanghai Composite Index). To the best of our knowledge, this is the first study and mobile app system that emphasizes and investigates the concept drift phenomenon in stock market, as well as the performance comparison between data-driven intelligent model and strategies of master investors.


Author(s):  
Yunsheng Song ◽  
Fangyi Li ◽  
Jianyu Liu ◽  
Juao Zhang

Support vector regression is an important algorithm in machine learning, and it is widely used in real life for its good performance, such as house price forecast, disease prediction, weather forecast, and so on. However, it cannot efficiently process large-scale data, because it has a high time complexity in the training process. Data partition as an important solution to solve the large-scale learning problem mainly focuses on the classification task, it trains the classifiers over the divided subsets produced by data partition and obtain the final classifier by combining those classifiers. Meanwhile, the most existing method rarely study the influence of data partition on the regressor performance, so that it is difficult to keep its generation ability. To solve this problem, we obtain the estimation of the difference in objective function before and after the data partition. Mini-Batch K-Means clustering is adopted to largely reduce this difference, and an improved algorithm is proposed. This proposed algorithm includes training stage and prediction stage. In training stag, it uses Mini-Batch K-Means clustering to divide the input space into some disjoint sub-regions of equal sample size, then it trains the regressor on each divided sub-region using support vector regression algorithm. In the prediction stage, the regressor merely offers the predicted label for the unlabeled instances that are in the same sub-region. Experiment results on real datasets illustrate that the proposed algorithm obtains the similar generation ability as the original algorithm, but it has less execution time than other acceleration algorithms.


Author(s):  
Dian Puspita Hapsari ◽  
Imam Utoyo ◽  
Santi Wulan Purnami

Data classification has several problems one of which is a large amount of data that will reduce computing time. SVM is a reliable linear classifier for linear or non-linear data, for large-scale data, there are computational time constraints. The Fractional gradient descent method is an unconstrained optimization algorithm to train classifiers with support vector machines that have convex problems. Compared to the classic integer-order model, a model built with fractional calculus has a significant advantage to accelerate computing time. In this research, it is to conduct investigate the current state of this new optimization method fractional derivatives that can be implemented in the classifier algorithm. The results of the SVM Classifier with fractional gradient descent optimization, it reaches a convergence point of approximately 50 iterations smaller than SVM-SGD. The process of updating or fixing the model is smaller in fractional because the multiplier value is less than 1 or in the form of fractions. The SVM-Fractional SGD algorithm is proven to be an effective method for rainfall forecast decisions.


Author(s):  
Sawan Kumar ◽  
Varsha Sreenivasan ◽  
Partha Talukdar ◽  
Franco Pestilli ◽  
Devarajan Sridharan

Diffusion imaging and tractography enable mapping structural connections in the human brain, in-vivo. Linear Fascicle Evaluation (LiFE) is a state-of-the-art approach for pruning spurious connections in the estimated structural connectome, by optimizing its fit to the measured diffusion data. Yet, LiFE imposes heavy demands on computing time, precluding its use in analyses of large connectome databases. Here, we introduce a GPU-based implementation of LiFE that achieves 50-100x speedups over conventional CPU-based implementations for connectome sizes of up to several million fibers. Briefly, the algorithm accelerates generalized matrix multiplications on a compressed tensor through efficient GPU kernels, while ensuring favorable memory access patterns. Leveraging these speedups, we advance LiFE’s algorithm by imposing a regularization constraint on estimated fiber weights during connectome pruning. Our regularized, accelerated, LiFE algorithm (“ReAl-LiFE”) estimates sparser connectomes that also provide more accurate fits to the underlying diffusion signal. We demonstrate the utility of our approach by classifying pathological signatures of structural connectivity in patients with Alzheimer’s Disease (AD). We estimated million fiber whole-brain connectomes, followed by pruning with ReAl-LiFE, for 90 individuals (45 AD patients and 45 healthy controls). Linear classifiers, based on support vector machines, achieved over 80% accuracy in classifying AD patients from healthy controls based on their ReAl-LiFE pruned structural connectomes alone. Moreover, classification based on the ReAl-LiFE pruned connectome outperformed both the unpruned connectome, as well as the LiFE pruned connectome, in terms of accuracy. We propose our GPU-accelerated approach as a widely relevant tool for non-negative least squares optimization, across many domains.


2002 ◽  
Vol 14 (5) ◽  
pp. 1105-1114 ◽  
Author(s):  
Ronan Collobert ◽  
Samy Bengio ◽  
Yoshua Bengio

Support vector machines (SVMs) are the state-of-the-art models for many classification problems, but they suffer from the complexity of their training algorithm, which is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundred thousand examples with SVMs. This article proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole data set. Experiments on a large benchmark data set (Forest) yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples). In addition, and surprisingly, a significant improvement in generalization was observed.


Energies ◽  
2018 ◽  
Vol 12 (1) ◽  
pp. 109 ◽  
Author(s):  
Jingjing Tu ◽  
Yonghai Xu ◽  
Zhongdong Yin

For the integration of distributed generations such as large-scale wind and photovoltaic power generation, the characteristics of the distribution network are fundamentally changed. The intermittence, variability, and uncertainty of wind and photovoltaic power generation make the adjustment of the network peak load and the smooth control of power become the key issues of the distribution network to accept various types of distributed power. This paper uses data-driven thinking to describe the uncertainty of scenery output, and introduces it into the power flow calculation of distribution network with multi-class DG, improving the processing ability of data, so as to better predict DG output. For the problem of network stability and operational control complexity caused by DG access, using KELM algorithm to simplify the complexity of the model and improve the speed and accuracy. By training and testing the KELM model, various DG configuration schemes that satisfy the minimum network loss and constraints are given, and the voltage stability evaluation index is introduced to evaluate the results. The general recommendation for DG configuration is obtained. That is, DG is more suitable for accessing the lower point of the network voltage or the end of the network. By configuring the appropriate capacity, it can reduce the network loss, improve the network voltage stability, and the quality of the power supply. Finally, the IEEE33&69-bus radial distribution system is used to simulate, and the results are compared with the existing particle swarm optimization (PSO), genetic algorithm (GA), and support vector machine (SVM). The feasibility and effectiveness of the proposed model and method are verified.


2021 ◽  
Author(s):  
Sapna Yadav ◽  
Satish Chand

The rapid growth in deep learning has made convolutional neural networks deeper and more complex to realize higher accuracy. But many day-to-day recognition tasks need be performed in a limited computational platform. One of the applications is food image recognition which is very helpful in individual’s health monitoring, dietary assessment, nutrition analysis etc. This task needs small convolutional neural network based engine to do computations fast and accurate. MoblieNetV2 being simple and smaller in size can incorporate easily into small end devices. In this paper, MobileNetV2 and support vector machine are used to classify the food images. Simulation results show that the features extracted from Conv_1 layer, out_relu layer and Conv_1_bn layer of MobileNetV2 and classified using Support Vector Machine have achieved classification accuracies of 84.0%, 87.27% and 83.60% respectively. Because of fewer parameters, smaller size and lesser training time, MobileNetV2 is an excellent choice for real-life recognition tasks.


2020 ◽  
Vol 2 ◽  
Author(s):  
Carlos André Muñoz López ◽  
Satyajeet Bhonsale ◽  
Kristin Peeters ◽  
Jan F. M. Van Impe

Processing data that originates from uneven, multi-phase batches is a challenge in data-driven modeling. Training predictive and monitoring models requires the data to be in the right shape to be informative. Only then can a model learn meaningful features that describe the deterministic variability of the process. The presence of multiple phases in the data, which display different correlation patterns and have an uneven duration from batch to batch, reduces the performance of the data-driven modeling methods significantly. Therefore, phase identification and alignment is a critical step and can lead to an unsuccessful modeling exercise if not applied correctly. In this paper, a novel approach is proposed to perform unsupervised phase identification and alignment based on the correlation patterns found in the data. Phase identification is performed via manifold learning using t-Distributed Stochastic Neighbor Embedding (t-SNE), which is a state-of-the-art machine learning algorithm for non-linear dimensionality reduction. The application of t-SNE to a reduced cross-correlation matrix of every batch with respect to a reference batch results in data clustering in the embedded space. Models based on support vector machines (SVMs) are trained to, 1) reproduce the manifold learning obtained via t-SNE, and 2) determine the membership of the data points to a process phase. Compared to previously proposed clustering approaches for phase identification, this is an unsupervised, non-linear method. The perplexity parameter of the t-SNE algorithm can be interpreted as the estimated duration of the shortest phase in the process. The advantages of the proposed method are demonstrated through its application on an in-silico benchmark case study, and on real industrial data from two unit-operations in the large scale production of an active pharmaceutical ingredients (API). The efficacy and robustness of the method are evidenced in the successful phase identification and alignment obtained for these three distinct processes, displaying smooth, sudden and repetitive phase changes. Additionally, the low complexity of the method makes feasible its online implementation.


Sign in / Sign up

Export Citation Format

Share Document