Key Feature Selecting in the Clean Oil Refinery Process Based on a Two-Stage Data Mining Framework

Maintaining the ratio of octane number and reducing the proportion of harmful substances in the heavy oil fluid catalytic cracking process meets both environmental and economic benefits. Through collecting tremendous processing data by digital hardware, gasoline refiners are still hard to do well data analytical work in production process control due to the large scale of ambiguous intermediate operating variables. This paper proposes a two-stage data mining framework integrates the strengths of Ridge regression and Person correlation analysis to extract a scale limited group of key features. Different with traditional recursive feature elimination methods, we pay more attention to the correlation analysis between every couple of features in the result. Two stop criterions guarantee to fulfil refining standards and limit the computational work in finite steps. A real word case study contains 325 samples, 13 quality indicators and 354 operating variables which testifies the validity and practicality of our algorithm. The result shows only 13 features (operating variables) are significant to the rationality of process design and the improvement of process control.

Download Full-text

Feasibility Studies of Two-Stage Testing in Large-Scale Educational Assessment: Implications for NAEP

PsycEXTRA Dataset ◽

10.1037/e540612012-001 ◽

1998 ◽

Cited By ~ 1

Author(s):

R. Darrell Bock ◽

Michele F. Zimowski

Keyword(s):

Large Scale ◽

Feasibility Studies ◽

Educational Assessment ◽

Two Stage

Download Full-text

Screening for the FV: Q506 Mutation – Evaluation of Thirteen Plasma-based Methods for their Diagnostic Efficacy in Comparison with DNA Analysis

Thrombosis and Haemostasis ◽

10.1055/s-0038-1655984 ◽

1997 ◽

Vol 77 (03) ◽

pp. 436-439 ◽

Cited By ~ 39

Author(s):

Armando Tripodi ◽

Barbara Negri ◽

Rogier M Bertina ◽

Pier Mannuccio Mannucci

Keyword(s):

Venous Thrombosis ◽

Large Scale ◽

Dna Analysis ◽

Genetic Defect ◽

Laboratory Diagnosis ◽

Economic Benefits ◽

Factor V ◽

Factor X ◽

Diagnostic Efficacy ◽

Dna Test

SummaryThe factor V (FV) mutation Q506 that causes resistance to activated protein C (APC) is the genetic defect associated most frequently with venous thrombosis. The laboratory diagnosis can be made by DNA analysis or by clotting tests that measure the degree of prolongation of plasma clotting time upon addition of APC. Home-made and commercial methods are available but no comparative evaluation of their diagnostic efficacy has so far been reported. Eighty frozen coded plasma samples from carriers and non-carriers of the FV: Q506 mutation, diagnosed by DNA analysis, were sent to 8 experienced laboratories that were asked to analyze these samples in blind with their own APC resistance tests. The APTT methods were highly variable in their capacity to discriminate between carriers and non-carriers but this capacity increased dramatically when samples were diluted with FV-deficient plasma before analysis, bringing the sensitivity and specificity of these tests to 100%. The best discrimination was obtained with methods in which fibrin formation is triggered by the addition of activated factor X or Russell viper venom. In conclusion, this study provides evidence that some coagulation tests are able to distinguish carriers of the FV: Q506 mutation from non-carriers as well as the DNA test. They are inexpensive and easy to perform. Their use in large-scale clinical trials should be of help to determine the medical and economic benefits of screening healthy individuals for the mutation before they are exposed to such risk factors for venous thrombosis as surgery, pregnancy and oral contraceptives.

Download Full-text

Economic benefit of shale gas exploitation based on back propagation neural network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189279 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8823-8830

Author(s):

Jiafeng Li ◽

Hui Hu ◽

Xiang Li ◽

Qian Jin ◽

Tianhao Huang

Keyword(s):

Neural Network ◽

Shale Gas ◽

Bp Neural Network ◽

Large Scale ◽

Linear Prediction ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Economic Benefits ◽

Gas Well ◽

Well Production

Under the influence of COVID-19, the economic benefits of shale gas development are greatly affected. With the large-scale development and utilization of shale gas in China, it is increasingly important to assess the economic impact of shale gas development. Therefore, this paper proposes a method for predicting the production of shale gas reservoirs, and uses back propagation (BP) neural network to nonlinearly fit reservoir reconstruction data to obtain shale gas well production forecasting models. Experiments show that compared with the traditional BP neural network, the proposed method can effectively improve the accuracy and stability of the prediction. There is a nonlinear correlation between reservoir reconstruction data and gas well production, which does not apply to traditional linear prediction methods

Download Full-text

A Power Optimization Method in Large-Scale Wireless Sensor Networks Using a Two-Stage Meta-Heuristic Algorithm

IEICE Proceeding Series ◽

10.15248/proc.2.302 ◽

2014 ◽

Vol 2 ◽

pp. 302-305

Author(s):

Ryohei Sato ◽

Hidehiro Nakano ◽

Arata Miyauchi

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Heuristic Algorithm ◽

Large Scale ◽

Power Optimization ◽

Optimization Method ◽

Wireless Sensor ◽

Two Stage

Download Full-text

Accelerated Discovery of High-Refractive-Index Polyimides via First-Principles Molecular Modeling, Virtual High-Throughput Screening, and Data Mining

10.26434/chemrxiv.7670903.v1 ◽

2019 ◽

Author(s):

Mohammad Atif Faiz Afzal ◽

Mojtaba Haghighatlari ◽

Sai Prasad Ganesh ◽

Chong Cheng ◽

Johannes Hachmann

Keyword(s):

Data Mining ◽

Refractive Index ◽

High Throughput ◽

First Principles ◽

High Throughput Screening ◽

Large Scale ◽

Computational Study ◽

High Refractive Index ◽

Structural Features ◽

Learning Program

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>

Download Full-text

Multi-GPU approach to global induction of classification trees for large-scale data mining

Applied Intelligence ◽

10.1007/s10489-020-01952-5 ◽

2021 ◽

Author(s):

Krzysztof Jurczuk ◽

Marcin Czajkowski ◽

Marek Kretowski

Keyword(s):

Data Mining ◽

Large Scale ◽

Real Life ◽

Population Based ◽

Tree Structure ◽

Global Approach ◽

Data Parallel ◽

Large Scale Data ◽

The Impact ◽

Scale Data

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

Solving the Real Power Limitations in the Dynamic Economic Dispatch of Large-Scale Thermal Power Units under the Effects of Valve-Point Loading and Ramp-Rate Limitations

Sustainability ◽

10.3390/su13031274 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1274

Author(s):

Loau Al-Bahrani ◽

Mehdi Seyedmahmoudian ◽

Ben Horan ◽

Alex Stojcevski

Keyword(s):

Large Scale ◽

Thermal Power ◽

Optimization Technique ◽

Economic Dispatch ◽

Pso Algorithm ◽

Search Space ◽

Economic Benefits ◽

Optimization Techniques ◽

Ramp Rate ◽

Dynamic Economic Dispatch

Few non-traditional optimization techniques are applied to the dynamic economic dispatch (DED) of large-scale thermal power units (TPUs), e.g., 1000 TPUs, that consider the effects of valve-point loading with ramp-rate limitations. This is a complicated multiple mode problem. In this investigation, a novel optimization technique, namely, a multi-gradient particle swarm optimization (MG-PSO) algorithm with two stages for exploring and exploiting the search space area, is employed as an optimization tool. The M particles (explorers) in the first stage are used to explore new neighborhoods, whereas the M particles (exploiters) in the second stage are used to exploit the best neighborhood. The M particles’ negative gradient variation in both stages causes the equilibrium between the global and local search space capabilities. This algorithm’s authentication is demonstrated on five medium-scale to very large-scale power systems. The MG-PSO algorithm effectively reduces the difficulty of handling the large-scale DED problem, and simulation results confirm this algorithm’s suitability for such a complicated multi-objective problem at varying fitness performance measures and consistency. This algorithm is also applied to estimate the required generation in 24 h to meet load demand changes. This investigation provides useful technical references for economic dispatch operators to update their power system programs in order to achieve economic benefits.

Download Full-text

Two-stage multi-tasking transform framework for large-scale many-objective optimization problems

Complex & Intelligent Systems ◽

10.1007/s40747-021-00273-5 ◽

2021 ◽

Author(s):

Lu Chen ◽

Handing Wang ◽

Wenping Ma

Keyword(s):

Large Scale ◽

Original Problem ◽

Optimization Problems ◽

Reference Points ◽

Optimization Strategy ◽

Two Stage ◽

Decision Space ◽

Second Stage ◽

Decision Variables ◽

Objective Space

AbstractReal-world optimization applications in complex systems always contain multiple factors to be optimized, which can be formulated as multi-objective optimization problems. These problems have been solved by many evolutionary algorithms like MOEA/D, NSGA-III, and KnEA. However, when the numbers of decision variables and objectives increase, the computation costs of those mentioned algorithms will be unaffordable. To reduce such high computation cost on large-scale many-objective optimization problems, we proposed a two-stage framework. The first stage of the proposed algorithm combines with a multi-tasking optimization strategy and a bi-directional search strategy, where the original problem is reformulated as a multi-tasking optimization problem in the decision space to enhance the convergence. To improve the diversity, in the second stage, the proposed algorithm applies multi-tasking optimization to a number of sub-problems based on reference points in the objective space. In this paper, to show the effectiveness of the proposed algorithm, we test the algorithm on the DTLZ and LSMOP problems and compare it with existing algorithms, and it outperforms other compared algorithms in most cases and shows disadvantage on both convergence and diversity.

Download Full-text

Combined Aggregated Sampling Stochastic Dynamic Programming and Simulation-Optimization to Derive Operation Rules for Large-Scale Hydropower System

Energies ◽

10.3390/en14030625 ◽

2021 ◽

Vol 14 (3) ◽

pp. 625

Author(s):

Xinyu Wu ◽

Rui Guo ◽

Xilong Cheng ◽

Chuntian Cheng

Keyword(s):

Dynamic Programming ◽

Stochastic Dynamic Programming ◽

Large Scale ◽

Simulation Optimization ◽

Optimal Operation ◽

Stochastic Dynamic ◽

Operation Model ◽

Two Stage ◽

Operation Rules ◽

Reservoir Systems

Simulation-optimization methods are often used to derive operation rules for large-scale hydropower reservoir systems. The solution of the simulation-optimization models is complex and time-consuming, for many interconnected variables need to be optimized, and the objective functions need to be computed through simulation in many periods. Since global solutions are seldom obtained, the initial solutions are important to the solution quality. In this paper, a two-stage method is proposed to derive operation rules for large-scale hydropower systems. In the first stage, the optimal operation model is simplified and solved using sampling stochastic dynamic programming (SSDP). In the second stage, the optimal operation model is solved by using a genetic algorithm, taking the SSDP solution as an individual in the initial population. The proposed method is applied to a hydropower system in Southwest China, composed of cascaded reservoir systems of Hongshui River, Lancang River, and Wu River. The numerical result shows that the two-stage method can significantly improve the solution in an acceptable solution time.

Download Full-text

A Compound Wind Power Forecasting Strategy Based on Clustering, Two-Stage Decomposition, Parameter Optimization, and Optimal Combination of Multiple Machine Learning Approaches

Energies ◽

10.3390/en12183586 ◽

2019 ◽

Vol 12 (18) ◽

pp. 3586 ◽

Cited By ~ 1

Author(s):

Sizhou Sun ◽

Jingqi Fu ◽

Ang Li

Keyword(s):

Wind Power ◽

Large Scale ◽

Wind Farm ◽

Search Algorithm ◽

Interpolation Method ◽

Least Square ◽

Support Vector ◽

Two Stage ◽

Wind Power Forecasting ◽

Power Forecasting

Given the large-scale exploitation and utilization of wind power, the problems caused by the high stochastic and random characteristics of wind speed make researchers develop more reliable and precise wind power forecasting (WPF) models. To obtain better predicting accuracy, this study proposes a novel compound WPF strategy by optimal integration of four base forecasting engines. In the forecasting process, density-based spatial clustering of applications with noise (DBSCAN) is firstly employed to identify meaningful information and discard the abnormal wind power data. To eliminate the adverse influence of the missing data on the forecasting accuracy, Lagrange interpolation method is developed to get the corrected values of the missing points. Then, the two-stage decomposition (TSD) method including ensemble empirical mode decomposition (EEMD) and wavelet transform (WT) is utilized to preprocess the wind power data. In the decomposition process, the empirical wind power data are disassembled into different intrinsic mode functions (IMFs) and one residual (Res) by EEMD, and the highest frequent time series IMF1 is further broken into different components by WT. After determination of the input matrix by a partial autocorrelation function (PACF) and normalization into [0, 1], these decomposed components are used as the input variables of all the base forecasting engines, including least square support vector machine (LSSVM), wavelet neural networks (WNN), extreme learning machine (ELM) and autoregressive integrated moving average (ARIMA), to make the multistep WPF. To avoid local optima and improve the forecasting performance, the parameters in LSSVM, ELM, and WNN are tuned by backtracking search algorithm (BSA). On this basis, BSA algorithm is also employed to optimize the weighted coefficients of the individual forecasting results that produced by the four base forecasting engines to generate an ensemble of the forecasts. In the end, case studies for a certain wind farm in China are carried out to assess the proposed forecasting strategy.

Download Full-text