Key Feature Selecting in the Clean Oil Refinery Process Based on a Two-Stage Data Mining Framework
Maintaining the ratio of octane number and reducing the proportion of harmful substances in the heavy oil fluid catalytic cracking process meets both environmental and economic benefits. Through collecting tremendous processing data by digital hardware, gasoline refiners are still hard to do well data analytical work in production process control due to the large scale of ambiguous intermediate operating variables. This paper proposes a two-stage data mining framework integrates the strengths of Ridge regression and Person correlation analysis to extract a scale limited group of key features. Different with traditional recursive feature elimination methods, we pay more attention to the correlation analysis between every couple of features in the result. Two stop criterions guarantee to fulfil refining standards and limit the computational work in finite steps. A real word case study contains 325 samples, 13 quality indicators and 354 operating variables which testifies the validity and practicality of our algorithm. The result shows only 13 features (operating variables) are significant to the rationality of process design and the improvement of process control.