Ladle furnace temperature prediction model based on large-scale data with random forest

2017 ◽  
Vol 4 (4) ◽  
pp. 770-774 ◽  
Author(s):  
Xiaojun Wang
Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
E. Zhu ◽  
D. Pi

In order to remedy problems encompassing large-scale data being collected by photovoltaic (PV) stations, multiple dimensions of power prediction mode input, noise, slow model convergence speed, and poor precision, a power prediction model that combines the Candid Covariance-free Incremental Principal Component Analysis (CCIPCA) with Long Short-Term Memory (LSTM) network was proposed in this study. The corresponding model uses factor correlation coefficient to evaluate the factors that affect PV generation and obtains the most critical factor of PV generation. Then, it uses CCIPCA to reduce the dimension of PV super large-scale data to the factor dimension, avoiding the complex calculation of covariance matrix of algorithms such as Principal Component Analysis (PCA) and to some extent eliminating the influence of noise made by PV generation data acquisition equipment and transmission equipment such as sensors. The training speed and convergence speed of LSTM are improved by the dimension-reduced data. The PV generation data of a certain power station over a period is collected from SolarGIS as sample data. The model is compared with Markov chain power generation prediction model and GA-BP power generation prediction model. The experimental results indicate that the generation prediction error of the model is less than 3%.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Xi Shi ◽  
Gorana Nikolic ◽  
Gorka Epelde ◽  
Mónica Arrúe ◽  
Joseba Bidaurrazaga Van-Dierdonck ◽  
...  

Abstract Background The increasing prevalence of childhood obesity makes it essential to study the risk factors with a sample representative of the population covering more health topics for better preventive policies and interventions. It is aimed to develop an ensemble feature selection framework for large-scale data to identify risk factors of childhood obesity with good interpretability and clinical relevance. Methods We analyzed the data collected from 426,813 children under 18 during 2000–2019. A BMI above the 90th percentile for the children of the same age and gender was defined as overweight. An ensemble feature selection framework, Bagging-based Feature Selection framework integrating MapReduce (BFSMR), was proposed to identify risk factors. The framework comprises 5 models (filter with mutual information/SVM-RFE/Lasso/Ridge/Random Forest) from filter, wrapper, and embedded feature selection methods. Each feature selection model identified 10 variables based on variable importance. Considering accuracy, F-score, and model characteristics, the models were classified into 3 levels with different weights: Lasso/Ridge, Filter/SVM-RFE, and Random Forest. The voting strategy was applied to aggregate the selected features, with both feature weights and model weights taken into consideration. We compared our voting strategy with another two for selecting top-ranked features in terms of 6 dimensions of interpretability. Results Our method performed the best to select the features with good interpretability and clinical relevance. The top 10 features selected by BFSMR are age, sex, birth year, breastfeeding type, smoking habit and diet-related knowledge of both children and mothers, exercise, and Mother’s systolic blood pressure. Conclusion Our framework provides a solution for identifying a diverse and interpretable feature set without model bias from large-scale data, which can help identify risk factors of childhood obesity and potentially some other diseases for future interventions or policies.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

2016 ◽  
Author(s):  
John W. Williams ◽  
◽  
Simon Goring ◽  
Eric Grimm ◽  
Jason McLachlan

2008 ◽  
Vol 9 (10) ◽  
pp. 1373-1381 ◽  
Author(s):  
Ding-yin Xia ◽  
Fei Wu ◽  
Xu-qing Zhang ◽  
Yue-ting Zhuang

Sign in / Sign up

Export Citation Format

Share Document