scholarly journals An interpretable prediction method for university student academic crisis warning

Author(s):  
Zhai Mingyu ◽  
Wang Sutong ◽  
Wang Yanzhang ◽  
Wang Dujuan

AbstractData-driven techniques improve the quality of talent training comprehensively for university by discovering potential academic problems and proposing solutions. We propose an interpretable prediction method for university student academic crisis warning, which consists of K-prototype-based student portrait construction and Catboost–SHAP-based academic achievement prediction. The academic crisis warning experiment is carried out on desensitization multi-source student data of a university. The experimental results show that the proposed method has significant advantages over common machine learning algorithms. In terms of achievement prediction, mean square error (MSE) reaches 24.976, mean absolute error (MAE) reaches 3.551, coefficient of determination ($$R^{2}$$ R 2 ) reaches 80.3%. The student portrait and Catboost–SHAP method are used for visual analysis of the academic achievement factors, which provide intuitive decision support and guidance assistance for education administrators.

Materials ◽  
2020 ◽  
Vol 13 (5) ◽  
pp. 1072 ◽  
Author(s):  
Dong Van Dao ◽  
Hai-Bang Ly ◽  
Huong-Lan Thi Vu ◽  
Tien-Thinh Le ◽  
Binh Thai Pham

Development of Foamed Concrete (FC) and incessant increases in fabrication technology have paved the way for many promising civil engineering applications. Nevertheless, the design of FC requires a large number of experiments to determine the appropriate Compressive Strength (CS). Employment of machine learning algorithms to take advantage of the existing experiments database has been attempted, but model performance can still be improved. In this study, the performance of an Artificial Neural Network (ANN) was fully analyzed to predict the 28 days CS of FC. Monte Carlo simulations (MCS) were used to statistically analyze the convergence of the modeled results under the effect of random sampling strategies and the network structures selected. Various statistical measures such as Coefficient of Determination (R2), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) were used for validation of model performance. The results show that ANN is a highly efficient predictor of the CS of FC, achieving a maximum R2 value of 0.976 on the training part and an R2 of 0.972 on the testing part, using the optimized C-ANN-[3–4–5–1] structure, which compares with previous published studies. In addition, a sensitivity analysis using Partial Dependence Plots (PDP) over 1000 MCS was also performed to interpret the relationship between the input parameters and 28 days CS of FC. Dry density was found as the variable with the highest impact to predict the CS of FC. The results presented could facilitate and enhance the use of C-ANN in other civil engineering-related problems.


Energies ◽  
2020 ◽  
Vol 13 (20) ◽  
pp. 5420
Author(s):  
Alexandre Lucas ◽  
Konstantinos Pegios ◽  
Evangelos Kotsakis ◽  
Dan Clarke

The importance of price forecasting has gained attention over the last few years, with the growth of aggregators and the general opening of the European electricity markets. Market participants manage a tradeoff between, bidding in a lower price market (day-ahead), but with typically higher volume, or aiming for a lower volume market but with potentially higher returns (balance energy market). Companies try to forecast the extremes of revenues or prices, in order to manage risk and opportunity, assigning their assets in an optimal way. It is thought that in general, electricity markets have quasi-deterministic principles, rather than being based on speculation, hence the desire to forecast the price based on variables that can describe the outcome of the market. Many studies address this problem from a statistical approach or by performing multiple-variable regressions, but they very often focus only on the time series analysis. In 2019, the Loss of Load Probability (LOLP) was made available in the UK for the first time. Taking this opportunity, this study focusses on five LOLP variables (with different time-ahead estimations) and other quasi-deterministic variables, to explain the price behavior of a multi-variable regression model. These include base production, system load, solar and wind generation, seasonality, day-ahead price and imbalance volume contributions. Three machine-learning algorithms were applied to test for performance, Gradient Boosting (GB), Random Forest (RF) and XGBoost. XGBoost presented higher performance and so it was chosen for the implementation of the real time forecast step. The model returns a Mean Absolute Error (MAE) of 7.89 £/MWh, a coefficient of determination (R2 score) of 76.8% and a Mean Squared Error (MSE) of 124.74. The variables that contribute the most to the model are the Net Imbalance Volume, the LOLP (aggregated), the month and the De-rated margins (aggregated) with 28.6%, 27.5%, 14.0%, and 8.9% of weight on feature importance respectively.


2018 ◽  
Vol 20 (suppl_2) ◽  
pp. i167-i167
Author(s):  
Hannah McAtee ◽  
Sheila Barron ◽  
Natalie Denburg ◽  
Amanda Grafft ◽  
Timothy Ginader ◽  
...  

2020 ◽  
Vol 5 (2) ◽  
pp. 183-186
Author(s):  
Ledisi Giok Kabari ◽  
Marcus B. Chigoziri ◽  
Joseph Eneotu

In this study, we discuss various machine learning algorithms and architectures suitable for the Nigerian Naira exchange rate forecast. Our analyses were focused on the exchange rates of the British Pounds, US Dollars and the Euro against the Naira. The exchange rate data was sourced from the Central Bank of Nigeria. The performances of the algorithms were evaluated using Mean Squared Error, Root Mean Squared Error, Mean Absolute Error and the coefficient of determination (R-Squared score). Finally, we compared the performances of these algorithms in forecasting the exchange rates.


2021 ◽  
Vol 40 (1) ◽  
pp. 947-972
Author(s):  
Samih M. Mostafa

Data preprocessing is a necessary core in data mining. Preprocessing involves handling missing values, outlier and noise removal, data normalization, etc. The problem with existing methods which handle missing values is that they deal with the whole data ignoring the characteristics of the data (e.g., similarities and differences between cases). This paper focuses on handling the missing values using machine learning methods taking into account the characteristics of the data. The proposed preprocessing method clusters the data, then imputes the missing values in each cluster depending on the data belong to this cluster rather than the whole data. The author performed a comparative study of the proposed method and ten popular imputation methods namely mean, median, mode, KNN, IterativeImputer, IterativeSVD, Softimpute, Mice, Forimp, and Missforest. The experiments were done on four datasets with different number of clusters, sizes, and shapes. The empirical study showed better effectiveness from the point of view of imputation time, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination (R2 score) (i.e., the similarity of the original removed value to the imputed one).


2021 ◽  
pp. 1-15
Author(s):  
O. Basturk ◽  
C. Cetek

ABSTRACT In this study, prediction of aircraft Estimated Time of Arrival (ETA) is proposed using machine learning algorithms. Accurate prediction of ETA is important for management of delay and air traffic flow, runway assignment, gate assignment, collaborative decision making (CDM), coordination of ground personnel and equipment, and optimisation of arrival sequence etc. Machine learning is able to learn from experience and make predictions with weak assumptions or no assumptions at all. In the proposed approach, general flight information, trajectory data and weather data were obtained from different sources in various formats. Raw data were converted to tidy data and inserted into a relational database. To obtain the features for training the machine learning models, the data were explored, cleaned and transformed into convenient features. New features were also derived from the available data. Random forests and deep neural networks were used to train the machine learning models. Both models can predict the ETA with a mean absolute error (MAE) less than 6min after departure, and less than 3min after terminal manoeuvring area (TMA) entrance. Additionally, a web application was developed to dynamically predict the ETA using proposed models.


Sign in / Sign up

Export Citation Format

Share Document