scholarly journals Prediction Poverty Levels of College Students Using a Machine Learning Model

Author(s):  
Wang Sheng ◽  
Shi Yumei

Abstract Nowadays, poverty-stricken college students have become a special group among the college students and occupied higher proportion in it. How to accurately identify poverty levels of college students and provide funding is a new problem for universities. In this manuscript, a novel model that combined Random Forest with Principle Components Analysis (RF-PCA) is proposed prediction poverty levels of college students. To build this model, data was firstly collected to establish datasets including 4 classed of poverty levels and 21 features of poverty-stricken college students. Then, feature dimension reduction includes two steps: the first step we selected the top 16 features with the ranking of feature, according to the Gini importance and Shapley Additive explanations (SHAP) values of features based on Random Forest (RF); the second step of feature extraction through Principle Components Analysis (PCA) extracted 11 dimensions. Finally, confusion metrics and receiver operating characteristic (ROC) curves were used to evaluate the performance of the proposed model, the accuracy of the model achieved 78.61%. Furthermore, compared with seven different classification algorithms, the model has a higher prediction accuracy, the result has great potential to identify the poverty levels of college students.

2020 ◽  
Vol 11 (2) ◽  
Author(s):  
Osval Antonio Montesinos-López ◽  
Abelardo Montesinos-López ◽  
Brandon A Mosqueda-Gonzalez ◽  
José Cricelio Montesinos-López ◽  
José Crossa ◽  
...  

Abstract In genomic selection choosing the statistical machine learning model is of paramount importance. In this paper, we present an application of a zero altered random forest model with two versions (ZAP_RF and ZAPC_RF) to deal with excess zeros in count response variables. The proposed model was compared with the conventional random forest (RF) model and with the conventional Generalized Poisson Ridge regression (GPR) using two real datasets, and we found that, in terms of prediction performance, the proposed zero inflated random forest model outperformed the conventional RF and GPR models.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


2021 ◽  
Vol 15 (1) ◽  
pp. 151-160
Author(s):  
Hemant P. Kasturiwale ◽  
Sujata N. Kale

The Autonomous Nervous System (ANS) controls the nervous system and Heart Rate Variability (HRV) can be used as a diagnostic tool to diagnose heart defects. HRV can be classified into linear and nonlinear HRV indices which are used mostly to measure the efficiency of the model. For prediction of cardiac diseases, the selection and extraction features of machine learning model are effective. The available model used till date is based on HRV indices to predict the cardiac diseases accurately. The model could hardly throw light on specifics of indices, selection process and stability of the model. The proposed model is developed considering all facet electrocardiogram amplitude (ECG), frequency components, sampling frequency, extraction methods and acquisition techniques. The machine learning based model and its performance shall be tested using the standard BioSignal method, both on the data available and on the data obtained by the author. This is unique model developed by considering the vast number of mixtures sets and more than four complex cardiac classes. The statistical analysis is performed on a variety of databases such as MIT/BIH Normal Sinus Rhythm (NSR), MIT/BIH Arrhythmia (AR) and MIT/BIH Atrial Fibrillation (AF) and Peripheral Pule Analyser using feature compatibility techniques. The classifiers are trained for prediction with approximately 40000 sets of parameters. The proposed model reaches an average accuracy of 97.87 percent and is sensitive and précised. The best features are chosen from the different HRV features that will be used for classification. The present model was checked under all possible subject scenarios, such as the raw database and the non-ECG signal. In this sense, robustness is defined not only by the specificity parameter, but also by other measuring output parameters. Support Vector Machine (SVM), K-nearest Neighbour (KNN), Ensemble Adaboost (EAB) with Random Forest (RF) are tested in a 5% higher precision band and a lower band configuration. The Random Forest has produced better results, and its robustness has been established.


2021 ◽  
pp. 1-18
Author(s):  
Zhang Zixian ◽  
Liu Xuning ◽  
Li Zhixiang ◽  
Hu Hongqiang

The influencing factors of coal and gas outburst are complex, now the accuracy and efficiency of outburst prediction and are not high, in order to obtain the effective features from influencing factors and realize the accurate and fast dynamic prediction of coal and gas outburst, this article proposes an outburst prediction model based on the coupling of feature selection and intelligent optimization classifier. Firstly, in view of the redundancy and irrelevance of the influencing factors of coal and gas outburst, we use Boruta feature selection method obtain the optimal feature subset from influencing factors of coal and gas outburst. Secondly, based on Apriori association rules mining method, the internal association relationship between coal and gas outburst influencing factors is mined, and the strong association rules existing in the influencing factors and samples that affect the classification of coal and gas outburst are extracted. Finally, svm is used to classify coal and gas outbursts based on the above obtained optimal feature subset and sample data, and Bayesian optimization algorithm is used to optimize the kernel parameters of svm, and the coal and gas outburst pattern recognition prediction model is established, which is compared with the existing coal and gas outbursts prediction model in literatures. Compared with the method of feature selection and association rules mining alone, the proposed model achieves the highest prediction accuracy of 93% when the feature dimension is 3, which is higher than that of Apriori association rules and Boruta feature selection, and the classification accuracy is significantly improved, However, the feature dimension decreased significantly; The results show that the proposed model is better than other prediction models, which further verifies the accuracy and applicability of the coupling prediction model, and has high stability and robustness.


2012 ◽  
Vol 2012 ◽  
pp. 1-12 ◽  
Author(s):  
Junfei Chen ◽  
Ming Li ◽  
Weiguang Wang

Drought is part of natural climate variability and ranks the first natural disaster in the world. Drought forecasting plays an important role in mitigating impacts on agriculture and water resources. In this study, a drought forecast model based on the random forest method is proposed to predict the time series of monthly standardized precipitation index (SPI). We demonstrate model application by four stations in the Haihe river basin, China. The random-forest- (RF-) based forecast model has consistently shown better predictive skills than the ARIMA model for both long and short drought forecasting. The confidence intervals derived from the proposed model generally have good coverage, but still tend to be conservative to predict some extreme drought events.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Faizan Ullah ◽  
Qaisar Javaid ◽  
Abdu Salam ◽  
Masood Ahmad ◽  
Nadeem Sarwar ◽  
...  

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.


2021 ◽  
Vol 13 (18) ◽  
pp. 3573
Author(s):  
Chunfang Kong ◽  
Yiping Tian ◽  
Xiaogang Ma ◽  
Zhengping Weng ◽  
Zhiting Zhang ◽  
...  

Regarding the ever increasing and frequent occurrence of serious landslide disaster in eastern Guangxi, the current study was implemented to adopt support vector machines (SVM), particle swarm optimization support vector machines (PSO-SVM), random forest (RF), and particle swarm optimization random forest (PSO-RF) methods to assess landslide susceptibility in Zhaoping County. To this end, 10 landslide disaster-related variables including digital elevation model (DEM)-derived, meteorology-derived, Landsat8-derived, geology-derived, and human activities factors were provided. Of 345 landslide disaster locations found, 70% were used to train the models, and the rest of them were performed for model verification. The aforementioned four models were run, and landslide susceptibility evaluation maps were produced. Then, receiver operating characteristics (ROC) curves, statistical analysis, and field investigation were performed to test and verify the efficiency of these models. Analysis and comparison of the results denoted that all four landslide models performed well for the landslide susceptibility evaluation as indicated by the area under curve (AUC) values of ROC curves from 0.863 to 0.934. Among them, it has been shown that the PSO-RF model has the highest accuracy in comparison to other landslide models, followed by the PSO-SVM model, the RF model, and the SVM model. Moreover, the results also showed that the PSO algorithm has a good effect on SVM and RF models. Furthermore, the landslide models devolved in the present study are promising methods that could be transferred to other regions for landslide susceptibility evaluation. In addition, the evaluation results can provide suggestions for disaster reduction and prevention in Zhaoping County of eastern Guangxi.


Author(s):  
Kotchapong Sumanonta ◽  
Pasist Suwanapingkarl ◽  
Pisit Liutanakul

This article presents a novel model for the equivalent circuit of a photovoltaic module. This circuit consists of the following important parameters: a single diode, series resistance (Rs) and parallel resistance (Rp) that can be directly adjusted according to ambient temperature and the irradiance. The single diode in the circuit is directly related to the ideality factor (m), which represents the relationship between the materials and significant structures of PV module such as mono crystalline, multi crystalline and thin film technology.  Especially, the proposed model in this article is to present the simplified model that can calculate the results of I-V curves faster and more accurate than other methods of the previous models. This can show that the proposed models are more suitable for the practical application. In addition, the results of the proposed model are validated by the datasheet, the practical data in the laboratory (indoor test) and the onsite data (outdoor test). This ensures that the less than 0.1% absolute errors of the model can be accepted.


2020 ◽  
Vol 17 (3) ◽  
pp. 849-865
Author(s):  
Zhongqin Bi ◽  
Shuming Dou ◽  
Zhe Liu ◽  
Yongbin Li

Neural network methods have been trained to satisfactorily learn user/product representations from textual reviews. A representation can be considered as a multiaspect attention weight vector. However, in several existing methods, it is assumed that the user representation remains unchanged even when the user interacts with products having diverse characteristics, which leads to inaccurate recommendations. To overcome this limitation, this paper proposes a novel model to capture the varying attention of a user for different products by using a multilayer attention framework. First, two individual hierarchical attention networks are used to encode the users and products to learn the user preferences and product characteristics from review texts. Then, we design an attention network to reflect the adaptive change in the user preferences for each aspect of the targeted product in terms of the rating and review. The results of experiments performed on three public datasets demonstrate that the proposed model notably outperforms the other state-of-the-art baselines, thereby validating the effectiveness of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document