scholarly journals Classification of Categorical Outcome VariablenBased on Logistic Regression and Tree Algorithm

2020 ◽  
Vol 8 (5) ◽  
pp. 4685-4690

Logistic regression is most popular techniques incorporated in traditional statistics. Usually, this regression is applicable when the dependent variable is of categorical binary in nature. In the field of Statistics and Machine learning, classification of data is critical to discriminate to which set of clusters a new observation belongs, in the base of training set of a data containing observation whose group relationship is known. In this paper, we are focusing on the concepts of Logistic regression and classification tree. A large data taken from UCI (Machine learning Repository) incorporated for this research work. The aim of study is to distinguish the results obtained from Logistic regression and decision tree. At the end, decision tree gives better results than Logistic regression.

SinkrOn ◽  
2022 ◽  
Vol 7 (1) ◽  
pp. 59-65
Author(s):  
Artika Arista

Many people today are unsure whether they have COVID-19. The frequent fever, dry cough, and sore throat are all signs and symptoms of COVID-19. If a person has signs or symptoms of coronavirus disease 2019 (COVID-19), he/she should see the doctor or go to a clinic as soon as possible. As a result, it's vital to learn and comprehend the fundamental differences. COVID-19 can cause a wide range of symptoms. The experiments were carried out using two Machine Learning Classification Algorithms, namely Decision Tree (DT) and Logistic Regression (LR). Both algorithms were written and analyzed using the Python program in Jupyter Notebook 6.4.5. From the results obtained in the experiments of covid symptoms dataset, on average, the DT model has obtained the best cross-validation average and the testing performance average compared to the LR machine learning models. For cross-validation results, the DT model has achieved an accuracy of 98.0%. For performance testing, the DT model has achieved an accuracy of 98.0%. The LR has obtained the second-best result on the average of cross-validation performance and the testing results. For cross-validation results, the LR model has achieved an accuracy of 96.0%. For performance testing, the LR model has achieved an accuracy of 97.0%. Consequently, the DT for the COVID-19 symptoms dataset is outperforming the LR for cross-validation and testing results.


Author(s):  
M. Nirmala

Abstract: Data Mining in Educational System has increased tremendously in the past and still increasing in present era. This study focusses on the academic stand point and the performance of the student is evaluated by various parameters such as Scholastic Features, Demographic Features and Emotional Features are carried out. Various Machine learning methodologies are adopted to extract the masked knowledge from the educational data set provided, which helps in identifying the features giving more impact to the student academic performance and there by knowing the impacting features, helps us to predict deeper insights about student performance in academics. Various Machine learning workflow starting from problem definition to Model Prediction has been carried out in this study. The supervised learning methodology has been adopted and various Feature engineering methods has been adopted to make the ML model appropriate for training and evaluation. It is a prediction problem and various Classification algorithms such as Logistic Regression, Random Forest, SVM, KNN, XGBOOST, Decision Tree modelling has been done to fit the student data appropriately. Keywords: Scholastic, Demographic, Emotional, Logistic Regression, Random Forest, SVM, KNN, XGBOOST, Decision Tree.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 6967
Author(s):  
Kang Peng ◽  
Zheng Tang ◽  
Longjun Dong ◽  
Daoyuan Sun

Microseismic monitoring system is one of the effective means to monitor ground stress in deep mines. The accuracy and speed of microseismic signal identification directly affect the stability analysis in rock engineering. At present, manual identification, which heavily relies on manual experience, is widely used to classify microseismic events and blasts in the mines. To realize intelligent and accurate identification of microseismic events and blasts, a microseismic signal identification system based on machine learning was established in this work. The discrimination of microseismic events and blasts was established based on the machine learning framework. The microseismic monitoring data was used to optimize the parameters and validate the classification methods. Subsequently, ten machine learning algorithms were used as the preliminary algorithms of the learning layer, including the Decision Tree, Random Forest, Logistic Regression, SVM, KNN, GBDT, Naive Bayes, Bagging, AdaBoost, and MLP. Then, training set and test set, accounting for 50% of each data set, were prospectively examined, and the ACC, PPV, SEN, NPV, SPE, FAR and ROC curves were used as evaluation indexes. Finally, the performances of these machine learning algorithms in microseismic signal identification were evaluated with cross-validation methods. The results showed that the Logistic Regression classifier had the best performance in parameter identification, and the accuracy of cross-validation can reach more than 0.95. Random Forest, Decision Tree, and Naive Bayes also performed well in this data set. There were some differences in the accuracy of different classifiers in the training set, test set, and all data sets. To improve the accuracy of signal identification, the database of microseismic events and blasts should be expanded, to avoid the inaccurate data distribution caused by the small training set. Artificial intelligence identification methods, including Random Forest, Logistic Regression, Decision Tree, Naive Bayes, and AdaBoost algorithms, were applied to signal identification of the microseismic monitoring system in mines, and the identification results were consistent with the actual situation. In this way, the confusion caused by manual classification between microseismic events and blasts based on the characteristics of waveform signals is solved, and the required source parameters are easily obtained, which can ensure the accuracy and timeliness of microseismic events and blasts identification.


Malware damages computers without user's consent; they cause various threats unknowingly, hence detection of these is very crucial. In this study, we proposed to detect the presence of malware by using the classification technique of Machine Learning. Classification type in Machine Learning requires the output variable to be of a categorical kind; it attempts to draw some conclusion from the ascertained values. In short, classification constructs a model based on the training set and values or predicts categorical class labels. In our work, we propose to classify the presence of malware by incorporating two chief classification algorithms, such as Support Vector Machine and Logistic Regression. The data set used for it was not satisfactory. Consequently, we tend to explore a data set that met our necessities and enforced Logistic Regression on the same moreover, we plotted a scatter-gram for the scope of visualization and incorporated XG-Boost for the performance enhancement. This study assists in analyzing the presence of malware by adopting a proper dataset and ascertaining pivotal attributes leading to this classification.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


2020 ◽  
Vol 13 (5) ◽  
pp. 508-523 ◽  
Author(s):  
Guan‐Hua Huang ◽  
Chih‐Hsuan Lin ◽  
Yu‐Ren Cai ◽  
Tai‐Been Chen ◽  
Shih‐Yen Hsu ◽  
...  

2021 ◽  
Vol 79 ◽  
pp. 52-58
Author(s):  
Arnaldo Stanzione ◽  
Renato Cuocolo ◽  
Francesco Verde ◽  
Roberta Galatola ◽  
Valeria Romeo ◽  
...  

Heliyon ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e06257
Author(s):  
Ennio Idrobo-Ávila ◽  
Humberto Loaiza-Correa ◽  
Rubiel Vargas-Cañas ◽  
Flavio Muñoz-Bolaños ◽  
Leon van Noorden

Sign in / Sign up

Export Citation Format

Share Document