Machine learning-based FEMA Transitional Shelter Assistance (TSA) eligibility prediction models

Around 90 percent of the natural disasters in the United States involve floods. As a result of these floods, a massive number of houses become uninhabitable for their residents, making them in immediate need of lodging and shelters. The Federal Emergency Management Agency (FEMA) lodges people in noncongregated shelters such as hotels/motels for a short period—up to 45 days—through the Transitional Shelter Assistance (TSA) program. Government Accountability Office estimated that between 600 million and 1.4 billion dollars had been improperly spent. However, currently, the process of how an applicant becomes eligible for the TSA lacks a robust model and framework. However, the mechanism of selecting the recipients of TSA is mainly based on expert opinion and tacit knowledge. The objectives of this paper are (1) investigating how classification techniques can be used to help FEMA decision-makers during the time of the disaster and (2) building supervised machine learning decision-making models based on logistic regression, decision tree, and K nearest neighbor classification techniques using Python. The 4.8 million registries of applications dataset used for this paper were extracted from the National Emergency Management Information System. This research will help FEMA decision-makers for predicting TSA eligibility.

Download Full-text

Evaluating Annotated Dataset of Customer Reviews for Aspect Based Sentiment Analysis

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2122 ◽

2021 ◽

Author(s):

Dimple Chehal ◽

Parul Gupta ◽

Payal Gulati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Reviews ◽

K Nearest Neighbor ◽

Customer Reviews ◽

Percent Accuracy

Sentiment analysis of product reviews on e-commerce platforms aids in determining the preferences of customers. Aspect-based sentiment analysis (ABSA) assists in identifying the contributing aspects and their corresponding polarity, thereby allowing for a more detailed analysis of the customer’s inclination toward product aspects. This analysis helps in the transition from the traditional rating-based recommendation process to an improved aspect-based process. To automate ABSA, a labelled dataset is required to train a supervised machine learning model. As the availability of such dataset is limited due to the involvement of human efforts, an annotated dataset has been provided here for performing ABSA on customer reviews of mobile phones. The dataset comprising of product reviews of Apple-iPhone11 has been manually annotated with predefined aspect categories and aspect sentiments. The dataset’s accuracy has been validated using state-of-the-art machine learning techniques such as Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor and Multi Layer Perceptron, a sequential model built with Keras API. The MLP model built through Keras Sequential API for classifying review text into aspect categories produced the most accurate result with 67.45 percent accuracy. K- nearest neighbor performed the worst with only 49.92 percent accuracy. The Support Vector Machine had the highest accuracy for classifying review text into aspect sentiments with an accuracy of 79.46 percent. The model built with Keras API had the lowest 76.30 percent accuracy. The contribution is beneficial as a benchmark dataset for ABSA of mobile phone reviews.

Download Full-text

A Systematic Methodology to Evaluate Prediction Models for Driving Style Classification

Sensors ◽

10.3390/s20061692 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1692 ◽

Cited By ~ 6

Author(s):

Iván Silva ◽

José Eugenio Naranjo

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Prediction Models ◽

Statistical Tests ◽

Area Under The Curve ◽

The Other ◽

Support Vector ◽

Classification Models ◽

K Nearest Neighbor

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.

Download Full-text

Predicting Learning Outcomes with MOOC Clickstreams

Education Sciences ◽

10.3390/educsci9020104 ◽

2019 ◽

Vol 9 (2) ◽

pp. 104 ◽

Cited By ~ 5

Author(s):

Chen-Hsiang Yu ◽

Jungpin Wu ◽

An-Chi Liu

Keyword(s):

Machine Learning ◽

Learning Outcomes ◽

Prediction Accuracy ◽

Nearest Neighbor ◽

Prediction Models ◽

Video Data ◽

Support Vector ◽

Completion Rates ◽

K Nearest Neighbor ◽

Learning Behaviors

Massive Open Online Courses (MOOCs) have gradually become a dominant trend in education. Since 2014, the Ministry of Education in Taiwan has been promoting MOOC programs, with successful results. The ability of students to work at their own pace, however, is associated with low MOOC completion rates and has recently become a focus. The development of a mechanism to effectively improve course completion rates continues to be of great interest to both teachers and researchers. This study established a series of learning behaviors using the video clickstream records of students, through a MOOC platform, to identify seven types of cognitive participation models of learners. We subsequently built practical machine learning models by using K-nearest neighbor (KNN), support vector machines (SVM), and artificial neural network (ANN) algorithms to predict students’ learning outcomes via their learning behaviors. The ANN machine learning method had the highest prediction accuracy. Based on the prediction results, we saw a correlation between video viewing behavior and learning outcomes. This could allow teachers to help students needing extra support successfully pass the course. To further improve our method, we classified the course videos based on their content. There were three video categories: theoretical, experimental, and analytic. Different prediction models were built for each of these three video types and their combinations. We performed the accuracy verification; our experimental results showed that we could use only theoretical and experimental video data, instead of all three types of data, to generate prediction models without significant differences in prediction accuracy. In addition to data reduction in model generation, this could help teachers evaluate the effectiveness of course videos.

Download Full-text

Data Enrichment and Developing Reliable Prediction Models for Identifying Mode of Delivery in Healthcare Practice Using Machine Learning Methods (Preprint)

10.2196/preprints.28856 ◽

2021 ◽

Author(s):

Zahid Ullah ◽

Farrukh Saleem ◽

Mona Jamjoom

Keyword(s):

Machine Learning ◽

Health Care ◽

Random Forest ◽

Decision Tree ◽

Maternity Care ◽

Nearest Neighbor ◽

Prediction Models ◽

Mode Of Delivery ◽

K Nearest Neighbor ◽

Reliable Prediction

BACKGROUND The use of artificial intelligence (AI) has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medical and sciences, and every other sector. The new reforms and advanced technologies of AI have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on this data is therefore a challenge. In health care, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery have been investigated in this study. Cesarean delivery is performed to save both mother and fetal lives when complications arise related to vaginal birth. OBJECTIVE To develop reliable prediction models for a maternity care decision support system to predict mode of delivery before birth. METHODS This study is conducted in two folds for identifying the mode of delivery: firstly, to enrich the existing dataset; secondly, to investigate previous medical records about the mode of delivery using machine learning algorithms and extract meaningful insight into the unseen cases. To achieve this objective, several prediction models were trained such as Decision Tree (DT), Random Forest (RF), AdaBoostM1 (AB), Bagging, and k-Nearest Neighbor (k-NN), based on original and enriched datasets. RESULTS To achieve the objective, several prediction models were trained such as Decision Tree (DT), Random Forest (RF), AdaBoostM1 (AB), Bagging, and k-Nearest Neighbor (k-NN) based on original and enriched datasets. As an outcome, the prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and ROC. Specifically, k-NN outperformed with an accuracy of 84.38%, Bagging (83.75%), RF (83.13%), DT (81.25%), and AB (80.63%). In the end, enriching the dataset improves the accuracy of the prediction process, which supports maternity care practitioners in making decisions for critical cases. CONCLUSIONS Enriching the dataset improves the accuracy of the prediction process, which supports maternity care practitioners in making decisions for critical cases. The enriched dataset in its current stage used in this study yields better results, but this could be even better if its records were increased with real clinical data.

Download Full-text

Leveraging Machine Learning Algorithms For Zero-Day Ransomware Attack

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8694.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4104-4107

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

Supervised Learning Algorithms ◽

Microsoft Windows

Current global huge cyber protection attacks resulting from Infected Encryption ransomware structures over all international locations and businesses with millions of greenbacks lost in paying compulsion abundance. This type of malware encrypts consumer files, extracts consumer files, and charges higher ransoms to be paid for decryption of keys. An attacker could use different types of ransomware approach to steal a victim's files. Some of ransomware attacks like Scareware, Mobile ransomware, WannaCry, CryptoLocker, Zero-Day ransomware attack etc. A zero-day vulnerability is a software program security flaw this is regarded to the software seller however doesn’t have patch in vicinity to restore a flaw. Despite the fact that machine learning algorithms are already used to find encryption Ransomware. This is based on the analysis of a large number of PE file data Samples (benign software and ransomware utility) makes use of supervised machine learning algorithms for ascertain Zero-day attacks. This work was done on a Microsoft Windows operating system (the most attacked os through encryption ransomware) and estimated it. We have used four Supervised learning Algorithms, Random Forest Classifier , K-Nearest Neighbor, Support Vector Machine and Logistic Regression. Tests using machine learning algorithms evaluate almost null false positives with a 99.5% accuracy with a random forest algorithm.

Download Full-text

An Analytical Study on Machine Learning Techniques

Multidisciplinary Functions of Blockchain Technology in AI and IoT Applications - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-5876-8.ch007 ◽

2021 ◽

pp. 137-157

Author(s):

Law Kumar Singh ◽

Pooja ◽

Hitendra Garg ◽

Munish Khanna ◽

Robin Singh Bhadoria

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Ethical Issues ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Behavior Prediction ◽

K Nearest Neighbor ◽

Learning Techniques ◽

Learning Machine ◽

And Behavior

The last few months have produced a remarkable expansion in research and deep study in the field of machine learning. Machine learning is a technique in which the set of the methods are used by the computers to make prediction, improve prediction and behavior prediction based on dataset. The learning techniques can be classified as supervised and unsupervised learning. The focus is on supervised machine learning that covers all the predictions problem for which we had the dataset in which the outcome is already known. Some of the algorithm like naive bayes, linear regression, SVM, k-nearest neighbor, especially neural network have gain growth in this area. The classifiers of machine learning are completely unconstrained with the assumptions of statistical and for that they are adapted by complex data. The authors have demonstrated the application of machine learning techniques and its ethical issues.

Download Full-text

Comparison of Machine Learning Techniques for Software Quality Prediction

International Journal of Knowledge and Systems Science ◽

10.4018/ijkss.2020040102 ◽

2020 ◽

Vol 11 (2) ◽

pp. 20-40

Author(s):

Somya Goyal ◽

Pradeep Kumar Bhatia

Keyword(s):

Machine Learning ◽

Software Quality ◽

Software Metrics ◽

Nearest Neighbor ◽

Prediction Models ◽

Machine Learning Techniques ◽

Support Vector ◽

Quality Prediction ◽

K Nearest Neighbor ◽

Software Quality Prediction

Software quality prediction is one the most challenging tasks in the development and maintenance of software. Machine learning (ML) is widely being incorporated for the prediction of the quality of a final product in the early development stages of the software development life cycle (SDLC). An ML prediction model uses software metrics and faulty data from previous projects to detect high-risk modules for future projects, so that the testing efforts can be targeted to those specific ‘risky' modules. Hence, ML-based predictors contribute to the detection of development anomalies early and inexpensively and ensure the timely delivery of a successful, failure-free and supreme quality software product within budget. This article has a comparison of 30 software quality prediction models (5 technique * 6 dataset) built on five ML techniques: artificial neural network (ANN); support vector machine (SVMs); Decision Tree (DTs); k-Nearest Neighbor (KNN); and Naïve Bayes Classifiers (NBC), using six datasets: CM1, KC1, KC2, PC1, JM1, and a combined one. These models exploit the predictive power of static code metrics, McCabe complexity metrics, for quality prediction. All thirty predictors are compared using a receiver operator curve (ROC), area under the curve (AUC), and accuracy as performance evaluation criteria. The results show that the ANN technique for software quality prediction is promising for accurate quality prediction irrespective of the dataset used.

Download Full-text

Real Time Smartphone Data for Prediction of Nomophobia Severity using Supervised Machine Learning

10.21467/proceedings.114.11 ◽

2021 ◽

Author(s):

Anshika Arora ◽

Pinaki Chakraborty ◽

M.P.S. Bhatia

Keyword(s):

Machine Learning ◽

Real Time ◽

Undergraduate Students ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

F Measure

Excessive use of smartphones throughout the day having dependency on them for social interaction, entertainment and information retrieval may lead users to develop nomophobia. This makes them feel anxious during non-availability of smartphones. This study describes the usefulness of real time smartphone usage data for prediction of nomophobia severity using machine learning. Data is collected from 141 undergraduate students analyzing their perception about their smartphone using the Nomophobia Questionnaire (NMP-Q) and their real time smartphone usage patterns using a purpose-built android application. Supervised machine learning models including Random Forest, Decision Tree, Support Vector Machines, Naïve Bayes and K-Nearest Neighbor are trained using two features sets where the first feature set comprises only the NMP-Q features and the other comprises real time smartphone usage features along with the NMP-Q features. Performance of these models is evaluated using f-measure and area under ROC and It is observed that all the models perform better when provided with smartphone usage features along with the NMP-Q features. Naïve Bayes outperforms other models in prediction of nomophobia achieving a f-measure value of 0.891 and ROC area value of 0.933.

Download Full-text

Reliable Prediction Models Based on Enriched Data for Identifying the Mode of Childbirth by Using Machine Learning Methods: Development Study

Journal of Medical Internet Research ◽

10.2196/28856 ◽

2021 ◽

Vol 23 (6) ◽

pp. e28856

Author(s):

Zahid Ullah ◽

Farrukh Saleem ◽

Mona Jamjoom ◽

Bahjat Fakieh

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Random Forest ◽

Maternity Care ◽

Nearest Neighbor ◽

Prediction Models ◽

Mode Of Delivery ◽

K Nearest Neighbor ◽

Reliable Prediction ◽

Data Set

Background The use of artificial intelligence has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medicine and sciences, and every other sector. The new reforms and advanced technologies of artificial intelligence have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on these data is therefore a challenge. In this study, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery were investigated. Cesarean delivery is performed to save both the mother and the fetus when complications related to vaginal birth arise. Objective The aim of this study was to develop reliable prediction models for a maternity care decision support system to predict the mode of delivery before childbirth. Methods This study was conducted in 2 parts for identifying the mode of childbirth: first, the existing data set was enriched and second, previous medical records about the mode of delivery were investigated using machine learning algorithms and by extracting meaningful insights from unseen cases. Several prediction models were trained to achieve this objective, such as decision tree, random forest, AdaBoostM1, bagging, and k-nearest neighbor, based on original and enriched data sets. Results The prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and receiver operating characteristic curves in the outcomes. Specifically, the accuracy of k-nearest neighbor was 84.38%, that of bagging was 83.75%, that of random forest was 83.13%, that of decision tree was 81.25%, and that of AdaBoostM1 was 80.63%. Enrichment of the data set had a good impact on improving the accuracy of the prediction process, which supports maternity care practitioners in making decisions in critical cases. Conclusions Our study shows that enriching the data set improves the accuracy of the prediction process, thereby supporting maternity care practitioners in making informed decisions in critical cases. The enriched data set used in this study yields good results, but this data set can become even better if the records are increased with real clinical data.

Download Full-text

A REVIEW ON MACHINE LEARNING TECHNIQUES FOR ADVANCED HEALTH CARE SYSTEMS

June-2020 - International Journal of Engineering Sciences & Research Technology ◽

10.29121/ijesrt.v9.i11.2020.1 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1-7

Keyword(s):

Machine Learning ◽

Health Care ◽

Logistic Regression ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor

Artificial intelligence is the technology that lets a machine mimic the thinking ability of a human being. Machine learning is the subset of AI, that makes this machine exhibit human behavior by making it learn from the known data, without the need of explicitly programming it. The health care sector has adopted this technology, for the development of medical procedures, maintaining huge patient’s records, assist physicians in the prediction, detection, and treatment of diseases and many more. In this paper, a comparative study of six supervised machine learning algorithms namely Logistic Regression(LR),support vector machine(SVM),Decision Tree(DT).Random Forest(RF),k-nearest neighbor(k-NN),Naive Bayes (NB) are made for the classification and prediction of diseases. Result shows out of compared supervised learning algorithms here, logistic regression is performing best with an accuracy of 81.4 % and the least performing is k-NN with just an accuracy of 69.01% in the classification and prediction of diseases.

Download Full-text