scholarly journals APTITUDE Framework for Learning Data Classification Based on Machine Learning

Learning analytics refers to the machine learning to provide predictions of learner success and prescriptions to learners and teachers. The main goal of paper is to proposed APTITUDE framework for learning data classification in order to achieve an adaptation and recommendations a course content or flow of course activities. This framework has applied model for student learning prediction based on machine learning. The five machine learning algorithms are used to provide learning data classification: random forest, Naïve Bayes, k-nearest neighbors, logistic regression and support vector machines

2020 ◽  
Vol 9 (9) ◽  
pp. 533 ◽  
Author(s):  
Ricardo Afonso ◽  
André Neves ◽  
Carlos Viegas Damásio ◽  
João Moura Pires ◽  
Fernando Birra ◽  
...  

Every year, wildfires strike the Portuguese territory and are a concern for public entities and the population. To prevent a wildfire progression and minimize its impact, Fuel Management Zones (FMZs) have been stipulated, by law, around buildings, settlements, along national roads, and other infrastructures. FMZs require monitoring of the vegetation condition to promptly proceed with the maintenance and cleaning of these zones. To improve FMZ monitoring, this paper proposes the use of satellite images, such as the Sentinel-1 and Sentinel-2, along with vegetation indices and extracted temporal characteristics (max, min, mean and standard deviation) associated with the vegetation within and outside the FMZs and to determine if they were treated. These characteristics feed machine-learning algorithms, such as XGBoost, Support Vector Machines, K-nearest neighbors and Random Forest. The results show that it is possible to detect an intervention in an FMZ with high accuracy, namely with an F1-score ranging from 90% up to 94% and a Kappa ranging from 0.80 up to 0.89.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8764 ◽  
Author(s):  
Siroj Bakoev ◽  
Lyubov Getmantseva ◽  
Maria Kolosova ◽  
Olga Kostyunina ◽  
Duane R. Chartier ◽  
...  

Industrial pig farming is associated with negative technological pressure on the bodies of pigs. Leg weakness and lameness are the sources of significant economic loss in raising pigs. Therefore, it is important to identify the predictors of limb condition. This work presents assessments of the state of limbs using indicators of growth and meat characteristics of pigs based on machine learning algorithms. We have evaluated and compared the accuracy of prediction for nine ML classification algorithms (Random Forest, K-Nearest Neighbors, Artificial Neural Networks, C50Tree, Support Vector Machines, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) and have identified the Random Forest and K-Nearest Neighbors as the best-performing algorithms for predicting pig leg weakness using a small set of simple measurements that can be taken at an early stage of animal development. Measurements of Muscle Thickness, Back Fat amount, and Average Daily Gain were found to be significant predictors of the conformation of pig limbs. Our work demonstrates the utility and relative ease of using machine learning algorithms to assess the state of limbs in pigs based on growth rate and meat characteristics.


2021 ◽  
pp. 1-29
Author(s):  
Ahmed Alsaihati ◽  
Mahmoud Abughaban ◽  
Salaheldin Elkatatny ◽  
Abdulazeez Abdulraheem

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.


2021 ◽  
Vol 12 (3) ◽  
pp. 31-38
Author(s):  
Michelle Tais Garcia Furuya ◽  
Danielle Elis Garcia Furuya

The e-mail service is one of the main tools used today and is an example that technology facilitates the exchange of information. On the other hand, one of the biggest obstacles faced by e-mail services is spam, the name given to the unsolicited message received by a user. The machine learning application has been gaining prominence in recent years as an alternative for efficient identification of spam. In this area, different algorithms can be evaluated to identify which one has the best performance. The aim of the study is to identify the ability of machine learning algorithms to correctly classify e-mails and also to identify which algorithm obtained the greatest accuracy. The database used was taken from the Kaggle platform and the data were processed bythe Orange software with four algorithms: Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Naive Bayes (NB). The division of data in training and testing considers 80% of the data for training and 20% for testing. The results show that Random Forest was the best performing algorithm with 99% accuracy.


Author(s):  
Shler Farhad Khorshid ◽  
Adnan Mohsin Abdulazeez ◽  
Amira Bibo Sallow

Breast cancer is one of the most common diseases among women, accounting for many deaths each year. Even though cancer can be treated and cured in its early stages, many patients are diagnosed at a late stage. Data mining is the method of finding or extracting information from massive databases or datasets, and it is a field of computer science with a lot of potentials. It covers a wide range of areas, one of which is classification. Classification may also be accomplished using a variety of methods or algorithms. With the aid of MATLAB, five classification algorithms were compared. This paper presents a performance comparison among the classifiers: Support Vector Machine (SVM), Logistics Regression (LR), K-Nearest Neighbors (K-NN), Weighted K-Nearest Neighbors (Weighted K-NN), and Gaussian Naïve Bayes (Gaussian NB). The data set was taken from UCI Machine learning Repository. The main objective of this study is to classify breast cancer women using the application of machine learning algorithms based on their accuracy. The results have revealed that Weighted K-NN (96.7%) has the highest accuracy among all the classifiers.


2021 ◽  
Vol 4 (2) ◽  
pp. p10
Author(s):  
Yanmeng Liu

The success of health education resources largely depends on their readability, as the health information can only be understood and accepted by the target readers when the information is uttered with proper reading difficulty. Unlike other populations, children feature limited knowledge and underdeveloped reading comprehension, which poses more challenges for the readability research on health education resources. This research aims to explore the readability prediction of health education resources for children by using semantic features to develop machine learning algorithms. A data-driven method was applied in this research:1000 health education articles were collected from international health organization websites, and they were grouped into resources for kids and resources for non-kids according to their sources. Moreover, 73 semantic features were used to train five machine learning algorithms (decision tree, support vector machine, k-nearest neighbors algorithm, ensemble classifier, and logistic regression). The results showed that the k-nearest neighbors algorithm and ensemble classifier outperformed in terms of area under the operating characteristic curve sensitivity, specificity, and accuracy and achieved good performance in predicting whether the readability of health education resources is suitable for children or not.


2020 ◽  
pp. 1028-1041
Author(s):  
Junjie Bai ◽  
Kan Luo ◽  
Jun Peng ◽  
Jinliang Shi ◽  
Ying Wu ◽  
...  

Music emotions recognition (MER) is a challenging field of studies addressed in multiple disciplines such as musicology, cognitive science, physiology, psychology, arts and affective computing. In this article, music emotions are classified into four types known as those of pleasing, angry, sad and relaxing. MER is formulated as a classification problem in cognitive computing where 548 dimensions of music features are extracted and modeled. A set of classifications and machine learning algorithms are explored and comparatively studied for MER, which includes Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Neuro-Fuzzy Networks Classification (NFNC), Fuzzy KNN (FKNN), Bayes classifier and Linear Discriminant Analysis (LDA). Experimental results show that the SVM, FKNN and LDA algorithms are the most effective methodologies that obtain more than 80% accuracy for MER.


2020 ◽  
Author(s):  
Stephanie Turgeon ◽  
Marc Lanovaz

Machine learning algorithms hold promise in revolutionizing how educators and clinicians make decisions. However, researchers in behavior analysis have been slow to adopt this methodology to further develop their understanding of human behavior and improve the application of the science to problems of applied significance. One potential explanation for the scarcity of research is that machine learning is not typically taught as part of training programs in behavior analysis. This tutorial aims to address this barrier by promoting increased research using machine learning in behavior analysis. We present how to apply the random forest, support vector machine, stochastic gradient descent, and k-nearest neighbors algorithms on a small dataset to better identify parents who would benefit from a behavior analytic interactive web training. These step-by-step applications should allow researchers to implement machine learning algorithms with novel research questions and datasets.


2020 ◽  
Vol 7 ◽  
Author(s):  
Holmes Yesid Ayala-Yaguara ◽  
Gina Maribel Valenzuela-Sabogal ◽  
Alexander Espinosa-García

En el presente artículo se describe la obtención de un modelo de minería de datos aplicado al problema de la deserción universitaria en el programa de Ingeniería de Sistemas de la Universidad de Cundinamarca, extensión Facatativá. El modelo se estructuró mediante la metodología de minería de datos KDD (knowledge discovery in databases) haciendo uso del lenguaje de programación Python, la librería de procesamiento de datos Pandas y de machine learning Sklearn. Para el proceso se tuvieron en cuenta problemas adicionales al proceso de minería, como, por ejemplo, la alta dimensionalidad, por lo cual se aplicaron los métodos de selección de las variables estadístico univariado, feature importance y SelectFromModel (Sklearn). En el proyecto se seleccionaron cinco técnicas de minería de datos para evaluarlas: vecinos más cercanos (K nearest neighbors, KNN), árboles de decisión (decision tree, DT), árboles aleatorios (random forest, RF), regresión logística (logistic regression, LR) y máquinas de vectores soporte (support vector machines, SVM). Respecto a la selección del modelo final se evaluaron los resultados de cada modelo en las métricas de precisión, matriz de confusión y métricas adicionales de la matriz de confusión. Por último, se ajustaron los parámetros del modelo seleccionado y se evaluó la generalización del modelo al graficar su curva de aprendizaje.


Author(s):  
Hedieh Sajedi ◽  
Mehran Bahador

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.


Sign in / Sign up

Export Citation Format

Share Document