A Machine Learning Approach for One-Stop Learning

Author(s):  
Marco A. Alvarez ◽  
SeungJin Lim

Current search engines impose an overhead to motivated students and Internet users who employ the Web as a valuable resource for education. The user, searching for good educational materials for a technical subject, often spends extra time to filter irrelevant pages or ends up with commercial advertisements. It would be ideal if, given a technical subject by user who is educationally motivated, suitable materials with respect to the given subject are automatically identified by an affordable machine processing of the recommendation set returned by a search engine for the subject. In this scenario, the user can save a significant amount of time in filtering out less useful Web pages, and subsequently the user’s learning goal on the subject can be achieved more efficiently without clicking through numerous pages. This type of convenient learning is called One-Stop Learning (OSL). In this paper, the contributions made by Lim and Ko in (Lim and Ko, 2006) for OSL are redefined and modeled using machine learning algorithms. Four selected supervised learning algorithms: Support Vector Machine (SVM), AdaBoost, Naive Bayes and Neural Networks are evaluated using the same data used in (Lim and Ko, 2006). The results presented in this paper are promising, where the highest precision (98.9%) and overall accuracy (96.7%) obtained by using SVM is superior to the results presented by Lim and Ko. Furthermore, the machine learning approach presented here, demonstrates that the small set of features used to represent each Web page yields a good solution for the OSL problem.

2018 ◽  
Vol 7 (2.8) ◽  
pp. 684 ◽  
Author(s):  
V V. Ramalingam ◽  
Ayantan Dandapath ◽  
M Karthik Raja

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241239
Author(s):  
Kai On Wong ◽  
Osmar R. Zaïane ◽  
Faith G. Davis ◽  
Yutaka Yasui

Background Canada is an ethnically-diverse country, yet its lack of ethnicity information in many large databases impedes effective population research and interventions. Automated ethnicity classification using machine learning has shown potential to address this data gap but its performance in Canada is largely unknown. This study conducted a large-scale machine learning framework to predict ethnicity using a novel set of name and census location features. Methods Using census 1901, the multiclass and binary class classification machine learning pipelines were developed. The 13 ethnic categories examined were Aboriginal (First Nations, Métis, Inuit, and all-combined)), Chinese, English, French, Irish, Italian, Japanese, Russian, Scottish, and others. Machine learning algorithms included regularized logistic regression, C-support vector, and naïve Bayes classifiers. Name features consisted of the entire name string, substrings, double-metaphones, and various name-entity patterns, while location features consisted of the entire location string and substrings of province, district, and subdistrict. Predictive performance metrics included sensitivity, specificity, positive predictive value, negative predictive value, F1, Area Under the Curve for Receiver Operating Characteristic curve, and accuracy. Results The census had 4,812,958 unique individuals. For multiclass classification, the highest performance achieved was 76% F1 and 91% accuracy. For binary classifications for Chinese, French, Italian, Japanese, Russian, and others, the F1 ranged 68–95% (median 87%). The lower performance for English, Irish, and Scottish (F1 ranged 63–67%) was likely due to their shared cultural and linguistic heritage. Adding census location features to the name-based models strongly improved the prediction in Aboriginal classification (F1 increased from 50% to 84%). Conclusions The automated machine learning approach using only name and census location features can predict the ethnicity of Canadians with varying performance by specific ethnic categories.


2021 ◽  
Vol 11 (21) ◽  
pp. 9927
Author(s):  
Qiuying Chen ◽  
SangJoon Lee

Health authorities have recommended the use of digital tools for home workouts to stay active and healthy during the COVID-19 pandemic. In this paper, a machine learning approach is proposed to assess the activity of users on a home workout platform. Keep is a home workout application dedicated to providing one-stop exercise solutions such as fitness teaching, cycling, running, yoga, and fitness diet guidance. We used a data crawler to collect the total training set data of 7734 Keep users and compared four supervised learning algorithms: support vector machine, k-nearest neighbor, random forest, and logistic regression. The receiver operating curve analysis indicated that the overall discrimination verification power of random forest was better than that of the other three models. The random forest model was used to classify 850 test samples, and a correct rate of 88% was obtained. This approach can predict the continuous usage of users after installing the home workout application. We considered 18 variables on Keep that were expected to affect the determination of continuous participation. Keep certification is the most important variable that affected the results of this study. Keep certification refers to someone who has verified their identity information and can, therefore, obtain the Keep certification logo. The results show that the platform still needs to be improved in terms of real identity privacy information and other aspects.


2021 ◽  
Vol 2115 (1) ◽  
pp. 012042
Author(s):  
S Premanand ◽  
Sathiya Narayanan

Abstract The primary objective of this particular paper is to classify the health-related data without feature extraction in Machine Learning, which hinder the performance and reliability. The assumption of our work will be like, can we able to get better result for health-related data with the help of Tree based Machine Learning algorithms without extracting features like in Deep Learning. This study performs better classification with Tree based Machine Learning approach for the health-related medical data. After doing pre-processing, without feature extraction, i.e., from raw data signal with the help of Machine Learning algorithms we are able to get better results. The presented paper which has better result even when compared to some of the advanced Deep Learning architecture models. The results demonstrate that overall classification accuracy of Random Forest, XGBoost, LightGBM and CatBoost, Tree-based Machine Learning algorithms for normal and abnormal condition of the datasets was found to be 97.88%, 98.23%, 98.03% and 95.57% respectively.


Author(s):  
Erick Omuya ◽  
George Okeyo ◽  
Michael Kimwele

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.


Current global huge cyber protection attacks resulting from Infected Encryption ransomware structures over all international locations and businesses with millions of greenbacks lost in paying compulsion abundance. This type of malware encrypts consumer files, extracts consumer files, and charges higher ransoms to be paid for decryption of keys. An attacker could use different types of ransomware approach to steal a victim's files. Some of ransomware attacks like Scareware, Mobile ransomware, WannaCry, CryptoLocker, Zero-Day ransomware attack etc. A zero-day vulnerability is a software program security flaw this is regarded to the software seller however doesn’t have patch in vicinity to restore a flaw. Despite the fact that machine learning algorithms are already used to find encryption Ransomware. This is based on the analysis of a large number of PE file data Samples (benign software and ransomware utility) makes use of supervised machine learning algorithms for ascertain Zero-day attacks. This work was done on a Microsoft Windows operating system (the most attacked os through encryption ransomware) and estimated it. We have used four Supervised learning Algorithms, Random Forest Classifier , K-Nearest Neighbor, Support Vector Machine and Logistic Regression. Tests using machine learning algorithms evaluate almost null false positives with a 99.5% accuracy with a random forest algorithm.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Absalom E. Ezugwu ◽  
Ibrahim Abaker Targio Hashem ◽  
Olaide N. Oyelade ◽  
Mubarak Almutari ◽  
Mohammed A. Al-Garadi ◽  
...  

The spread of COVID-19 worldwide continues despite multidimensional efforts to curtail its spread and provide treatment. Efforts to contain the COVID-19 pandemic have triggered partial or full lockdowns across the globe. This paper presents a novel framework that intelligently combines machine learning models and the Internet of Things (IoT) technology specifically to combat COVID-19 in smart cities. The purpose of the study is to promote the interoperability of machine learning algorithms with IoT technology by interacting with a population and its environment to curtail the COVID-19 pandemic. Furthermore, the study also investigates and discusses some solution frameworks, which can generate, capture, store, and analyze data using machine learning algorithms. These algorithms can detect, prevent, and trace the spread of COVID-19 and provide a better understanding of the disease in smart cities. Similarly, the study outlined case studies on the application of machine learning to help fight against COVID-19 in hospitals worldwide. The framework proposed in the study is a comprehensive presentation on the major components needed to integrate the machine learning approach with other AI-based solutions. Finally, the machine learning framework presented in this study has the potential to help national healthcare systems in curtailing the COVID-19 pandemic in smart cities. In addition, the proposed framework is poised as a pointer for generating research interests that would yield outcomes capable of been integrated to form an improved framework.


Risks ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 50
Author(s):  
Apostolos Ampountolas ◽  
Titus Nyarko Nde ◽  
Paresh Date ◽  
Corina Constantinescu

In micro-lending markets, lack of recorded credit history is a significant impediment to assessing individual borrowers’ creditworthiness and therefore deciding fair interest rates. This research compares various machine learning algorithms on real micro-lending data to test their efficacy at classifying borrowers into various credit categories. We demonstrate that off-the-shelf multi-class classifiers such as random forest algorithms can perform this task very well, using readily available data about customers (such as age, occupation, and location). This presents inexpensive and reliable means to micro-lending institutions around the developing world with which to assess creditworthiness in the absence of credit history or central credit databases.


2020 ◽  
Vol 10 (6) ◽  
pp. 6589-6596
Author(s):  
H. Al-Dossari ◽  
F. A. Nughaymish ◽  
Z. Al-Qahtani ◽  
M. Alkahlifah ◽  
A. Alqahtani

Enterprises rely more and more on well-qualified and highly specialized IT professionals. Although the increasing availability of IT jobs is a good indicator for IT graduates, they nonetheless may find themselves confused about the most appropriate career for their future. In this paper, a recommendation system called CareerRec is proposed, which uses machine learning algorithms to help IT graduates select a career path based on their skills. CareerRec was trained and tested using a dataset of 2255 employees in the IT sector in Saudi Arabia. We conducted a performance comparison between five machine learning algorithms to assess their accuracy for predicting the best-suited career path among 3 classes. Our experiments demonstrate that the XGBoost algorithm outperforms other models and gives the highest accuracy (70.47%).


Author(s):  
Ong Vienna Lee ◽  
Ahmad Heryanto ◽  
Mohd Faizal Ab Razak ◽  
Anis Farihan Mat Raffei ◽  
Danakorn Nincarean Eh Phon ◽  
...  

<span>The openness of the World Wide Web (Web) has become more exposed to cyber-attacks. An attacker performs the cyber-attacks on Web using malware Uniform Resource Locators (URLs) since it widely used by internet users. Therefore, a significant approach is required to detect malicious URLs and identify their nature attack. This study aims to assess the efficiency of the machine learning approach to detect and identify malicious URLs. In this study, we applied features optimization approaches by using a bio-inspired algorithm for selecting significant URL features which able to detect malicious URLs applications. By using machine learning approach with static analysis technique is used for detecting malicious URLs applications. Based on this combination as well as significant features, this paper shows promising results with higher detection accuracy.  The bio-inspired algorithm: particle swarm optimization (PSO) is used to optimized URLs features. In detecting malicious URLs, it shows that naïve Bayes and support vector machine (SVM) are able to achieve high detection accuracy with rate value of 99%, using URL as a feature.</span>


Sign in / Sign up

Export Citation Format

Share Document