Dynamic Nearest Neighbor: An Improved Machine Learning Classifier and Its Application in Finances

The presence of machine learning, data mining and related disciplines is increasingly evident in everyday environments. The support for the applications of learning techniques in topics related to economic risk assessment, among other financial topics of interest, is relevant for us as human beings. The content of this paper consists of a proposal of a new supervised learning algorithm and its application in real world datasets related to finance, called D1-NN (Dynamic 1-Nearest Neighbor). The D1-NN performance is competitive against the main state of the art algorithms in solving finance-related problems. The effectiveness of the new D1-NN classifier was compared against five supervised classifiers of the most important approaches (Bayes, nearest neighbors, support vector machines, classifier ensembles, and neural networks), with superior results overall.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

Comparison of Machine Learning algorithm for COVID-19 Death Risk Prediction

10.21203/rs.3.rs-196077/v1 ◽

2021 ◽

Author(s):

Praveeen Anandhanathan ◽

Priyanka Gopalan

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Techniques ◽

Support Vector ◽

Nearest Neighbour ◽

Decision Tree Algorithm ◽

The Past ◽

Random Forest Method ◽

Learning Techniques ◽

The World

Abstract Coronavirus disease (COVID-19) is spreading across the world. Since at first it has appeared in Wuhan, China in December 2019, it has become a serious issue across the globe. There are no accurate resources to predict and find the disease. So, by knowing the past patients’ records, it could guide the clinicians to fight against the pandemic. Therefore, for the prediction of healthiness from symptoms Machine learning techniques can be implemented. From this we are going to analyse only the symptoms which occurs in every patient. These predictions can help clinicians in the easier manner to cure the patients. Already for prediction of many of the diseases, techniques like SVM (Support vector Machine), Fuzzy k-Means Clustering, Decision Tree algorithm, Random Forest Method, ANN (Artificial Neural Network), KNN (k-Nearest Neighbour), Naïve Bayes, Linear Regression model are used. As we haven’t faced this disease before, we can’t say which technique will give the maximum accuracy. So, we are going to provide an efficient result by comparing all the such algorithms in RStudio.

Download Full-text

Classification Framework for Healthy Hairs and Alopecia Areata: A Machine Learning (ML) Approach

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/1102083 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Choudhary Sobhan Shakeel ◽

Saad Jawaid Khan ◽

Beenish Chaudhry ◽

Syeda Fatima Aijaz ◽

Umer Hassan

Keyword(s):

Machine Learning ◽

Alopecia Areata ◽

Nearest Neighbor ◽

Autoimmune Disorder ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Framework ◽

Learning Techniques ◽

Machine Learning Model ◽

Image Set

Alopecia areata is defined as an autoimmune disorder that results in hair loss. The latest worldwide statistics have exhibited that alopecia areata has a prevalence of 1 in 1000 and has an incidence of 2%. Machine learning techniques have demonstrated potential in different areas of dermatology and may play a significant role in classifying alopecia areata for better prediction and diagnosis. We propose a framework pertaining to the classification of healthy hairs and alopecia areata. We used 200 images of healthy hairs from the Figaro1k dataset and 68 hair images of alopecia areata from the Dermnet dataset to undergo image preprocessing including enhancement and segmentation. This was followed by feature extraction including texture, shape, and color. Two classification techniques, i.e., support vector machine (SVM) and k -nearest neighbor (KNN), are then applied to train a machine learning model with 70% of the images. The remaining image set was used for the testing phase. With a 10-fold cross-validation, the reported accuracies of SVM and KNN are 91.4% and 88.9%, respectively. Paired sample T -test showed significant differences between the two accuracies with a p < 0.001 . SVM generated higher accuracy (91.4%) as compared to KNN (88.9%). The findings of our study demonstrate potential for better prediction in the field of dermatology.

Download Full-text

A Comparative Study on University Admission Predictions Using Machine Learning Techniques

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2172107 ◽

2021 ◽

pp. 537-548

Author(s):

Prince Golden ◽

Kasturi Mojesh ◽

Lakshmi Madhavi Devarapalli ◽

Pabbidi Naga Suba Reddy ◽

Srigiri Rajesh ◽

...

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Education Systems ◽

University Admissions ◽

The Troubles ◽

Learning Techniques ◽

Cloud Servers

In this era of Cloud Computing and Machine Learning where every kind of work is getting automated through machine learning techniques running off of cloud servers to complete them more efficiently and quickly, what needs to be addressed is how we are changing our education systems and minimizing the troubles related to our education systems with all the advancements in technology. One of the the prominent issues in front of students has always been their graduate admissions and the colleges they should apply to. It has always been difficult to decide as to which university or college should they apply according to their marks obtained during their undergrad as not only it’s a tedious and time consuming thing to apply for number of universities at a single time but also expensive. Thus many machine learning solutions have emerged in the recent years to tackle this problem and provide various predictions, estimations and consultancies so that students can easily make their decisions about applying to the universities with higher chances of admission. In this paper, we review the machine learning techniques which are prevalent and provide accurate predictions regarding university admissions. We compare different regression models and machine learning methodologies such as, Random Forest, Linear Regression, Stacked Ensemble Learning, Support Vector Regression, Decision Trees, KNN(K-Nearest Neighbor) etc, used by other authors in their works and try to reach on a conclusion as to which technique will provide better accuracy.

Download Full-text

The Role of Medication Data to Enhance the Prediction of Alzheimer’s Progression Using Machine Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/8439655 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Shaker El-Sappagh ◽

Tamer Abuhmed ◽

Bader Alouffi ◽

Radhya Sahal ◽

Naglaa Abdelhade ◽

...

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Multimodal Data ◽

Learning Techniques ◽

Early Progression ◽

Neuroimaging Data

Early detection of Alzheimer’s disease (AD) progression is crucial for proper disease management. Most studies concentrate on neuroimaging data analysis of baseline visits only. They ignore the fact that AD is a chronic disease and patient’s data are naturally longitudinal. In addition, there are no studies that examine the effect of dementia medicines on the behavior of the disease. In this paper, we propose a machine learning-based architecture for early progression detection of AD based on multimodal data of AD drugs and cognitive scores data. We compare the performance of five popular machine learning techniques including support vector machine, random forest, logistic regression, decision tree, and K-nearest neighbor to predict AD progression after 2.5 years. Extensive experiments are performed using an ADNI dataset of 1036 subjects. The cross-validation performance of most algorithms has been improved by fusing the drugs and cognitive scores data. The results indicate the important role of patient’s taken drugs on the progression of AD disease.

Download Full-text

Machine Learning Methods as a Test Bed for EEG Analysis in BCI Paradigms

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch081 ◽

2020 ◽

pp. 1577-1597

Author(s):

Kusuma Mohanchandra ◽

Snehanshu Saha

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Analytical Models ◽

Support Vector ◽

Test Bed ◽

K Nearest Neighbor ◽

Eeg Signals ◽

Broad Perspective ◽

Learning Techniques

Machine learning techniques, is a crucial tool to build analytical models in EEG data analysis. These models are an excellent choice for analyzing the high variability in EEG signals. The advancement in EEG-based Brain-Computer Interfaces (BCI) demands advanced processing tools and algorithms for exploration of EEG signals. In the context of the EEG-based BCI for speech communication, few classification and clustering techniques is presented in this book chapter. A broad perspective of the techniques and implementation of the weighted k-Nearest Neighbor (k-NN), Support vector machine (SVM), Decision Tree (DT) and Random Forest (RF) is explained and their usage in EEG signal analysis is mentioned. We suggest that these machine learning techniques provides not only potentially valuable control mechanism for BCI but also a deeper understanding of neuropathological mechanisms underlying the brain in ways that are not possible by conventional linear analysis.

Download Full-text

On the Use of Artificial Intelligence Techniques in Crop Monitoring and Disease Identification

Precision Agriculture Technologies for Food Security and Sustainability - Advances in Environmental Engineering and Green Technologies ◽

10.4018/978-1-7998-5000-7.ch007 ◽

2021 ◽

pp. 166-186

Author(s):

Muzaffer Kanaan ◽

Rüştü Akay ◽

Canset Koçer Baykara

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Nearest Neighbor ◽

Crop Yields ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Disease Identification ◽

Use Of Technology ◽

Learning Techniques

The use of technology for the purpose of improving crop yields, quality and quantity of the harvest, as well as maintaining the quality of the crop against adverse environmental elements (such as rodent or insect infestation, as well as microbial disease agents) is becoming more critical for farming practice worldwide. One of the technology areas that is proving to be most promising in this area is artificial intelligence, or more specifically, machine learning techniques. This chapter aims to give the reader an overview of how machine learning techniques can help solve the problem of monitoring crop quality and disease identification. The fundamental principles are illustrated through two different case studies, one involving the use of artificial neural networks for harvested grain condition monitoring and the other concerning crop disease identification using support vector machines and k-nearest neighbor algorithm.

Download Full-text

A machine-learning approach to predict postprandial hypoglycemia

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0943-4 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 7

Author(s):

Wonju Seo ◽

You-Bin Lee ◽

Seunghyun Lee ◽

Sang-Man Jin ◽

Sung-Min Park

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Characteristic Curve ◽

Artificial Pancreas ◽

Individual Performance ◽

Machine Learning Algorithms ◽

Prediction Algorithm ◽

Support Vector ◽

K Nearest Neighbor

Abstract Background For an effective artificial pancreas (AP) system and an improved therapeutic intervention with continuous glucose monitoring (CGM), predicting the occurrence of hypoglycemia accurately is very important. While there have been many studies reporting successful algorithms for predicting nocturnal hypoglycemia, predicting postprandial hypoglycemia still remains a challenge due to extreme glucose fluctuations that occur around mealtimes. The goal of this study is to evaluate the feasibility of easy-to-use, computationally efficient machine-learning algorithm to predict postprandial hypoglycemia with a unique feature set. Methods We use retrospective CGM datasets of 104 people who had experienced at least one hypoglycemia alert value during a three-day CGM session. The algorithms were developed based on four machine learning models with a unique data-driven feature set: a random forest (RF), a support vector machine using a linear function or a radial basis function, a K-nearest neighbor, and a logistic regression. With 5-fold cross-subject validation, the average performance of each model was calculated to compare and contrast their individual performance. The area under a receiver operating characteristic curve (AUC) and the F1 score were used as the main criterion for evaluating the performance. Results In predicting a hypoglycemia alert value with a 30-min prediction horizon, the RF model showed the best performance with the average AUC of 0.966, the average sensitivity of 89.6%, the average specificity of 91.3%, and the average F1 score of 0.543. In addition, the RF showed the better predictive performance for postprandial hypoglycemic events than other models. Conclusion In conclusion, we showed that machine-learning algorithms have potential in predicting postprandial hypoglycemia, and the RF model could be a better candidate for the further development of postprandial hypoglycemia prediction algorithm to advance the CGM technology and the AP technology further.

Download Full-text

AUTHORSHIP ATTRIBUTION BASED ON FEATURE SET SUBSPACING ENSEMBLES

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213006002965 ◽

2006 ◽

Vol 15 (05) ◽

pp. 823-838 ◽

Cited By ~ 29

Author(s):

EFSTATHIOS STAMATATOS

Keyword(s):

Machine Learning ◽

Text Categorization ◽

Machine Learning Techniques ◽

Authorship Attribution ◽

Support Vector ◽

Classifier Ensembles ◽

Simple Method ◽

Text Corpora ◽

Learning Techniques ◽

Feature Spaces

Authorship attribution can assist the criminal investigation procedure as well as cybercrime analysis. This task can be viewed as a single-label multi-class text categorization problem. Given that the style of a text can be represented as mere word frequencies selected in a language-independent method, suitable machine learning techniques able to deal with high dimensional feature spaces and sparse data can be directly applied to solve this problem. This paper focuses on classifier ensembles based on feature set subspacing. It is shown that an effective ensemble can be constructed using, exhaustive disjoint subspacing, a simple method producing many poor but diverse base classifiers. The simple model can be enhanced by a variation of the technique of cross-validated committees applied to the feature set. Experiments on two benchmark text corpora demonstrate the effectiveness of the presented method improving previously reported results and compare it to support vector machines, an alternative suitable machine learning approach to authorship attribution.

Download Full-text

Predicting in-Hospital Mortality of Patients with COVID-19 Using Machine Learning Techniques

Journal of Personalized Medicine ◽

10.3390/jpm11050343 ◽

2021 ◽

Vol 11 (5) ◽

pp. 343

Author(s):

Fabiana Tezza ◽

Giulia Lorenzoni ◽

Danila Azzolina ◽

Sofia Barbar ◽

Lucia Anna Carmela Leone ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hospital Mortality ◽

Learning Algorithm ◽

Vital Signs ◽

Mortality Prediction ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Learning Techniques

The present work aims to identify the predictors of COVID-19 in-hospital mortality testing a set of Machine Learning Techniques (MLTs), comparing their ability to predict the outcome of interest. The model with the best performance will be used to identify in-hospital mortality predictors and to build an in-hospital mortality prediction tool. The study involved patients with COVID-19, proved by PCR test, admitted to the “Ospedali Riuniti Padova Sud” COVID-19 referral center in the Veneto region, Italy. The algorithms considered were the Recursive Partition Tree (RPART), the Support Vector Machine (SVM), the Gradient Boosting Machine (GBM), and Random Forest. The resampled performances were reported for each MLT, considering the sensitivity, specificity, and the Receiving Operative Characteristic (ROC) curve measures. The study enrolled 341 patients. The median age was 74 years, and the male gender was the most prevalent. The Random Forest algorithm outperformed the other MLTs in predicting in-hospital mortality, with a ROC of 0.84 (95% C.I. 0.78–0.9). Age, together with vital signs (oxygen saturation and the quick SOFA) and lab parameters (creatinine, AST, lymphocytes, platelets, and hemoglobin), were found to be the strongest predictors of in-hospital mortality. The present work provides insights for the prediction of in-hospital mortality of COVID-19 patients using a machine-learning algorithm.

Download Full-text