Dissimilarity space reinforced with manifold learning and latent space modeling for improved pattern classification

AbstractDissimilarity representation plays a very important role in pattern recognition due to its ability to capture structural and relational information between samples. Dissimilarity space embedding is an approach in which each sample is represented as a vector based on its dissimilarity to some other samples called prototypes. However, lack of neighborhood-preserving, fixed and usually considerable prototype set for all training samples cause low classification accuracy and high computational complexity. To address these challenges, our proposed method creates dissimilarity space considering the neighbors of each data point on the manifold. For this purpose, Locally Linear Embedding (LLE) is used as an unsupervised manifold learning algorithm. The only goal of this step is to learn the global structure and the neighborhood of data on the manifold and mapping or dimension reduction is not performed. In order to create the dissimilarity space, each sample is compared only with its prototype set including its k-nearest neighbors on the manifold using the geodesic distance metric. Geodesic distance metric is used for the structure preserving and is computed using the weighted LLE neighborhood graph. Finally, Latent Space Model (LSM), is applied to reduce the dimensions of the Euclidean latent space so that the second challenge is resolved. To evaluate the resulted representation ad so called dissimilarity space, two common classifiers namely K Nearest Neighbor (KNN) and Support Vector Machine (SVM) are applied. Experiments on different datasets which included both Euclidean and non-Euclidean spaces, demonstrate that using the proposed approach, classifiers outperform the other basic dissimilarity spaces in both accuracy and runtime.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

IntelliFin: Advanced Stock Prediction using Hybrid ML and LSTM Model with Financial Indicators powered by Sentiment Determination using NLP

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d8437.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 428-433

Keyword(s):

Stock Market ◽

Stock Prices ◽

Stock Price ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Majority Voting ◽

Support Vector ◽

K Nearest Neighbor ◽

Financial History ◽

The Right

Stock Trading has been one of the most important parts of the financial world for decades. People investing in the share market analyze the financial history of a corporation, the news related to it and study huge amounts of data so as to predict its stock price trend. The right investment i.e. buying and selling a company stock at the right time leads to monetary benefits and can make one a millionaire overnight. The stock market is an extremely fluctuating platform wherein data is produced in humongous quantities and is influenced by numerous disparate factors such as socio-political issues, financial activities like splits and dividends, news as well as rumors. This work proposes a novel system “IntelliFin” to predict the share market trend. The system uses the various stock market technical indicators along with the company's historical market data trends to predict the share prices. The system employs the sentiment determination of a company's financial and socio-political news for a more accurate prediction. This system is implemented using two models. The first is a hybrid LSTM model optimized by an ADAM optimizer. The other is a hybrid ML model which integrates a Support Vector Regressor, K-Nearest Neighbor classifier, an RF classifier and a Linear Regressor using a Majority Voting algorithm. Both models employ a sentiment analyzer to account for the news impacting the stock prices which is powered by NLP. The models are trained continuously using Reinforcement Learning implemented by the Q-Learning Algorithm to increase the consistency and accuracy. The project aims to support the inexperienced investors, who don't have enough experience in investing in the stock market and help them maximize their profit and minimize or eliminate the losses. The developed system will also serve as a tool for professional investors to help and aid their decision making.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Distance Metric Learnt Kernel-Based Music Classification Using Timbral Descriptors

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421510149 ◽

2021 ◽

Author(s):

Rajeev Rajan ◽

B. S. Shajee Mohan

Keyword(s):

Folk Music ◽

Nearest Neighbor ◽

Metric Learning ◽

Support Vector ◽

Distance Metric ◽

K Nearest Neighbor ◽

Genre Classification ◽

Rbf Kernel ◽

Music Genre ◽

Music Genre Classification

Automatic music genre classification based on distance metric learning (DML) is proposed in this paper. Three types of timbral descriptors, namely, mel-frequency cepstral coefficient (MFCC) features, modified group delay features (MODGDF) and low-level timbral feature sets are combined at the feature level. We experimented with k nearest neighbor (kNN) and support vector machine (SVM)-based classifiers for standard and DML kernels (DMLK) using GTZAN and Folk music dataset. Standard kernel-based kNN and SVM-based classifiers report classification accuracy (in%) of 79.03 and 90.16, respectively, on GTZAN dataset and 86.60 and 92.26, respectively, for Folk music dataset, with the best performing RBF kernel. A further improvement was observed when DML kernels were used in place of standard kernels in the kernel kNN and SVM-based classifiers with an accuracy of 84.46%, 92.74% (GTZAN), 90.00 and 96.23 (Folk music dataset) for DMLK-kNN and DMLK-SVM, respectively. The results demonstrate the potential of DML kernels in music genre classification task.

Download Full-text

A machine-learning approach to predict postprandial hypoglycemia

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0943-4 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 7

Author(s):

Wonju Seo ◽

You-Bin Lee ◽

Seunghyun Lee ◽

Sang-Man Jin ◽

Sung-Min Park

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Characteristic Curve ◽

Artificial Pancreas ◽

Individual Performance ◽

Machine Learning Algorithms ◽

Prediction Algorithm ◽

Support Vector ◽

K Nearest Neighbor

Abstract Background For an effective artificial pancreas (AP) system and an improved therapeutic intervention with continuous glucose monitoring (CGM), predicting the occurrence of hypoglycemia accurately is very important. While there have been many studies reporting successful algorithms for predicting nocturnal hypoglycemia, predicting postprandial hypoglycemia still remains a challenge due to extreme glucose fluctuations that occur around mealtimes. The goal of this study is to evaluate the feasibility of easy-to-use, computationally efficient machine-learning algorithm to predict postprandial hypoglycemia with a unique feature set. Methods We use retrospective CGM datasets of 104 people who had experienced at least one hypoglycemia alert value during a three-day CGM session. The algorithms were developed based on four machine learning models with a unique data-driven feature set: a random forest (RF), a support vector machine using a linear function or a radial basis function, a K-nearest neighbor, and a logistic regression. With 5-fold cross-subject validation, the average performance of each model was calculated to compare and contrast their individual performance. The area under a receiver operating characteristic curve (AUC) and the F1 score were used as the main criterion for evaluating the performance. Results In predicting a hypoglycemia alert value with a 30-min prediction horizon, the RF model showed the best performance with the average AUC of 0.966, the average sensitivity of 89.6%, the average specificity of 91.3%, and the average F1 score of 0.543. In addition, the RF showed the better predictive performance for postprandial hypoglycemic events than other models. Conclusion In conclusion, we showed that machine-learning algorithms have potential in predicting postprandial hypoglycemia, and the RF model could be a better candidate for the further development of postprandial hypoglycemia prediction algorithm to advance the CGM technology and the AP technology further.

Download Full-text

Review on Techniques for Plant Leaf Classification and Recognition

Computers ◽

10.3390/computers8040077 ◽

2019 ◽

Vol 8 (4) ◽

pp. 77 ◽

Cited By ~ 8

Author(s):

Muhammad Azfar Firdaus Azlah ◽

Lee Suan Chua ◽

Fakhrul Razan Rahmad ◽

Farah Izana Abdullah ◽

Sharifah Rafidah Wan Alwi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Probabilistic Neural Network ◽

Machine Learning Algorithms ◽

Support Vector ◽

Plant Systematics ◽

K Nearest Neighbor ◽

Plant Leaf

Plant systematics can be classified and recognized based on their reproductive system (flowers) and leaf morphology. Neural networks is one of the most popular machine learning algorithms for plant leaf classification. The commonly used neutral networks are artificial neural network (ANN), probabilistic neural network (PNN), convolutional neural network (CNN), k-nearest neighbor (KNN) and support vector machine (SVM), even some studies used combined techniques for accuracy improvement. The utilization of several varying preprocessing techniques, and characteristic parameters in feature extraction appeared to improve the performance of plant leaf classification. The findings of previous studies are critically compared in terms of their accuracy based on the applied neural network techniques. This paper aims to review and analyze the implementation and performance of various methodologies on plant classification. Each technique has its advantages and limitations in leaf pattern recognition. The quality of leaf images plays an important role, and therefore, a reliable source of leaf database must be used to establish the machine learning algorithm prior to leaf recognition and validation.

Download Full-text

An Artificial Intelligence-Based Smart System for Early Glaucoma Recognition Using OCT Images

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20210701.oa3 ◽

2021 ◽

Vol 12 (4) ◽

pp. 32-59

Author(s):

Law Kumar Singh ◽

Pooja ◽

Hitendra Garg ◽

Munish Khanna

Keyword(s):

Vision Loss ◽

Nearest Neighbor ◽

Peripheral Vision ◽

Learning Algorithm ◽

Three Dimensional ◽

Support Vector ◽

K Nearest Neighbor ◽

Linear Discriminant ◽

Glaucoma Diagnosis ◽

Detection And Identification

Glaucoma is a progressive and constant eye disease that leads to a deficiency of peripheral vision and, at last, leads to irrevocable loss of vision. Detection and identification of glaucoma are essential for earlier treatment and to reduce vision loss. This motivates us to present a study on intelligent diagnosis system based on machine learning algorithm(s) for glaucoma identification using three-dimensional optical coherence tomography (OCT) data. This experimental work is attempted on 70 glaucomatous and 70 healthy eyes from combination of public (Mendeley) dataset and private dataset. Forty-five vital features were extracted using two approaches from the OCT images. K-nearest neighbor (KNN), linear discriminant analysis (LDA), decision tree, random forest, support vector machine (SVM) were applied for the categorization of OCT images among the glaucomatous and non-glaucomatous class. The largest AUC is achieved by KNN (0.97). The accuracy is obtained on fivefold cross-validation techniques. This study will facilitate to reach high standards in glaucoma diagnosis.

Download Full-text

Automated Scoring of Chinese Engineering Students' English Essays

International Journal of Distance Education Technologies ◽

10.4018/ijdet.2017010104 ◽

2017 ◽

Vol 15 (1) ◽

pp. 52-68

Author(s):

Ming Liu ◽

Yuqi Wang ◽

Weiwei Xu ◽

Li Liu

Keyword(s):

Support Vector Machine ◽

Engineering Students ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Kernel Functions ◽

Support Vector ◽

Spelling Error ◽

K Nearest Neighbor ◽

Essay Scoring ◽

Scoring Algorithms

The number of Chinese engineering students has increased greatly since 1999. Rating the quality of these students' English essays has thus become time-consuming and challenging. This paper presents a novel automatic essay scoring algorithm called PSO-SVR, based on a machine learning algorithm, Support Vector Machine for Regression (SVR), and a computational intelligence algorithm, Particle Swarm Optimization, which optimizes the parameters of SVR kernel functions. Three groups of essays, written by chemical, electrical and computer science engineering majors respectively, were used for evaluation. The study result shows that this PSO-SVR outperforms traditional essay scoring algorithms, such as multiple linear regression, support vector machine for regression and K Nearest Neighbor algorithm. It indicates that PSO-SVR is more robust in predicting irregular datasets, because the repeated use of simple content words may result in the low score of an essay, even though the system detects higher cohesion but no spelling error.

Download Full-text

Orchid types classification using supervised learning algorithm based on feature and color extraction

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i5.3118 ◽

2021 ◽

Vol 10 (5) ◽

pp. 2530-2538

Author(s):

Pulung Nurtantio Andono ◽

Eko Hari Rachmawanto ◽

Nanna Suryana Herman ◽

Kunio Kondo

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Gaussian Kernel ◽

Support Vector ◽

Linear Kernel ◽

K Nearest Neighbor ◽

Text Input ◽

Better Than

Orchid flower as ornamental plants with a variety of types where one type of orchid has various characteristics in the form of different shapes and colors. Here, we chosen support vector machine (SVM), Naïve Bayes, and k-nearest neighbor algorithm which generates text input. This system aims to assist the community in recognizing orchid plants based on their type. We used more than 2250 and 1500 images for training and testing respectively which consists of 15 types. Testing result shown impact analysis of comparison of three supervised algorithm using extraction or not and several variety distance. Here, we used SVM in Linear, Polynomial, and Gaussian kernel while k-nearest neighbor operated in distance starting from K1 until K11. Based on experimental results provide Linear kernel as best classifier and extraction process had been increase accuracy. Compared with Naïve Bayes in 66%, and a highest KNN in K=1 and d=1 is 98%, SVM had a better accuracy. SVM-GLCM-HSV better than SVM-HSV only that achieved 98.13% and 93.06% respectively both in Linear kernel. On the other side, a combination of SVM-KNN yield highest accuracy better than selected algorithm here.

Download Full-text

An Experiment to Determine Student Performance Prediction using Machine Learning Algorithm

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1717 ◽

2021 ◽

pp. 138-141

Author(s):

Bhavesh Shah ◽

Tushar Nimse ◽

Vikas Choudhary ◽

Vijendra Jadhav

Keyword(s):

Machine Learning ◽

Student Performance ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Improve Student Performance ◽

Current Scenario ◽

Selection For

In the current scenario, this is difficult to predict students’ future results based on his/her current performance. As the outcome of this, the teacher can advise him/her to overcome the poor result, and also it can coach the student. By finding out the dependencies for final examinations. The system suggests to students about subject/course selection for the upcoming semester and act as roles of adviser/teacher. Due to improper advice and monitoring a lot of student’s futures in dark. This is difficult for a teacher to analyze and monitors the performance of each and every student. The system can give feedback to teachers about how to improve student performance. This paper carried out a literature review from the year 2003 to 2021. The system predicts his/her future results by applying Machine Learning Algorithms like k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), and Naive Bayes at an earlier stage.

Download Full-text