scholarly journals Dissimilarity space reinforced with manifold learning and latent space modeling for improved pattern classification

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Azadeh Rezazadeh Hamedani ◽  
Mohammad Hossein Moattar ◽  
Yahya Forghani

AbstractDissimilarity representation plays a very important role in pattern recognition due to its ability to capture structural and relational information between samples. Dissimilarity space embedding is an approach in which each sample is represented as a vector based on its dissimilarity to some other samples called prototypes. However, lack of neighborhood-preserving, fixed and usually considerable prototype set for all training samples cause low classification accuracy and high computational complexity. To address these challenges, our proposed method creates dissimilarity space considering the neighbors of each data point on the manifold. For this purpose, Locally Linear Embedding (LLE) is used as an unsupervised manifold learning algorithm. The only goal of this step is to learn the global structure and the neighborhood of data on the manifold and mapping or dimension reduction is not performed. In order to create the dissimilarity space, each sample is compared only with its prototype set including its k-nearest neighbors on the manifold using the geodesic distance metric. Geodesic distance metric is used for the structure preserving and is computed using the weighted LLE neighborhood graph. Finally, Latent Space Model (LSM), is applied to reduce the dimensions of the Euclidean latent space so that the second challenge is resolved. To evaluate the resulted representation ad so called dissimilarity space, two common classifiers namely K Nearest Neighbor (KNN) and Support Vector Machine (SVM) are applied. Experiments on different datasets which included both Euclidean and non-Euclidean spaces, demonstrate that using the proposed approach, classifiers outperform the other basic dissimilarity spaces in both accuracy and runtime.

2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


Stock Trading has been one of the most important parts of the financial world for decades. People investing in the share market analyze the financial history of a corporation, the news related to it and study huge amounts of data so as to predict its stock price trend. The right investment i.e. buying and selling a company stock at the right time leads to monetary benefits and can make one a millionaire overnight. The stock market is an extremely fluctuating platform wherein data is produced in humongous quantities and is influenced by numerous disparate factors such as socio-political issues, financial activities like splits and dividends, news as well as rumors. This work proposes a novel system “IntelliFin” to predict the share market trend. The system uses the various stock market technical indicators along with the company's historical market data trends to predict the share prices. The system employs the sentiment determination of a company's financial and socio-political news for a more accurate prediction. This system is implemented using two models. The first is a hybrid LSTM model optimized by an ADAM optimizer. The other is a hybrid ML model which integrates a Support Vector Regressor, K-Nearest Neighbor classifier, an RF classifier and a Linear Regressor using a Majority Voting algorithm. Both models employ a sentiment analyzer to account for the news impacting the stock prices which is powered by NLP. The models are trained continuously using Reinforcement Learning implemented by the Q-Learning Algorithm to increase the consistency and accuracy. The project aims to support the inexperienced investors, who don't have enough experience in investing in the stock market and help them maximize their profit and minimize or eliminate the losses. The developed system will also serve as a tool for professional investors to help and aid their decision making.


2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


Author(s):  
Rajeev Rajan ◽  
B. S. Shajee Mohan

Automatic music genre classification based on distance metric learning (DML) is proposed in this paper. Three types of timbral descriptors, namely, mel-frequency cepstral coefficient (MFCC) features, modified group delay features (MODGDF) and low-level timbral feature sets are combined at the feature level. We experimented with k nearest neighbor (kNN) and support vector machine (SVM)-based classifiers for standard and DML kernels (DMLK) using GTZAN and Folk music dataset. Standard kernel-based kNN and SVM-based classifiers report classification accuracy (in%) of 79.03 and 90.16, respectively, on GTZAN dataset and 86.60 and 92.26, respectively, for Folk music dataset, with the best performing RBF kernel. A further improvement was observed when DML kernels were used in place of standard kernels in the kernel kNN and SVM-based classifiers with an accuracy of 84.46%, 92.74% (GTZAN), 90.00 and 96.23 (Folk music dataset) for DMLK-kNN and DMLK-SVM, respectively. The results demonstrate the potential of DML kernels in music genre classification task.


Author(s):  
Wonju Seo ◽  
You-Bin Lee ◽  
Seunghyun Lee ◽  
Sang-Man Jin ◽  
Sung-Min Park

Abstract Background For an effective artificial pancreas (AP) system and an improved therapeutic intervention with continuous glucose monitoring (CGM), predicting the occurrence of hypoglycemia accurately is very important. While there have been many studies reporting successful algorithms for predicting nocturnal hypoglycemia, predicting postprandial hypoglycemia still remains a challenge due to extreme glucose fluctuations that occur around mealtimes. The goal of this study is to evaluate the feasibility of easy-to-use, computationally efficient machine-learning algorithm to predict postprandial hypoglycemia with a unique feature set. Methods We use retrospective CGM datasets of 104 people who had experienced at least one hypoglycemia alert value during a three-day CGM session. The algorithms were developed based on four machine learning models with a unique data-driven feature set: a random forest (RF), a support vector machine using a linear function or a radial basis function, a K-nearest neighbor, and a logistic regression. With 5-fold cross-subject validation, the average performance of each model was calculated to compare and contrast their individual performance. The area under a receiver operating characteristic curve (AUC) and the F1 score were used as the main criterion for evaluating the performance. Results In predicting a hypoglycemia alert value with a 30-min prediction horizon, the RF model showed the best performance with the average AUC of 0.966, the average sensitivity of 89.6%, the average specificity of 91.3%, and the average F1 score of 0.543. In addition, the RF showed the better predictive performance for postprandial hypoglycemic events than other models. Conclusion In conclusion, we showed that machine-learning algorithms have potential in predicting postprandial hypoglycemia, and the RF model could be a better candidate for the further development of postprandial hypoglycemia prediction algorithm to advance the CGM technology and the AP technology further.


Computers ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 77 ◽  
Author(s):  
Muhammad Azfar Firdaus Azlah ◽  
Lee Suan Chua ◽  
Fakhrul Razan Rahmad ◽  
Farah Izana Abdullah ◽  
Sharifah Rafidah Wan Alwi

Plant systematics can be classified and recognized based on their reproductive system (flowers) and leaf morphology. Neural networks is one of the most popular machine learning algorithms for plant leaf classification. The commonly used neutral networks are artificial neural network (ANN), probabilistic neural network (PNN), convolutional neural network (CNN), k-nearest neighbor (KNN) and support vector machine (SVM), even some studies used combined techniques for accuracy improvement. The utilization of several varying preprocessing techniques, and characteristic parameters in feature extraction appeared to improve the performance of plant leaf classification. The findings of previous studies are critically compared in terms of their accuracy based on the applied neural network techniques. This paper aims to review and analyze the implementation and performance of various methodologies on plant classification. Each technique has its advantages and limitations in leaf pattern recognition. The quality of leaf images plays an important role, and therefore, a reliable source of leaf database must be used to establish the machine learning algorithm prior to leaf recognition and validation.


Author(s):  
Law Kumar Singh ◽  
Pooja ◽  
Hitendra Garg ◽  
Munish Khanna

Glaucoma is a progressive and constant eye disease that leads to a deficiency of peripheral vision and, at last, leads to irrevocable loss of vision. Detection and identification of glaucoma are essential for earlier treatment and to reduce vision loss. This motivates us to present a study on intelligent diagnosis system based on machine learning algorithm(s) for glaucoma identification using three-dimensional optical coherence tomography (OCT) data. This experimental work is attempted on 70 glaucomatous and 70 healthy eyes from combination of public (Mendeley) dataset and private dataset. Forty-five vital features were extracted using two approaches from the OCT images. K-nearest neighbor (KNN), linear discriminant analysis (LDA), decision tree, random forest, support vector machine (SVM) were applied for the categorization of OCT images among the glaucomatous and non-glaucomatous class. The largest AUC is achieved by KNN (0.97). The accuracy is obtained on fivefold cross-validation techniques. This study will facilitate to reach high standards in glaucoma diagnosis.


2017 ◽  
Vol 15 (1) ◽  
pp. 52-68
Author(s):  
Ming Liu ◽  
Yuqi Wang ◽  
Weiwei Xu ◽  
Li Liu

The number of Chinese engineering students has increased greatly since 1999. Rating the quality of these students' English essays has thus become time-consuming and challenging. This paper presents a novel automatic essay scoring algorithm called PSO-SVR, based on a machine learning algorithm, Support Vector Machine for Regression (SVR), and a computational intelligence algorithm, Particle Swarm Optimization, which optimizes the parameters of SVR kernel functions. Three groups of essays, written by chemical, electrical and computer science engineering majors respectively, were used for evaluation. The study result shows that this PSO-SVR outperforms traditional essay scoring algorithms, such as multiple linear regression, support vector machine for regression and K Nearest Neighbor algorithm. It indicates that PSO-SVR is more robust in predicting irregular datasets, because the repeated use of simple content words may result in the low score of an essay, even though the system detects higher cohesion but no spelling error.


2021 ◽  
Vol 10 (5) ◽  
pp. 2530-2538
Author(s):  
Pulung Nurtantio Andono ◽  
Eko Hari Rachmawanto ◽  
Nanna Suryana Herman ◽  
Kunio Kondo

Orchid flower as ornamental plants with a variety of types where one type of orchid has various characteristics in the form of different shapes and colors. Here, we chosen support vector machine (SVM), Naïve Bayes, and k-nearest neighbor algorithm which generates text input. This system aims to assist the community in recognizing orchid plants based on their type. We used more than 2250 and 1500 images for training and testing respectively which consists of 15 types. Testing result shown impact analysis of comparison of three supervised algorithm using extraction or not and several variety distance. Here, we used SVM in Linear, Polynomial, and Gaussian kernel while k-nearest neighbor operated in distance starting from K1 until K11. Based on experimental results provide Linear kernel as best classifier and extraction process had been increase accuracy. Compared with Naïve Bayes in 66%, and a highest KNN in K=1 and d=1 is 98%, SVM had a better accuracy. SVM-GLCM-HSV better than SVM-HSV only that achieved 98.13% and 93.06% respectively both in Linear kernel. On the other side, a combination of SVM-KNN yield highest accuracy better than selected algorithm here.


Author(s):  
Bhavesh Shah ◽  
Tushar Nimse ◽  
Vikas Choudhary ◽  
Vijendra Jadhav

In the current scenario, this is difficult to predict students’ future results based on his/her current performance. As the outcome of this, the teacher can advise him/her to overcome the poor result, and also it can coach the student. By finding out the dependencies for final examinations. The system suggests to students about subject/course selection for the upcoming semester and act as roles of adviser/teacher. Due to improper advice and monitoring a lot of student’s futures in dark. This is difficult for a teacher to analyze and monitors the performance of each and every student. The system can give feedback to teachers about how to improve student performance. This paper carried out a literature review from the year 2003 to 2021. The system predicts his/her future results by applying Machine Learning Algorithms like k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), and Naive Bayes at an earlier stage.


Sign in / Sign up

Export Citation Format

Share Document