Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor

2021 ◽  
Vol 5 (6) ◽  
pp. 1083-1089
Author(s):  
Nur Ghaniaviyanto Ramadhan

News is information disseminated by newspapers, radio, television, the internet, and other media. According to the survey results, there are many news titles from various topics spread on the internet. This of course makes newsreaders have difficulty when they want to find the desired news topic to read. These problems can be solved by grouping or so-called classification. The classification process is carried out of course by using a computerized process. This study aims to classify several news topics in Indonesian language using the KNN classification model and word2vec to convert words into vectors which aim to facilitate the classification process. The use of KNN in this study also determines the optimal K value to be used. In addition to using the classification model, this study also uses a word embedding-based model, namely word2vec. The results obtained using the word2vec and KNN models have an accuracy of 89.2% with a value of K=7. The word2vec and KNN models are also superior to the support vector machine, logistic regression, and random forest classification models.  

Author(s):  
Vinothini Selvaraju ◽  
P.A. Karthick ◽  
Ramakrishnan Swaminathan

In this work, an attempt has been made to analyze the influence of the frequencies bands in uterine electromyography (uEMG) signals on the detection of preterm birth. The signals recorded from the women’s abdomen during pregnancy are considered in this study. The signals are subjected to preprocessing using digital bandpass Butterworth filter and decomposed into different frequency bands namely, 0.3-1.0 Hz (F1), 1.0-2.0 Hz (F2) and 2.0-3.0Hz (F3). Spectral features namely, peak magnitude, peak frequency, mean frequency and median frequency are extracted from the power spectrum. Classification models namely, k-nearest neighbor, support vector machine and random forest are employed to distinguish the term and preterm conditions. The results show that the features extracted from these frequency bands are able to differentiate term and preterm condition. Particularly, the frequency band F3 performs better than other frequency bands. The features associated with these frequencies along with random forest classification model achieves a maximum accuracy of 75.2%. Thus, these measures could be used to accurately detect the preterm birth well in advance.


Author(s):  
Swati Pandey ◽  
Shruti Sharma ◽  
Shubham Kumar ◽  
Kanchan Bhatt ◽  
Dr. Rakesh Kumar Arora

Weather Forecasting is the attempt to predict the weather conditions based on parameters such as temperature, wind, humidity and rainfall. These parameters will be considered for experimental analysis to give the desired results. Data used in this project has been collected from various government institution sites. The algorithm used to predict weather includes Neural Networks(NN), Random Forest, Classification and Regression tree (C &RT), Support Vector Machine, K-nearest neighbor. The correlation analysis of the parameters will help in predicting the future values. This web based application we will have its own chat bot where user can directly communicate about their query related to Weather Forecast and can have experience of two-way communication.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6365
Author(s):  
Jung Hwan Kim ◽  
Chul Min Kim ◽  
Man-Sung Yim

This study proposes a scheme to identify insider threats in nuclear facilities through the detection of malicious intentions of potential insiders using subject-wise classification. Based on electroencephalography (EEG) signals, a classification model was developed to identify whether a subject has a malicious intention under scenarios of being forced to become an insider threat. The model also distinguishes insider threat scenarios from everyday conflict scenarios. To support model development, 21-channel EEG signals were measured on 25 healthy subjects, and sets of features were extracted from the time, time–frequency, frequency and nonlinear domains. To select the best use of the available features, automatic selection was performed by random-forest-based algorithms. The k-nearest neighbor, support vector machine with radial kernel, naïve Bayes, and multilayer perceptron algorithms were applied for the classification. By using EEG signals obtained while contemplating becoming an insider threat, the subject-wise model identified malicious intentions with 78.57% accuracy. The model also distinguished insider threat scenarios from everyday conflict scenarios with 93.47% accuracy. These findings could be utilized to support the development of insider threat mitigation systems along with existing trustworthiness assessments in the nuclear industry.


2021 ◽  
Author(s):  
Monika Jyotiyana ◽  
Nishtha Kesswani ◽  
Munish Kumar

Abstract Deep learning techniques are playing an important role in the classification and prediction of diseases. Undoubtedly deep learning has a promising future in the health sector, especially in medical imaging. The popularity of deep learning approaches is because of their ability to handle a large amount of data related to the patients with accuracy, reliability in a short span of time. However, the practitioners may take time in analyzing and generating reports. In this paper, we have proposed a Deep Neural Network-based classification model for Parkinson’s disease. Our proposed method is one such good example giving faster and more accurate results for the classification of Parkinson’s disease patients with excellent accuracy of 94.87%. Based on the attributes of the dataset of the patient, the model can be used for the identification of Parkinsonism's. We have also compared the results with other existing approaches like Linear Discriminant Analysis, Support Vector Machine, K-Nearest Neighbor, Decision Tree, Classification and Regression Trees, Random Forest, Linear Regression, Logistic Regression, Multi-Layer Perceptron, and Naive Bayes.


2021 ◽  
Vol 7 ◽  
pp. e437
Author(s):  
Arushi Agarwal ◽  
Purushottam Sharma ◽  
Mohammed Alshehri ◽  
Ahmed A. Mohamed ◽  
Osama Alfarraj

In today’s cyber world, the demand for the internet is increasing day by day, increasing the concern of network security. The aim of an Intrusion Detection System (IDS) is to provide approaches against many fast-growing network attacks (e.g., DDoS attack, Ransomware attack, Botnet attack, etc.), as it blocks the harmful activities occurring in the network system. In this work, three different classification machine learning algorithms—Naïve Bayes (NB), Support Vector Machine (SVM), and K-nearest neighbor (KNN)—were used to detect the accuracy and reducing the processing time of an algorithm on the UNSW-NB15 dataset and to find the best-suited algorithm which can efficiently learn the pattern of the suspicious network activities. The data gathered from the feature set comparison was then applied as input to IDS as data feeds to train the system for future intrusion behavior prediction and analysis using the best-fit algorithm chosen from the above three algorithms based on the performance metrics found. Also, the classification reports (Precision, Recall, and F1-score) and confusion matrix were generated and compared to finalize the support-validation status found throughout the testing phase of the model used in this approach.


2021 ◽  
Vol 11 (5) ◽  
pp. 2005
Author(s):  
Toan Huy Bui ◽  
Kazuhiko Hamamoto ◽  
May Phu Paing

Caries is the most well-known disease and relates to the oral health of billions of people around the world. Despite the importance and necessity of a well-designed detection method, studies in caries detection are still limited and show a restriction in performance. In this paper, we proposed a computer-aided diagnosis (CAD) method to detect caries among normal patients using dental radiographs. The proposed method mainly consists of two processes: feature extraction and classification. In the feature extraction phase, the chosen 2D tooth image was employed to extract deep activated features using a deep pre-trained model and geometric features using mathematic formulas. Both feature sets were then combined, called fusion feature, to complement each other defects. Then, the optimal fusion feature set was fed into well-known classification models such as support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), Naïve Bayes (NB), and random forest (RF) to determine the best classification model that fit the fusion features set and perform the most preeminent result. The results show 91.70%, 90.43%, and 92.67% for accuracy, sensitivity, and specificity, respectively. The proposed method has outperformed the previous state-of-the-art and shows promising results when none of the measured factors is less than 90%; therefore, the method is promising for dentists and capable of wide-scale implementation caries detection in hospitals.


Author(s):  
Rajni Bhalla ◽  
Jyoti

To construct a new text message classifier, this paper combines the K-nearest neighbor (KNN) classification approach with the support vector machine (SVM) training algorithm. The hybrid classification system is built by combining KNN and Support Vector Machine is abbreviated as K-VM. Due to its flexibility and reliability in handling different forms of classification activities, the KNN has been stated as one of the most frequently used classification approaches. The KNN faces a significant challenge in determining the acceptable value for parameter K to ensure good classification efficacy. This is because the value of parameter K has a significant effect on the KNN classifier's accuracy. The KNN is a method of learning that is based on laziness that holds the entire training examples before classification time, in addition to deciding the optimum value of parameter K. As a result, as the value of parameter K increases, the KNN's computational method becomes more intensive. This paper proposes the K-VM hybrid classification system to reduce the impact of parameters on classification accuracy. The Euclidean distance function is used to measure the average distance between the testing data point and each range in SVs in various categories. Experiments on a variety of benchmark datasets show that the K-VM approach outperforms the conventional KNN classification model in classification accuracy.


2021 ◽  
Vol 12 (11) ◽  
pp. 1886-1891
Author(s):  
Sarthika Dutt, Et. al.

Dysgraphia is a disorder that affects writing skills. Dysgraphia Identification at an early age of a child's development is a difficult task.  It can be identified using problematic skills associated with Dysgraphia difficulty. In this study motor ability, space knowledge, copying skill, Visual Spatial Response are some of the features included for Dysgraphia identification. The features that affect Dysgraphia disability are analyzed using a feature selection technique EN (Elastic Net). The significant features are classified using machine learning techniques. The classification models compared are KNN (K-Nearest Neighbors), Naïve Bayes, Decision tree, Random Forest, SVM (Support Vector Machine) on the Dysgraphia dataset. Results indicate the highest performance of the Random forest classification model for Dysgraphia identification.


Machines ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 56
Author(s):  
Pringgo Widyo Laksono ◽  
Takahide Kitamura ◽  
Joseph Muguro ◽  
Kojiro Matsushita ◽  
Minoru Sasaki ◽  
...  

This research focuses on the minimum process of classifying three upper arm movements (elbow extension, shoulder extension, combined shoulder and elbow extension) of humans with three electromyography (EMG) signals, to control a 2-degrees of freedom (DoF) robotic arm. The proposed minimum process consists of four parts: time divisions of data, Teager–Kaiser energy operator (TKEO), the conventional EMG feature extraction (i.e., the mean absolute value (MAV), zero crossings (ZC), slope-sign changes (SSC), and waveform length (WL)), and eight major machine learning models (i.e., decision tree (medium), decision tree (fine), k-Nearest Neighbor (KNN) (weighted KNN, KNN (fine), Support Vector Machine (SVM) (cubic and fine Gaussian SVM), Ensemble (bagged trees and subspace KNN). Then, we compare and investigate 48 classification models (i.e., 47 models are proposed, and 1 model is the conventional) based on five healthy subjects. The results showed that all the classification models achieved accuracies ranging between 74–98%, and the processing speed is below 40 ms and indicated acceptable controller delay for robotic arm control. Moreover, we confirmed that the classification model with no time division, with TKEO, and with ensemble (subspace KNN) had the best performance in accuracy rates at 96.67, recall rates at 99.66, and precision rates at 96.99. In short, the combination of the proposed TKEO and ensemble (subspace KNN) plays an important role to achieve the EMG classification.


2021 ◽  
Vol 5 (6) ◽  
pp. 1120-1126
Author(s):  
Jessica Widyadhana Iskandar ◽  
Yessica Nataliani

The Samsung Galaxy Z Flip 3 is one of the gadgets that are currently popular among the public because of its unique shape and features. Youtube is one of the social media that can be accessed and enjoyed by the public, one of which is gadget review content on the GadgetIn channel. Youtube can provide information, whether people accept or are interested in this new gadget or not. This study aims to determine the sentiment of a gadget producer. Based on the results of the analysis and testing that has been carried out on the Youtube comments of the Samsung Galaxy Z Flip 3 gadget with a total of 9,597 comments, more users gave positive opinions in the design aspect and negative opinions on the price, specifications and brand image aspects. By using the CRISP-DM model and comparing the Naïve Bayes (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN) classification methods, it is proven that the SVM classification model shows the best results. The average accuracy of SVM is 96.43% seen from four aspects, namely the design aspect of 94.40%, the price aspect of 97.44%, the specification aspect of 96.22%, and the brand image aspect of 97.63%.  


Sign in / Sign up

Export Citation Format

Share Document