Expert Hypertension Detection System Featuring Pulse Plethysmograph Signals and Hybrid Feature Selection and Reduction Scheme

Hypertension is an antecedent to cardiac disorders. According to the World Health Organization (WHO), the number of people affected with hypertension will reach around 1.56 billion by 2025. Early detection of hypertension is imperative to prevent the complications caused by cardiac abnormalities. Hypertension usually possesses no apparent detectable symptoms; hence, the control rate is significantly low. Computer-aided diagnosis based on machine learning and signal analysis has recently been applied to identify biomarkers for the accurate prediction of hypertension. This research proposes a new expert hypertension detection system (EHDS) from pulse plethysmograph (PuPG) signals for the categorization of normal and hypertension. The PuPG signal data set, including rich information of cardiac activity, was acquired from healthy and hypertensive subjects. The raw PuPG signals were preprocessed through empirical mode decomposition (EMD) by decomposing a signal into its constituent components. A combination of multi-domain features was extracted from the preprocessed PuPG signal. The features exhibiting high discriminative characteristics were selected and reduced through a proposed hybrid feature selection and reduction (HFSR) scheme. Selected features were subjected to various classification methods in a comparative fashion in which the best performance of 99.4% accuracy, 99.6% sensitivity, and 99.2% specificity was achieved through weighted k-nearest neighbor (KNN-W). The performance of the proposed EHDS was thoroughly assessed by tenfold cross-validation. The proposed EHDS achieved better detection performance in comparison to other electrocardiogram (ECG) and photoplethysmograph (PPG)-based methods.

Download Full-text

A novel ensemble modeling for intrusion detection system

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i2.pp1963-1971 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1963

Author(s):

Pullagura Indira Priyadarsini ◽

G. Anuradha

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Detection System ◽

Distance Functions ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set

Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs. And hence our system is robust and efficient.

Download Full-text

Feature Selection Algorithm for Hyperlipidemia Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.110 ◽

2014 ◽

Vol 701-702 ◽

pp. 110-113

Author(s):

Qi Rui Zhang ◽

He Xian Wang ◽

Jiang Wei Qin

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Classification Systems ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Document Frequency ◽

Selection Algorithms ◽

Term Weights

This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ2 statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averagingF1 measure is used. DF is suitable for the task of large text classification.

Download Full-text

Feature Selection and K-nearest Neighbor for Diagnosis Cow Disease

International journal of science, engineering, and information technology ◽

10.21107/ijseit.v5i02.10218 ◽

2021 ◽

Vol 5 (02) ◽

pp. 249-253

Author(s):

Yeni Kustiyahningsih

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Disease Classification ◽

Training Data ◽

Test Results ◽

K Nearest Neighbor ◽

Data Set ◽

Cattle Disease ◽

Cattle Diseases ◽

Cattle Breeders

The large number of cattle population that exists can increase the potential for developing cow disease. Lack of knowledge about various kinds of cattle diseases and their handling solutions is one of the causes of decreasing cow productivity. The aim of this research is to classify cattle disease quickly and accurately to assist cattle breeders in accelerating detection and handling of cattle disease. This study uses K-Nearest Neighbour (KNN) classification method with the F-Score feature selection. The KNN method is used for disease classification based on the distance between training data and test data, while F-Score feature selection is used to reduce the attribute dimensions in order to obtain the relevant attributes. The data set used was data on cattle disease in Madura with a total of 350 data consisting of 21 features and 7 classes. Data were broken down using K-fold Cross Validation using k = 5. Based on the test results, the best accuracy was obtained with the number of features = 18 and KNN (k = 3) which resulted in an accuracy of 94.28571, a recall of 0.942857 and a precision of 0.942857.

Download Full-text

Klasifikasi Tingkat Laju Data Covid-19 Untuk Mitigasi Penyebaran Menggunakan Metode Modified K-Nearest Neighbor (MKNN)

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021834400 ◽

2021 ◽

Vol 8 (3) ◽

pp. 595

Author(s):

Imam Cholissodin ◽

Felicia Marvela Evanita ◽

Jeffrey Junior Tedjasulaksana ◽

Kukuh Wicaksono Wahyuditomo

Keyword(s):

World Health Organization ◽

Nearest Neighbor ◽

Assessment System ◽

World Health ◽

K Nearest Neighbor ◽

Spread Rate ◽

Social Distancing ◽

The World ◽

Rate Class ◽

Health Organization

COVID-19 atau Coronavirus Disease 2019 merupakan sebuah penyakit yang disebabkan oleh virus yang dapat menular melalui saluran pernapasan pada hewan atau manusia dan menyebabkan ribuan orang meninggal hampir di seluruh dunia, sehingga dinyatakan sebagai sebuah pandemi di banyak negara, termasuk di Indonesia. Kasus COVID-19 pertama kali ditemukan di Indonesia pada tanggal 2 Maret 2020, dalam menangani pandemi COVID-19 pemerintah menerapkan social distancing dengan menjaga jarak antara satu sama lain sejauh lebih dari 1 meter dan menerapkan protokol kesehatan yang telah diatur saat melakukan aktivitas di luar rumah sesuai anjuran World Health Organization (WHO). Rendahnya kesadaran masyarakat Indonesia dalam menerapkan social distancing dan protokol kesehatan menyebabkan bertambahnya kasus positif COVID-19 di Indonesia secara signifikan sehingga banyak korban yang meninggal, oleh karena itu pada penelitian ini kami membuat sistem klasifikasi tingkat laju data COVID-19 untuk mitigasi penyebaran di seluruh provinsi di Indonesia dengan menggunakan metode Modified K-Nearest Neighbor (MKNN) dengan hasil keluaran berupa kelas laju penyebaran yaitu laju penyebaran rendah yang artinya mitigasi penybarannya tinggi, kemudian kelas laju penyebaran sedang yang artinya mitigasi penyebarannya sedang, dan laju penyebaran tinggi yang berarti mitigasi penyebaran rendah dan dijelaskan lebih lanjut pada bagian metodologi penelitian. Hasil keluaran dari sistem bertujuan untuk meningkatkan kesadaran masyarakat Indonesia dalam mencegah COVID-19 dengan melihat kelas laju penyebaran pada masing-masing provinsi di Indonesia. Alasan penggunaan metode Modified K-Nearest Neighbor pada penelitian ini adalah karena metode Modified K-Nearest Neighbor merupakan salah satu metode klasifikasi yang cukup baik, dimana pada metode ini dilakukan pemvalidasian dan pembobotan yang bobot nya ditentukan dengan menghitung fraksi dari tetangga berlabel yang sama dengan total jumlah tetangga. Parameter yang digunakan dalam proses klasifikasi adalah jumlah kasus positif, jumlah orang yang sembuh, dan jumlah orang yang meninggal akibat COVID-19. Data yang digunakan pada penelitian ini berasal dari situs resmi kementerian kesehatan republik Indonesia yang dapat diakses pada link <a href="https://infeksiemerging.kemkes.go.id/">https://infeksiemerging.kemkes.go.id/</a> dengan jumlah data latih sebanyak 374 data pada tanggal 12 Mei 2020 sampai 22 Mei 2020 dan data uji sebanyak 136 data pada tanggal 23 Mei 2020 sampai tanggal 26 Mei 2020 , hasil akurasi yang dihasilkan adalah 97,79% dengan nilai K = 3. AbstractCOVID-19 or Coronavirus 2019 is a disease caused by a virus that can be transmitted through the respiratory tract to animals or humans and causes more people to die around the world, making it a pandemic in many countries, including Indonesia. COVID-19 cases were first discovered in Indonesia on March 2, 2020. Under the COVID-19 pandemic agreement, the government imposed a social grouping with a grouping of more than 1 meter apart from one another and the transfer of related health protection when carrying out activities outside the home as directed by the World Health Organization(WHO). Considering the Indonesian people in implementing social preservation and protecting health policies increase the positive acquisition of COVID-19 in Indonesia significantly related to the number of victims who died, therefore in this study, we created a COVID-19 data level assessment system for transfer mitigation in all provinces in Indonesia by using the Modified K-Nearest Neighbor (MKNN) method with the output in the form of a spread rate class, namely a low spread rate which means that the spread mitigation is high, then the medium spread rate class which means the spread mitigation is moderate, and the spread rate is high which means low spread mitigation which is further explained in the section on the research methodology. The purpose of the system output is to increase the awareness of the Indonesian people in preventing COVID-19. The parameters used in the classification process are the number of positives, the number of people recovered, and the number of people died by COVID-19 by looking at the class distribution rate in each province in Indonesia. The reason for using the Modified K-Nearest Neighbor method in this research is because the Modified K-Nearest Neighbor method is a fairly good classification method, where this method is validated and weighted whose weight is determined by calculating the fraction of neighbors labeled the same as the total of neighbors number. The data used in this study was released from the official website of the Ministry of Health of the Republic of Indonesia which can be accessed at the link https://infection.infemerging.kemkes.go.id/ with a total of 374 training data from May 12, 2020 to May 22, 2020 and test data As many as 136 data from 23 May 2020 to 26 May 2020, the resulting accuracy was 97.79% with a K = 3.

Download Full-text

Influence of pre-processing on anomaly-based intrusion detection

Vojnotehnicki glasnik ◽

10.5937/vojtehg68-27319 ◽

2020 ◽

Vol 68 (3) ◽

pp. 598-611 ◽

Cited By ~ 1

Author(s):

Danijela Protić

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Nearest Neighbor ◽

Computer Network ◽

Reference Model ◽

Detection System ◽

False Negative ◽

K Nearest Neighbor ◽

Dataset Size ◽

Binary Classifiers

Introduction/purpose: The anomaly-based intrusion detection system detects intrusions based on a reference model which identifies the normal behavior of a computer network and flags an anomaly. Machine-learning models classify intrusions or misuse as either normal or anomaly. In complex computer networks, the number of training records is large, which makes the evaluation of the classifiers computationally expensive. Methods: A feature selection algorithm that reduces the dataset size is presented in this paper. Results: The experiments are conducted on the Kyoto 2006+ dataset and four classifier models: feedforward neural network, k-nearest neighbor, weighted k-nearest neighbor, and medium decision tree. The results show high accuracy of the models, as well as low false positive and false negative rates. Conclusion: The three-step pre-processing algorithm for feature selection and instance normalization resulted in improving performances of four binary classifiers and in decreasing processing time.

Download Full-text

Anomaly-Based Intrusion Detection: Feature Selection and Normalization Influence to the Machine Learning Models Accuracy

European Journal of Engineering and Formal Sciences ◽

10.26417/ejef.v2i3.p101-106 ◽

2018 ◽

Vol 2 (3) ◽

pp. 101

Author(s):

Danijela Protić ◽

Miomir Stanković

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Decision Tree ◽

Network Traffic ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Reference Model ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor

Anomaly-based intrusion detection system detects intrusion to the computer network based on a reference model that has to be able to identify its normal behavior and flag what is not normal. In this process network traffic is classified into two groups by adding different labels to normal and malicious behavior. Main disadvantage of anomaly-based intrusion detection system is necessity to learn the difference between normal and not normal. Another disadvantage is the complexity of datasets which simulate realistic network traffic. Feature selection and normalization can be used to reduce data complexity and decrease processing runtime by selecting a better feature space This paper presents the results of testing the influence of feature selection and instances normalization to the classification performances of k-nearest neighbor, weighted k-nearest neighbor, support vector machines and decision tree models on 10 days records of the Kyoto 2006+ dataset. The data was pre-processed to remove all categorical features from the dataset. The resulting subset contained 17 features. Features containing instances which could not be normalized into the range [-1, 1] have also been removed. The resulting subset consisted of nine features. The feature ‘Label’ categorized network traffic to two classes: normal (1) and malicious (0). The performance metric to evaluate models was accuracy. Proposed method resulted in very high accuracy values with Decision Tree giving highest values for not-normalized and with k-nearest neighbor giving highest values for normalized data.Keywords: feature selection, normalization, k-NN, weighted k-NN, SVM, decision tree, Kyoto 2006+

Download Full-text

Two-stage feature selection for bearing fault diagnosis based on dual-tree complex wavelet transform and empirical mode decomposition

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406215573976 ◽

2015 ◽

Vol 230 (2) ◽

pp. 291-302 ◽

Cited By ~ 19

Author(s):

Mien Van ◽

Hee-Jun Kang

Keyword(s):

Feature Selection ◽

Wavelet Transform ◽

Empirical Mode Decomposition ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Complex Wavelet Transform ◽

Two Stage ◽

Mode Decomposition ◽

Complex Wavelet ◽

Neighbor Classifier

This paper presents an automatic fault diagnosis of different rolling element bearing faults using a dual-tree complex wavelet transform, empirical mode decomposition, and a novel two-stage feature selection technique. In this method, dual-tree complex wavelet transform and empirical mode decomposition were used to preprocess the original vibration signal to obtain more accurate fault characteristic information. Then, features in the time domain were extracted from each of the original signals, the coefficients of the dual-tree complex wavelet transform, and some useful intrinsic mode functions to generate a rich combined feature set. Next, a two-stage feature selection algorithm was proposed to generate the smallest set of features that leads to the superior classification accuracy. In the first stage of the two-stage feature selection, we found the candidate feature set using the distance evaluation technique and a k-nearest neighbor classifier. In the second stage, a genetic algorithm-based k-nearest neighbor classifier was designed to obtain the superior combination of features from the candidate feature set with respect to the classification accuracy and number of feature inputs. Finally, the selected features were used as the input to a k-nearest neighbor classifier to evaluate the system diagnosis performance. The experimental results obtained from real bearing vibration signals demonstrated that the method combining dual-tree complex wavelet transform, empirical mode decomposition, and the two-stage feature selection technique is effective in both feature extraction and feature selection, which also increase classification accuracy.

Download Full-text

ANALISA 4 ALGORITMA DALAM KLASIFIKASI LIVER MENGGUNAKAN RAPIDMINER

Jurnal Informatika Polinema ◽

10.33795/jip.v6i2.274 ◽

2020 ◽

Vol 6 (2) ◽

pp. 1-9

Author(s):

Annisa Putri Ayudhitama ◽

Utomo Pujianto

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

World Health ◽

K Nearest Neighbor ◽

Health Organization

Hati merupakan salah satu organ penting dalam tubuh manusia yang berfungsi untuk detoksifikasi racun atau penetral racun dari segala sesuatu yang masuk ke dalam tubuh kita, sehingga tubuh menjadi lebih sehat. Hati dapat terserang suatu penyakit yang mampu mengganggu tugasnya, apabila penyakit hati sudah menyerang maka racun akan tersebar ke seluruh tubuh dan membuat tubuh menjadi tidak sehat. Penyakit liver merupakan penyakit hati yang disebabkan oleh virus, alkohol, pola hidup dan lainnya. Menurut data WHO (World Health Organization) menunjukkan hampir 1,2 juta orang per tahun khususnya di Asia Tenggara dan Afrika mengalami kematian akibat terserang penyakit liver. Seseorang sering tidak menyadari atau terlambat mengetahui penyakit liver sehingga ketika diperiksa penyakit liver sudah parah, akan lebih baik apabila dilakukan penanganan lebih awal dengan mengetahui gejala-gejala yang diderita. Data mining mampu membantu diagnosa penyakit liver dengan lebih mudah terutama untuk membantu para dokter dalam menentukan apakah pasien menderita penyakit liver atau tidak, dengan gejala hampir mendekati penyakit liver. Proses diagnosa penyakit liver dilakukan dengan proses klasifikasi dan hasilnya berupa pasien tersebut menderita liver atau tidak. Penelitian ini menggunakan 4 algoritma data mining yaitu Naïve Bayes, K-Nearest Neighbor (KNN), Decision Tree dan Neural Network. Dataset yang digunakan yaitu Indian Liver Patient Dataset (ILPD) dari website UCI Machine Learning Repository. Keempat algoritma tersebut dibandingkan manakah yang lebih baik akurasinya untuk kasus diagnosa penyakit liver. Hasilnya menunjukkan bahwa algoritma Naïve Bayes memiliki akurasi 55,75%, algoritma K-Nearest Neigbor memiliki akurasi 66,36%, algoritma Decision Tree memiliki akurasi 67,04%, dan algoritma Neural Network memiliki akurasi 70,50%. Akurasi tersebut tergolong rendah karena kelas atau label antara pasien penyakit liver dan pasien tidak memiliki liver tidaklah seimbang, kelas pasien penyakit liver lebih banyak dibandingkan pasien tidak memiliki liver, sehingga banyak data yang diklasifikasikan sebagai pasien penyakit liver. Keywords— Data Mining, Decision Tree, Klasifikasi, KNN, Liver, Naïve Bayes, Neural Network

Download Full-text

Intrusion Detection System Berbasis Seleksi Fitur Dengan Kombinasi Filter Information Gain Ratio Dan Correlation

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.0813154 ◽

2021 ◽

Vol 8 (3) ◽

pp. 457

Author(s):

Nitami Lestari Putri ◽

Radityo Adi Nugroho ◽

Rudy Herteno

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Information Gain ◽

Detection System ◽

Feature Ranking ◽

K Nearest Neighbor ◽

Gain Ratio ◽

Information Gain Ratio

Intrusion Detection System merupakan suatu sistem yang dikembangkan untuk memantau dan memfilter aktivitas jaringan dengan mengidentifikasi serangan. Karena jumlah data yang perlu diperiksa oleh IDS sangat besar dan banyaknya fitur-fitur asing yang dapat membuat proses analisis menjadi sulit untuk mendeteksi pola perilaku yang mencurigakan, maka IDS perlu mengurangi jumlah data yang akan diproses dengan cara mengurangi fitur yang dapat dilakukan dengan seleksi fitur. Pada penelitian ini mengkombinasikan dua metode perangkingan fitur yaitu Information Gain Ratio dan Correlation dan mengklasifikasikannya menggunakan algoritma K-Nearest Neighbor. Hasil perankingan dari kedua metode dibagi menjadi dua kelompok. Pada kelompok pertama dicari nilai mediannya dan untuk kelompok kedua dihapus. Lalu dilakukan klasifikasi K-Nearest Neighbor dengan menggunakan 10 kali validasi silang dan dilakukan pengujian dengan nilai k=5. Penerapan pemodelan yang diusulkan menghasilkan akurasi tertinggi sebesar 99.61%. Sedangkan untuk akurasi tanpa seleksi fitur menghasilkan akurasi tertinggi sebesar 99.59%. AbstractIntrusion Detection System is a system that was developed for monitoring and filtering activity in network with identified of attack. Because of the amount of the data that need to be checked by IDS is very large and many foreign feature that can make the analysis process difficult for detection suspicious pattern of behavior, so that IDS need for reduce amount of the data to be processed by reducing features that can be done by feature selection. In this study, combines two methods of feature ranking is Information Gain Ratio and Correlation and classify it using K-Nearest Neighbor algorithm. The result of feature ranking from the both methods divided into two groups. in the first group searched for the median value and in the second group is removed. Then do the classification of K-Nearest Neighbor using 10 fold cross validation and do the tests with values k=5. The result of the proposed modelling produce the highest accuracy of 99.61%. While the highest accuracy value of the not using the feature selection is 99.59%.

Download Full-text

Tuberculosis detection in chest X-ray using Mayfly-algorithm optimized dual-deep-learning features

Journal of X-Ray Science and Technology ◽

10.3233/xst-210976 ◽

2021 ◽

pp. 1-14

Author(s):

M.P. Rajakumar ◽

R. Sonia ◽

B. Uma Maheswari ◽

S.P. Karuppiah

Keyword(s):

Deep Learning ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Binary Classification ◽

Primary Objective ◽

World Health ◽

Experimental Investigations ◽

K Nearest Neighbor ◽

X Ray ◽

Health Organization

World-Health-Organization (WHO) has listed Tuberculosis (TB) as one among the top 10 reasons for death and an early diagnosis will help to cure the patient by giving suitable treatment. TB usually affects the lungs and an accurate bio-imaging scheme will be apt to diagnose the infection. This research aims to implement an automated scheme to detect TB infection in chest radiographs (X-ray) using a chosen Deep-Learning (DL) approach. The primary objective of the proposed scheme is to attain better classification accuracy while detecting TB in X-ray images. The proposed scheme consists of the following phases namely, (1) image collection and pre-processing, (2) feature extraction with pre-trained VGG16 and VGG19, (3) Mayfly-algorithm (MA) based optimal feature selection, (4) serial feature concatenation and (5) binary classification with a 5-fold cross validation. In this work, the performance of the proposed DL scheme is separately validated for (1) VGG16 with conventional features, (2) VGG19 with conventional features, (3) VGG16 with optimal features, (4) VGG19 with optimal features and (5) concatenated dual-deep-features (DDF). All experimental investigations are conducted and achieved using MATLAB® program. Experimental outcome confirms that the proposed system with DDF yields a classification accuracy of 97.8%using a K Nearest-Neighbor (KNN) classifier.

Download Full-text