scholarly journals New Attributes Extraction System for Arabic Autograph as Genuine and Forged through a Classification Techniques

2021 ◽  
Author(s):  
Anwar Yahya Ebrahim ◽  
Hoshang Kolivand

The authentication of writers, handwritten autograph is widely realized throughout the world, the thorough check of the autograph is important before going to the outcome about the signer. The Arabic autograph has unique characteristics; it includes lines, and overlapping. It will be more difficult to realize higher achievement accuracy. This project attention the above difficulty by achieved selected best characteristics of Arabic autograph authentication, characterized by the number of attributes representing for each autograph. Where the objective is to differentiate if an obtain autograph is genuine, or a forgery. The planned method is based on Discrete Cosine Transform (DCT) to extract feature, then Spars Principal Component Analysis (SPCA) to selection significant attributes for Arabic autograph handwritten recognition to aid the authentication step. Finally, decision tree classifier was achieved for signature authentication. The suggested method DCT with SPCA achieves good outcomes for Arabic autograph dataset when we have verified on various techniques.

2009 ◽  
Vol 147-149 ◽  
pp. 588-593 ◽  
Author(s):  
Marcin Derlatka ◽  
Jolanta Pauk

In the paper the procedure of processing biomechanical data has been proposed. It consists of selecting proper noiseless data, preprocessing data by means of model’s identification and Kernel Principal Component Analysis and next classification using decision tree. The obtained results of classification into groups (normal and two selected pathology of gait: Spina Bifida and Cerebral Palsy) were very good.


2019 ◽  
Vol 8 (5) ◽  
pp. 136
Author(s):  
John Rennie Short ◽  
Justin Vélez-Hagan ◽  
Leah Dubots

There are now a wide variety of global indicators that measure different economic, political and social attributes of countries in the world. This paper seeks to answer two questions. First, what is the degree of overlap between these different measures? Are they, in fact, measuring the same underlying dimension? To answer this question, we employ a principal component analysis (PCA) to 15 indices across 145 countries. The results demonstrate that there is one underlying dimension that combines economic development and social progress with state stability. Second, how do countries score on this dimension? The results of the PCA allow us to produce categorical divisions of the world. The threefold division identifies a world composed of what we describe and map as rich, poor and middle countries. A five-group classification provided a more nuanced categorization described as: The very rich, free and stable; affluent and free; upper middle; lower middle; poor and not free.


2021 ◽  
Vol 4 (4) ◽  
pp. 354-365
Author(s):  
Vitaliy S. Yakovyna ◽  
◽  
Ivan I. Symets

This article is focused on improving static models of software reliability based on using machine learning methods to select the software code metrics that most strongly affect its reliability. The study used a merged dataset from the PROMISE Software Engineering repository, which contained data on testing software modules of five programs and twenty-one code metrics. For the prepared sampling, the most important features that affect the quality of software code have been selected using the following methods of feature selection: Boruta, Stepwise selection, Exhaustive Feature Selection, Random Forest Importance, LightGBM Importance, Genetic Algorithms, Principal Component Analysis, Xverse python. Basing on the voting on the results of the work of the methods of feature selection, a static (deterministic) model of software reliability has been built, which establishes the relationship between the probability of a defect in the software module and the metrics of its code. It has been shown that this model includes such code metrics as branch count of a program, McCabe’s lines of code and cyclomatic complexity, Halstead’s total number of operators and operands, intelligence, volume, and effort value. A comparison of the effectiveness of different methods of feature selection has been put into practice, in particular, a study of the effect of the method of feature selection on the accuracy of classification using the following classifiers: Random Forest, Support Vector Machine, k-Nearest Neighbors, Decision Tree classifier, AdaBoost classifier, Gradient Boosting for classification. It has been shown that the use of any method of feature selection increases the accuracy of classification by at least ten percent compared to the original dataset, which confirms the importance of this procedure for predicting software defects based on metric datasets that contain a significant number of highly correlated software code metrics. It has been found that the best accuracy of the forecast for most classifiers was reached using a set of features obtained from the proposed static model of software reliability. In addition, it has been shown that it is also possible to use separate methods, such as Autoencoder, Exhaustive Feature Selection and Principal Component Analysis with an insignificant loss of classification and prediction accuracy


2020 ◽  
Vol 13 (2) ◽  
pp. 11
Author(s):  
Bekti Endar Susilowati ◽  
Pardomuan Robinson Sihombing

Principal Component Analysis (PCA) merupakan salah satu analisis multivariat yang digunakan untuk mengganti variable dengan Principal Component yang sedikit jumlahnya namun tidak terlalu banyak informasi yang hilang. Atau dengan kata lain, it used to explain the underlying variance-covariance structure of the large data set of variables through a few linear combination of these variables. PCA sangat dipengaruhi oleh kehadiran outlier karena didasarkan pada matriks kovarian yang sensitive terhadap outlier. Oleh karena itu, pada analisis ini akan digunakan PCA yang robust terhadap outlier yaitu ROBPCA atau PCA Hubert. Selanjutnya, dari Principal Component yang terbentuk digunakan sebagai input (masukan) untuk cluster analysis dengan metode Clara (Clustering Large Area). Clustering Large Area merupakan salah satu metode k-medoids yang robust terhadap outlier dan baik digunakan pada data dalam jumlah besar. Dalam studi kasus terhadap variabel penyusun indeks kebahagiaan berdasarkan The World Happiness Report 2018 dengan metode Clara yang menggunakan jarak manhattan didapatkan nilai rata-rata Overall Average Silhouette Width yang terbaik pada 5 cluster. 


Jurnal INFORM ◽  
2016 ◽  
Vol 1 (2) ◽  
Author(s):  
Slamet Kacung

Abstract - Heart attack is the deadliest disease in the world including Indonesia. According to the report the heart Foundation Indonesia showed that the death toll reached more than 27 of 100 people due to heart disease. Early detection of heart disease is very needed considering the many people who suffer from heart disease on average already advanced stage. Intelligent system of early detection of heart disease is a method to know the symptoms that need to be alerted immediately so that heart disease could be known as early as possible. The methods used in this study using Decision Tree Classifier, the datasheet used are taken from the UCI Machine Learning Repository consisting of thirteen 270 instance, attribute input and 1 target attribute.The results of this research will result in a decision tree that can help the community and or used as a reference for a doctor in diagnosing early heart disease. The second is this research can also predict a person can be diagnosed with heart disease or not by giving the input a few symptoms that are already established, the research results cannot replace an existing heart examination but at least it can help society in General nor the doctor.


2020 ◽  
Vol 27 (4) ◽  
pp. 1-16
Author(s):  
Meenal Jain ◽  
Gagandeep Kaur

Due to the launch of new applications the behavior of internet traffic is changing. Hackers are always looking for sophisticated tools to launch attacks and damage the services. Researchers have been working on intrusion detection techniques involving machine learning algorithms for supervised and unsupervised detection of these attacks. However, with newly found attacks these techniques need to be refined. Handling data with large number of attributes adds to the problem. Therefore, dimensionality based feature reduction of the data is required. In this work three reduction techniques, namely, Principal Component Analysis (PCA), Artificial Neural Network (ANN), and Nonlinear Principal Component Analysis (NLPCA) have been studied and analyzed. Secondly, performance of four classifiers, namely, Decision Tree (DT), Support Vector Machine (SVM), K Nearest Neighbor (KNN) and Naïve Bayes (NB) has been studied for the actual and reduced datasets. In addition, novel performance measurement metrics, Classification Difference Measure (CDM), Specificity Difference Measure (SPDM), Sensitivity Difference Measure (SNDM), and F1 Difference Measure (F1DM) have been defined and used to compare the outcomes on actual and reduced datasets. Comparisons have been done using new Coburg Intrusion Detection Data Set (CIDDS-2017) dataset as well widely referred NSL-KDD dataset. Successful results were achieved for Decision Tree with 99.0 percent and 99.8 percent accuracy on CIDDS and NSLKDD datasets respectively.


Author(s):  
Putri Kurnia Handayani

Data mining merupakan salah bidang ilmu yang bermanfaat untuk pengenalan pola/knowledge yang tersimpan dalam database. Klasifikasi merupakan salah satu peran dalam bidang data mining. Termasuk ke dalam supervised learning, klasifikasi digunakan untuk memprediksi objek yang belum memiliki kelas/label. Penggunaan algoritma decision tree untuk proses mining dataset bunga iris dikarenakan kemudahan dalam representasi knowledge yang dihasilkan. Selain itu, decision tree juga termasuk ke dalam eager learner sehingga akurasi dari knowledge yang dihasilkan lebih baik. Penggunaan principal component analysis (PCA) dalam optimasi algoritma decision tree, dilakukan saat preprocessing dataset. PCA berfungsi untuk mereduksi dimensi, fitur yang saling berkorelasi akan dipertahankan. Penggunaan dataset publik bunga iris diambil dari UCI Repository. Berdasarkan hasil perhitungan, akurasi algoritma decision tree setelah dilakukan optimasi dengan PCA terhadap dataset bunga iris sebesar 95.33%.


Sign in / Sign up

Export Citation Format

Share Document