Hierarchy-Based File Fragment Classification

File fragment classification is an essential problem in digital forensics. Although several attempts had been made to solve this challenging problem, a general solution has not been found. In this work, we propose a hierarchical machine-learning-based approach with optimized support vector machines (SVM) as the base classifiers for file fragment classification. This approach consists of more general classifiers at the top level and more specialized fine-grain classifiers at the lower levels of the hierarchy. We also propose a primitive taxonomy for file types that can be used to perform hierarchical classification. We evaluate our model with a dataset of 14 file types, with 1000 fragments measuring 512 bytes from each file type derived from a subset of the publicly available Digital Corpora, the govdocs1 corpus. Our experiment shows comparable results to the present literature, with an average accuracy of 67.78% and an F1-measure of 65% using 10-fold cross-validation. We then improve on the hierarchy and find better results, with an increase in the F1-measure of 1%. Finally, we make our assessment and observations, then conclude the paper by discussing the scope of future research.

Download Full-text

Multiclass Kernel Function Evaluation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.542-543.1438 ◽

2012 ◽

Vol 542-543 ◽

pp. 1438-1442

Author(s):

Ting Hua Wang ◽

Cai Yun Cai ◽

Yan Liao

Keyword(s):

Cross Validation ◽

Selection Criterion ◽

Feature Space ◽

Function Evaluation ◽

Support Vector ◽

Computationally Efficient ◽

Computational Overhead ◽

Vector Machines ◽

Validation Technique ◽

Fold Cross Validation

Kernel is a key component of the support vector machines (SVMs) and other kernel methods. Based on the data distributions of classes in the feature space, this paper proposed a model selection criterion to evaluate the goodness of a kernel in multiclass classification scenario. This criterion is computationally efficient and is differentiable with respect to the kernel parameters. Compared with the k-fold cross validation technique which is often regarded as a benchmark, this criterion is found to yield about the same performance with much less computational overhead.

Download Full-text

Rice Yield Forecasting using Support Vector Machine

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7236.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2588-2593

Keyword(s):

Cross Validation ◽

Rice Yield ◽

Polynomial Kernel ◽

Support Vector ◽

Classification Models ◽

Average Increase ◽

Vector Machines ◽

Computing Support ◽

Multi Classification ◽

Fold Cross Validation

In the domain of Soft Computing, Support Vector Machines (SVMs) have acquired considerable significance. These are widely used in making predictions, owing to their ability of generalization. This paper is about the development of SVM based classification models for the prediction of rice yield in India. Experiments have been conducted involving oneagainst-one multi classification method, k-fold cross validation and polynomial kernel function for SVM training. Rice production data of India has been sourced from Directorate of Economics and Statistics, Ministry of Agriculture, Government of India, for this work. The best prediction accuracy for the 4- year relative average increase has been achieved as 75.06% using 4-fold cross validation method. MATLAB software has been used for experimentation in this work.

Download Full-text

Music Performers Classification by Using Multifractal Features: A Case Study

Archives of Acoustics ◽

10.1515/aoa-2017-0025 ◽

2017 ◽

Vol 42 (2) ◽

pp. 223-233 ◽

Cited By ~ 1

Author(s):

Natasa Reljin ◽

David Pokrajac

Keyword(s):

Cross Validation ◽

Classification Performance ◽

Support Vector ◽

Mel Frequency Cepstral Coefficients ◽

Vector Machines ◽

Characteristic Points ◽

Fold Cross Validation ◽

F Measure ◽

Better Than

Abstract In this paper, we investigated the possibility to classify different performers playing the same melodies at the same manner being subjectively quite similar and very difficult to distinguish even for musically skilled persons. For resolving this problem we propose the use of multifractal (MF) analysis, which is proven as an efficient method for describing and quantifying complex natural structures, phenomena or signals. We found experimentally that parameters associated to some characteristic points within the MF spectrum can be used as music descriptors, thus permitting accurate discrimination of music performers. Our approach is tested on the dataset containing the same songs performed by music group ABBA and by actors in the movie Mamma Mia. As a classifier we used the support vector machines and the classification performance was evaluated by using the four-fold cross-validation. The results of proposed method were compared with those obtained using mel-frequency cepstral coefficients (MFCCs) as descriptors. For the considered two-class problem, the overall accuracy and F-measure higher than 98% are obtained with the MF descriptors, which was considerably better than by using the MFCC descriptors when the best results were less than 77%.

Download Full-text

iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs

Bioinformatics ◽

10.1093/bioinformatics/btz757 ◽

2019 ◽

Cited By ~ 15

Author(s):

Jian-Peng Zhou ◽

Lei Chen ◽

Zi-Han Guo

Keyword(s):

Cross Validation ◽

Drug Repositioning ◽

Anatomical Therapeutic Chemical ◽

Support Vector ◽

Correct Identification ◽

Network Embedding ◽

Satisfactory Performance ◽

Comparison Results ◽

Essential Problem ◽

Fold Cross Validation

Abstract Motivation The anatomical therapeutic chemical (ATC) classification system plays an increasingly important role in drug repositioning and discovery. The correct identification of classes in each level of such system that a given drug may belong to is an essential problem. Several multi-label classifiers have been proposed in this regard. Although they provided satisfactory performance, the feature extraction procedures were still rough. More refined features may further improve the predicted quality. Results In this article, we provide a novel multi-label classifier, called iATC-NRAKEL, to predict drug ATC classes in the first level. To obtain more informative drug features, we employed the drug association information in STITCH and KEGG, which was organized by seven drug networks. The powerful network embedding algorithm, Mashup, was adopted to extract informative drug features. The obtained features were fed into the RAndom k-labELsets (RAKEL) algorithm with support vector machine as the basic classification algorithm to construct the classifier. The 10-fold cross-validation of the benchmark dataset with 3883 drugs showed that the accuracy and absolute true were 76.56 and 74.51%, respectively. The comparison results indicated that iATC-NRAKEL was much superior to all previous reported classifiers. Finally, the contribution of each network was analyzed. Availability and implementation The codes of iATC-NRAKEL are available at https://github.com/zhou256/iATC-NRAKEL.

Download Full-text

Klasifikasi Citra Daun dengan Metode Gabor Co-Occurence

Jurnal ULTIMA Computing ◽

10.31937/sk.v7i2.231 ◽

2016 ◽

Vol 7 (2) ◽

pp. 39-47

Author(s):

Mutmainnah Muchtar ◽

Laili Cahyani

Keyword(s):

Cross Validation ◽

Unique Feature ◽

Gabor Filter ◽

Processing Technique ◽

Image Processing Technique ◽

Support Vector ◽

Filter Methods ◽

Average Accuracy ◽

Crucial Part ◽

Fold Cross Validation

Plant takes a crucial part in mankind existences. The development of digital image processing technique made the plant classification task become a lot of easier. Leaf is a part of plant that can be used for plant classification where texture of the leaf is a common feature that been used for classification process. Texture offers a unique feature and able to work even when the leaf is damaged or overly big in size which sometimes made the acquisition process become more difficult. This study offers a combination of Gabor filter methods and co-occurrence matrices to produce the most representative features for leaf classification. Classification using SVM with 5-fold cross validation system shows that the proposed Gabor Co-Occurence methods was able to reach average accuracy up to 89.83%. Terms: Leaf, Gabor Co-occurence, Support Vector Machine, Texture

Download Full-text

Special Issue on Using Machine Learning Algorithms in the Prediction of Kyphosis Disease: A Comparative Study

Applied Sciences ◽

10.3390/app9163322 ◽

2019 ◽

Vol 9 (16) ◽

pp. 3322 ◽

Cited By ~ 2

Author(s):

Stephen Dankwa ◽

Wenfeng Zheng

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Grid Search ◽

Baseline Model ◽

Vector Machines ◽

Ann Models ◽

Fold Cross Validation

Machine learning (ML) is the technology that allows a computer system to learn from the environment, through re-iterative processes, and improve itself from experience. Recently, machine learning has gained massive attention across numerous fields, and is making it easy to model data extremely well, without the importance of using strong assumptions about the modeled system. The rise of machine learning has proven to better describe data as a result of providing both engineering solutions and an important benchmark. Therefore, in this current research work, we applied three different machine learning algorithms, which were, the Random Forest (RF), Support Vector Machines (SVM), and Artificial Neural Network (ANN) to predict kyphosis disease based on a biomedical data. At the initial stage of the experiments, we performed 5- and 10-Fold Cross-Validation using Logistic Regression as a baseline model to compare with our ML models without performing grid search. We then evaluated the models and compared their performances based on 5- and 10-Fold Cross-Validation after running grid search algorithms on the ML models. Among the Support Vector Machines, we experimented with the three kernels (Linear, Radial Basis Function (RBF), Polynomial). We observed overall accuracies of the models between 79%–85%, and 77%–86% based on the 5- and 10-Fold Cross-Validation, after running grid search respectively. Based on the 5- and 10-Fold Cross-Validation as evaluation metrics, the RF, SVM-RBF, and ANN models achieved accuracies more than 80%. The RF, SVM-RBF and ANN models outperformed the baseline model based on the 10-Fold Cross-Validation with grid search. Overall, in terms of accuracies, the ANN model outperformed all the other ML models, achieving 85.19% and 86.42% based on the 5- and 10-Fold Cross-Validation. We proposed that RF, SVM-RBF and ANN models should be used to detect and predict kyphosis disease after a patient had undergone surgery or operation. We suggest that machine learning should be adopted and used as an essential and critical tool across the maximum spectrum of answering biomedical questions.

Download Full-text

KOMPARASI ALGORITMA NAIVE BAYES DAN SUPPORT VECTOR MACHINE UNTUK ANALISA SENTIMEN REVIEW FILM

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v14i2.918 ◽

2018 ◽

Vol 14 (2) ◽

pp. 175

Author(s):

Elly Indrayuni

Keyword(s):

Support Vector Machine ◽

Support Vector Machines ◽

Cross Validation ◽

Opinion Mining ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

Vector Machines ◽

Fold Cross Validation

Film merupakan subjek yang diminati oleh sejumlah besar orang diantara komunitas jaringan sosial yang memiliki perbedaan signifikan dalam pendapat atau sentimen mereka. Analisa sentimen atau opinion mining merupakan salah satu solusi mengatasi masalah untuk mengelompokan opini atau review menjadi opini positif atau negatif secara otomatis. Teknik yang digunakan dalam penelitian ini adalah Naive Bayes dan Support Vector Machines (SVM). Naive Bayes memiliki kelebihan yaitu sederhana, cepat dan memiliki akurasi yang tinggi. Sedangkan SVM mampu mengidentifikasi hyperplane terpisah yang memaksimalkan margin antara dua kelas yang berbeda. Hasil klasifikasi sentimen pada penelitian ini terdiri dari dua label class, yaitu positif dan negatif. Nilai akurasi yang dihasilkan akan menjadi tolak ukur untuk mencari model pengujian terbaik untuk kasus klasifikasi sentimen. Evaluasi dilakukan menggunakan 10 fold cross validation. Pengukuran akurasi diukur dengan confusion matrix dan kurva ROC. Hasil penelitian menunjukkan nilai akurasi untuk algoritma Naive Bayes sebesar 84.50%. Sedangkan nilai akurasi algoritma Support Vector Machine (SVM) lebih besar dari Naive Bayes yaitu sebesar 90.00%.

Download Full-text

DECODING VISUAL COVERT SELECTIVE SPATIAL ATTENTION BASED ON MAGNETOENCEPHALOGRAPHY SIGNALS

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237219500030 ◽

2019 ◽

Vol 31 (01) ◽

pp. 1950003

Author(s):

Seyyed Abed Hosseini

Keyword(s):

Spatial Attention ◽

Cognitive Process ◽

Hybrid Approach ◽

Support Vector ◽

Surface Laplacian ◽

Average Accuracy ◽

Combined Use ◽

Vector Machines ◽

Comparison Of The Results ◽

Fold Cross Validation

This paper proposes a hybrid approach for inferring the target of visual covert selective spatial attention (VCSSA) from magnetoencephalography (MEG) signals. The MEG signal offers a higher spatial resolution and a lower distortion as compared with their competing brain signaling techniques, such as the electroencephalography signal. The proposed approach consists of removing global redundant patterns of MEG channels by surface Laplacian, feature extraction by Hurst exponent (H), 6th order Morlet coefficients (MCs), and Petrosian fractal dimension (PFD), standardization, feature ranking by statistical analysis, and classification by support vector machines (SVM). The results indicate that the combined use of the above elements can effectively decipher the cognitive process of VCSSA. In particular, using four-fold cross-validation, the proposed approach robustly predicts the location of the attended stimulus with an accuracy of up to 92.41% for distinguishing left from right. The results show that the fusion among wavelet coefficients and non-linear features is more robust than in other previous studies. The results also indicate that the VCSSA involves widespread functional brain activities, affecting more regions than temporal and parietal circuits. Finally, the comparison of the results with six other competing strategies indicates that a slightly higher average accuracy is obtained by the proposed approach on the same data.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

A Novel Method for Gender and Age Detection Based on EEG Brain Signals

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/10 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Haitham Issa ◽

Sali Issa ◽

Wahab Shah

Keyword(s):

Cross Validation ◽

Image Feature ◽

Emotional States ◽

Time Frequency ◽

Brain Signals ◽

Average Accuracy ◽

Gender And Age ◽

Novel Method ◽

Fold Cross Validation ◽

Validation Strategy

This paper presents a new gender and age classification system based on Electroencephalography (EEG) brain signals. First, Continuous Wavelet Transform (CWT) technique is used to get the time-frequency information of only one EEG electrode for eight distinct emotional states instead of the ordinary neutral or relax states. Then, sequential steps are implemented to extract the improved grayscale image feature. For system evaluation, a three-fold-cross validation strategy is applied to construct four different classifiers. The experimental test shows that the proposed extracted feature with Convolutional Neural Network (CNN) classifier improves the performance of both gender and age classification, and achieves an average accuracy of 96.3% and 89% for gender and age classification, respectively. Moreover, the ability to predict human gender and age during the mood of different emotional states is practically approved.

Download Full-text