scholarly journals Hierarchy-Based File Fragment Classification

2020 ◽  
Vol 2 (3) ◽  
pp. 216-232
Author(s):  
Manish Bhatt ◽  
Avdesh Mishra ◽  
Md Wasi Ul Kabir ◽  
S. E. Blake-Gatto ◽  
Rishav Rajendra ◽  
...  

File fragment classification is an essential problem in digital forensics. Although several attempts had been made to solve this challenging problem, a general solution has not been found. In this work, we propose a hierarchical machine-learning-based approach with optimized support vector machines (SVM) as the base classifiers for file fragment classification. This approach consists of more general classifiers at the top level and more specialized fine-grain classifiers at the lower levels of the hierarchy. We also propose a primitive taxonomy for file types that can be used to perform hierarchical classification. We evaluate our model with a dataset of 14 file types, with 1000 fragments measuring 512 bytes from each file type derived from a subset of the publicly available Digital Corpora, the govdocs1 corpus. Our experiment shows comparable results to the present literature, with an average accuracy of 67.78% and an F1-measure of 65% using 10-fold cross-validation. We then improve on the hierarchy and find better results, with an increase in the F1-measure of 1%. Finally, we make our assessment and observations, then conclude the paper by discussing the scope of future research.

2012 ◽  
Vol 542-543 ◽  
pp. 1438-1442
Author(s):  
Ting Hua Wang ◽  
Cai Yun Cai ◽  
Yan Liao

Kernel is a key component of the support vector machines (SVMs) and other kernel methods. Based on the data distributions of classes in the feature space, this paper proposed a model selection criterion to evaluate the goodness of a kernel in multiclass classification scenario. This criterion is computationally efficient and is differentiable with respect to the kernel parameters. Compared with the k-fold cross validation technique which is often regarded as a benchmark, this criterion is found to yield about the same performance with much less computational overhead.


2019 ◽  
Vol 8 (4) ◽  
pp. 2588-2593

In the domain of Soft Computing, Support Vector Machines (SVMs) have acquired considerable significance. These are widely used in making predictions, owing to their ability of generalization. This paper is about the development of SVM based classification models for the prediction of rice yield in India. Experiments have been conducted involving oneagainst-one multi classification method, k-fold cross validation and polynomial kernel function for SVM training. Rice production data of India has been sourced from Directorate of Economics and Statistics, Ministry of Agriculture, Government of India, for this work. The best prediction accuracy for the 4- year relative average increase has been achieved as 75.06% using 4-fold cross validation method. MATLAB software has been used for experimentation in this work.


2017 ◽  
Vol 42 (2) ◽  
pp. 223-233 ◽  
Author(s):  
Natasa Reljin ◽  
David Pokrajac

Abstract In this paper, we investigated the possibility to classify different performers playing the same melodies at the same manner being subjectively quite similar and very difficult to distinguish even for musically skilled persons. For resolving this problem we propose the use of multifractal (MF) analysis, which is proven as an efficient method for describing and quantifying complex natural structures, phenomena or signals. We found experimentally that parameters associated to some characteristic points within the MF spectrum can be used as music descriptors, thus permitting accurate discrimination of music performers. Our approach is tested on the dataset containing the same songs performed by music group ABBA and by actors in the movie Mamma Mia. As a classifier we used the support vector machines and the classification performance was evaluated by using the four-fold cross-validation. The results of proposed method were compared with those obtained using mel-frequency cepstral coefficients (MFCCs) as descriptors. For the considered two-class problem, the overall accuracy and F-measure higher than 98% are obtained with the MF descriptors, which was considerably better than by using the MFCC descriptors when the best results were less than 77%.


Author(s):  
Jian-Peng Zhou ◽  
Lei Chen ◽  
Zi-Han Guo

Abstract Motivation The anatomical therapeutic chemical (ATC) classification system plays an increasingly important role in drug repositioning and discovery. The correct identification of classes in each level of such system that a given drug may belong to is an essential problem. Several multi-label classifiers have been proposed in this regard. Although they provided satisfactory performance, the feature extraction procedures were still rough. More refined features may further improve the predicted quality. Results In this article, we provide a novel multi-label classifier, called iATC-NRAKEL, to predict drug ATC classes in the first level. To obtain more informative drug features, we employed the drug association information in STITCH and KEGG, which was organized by seven drug networks. The powerful network embedding algorithm, Mashup, was adopted to extract informative drug features. The obtained features were fed into the RAndom k-labELsets (RAKEL) algorithm with support vector machine as the basic classification algorithm to construct the classifier. The 10-fold cross-validation of the benchmark dataset with 3883 drugs showed that the accuracy and absolute true were 76.56 and 74.51%, respectively. The comparison results indicated that iATC-NRAKEL was much superior to all previous reported classifiers. Finally, the contribution of each network was analyzed. Availability and implementation The codes of iATC-NRAKEL are available at https://github.com/zhou256/iATC-NRAKEL.


2016 ◽  
Vol 7 (2) ◽  
pp. 39-47
Author(s):  
Mutmainnah Muchtar ◽  
Laili Cahyani

Plant takes a crucial part in mankind existences. The development of digital image processing technique made the plant classification task become a lot of easier. Leaf is a part of plant that can be used for plant classification where texture of the leaf is a common feature that been used for classification process. Texture offers a unique feature and able to work even when the leaf is damaged or overly big in size which sometimes made the acquisition process become more difficult. This study offers a combination of Gabor filter methods and co-occurrence matrices to produce the most representative features for leaf classification. Classification using SVM with 5-fold cross validation system shows that the proposed Gabor Co-Occurence methods was able to reach average accuracy up to 89.83%. Terms: Leaf, Gabor Co-occurence, Support Vector Machine, Texture


2019 ◽  
Vol 9 (16) ◽  
pp. 3322 ◽  
Author(s):  
Stephen Dankwa ◽  
Wenfeng Zheng

Machine learning (ML) is the technology that allows a computer system to learn from the environment, through re-iterative processes, and improve itself from experience. Recently, machine learning has gained massive attention across numerous fields, and is making it easy to model data extremely well, without the importance of using strong assumptions about the modeled system. The rise of machine learning has proven to better describe data as a result of providing both engineering solutions and an important benchmark. Therefore, in this current research work, we applied three different machine learning algorithms, which were, the Random Forest (RF), Support Vector Machines (SVM), and Artificial Neural Network (ANN) to predict kyphosis disease based on a biomedical data. At the initial stage of the experiments, we performed 5- and 10-Fold Cross-Validation using Logistic Regression as a baseline model to compare with our ML models without performing grid search. We then evaluated the models and compared their performances based on 5- and 10-Fold Cross-Validation after running grid search algorithms on the ML models. Among the Support Vector Machines, we experimented with the three kernels (Linear, Radial Basis Function (RBF), Polynomial). We observed overall accuracies of the models between 79%–85%, and 77%–86% based on the 5- and 10-Fold Cross-Validation, after running grid search respectively. Based on the 5- and 10-Fold Cross-Validation as evaluation metrics, the RF, SVM-RBF, and ANN models achieved accuracies more than 80%. The RF, SVM-RBF and ANN models outperformed the baseline model based on the 10-Fold Cross-Validation with grid search. Overall, in terms of accuracies, the ANN model outperformed all the other ML models, achieving 85.19% and 86.42% based on the 5- and 10-Fold Cross-Validation. We proposed that RF, SVM-RBF and ANN models should be used to detect and predict kyphosis disease after a patient had undergone surgery or operation. We suggest that machine learning should be adopted and used as an essential and critical tool across the maximum spectrum of answering biomedical questions.


2018 ◽  
Vol 14 (2) ◽  
pp. 175
Author(s):  
Elly Indrayuni

Film merupakan subjek yang diminati oleh sejumlah besar orang diantara komunitas jaringan sosial yang memiliki perbedaan signifikan dalam pendapat atau sentimen mereka. Analisa sentimen atau opinion mining merupakan salah satu solusi mengatasi masalah untuk mengelompokan opini atau review menjadi opini positif atau negatif secara otomatis. Teknik yang digunakan dalam penelitian ini adalah Naive Bayes dan Support Vector Machines (SVM). Naive Bayes memiliki kelebihan yaitu sederhana, cepat dan memiliki akurasi yang tinggi. Sedangkan SVM  mampu mengidentifikasi hyperplane terpisah yang memaksimalkan margin antara dua kelas yang berbeda. Hasil klasifikasi sentimen pada penelitian ini terdiri dari dua label class, yaitu positif dan negatif. Nilai akurasi yang dihasilkan akan menjadi tolak  ukur untuk mencari model pengujian terbaik untuk kasus klasifikasi sentimen. Evaluasi dilakukan menggunakan 10 fold cross validation. Pengukuran akurasi diukur dengan confusion matrix dan kurva ROC. Hasil penelitian menunjukkan nilai akurasi untuk algoritma Naive Bayes sebesar 84.50%. Sedangkan nilai akurasi algoritma Support Vector Machine (SVM) lebih besar dari Naive Bayes yaitu sebesar 90.00%.


2019 ◽  
Vol 31 (01) ◽  
pp. 1950003
Author(s):  
Seyyed Abed Hosseini

This paper proposes a hybrid approach for inferring the target of visual covert selective spatial attention (VCSSA) from magnetoencephalography (MEG) signals. The MEG signal offers a higher spatial resolution and a lower distortion as compared with their competing brain signaling techniques, such as the electroencephalography signal. The proposed approach consists of removing global redundant patterns of MEG channels by surface Laplacian, feature extraction by Hurst exponent (H), 6th order Morlet coefficients (MCs), and Petrosian fractal dimension (PFD), standardization, feature ranking by statistical analysis, and classification by support vector machines (SVM). The results indicate that the combined use of the above elements can effectively decipher the cognitive process of VCSSA. In particular, using four-fold cross-validation, the proposed approach robustly predicts the location of the attended stimulus with an accuracy of up to 92.41% for distinguishing left from right. The results show that the fusion among wavelet coefficients and non-linear features is more robust than in other previous studies. The results also indicate that the VCSSA involves widespread functional brain activities, affecting more regions than temporal and parietal circuits. Finally, the comparison of the results with six other competing strategies indicates that a slightly higher average accuracy is obtained by the proposed approach on the same data.


2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


Author(s):  
Haitham Issa ◽  
Sali Issa ◽  
Wahab Shah

This paper presents a new gender and age classification system based on Electroencephalography (EEG) brain signals. First, Continuous Wavelet Transform (CWT) technique is used to get the time-frequency information of only one EEG electrode for eight distinct emotional states instead of the ordinary neutral or relax states. Then, sequential steps are implemented to extract the improved grayscale image feature. For system evaluation, a three-fold-cross validation strategy is applied to construct four different classifiers. The experimental test shows that the proposed extracted feature with Convolutional Neural Network (CNN) classifier improves the performance of both gender and age classification, and achieves an average accuracy of 96.3% and 89% for gender and age classification, respectively. Moreover, the ability to predict human gender and age during the mood of different emotional states is practically approved.


Sign in / Sign up

Export Citation Format

Share Document