Commercial Video Evaluation via Low-Level Feature Extraction and Selection

To discover the influence of the commercial videos’ low-level features on the popularity of the videos, the feature selection method should be used to get the video features influencing the videos’ evaluation mostly after analyzing the source data and the audiences’ evaluations of the videos. After extracting the low-level features of the videos, this paper improved the Correlation-Based Feature Selection (CFS) method which is widely used and proposed an algorithm named CFS-Spearmen which combined the Spearmen correlation coefficient and the classical CFS to select features. The 4 datasets in UCI machine learning database were employed as the experiment data. The experiment results were compared with the results using traditional CFS, Minimum Redundancy and Maximum Relevance (mRMR). The SVM was used to test the method in this paper. Finally, the proposed method was used in commercial videos’ feature selection and the most influential feature set was obtained.

Download Full-text

Implementasi teknik seleksi fitur pada klasifikasi malware Android menggunakan support vector machine (SVM)

Repositor ◽

10.22219/repositor.v1i1.1 ◽

2019 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Hendra Saputra ◽

Setio Basuki ◽

Mahar Faiqurahman

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Chi Square ◽

Android Malware ◽

Correlation Based Feature Selection ◽

Selection Of

AbstrakPertumbuhan Malware Android telah meningkat secara signifikan seiring dengan majunya jaman dan meninggkatnya keragaman teknik dalam pengembangan Android. Teknik Machine Learning adalah metode yang saat ini bisa kita gunakan dalam memodelkan pola fitur statis dan dinamis dari Malware Android. Dalam tingkat keakurasian dari klasifikasi jenis Malware peneliti menghubungkan antara fitur aplikasi dengan fitur yang dibutuhkan dari setiap jenis kategori Malware. Kategori jenis Malware yang digunakan merupakan jenis Malware yang banyak beredar saat ini. Untuk mengklasifikasi jenis Malware pada penelitian ini digunakan Support Vector Machine (SVM). Jenis SVM yang akan digunakan adalah class SVM one against one menggunakan Kernel RBF. Fitur yang akan dipakai dalam klasifikasi ini adalah Permission dan Broadcast Receiver. Untuk meningkatkan akurasi dari hasil klasifikasi pada penelitian ini digunakan metode Seleksi Fitur. Seleksi Fitur yang digunakan ialah Correlation-based Feature Selection (CSF), Gain Ratio (GR) dan Chi-Square (CHI). Hasil dari Seleksi Fitur akan di evaluasi bersama dengan hasil yang tidak menggunakan Seleksi Fitur. Akurasi klasifikasi Seleksi Fitur CFS menghasilkan akurasi sebesar 90.83% , GR dan CHI sebesar 91.25% dan data yang tidak menggunakan Seleksi Fitur sebesar 91.67%. Hasil dari pengujian menunjukan bahwa Permission dan Broadcast Receiver bisa digunakan dalam mengklasifikasi jenis Malware, akan tetapi metode Seleksi Fitur yang digunakan mempunyai akurasi yang berada sedikit dibawah data yang tidak menggunakan Seleksi Fitur. Kata kunci: klasifikasi malware android, seleksi fitur, SVM dan multi class SVM one agains one Abstract Android Malware has growth significantly along with the advance of the times and the increasing variety of technique in the development of Android. Machine Learning technique is a method that now we can use in the modeling the pattern of a static and dynamic feature of Android Malware. In the level of accuracy of the Malware type classification, the researcher connect between the application feature with the feature required by each types of Malware category. The category of malware used is a type of Malware that many circulating today, to classify the type of Malware in this study used Support Vector Machine (SVM). The SVM type wiil be used is class SVM one against one using the RBF Kernel. The feature will be used in this classification are the Permission and Broadcast Receiver. To improve the accuracy of the classification result in this study used Feature Selection method. Selection of feature used are Correlation-based Feature Selection (CFS), Gain Ratio (GR) and Chi-Square (CHI). Result from Feature Selection will be evaluated together with result that not use Feature Selection. Accuracy Classification Feature Selection CFS result accuracy of 90.83%, GR and CHI of 91.25% and data that not use Feature Selection of 91.67%. The result of testing indicate that permission and broadcast receiver can be used in classyfing type of Malware, but the Feature Selection method that used have accuracy is a little below the data that are not using Feature Selection. Keywords: Classification Android Malware, Feature Selection, SVM and Multi Class SVM one against one

Download Full-text

Motor Imagery EEG Decoding Based on New Spatial-Frequency Feature and Hybrid Feature Selection Method

Mathematical Problems in Engineering ◽

10.1155/2022/2856818 ◽

2022 ◽

Vol 2022 ◽

pp. 1-12

Author(s):

Yuan Tang ◽

Zining Zhao ◽

Shaorong Zhang ◽

Zhi Li ◽

Yun Mo ◽

...

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Spatial Frequency ◽

Motor Imagery ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Fisher Score ◽

Feature Extraction And Selection ◽

Frequency Feature

Feature extraction and selection are important parts of motor imagery electroencephalogram (EEG) decoding and have always been the focus and difficulty of brain-computer interface (BCI) system research. In order to improve the accuracy of EEG decoding and reduce model training time, new feature extraction and selection methods are proposed in this paper. First, a new spatial-frequency feature extraction method is proposed. The original EEG signal is preprocessed, and then the common spatial pattern (CSP) is used for spatial filtering and dimensionality reduction. Finally, the filter bank method is used to decompose the spatially filtered signals into multiple frequency subbands, and the logarithmic band power feature of each frequency subband is extracted. Second, to select the subject-specific spatial-frequency features, a hybrid feature selection method based on the Fisher score and support vector machine (SVM) is proposed. The Fisher score of each feature is calculated, then a series of threshold parameters are set to generate different feature subsets, and finally, SVM and cross-validation are used to select the optimal feature subset. The effectiveness of the proposed method is validated using two sets of publicly available BCI competition data and a set of self-collected data. The total average accuracy of the three data sets achieved by the proposed method is 82.39%, which is 2.99% higher than the CSP method. The experimental results show that the proposed method has a better classification effect than the existing methods, and at the same time, feature extraction and feature selection time also have greater advantages.

Download Full-text

An Improved Machine Learning-Based Employees Attrition Prediction Framework with Emphasis on Feature Selection

Mathematics ◽

10.3390/math9111226 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1226

Author(s):

Saeed Najafi-Zangeneh ◽

Naser Shams-Gharneh ◽

Ali Arjomandi-Nezhad ◽

Sarfaraz Hashemkhani Zolfani

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Standard Deviation ◽

Analytical Formula ◽

Feature Selection Method ◽

Selection Method ◽

Performance Measure ◽

Learning Approaches ◽

Training Costs ◽

Professional Employees

Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be described by a scientific-analytical formula. Therefore, machine learning approaches are the best tools for this aim. This paper presents a three-stage (pre-processing, processing, post-processing) framework for attrition prediction. An IBM HR dataset is chosen as the case study. Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage. This method is implemented for the IBM HR dataset. The coefficient of each feature in the logistic regression model shows the importance of the feature in attrition prediction. The results show improvement in the F1-score performance measure due to the “max-out” feature selection method. Finally, the validity of parameters is checked by training the model for multiple bootstrap datasets. Then, the average and standard deviation of parameters are analyzed to check the confidence value of the model’s parameters and their stability. The small standard deviation of parameters indicates that the model is stable and is more likely to generalize well.

Download Full-text

NIMG-46. RADIOGENOMIC FEATURES PREDICT CLINICALLY RELEVANT GENOME-WIDE ALTERATION SIGNATURES IN GLIOBLASTOMA

Neuro-Oncology ◽

10.1093/neuonc/noaa215.659 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii158-ii158

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

James Fink ◽

David Haynor ◽

Eric Holland ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Selection Method ◽

Versus Group ◽

Mri Features ◽

Genome Wide ◽

Group 2 ◽

Group 1

Abstract BACKGROUND Previously, we have shown that combined whole-exome sequencing (WES) and genome-wide somatic copy number alteration (SCNA) information can separate IDH1/2-wildtype glioblastoma into two prognostic molecular subtypes (Group 1 and Group 2) and that these subtypes cannot be distinguished by epigenetic or clinical features. However, the potential for radiographic features to discriminate between these molecular subtypes has not been established. METHODS Radiogenomic features (n=35,400) were extracted from 46 multiparametric, pre-operative magnetic resonance imaging (MRI) of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive, all of whom have corresponding WES and SCNA data in The Cancer Genome Atlas. We developed a novel feature selection method that leverages the structure of extracted radiogenomic MRI features to mitigate the dimensionality challenge posed by the disparity between the number of features and patients in our cohort. Seven traditional machine learning classifiers were trained to distinguish Group 1 versus Group 2 using our feature selection method. Our feature selection was compared to lasso feature selection, recursive feature elimination, and variance thresholding. RESULTS We are able to classify Group 1 versus Group 2 glioblastomas with a cross-validated area under the curve (AUC) score of 0.82 using ridge logistic regression and our proposed feature selection method, which reduces the size of our feature set from 35,400 to 288. An interrogation of the selected features suggests that features describing contours in the T2 abnormality region on the FLAIR MRI modality may best distinguish these two groups from one another. CONCLUSIONS We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups. This algorithm may be applied to future prospective studies to assess the utility of MRI as a surrogate for costly prognostic genomic studies.

Download Full-text

Radiogenomic modeling predicts survival-associated prognostic groups in glioblastoma

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab004 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

Abdullah Feroze ◽

Eric C Holland ◽

Linda Shapiro ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Area Under The Curve ◽

Selection Method ◽

Recursive Feature Elimination ◽

Signal Abnormality ◽

Mri Features ◽

Mri Scans

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.

Download Full-text