Molecular Diagnosis

Summary Objectives: We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in high-dimensional spaces. Methods: We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation. Results: Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation. Conclusions: Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.

Download Full-text

A Personalized Predictive Framework for Multivariate Clinical Time Series via Adaptive Model Selection

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM '17 ◽

10.1145/3132847.3132859 ◽

2017 ◽

Cited By ~ 3

Author(s):

Zitao Liu ◽

Milos Hauskrecht

Keyword(s):

Time Series ◽

Model Selection ◽

Adaptive Model ◽

Adaptive Model Selection

Download Full-text

Federated learning using sparse-adaptive model selection for embedded edge computing

IEEE Access ◽

10.1109/access.2021.3137189 ◽

2021 ◽

pp. 1-1

Author(s):

Shan Ullah ◽

Deok-Hwan Kim

Keyword(s):

Model Selection ◽

Edge Computing ◽

Adaptive Model ◽

Adaptive Model Selection ◽

Selection For

Download Full-text

Scrutinizing Attacks and Evaluating Performance Appraisal Parameters via Feature Selection in Intrusion Detection System

10.21203/rs.3.rs-748765/v1 ◽

2021 ◽

Author(s):

Navroop Kaur ◽

Meenakshi Bansal ◽

Sukhwinder Singh S

Keyword(s):

Feature Selection ◽

Performance Evaluation ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Denial Of Service ◽

Cyber Attacks ◽

Support Vector ◽

K Nearest Neighbor ◽

Evaluation Parameters

Abstract In modern times the firewall and antivirus packages are not good enough to protect the organization from numerous cyber attacks. Computer IDS (Intrusion Detection System) is a crucial aspect that contributes to the success of an organization. IDS is a software application responsible for scanning organization networks for suspicious activities and policy rupturing. IDS ensures the secure and reliable functioning of the network within an organization. IDS underwent huge transformations since its origin to cope up with the advancing computer crimes. The primary motive of IDS has been to augment the competence of detecting the attacks without endangering the performance of the network. The research paper elaborates on different types and different functions performed by the IDS. The NSL KDD dataset has been considered for training and testing. The seven prominent classifiers LR (Logistic Regression), NB (Naïve Bayes), DT (Decision Tree), AB (AdaBoost), RF (Random Forest), kNN (k Nearest Neighbor), and SVM (Support Vector Machine) have been studied along with their pros and cons and the feature selection have been imposed to enhance the reading of performance evaluation parameters (Accuracy, Precision, Recall, and F1Score). The paper elaborates a detailed flowchart and algorithm depicting the procedure to perform feature selection using XGB (Extreme Gradient Booster) for four categories of attacks: DoS (Denial of Service), Probe, R2L (Remote to Local Attack), and U2R (User to Root Attack). The selected features have been ranked as per their occurrence. The implementation have been conducted at five different ratios of 60-40%, 70-30%, 90-10%, 50-50%, and 80-20%. Different classifiers scored best for different performance evaluation parameters at different ratios. NB scored with the best Accuracy and Recall values. DT and RF consistently performed with high accuracy. NB, SVM, and kNN achieved good F1Score.

Download Full-text

A new hybrid approach for feature selection and support vector machine model selection based on self-adaptive cohort intelligence

Expert Systems with Applications ◽

10.1016/j.eswa.2017.06.030 ◽

2017 ◽

Vol 88 ◽

pp. 118-131 ◽

Cited By ~ 23

Author(s):

Mohammed Aladeemy ◽

Salih Tutun ◽

Mohammad T. Khasawneh

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Model Selection ◽

Support Vector Machine Model ◽

Hybrid Approach ◽

Support Vector ◽

Machine Model ◽

Self Adaptive

Download Full-text

Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms

Applied Sciences ◽

10.3390/app10093291 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3291

Author(s):

Jesús F. Pérez-Gómez ◽

Juana Canul-Reich ◽

José Hernández-Torruco ◽

Betania Hernández-Ocaña

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Bacterial Vaginosis ◽

Cross Validation ◽

Performance Comparison ◽

Support Vector ◽

Ongoing Research ◽

Selection For ◽

Comparison Of The Results ◽

Fold Cross Validation

Requiring only a few relevant characteristics from patients when diagnosing bacterial vaginosis is highly useful for physicians as it makes it less time consuming to collect these data. This would result in having a dataset of patients that can be more accurately diagnosed using only a subset of informative or relevant features in contrast to using the entire set of features. As such, this is a feature selection (FS) problem. In this work, decision tree and Relief algorithms were used as feature selectors. Experiments were conducted on a real dataset for bacterial vaginosis with 396 instances and 252 features/attributes. The dataset was obtained from universities located in Baltimore and Atlanta. The FS algorithms utilized feature rankings, from which the top fifteen features formed a new dataset that was used as input for both support vector machine (SVM) and logistic regression (LR) algorithms for classification. For performance evaluation, averages of 30 runs of 10-fold cross-validation were reported, along with balanced accuracy, sensitivity, and specificity as performance measures. A performance comparison of the results was made between using the total number of features against using the top fifteen. These results found similar attributes from our rankings compared to those reported in the literature. This study is part of ongoing research that is investigating a range of feature selection and classification methods.

Download Full-text

FEATURE SELECTION FOR SUPPORT VECTOR MACHINES USING GENETIC ALGORITHMS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213004001818 ◽

2004 ◽

Vol 13 (04) ◽

pp. 791-800 ◽

Cited By ~ 26

Author(s):

HOLGER FRÖHLICH ◽

OLIVIER CHAPELLE ◽

BERNHARD SCHÖLKOPF

Keyword(s):

Genetic Algorithms ◽

Feature Selection ◽

Support Vector Machines ◽

Cross Validation ◽

Support Vector ◽

Generalization Error ◽

New Approach ◽

Vector Machines ◽

Selection For ◽

Natural Way

The problem of feature selection is a difficult combinatorial task in Machine Learning and of high practical relevance, e.g. in bioinformatics. Genetic Algorithms (GAs) offer a natural way to solve this problem. In this paper we present a special Genetic Algorithm, which especially takes into account the existing bounds on the generalization error for Support Vector Machines (SVMs). This new approach is compared to the traditional method of performing cross-validation and to other existing algorithms for feature selection.

Download Full-text

AN EFFICIENT FEATURE SELECTION ALGORITHM FOR COMPUTER-AIDED POLYP DETECTION

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821300600303x ◽

2006 ◽

Vol 15 (06) ◽

pp. 893-915 ◽

Cited By ~ 9

Author(s):

JIANG LI ◽

JIANHUA YAO ◽

RONALD M. SUMMERS ◽

NICHOLAS PETRICK ◽

MICHAEL T. MANRY ◽

...

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Search Algorithm ◽

Piecewise Linear ◽

Least Square ◽

Support Vector ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Polyp Detection ◽

Computer Aided

We present an efficient feature selection algorithm for computer aided detection (CAD) computed tomographic (CT) colonography. The algorithm (1) determines an appropriate piecewise linear network (PLN) model by cross validation, (2) applies the orthonormal least square (OLS) procedure to the PLN model utilizing a Modified Schmidt procedure, and (3) uses a floating search algorithm to select features that minimize the output variance. The undesirable "nesting effect" is prevented by the floating search approach, and the piecewise linear OLS procedure makes this algorithm very computationally efficient because the Modified Schmidt procedure only requires one data pass during the whole searching process. The selected features are compared to those obtained by other methods, through cross validation with support vector machines (SVMs).

Download Full-text

Adaptive Model Selection and Assessment for Exponential Family Distributions

Technometrics ◽

10.1198/004017004000000338 ◽

2004 ◽

Vol 46 (3) ◽

pp. 306-317 ◽

Cited By ~ 18

Author(s):

Xiaotong Shen ◽

Hsin-Cheng Huang ◽

Jimmy Ye

Keyword(s):

Model Selection ◽

Exponential Family ◽

Adaptive Model ◽

Adaptive Model Selection ◽

Selection And Assessment

Download Full-text

Marketplace Sentiment Analysis Using Naive Bayes And Support Vector Machine

PIKSEL : Penelitian Ilmu Komputer Sistem Embedded and Logic ◽

10.33558/piksel.v8i2.2272 ◽

2020 ◽

Vol 8 (2) ◽

pp. 91-100

Author(s):

Muhamad Azhar ◽

Noor Hafidz ◽

Biktra Rudianto ◽

Windu Gata

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Cross Validation ◽

Naive Bayes ◽

Particle Swarm ◽

Naïve Bayes ◽

Support Vector ◽

Swarm Optimization

Abstract Technology implementation in the marketplace world has attracted the attention of researchers to analyze the reviews from customers. The Klik Indomaret application page on GooglePlay is one application that can be used to get information on review data collection. However, getting information on consumer’s opinion or review is not an easy task and need a specific method in categorizing or grouping these reviews into certain groups, i.e. positive or negative reviews. The sentiment analysis study of a review application in GooglePlay is still rare. Therefore, this paper analysis the customer’s sentiment from klikindomaret app using Naive Bayes Classifier (NB) algorithm that is compared to Support Vector Machine (SVM) as well as optimizing the Feature Selection (FS) using the Particle Swarm Optimization method. The results for NB without using FS optimization were 69.74% for accuracy and 0.518 for Area Under Curve (AUC) and for SVM without using FS optimization were 81.21% for accuracy and 0.896 for AUC. While the results of cross-validation NB with FS are 75.21% for accuracy and 0.598 for AUC and cross-validation of SVM with FS is 81.84% for accuracy and 0.898 for AUC, while there is an increase when using the Feature Selection (FS) Particle Swarm Optimization and also the modeling algorithm SVM has a higher value compared to NB for the dataset used in this study. Keywords: Naive Bayes, Particle Swarm Optimization, Support Vector Machine, Feature Selection, Consumer Review.

Download Full-text

An Efficient Greedy EM Algorithm for Gaussian Mixture for Adaptive Model Selection Using the Kurtosis and Skewness Criterion

Advanced Materials Research ◽

10.4028/scientific5/amr.452-453.1501 ◽

2012 ◽

Vol 452-453 ◽

pp. 1501-1506

Author(s):

Lin Wang ◽

Jin Wen Ma

Keyword(s):

Model Selection ◽

Em Algorithm ◽

Gaussian Mixture ◽

Adaptive Model ◽

Adaptive Model Selection

Download Full-text