Sparse Contribution Feature Selection and Classifiers Optimized by Concave-Convex Variation for HCC Image Recognition

Accurate classification of hepatocellular carcinoma (HCC) image is of great importance in pathology diagnosis and treatment. This paper proposes a concave-convex variation (CCV) method to optimize three classifiers (random forest, support vector machine, and extreme learning machine) for the more accurate HCC image classification results. First, in preprocessing stage, hematoxylin-eosin (H&E) pathological images are enhanced using bilateral filter and each HCC image patch is obtained under the guidance of pathologists. Then, after extracting the complete features of each patch, a new sparse contribution (SC) feature selection model is established to select the beneficial features for each classifier. Finally, a concave-convex variation method is developed to improve the performance of classifiers. Experiments using 1260 HCC image patches demonstrate that our proposed CCV classifiers have improved greatly compared to each original classifier and CCV-random forest (CCV-RF) performs the best for HCC image recognition.

Download Full-text

RANDOM FOREST BASED CLASSIFICATION OF MEDICAL X-RAY IMAGES USING A GENETIC ALGORITHM FOR FEATURE SELECTION

Journal of Mechanics in Medicine and Biology ◽

10.1142/s0219519415400254 ◽

2015 ◽

Vol 15 (02) ◽

pp. 1540025 ◽

Cited By ~ 3

Author(s):

IMANE NEDJAR ◽

MOSTAFA EL HABIB DAHO ◽

NESMA SETTOUTI ◽

SAÏD MAHMOUDI ◽

MOHAMED AMINE CHIKH

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Random Forest ◽

Fitness Function ◽

Classification Error ◽

Support Vector ◽

Second Phase ◽

X Ray ◽

Combinatorial Set

Automated classification of medical images is an increasingly important tool for physicians in their daily activities. However, due to its computational complexity, this task is one of the major current challenges in the field of content-based image retrieval (CBIR). In this paper, a medical image classification approach is proposed. This method is composed of two main phases. The first step consists of a pre-processing, where a texture and shape based features vector is extracted. Also, a feature selection approach was applied by using a Genetic Algorithm (GA). The proposed GA uses a kNN based classification error as fitness function, which enables the GA to obtain a combinatorial set of feature giving rise to optimal accuracy. In the second phase, a classification process is achieved by using random Forest classifier and a supervised multi-class classifier based on the support vector machine (SVM) for classifying X-ray images.

Download Full-text

CLASSIFICATION OF HIGH-DIMENSIONAL MICROARRAY DATA WITH A TWO-STEP PROCEDURE VIA A WILCOXON CRITERION AND MULTILAYER PERCEPTRON

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026811002969 ◽

2011 ◽

Vol 10 (01) ◽

pp. 1-14

Author(s):

VLADIMIR NIKULIN ◽

TIAN-HSIANG HUANG ◽

GEOFFREY J. MCLACHLAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

High Dimensional ◽

Second Step ◽

Support Vector ◽

Step Procedure ◽

Leave One Out ◽

Natural Combination ◽

Feature Selection Techniques

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

A data mining approach to the diagnosis of failure modes for two serial fastened sandwich composite plates

Journal of Composite Materials ◽

10.1177/0021998316679720 ◽

2016 ◽

Vol 51 (20) ◽

pp. 2853-2862 ◽

Cited By ~ 2

Author(s):

Serkan Ballı

Keyword(s):

Data Mining ◽

Random Forest ◽

Failure Modes ◽

Composite Plates ◽

Study Data ◽

Sandwich Composite ◽

Support Vector ◽

Geometrical Parameters ◽

Mining Methods

The aim of this study is to diagnose and classify the failure modes for two serial fastened sandwich composite plates using data mining techniques. The composite material used in the study was manufactured using glass fiber reinforced layer and aluminum sheets. Obtained results of previous experimental study for sandwich composite plates, which were mechanically fastened with two serial pins or bolts were used for classification of failure modes. Furthermore, experimental data from previous study consists of different geometrical parameters for various applied preload moments as 0 (pinned), 2, 3, 4, and 5 Nm (bolted). In this study, data mining methods were applied by using these geometrical parameters and pinned/bolted joint configurations. Therefore, three geometrical parameters and 100 test data were used for classification by utilizing support vector machine, Naive Bayes, K-Nearest Neighbors, Logistic Regression, and Random Forest methods. According to experiments, Random Forest method achieved better results than others and it was appropriate for diagnosing and classification of the failure modes. Performances of all data mining methods used were discussed in terms of accuracy and error ratios.

Download Full-text

Multiclass classification of leukemia cancer data using Fuzzy Support Vector Machine (FSVM) with feature selection using Principal Component Analysis (PCA)

Journal of Physics Conference Series ◽

10.1088/1742-6596/1725/1/012012 ◽

2021 ◽

Vol 1725 ◽

pp. 012012

Author(s):

I R Fauzi ◽

Z Rustam ◽

A Wibowo

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Feature Selection ◽

Principal Component ◽

Component Analysis ◽

Multiclass Classification ◽

Support Vector ◽

Fuzzy Support Vector Machine ◽

Cancer Data

Download Full-text

Phybrata Sensors and Machine Learning for Enhanced Neurophysiological Diagnosis and Treatment

Sensors ◽

10.3390/s21217417 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7417

Author(s):

Alex J. Hope ◽

Utkarsh Vashisth ◽

Matthew J. Parker ◽

Andreas B. Ralston ◽

Joshua M. Roper ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Random Forest ◽

Binary Classification ◽

Classification Performance ◽

Support Vector ◽

Use Case ◽

Signal Features ◽

Test Population

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.

Download Full-text

Systematic Framework to Predict Early-Stage Liver Carcinoma Using Hybrid of Feature Selection Techniques and Regression Techniques

Complexity ◽

10.1155/2022/7816200 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Marium Mehmood ◽

Nasser Alshammari ◽

Saad Awadh Alanazi ◽

Fahad Ahmad

Keyword(s):

Feature Selection ◽

Random Forest ◽

Liver Diseases ◽

Early Stage ◽

Support Vector ◽

Liver Carcinoma ◽

Random Forest Regression ◽

Soft Computing Techniques ◽

Regression Algorithms ◽

Regression Techniques

The liver is the human body’s mandatory organ, but detecting liver disease at an early stage is very difficult due to the hiddenness of symptoms. Liver diseases may cause loss of energy or weakness when some irregularities in the working of the liver get visible. Cancer is one of the most common diseases of the liver and also the most fatal of all. Uncontrolled growth of harmful cells is developed inside the liver. If diagnosed late, it may cause death. Treatment of liver diseases at an early stage is, therefore, an important issue as is designing a model to diagnose early disease. Firstly, an appropriate feature should be identified which plays a more significant part in the detection of liver cancer at an early stage. Therefore, it is essential to extract some essential features from thousands of unwanted features. So, these features will be mined using data mining and soft computing techniques. These techniques give optimized results that will be helpful in disease diagnosis at an early stage. In these techniques, we use feature selection methods to reduce the dataset’s feature, which include Filter, Wrapper, and Embedded methods. Different Regression algorithms are then applied to these methods individually to evaluate the result. Regression algorithms include Linear Regression, Ridge Regression, LASSO Regression, Support Vector Regression, Decision Tree Regression, Multilayer Perceptron Regression, and Random Forest Regression. Based on the accuracy and error rates generated by these Regression algorithms, we have evaluated our results. The result shows that Random Forest Regression with the Wrapper Method from all the deployed Regression techniques is the best and gives the highest R2-Score of 0.8923 and lowest MSE of 0.0618.

Download Full-text

A DECISION TREE-BASED CLASSIFICATION FRAMEWORK FOR USED OIL ANALYSIS APPLYING RANDOM FOREST FEATURE SELECTION

Journal of Applied Science, Engineering and Technology for Development ◽

10.33803/jasetd.2017.3-1.7 ◽

2018 ◽

Keyword(s):

Decision Making ◽

Feature Selection ◽

Random Forest ◽

Decision Tree ◽

Condition Monitoring ◽

Critical Parameters ◽

Oil Analysis ◽

Used Oil ◽

Maintenance Decision

Lubricant condition monitoring (LCM), part of condition monitoring techniques under Condition Based Maintenance, monitors the condition and state of the lubricant which reveal the condition and state of the equipment. LCM has proved and evidenced to represent a key concept driving maintenance decision making involving sizeable number of parameter (variables) tests requiring classification and interpretation based on the lubricant’s condition. Reduction of the variables to a manageable and admissible level and utilization for prediction is key to ensuring optimization of equipment performance and lubricant condition. This study advances a methodology on feature selection and predictive modelling of in-service oil analysis data to assist in maintenance decision making of critical equipment. Proposed methodology includes data pre-processing involving cleaning, expert assessment and standardization due to the different measurement scales. Limits provided by the Original Equipment Manufacturers (OEM) are used by the analysts to manually classify and indicate samples with significant lubricant deterioration. In the last part of the methodology, Random Forest (RF) is used as a feature selection tool and a Decision Tree-based (DT) classification of the in-service oil samples. A case study of a thermal power plant is advanced, to which the framework is applied. The selection of admissible variables using Random Forest exposes critical used oil analysis (UOA) variables indicative of lubricant/machine degradation, while DT model, besides predicting the classification of samples, offers visual interpretability of parametric impact to the classification outcome. The model evaluation returned acceptable predictive, while the framework renders speedy classification with insights for maintenance decision making, thus ensuring timely interventions. Moreover, the framework highlights critical and relevant oil analysis parameters that are indicative of lubricant degradation; hence, by addressing such critical parameters, organizations can better enhance the reliability of their critical operable equipment.

Download Full-text

Predicting discontinuation of docetaxel treatment for metastatic castration-resistant prostate cancer (mCRPC) with random forest

F1000Research ◽

10.12688/f1000research.8353.1 ◽

2016 ◽

Vol 5 ◽

pp. 2673 ◽

Cited By ~ 1

Author(s):

Daniel Kristiyanto ◽

Kevin E. Anderson ◽

Ling-Hong Hung ◽

Ka Yee Yeung

Keyword(s):

Prostate Cancer ◽

Feature Selection ◽

Random Forest ◽

Developed Countries ◽

Hill Climbing ◽

Support Vector ◽

Castration Resistant Prostate Cancer ◽

K Nearest Neighbor ◽

Missing Data Imputation ◽

Docetaxel Treatment

Prostate cancer is the most common cancer among men in developed countries. Androgen deprivation therapy (ADT) is the standard treatment for prostate cancer. However, approximately one third of all patients with metastatic disease treated with ADT develop resistance to ADT. This condition is called metastatic castrate-resistant prostate cancer (mCRPC). Patients who do not respond to hormone therapy are often treated with a chemotherapy drug called docetaxel. Sub-challenge 2 of the Prostate Cancer DREAM Challenge aims to improve the prediction of whether a patient with mCRPC would discontinue docetaxel treatment due to adverse effects. Specifically, a dataset containing three distinct clinical studies of patients with mCRPC treated with docetaxel was provided. We applied the k-nearest neighbor method for missing data imputation, the hill climbing algorithm and random forest importance for feature selection, and the random forest algorithm for classification. We also empirically studied the performance of many classification algorithms, including support vector machines and neural networks. Additionally, we found using random forest importance for feature selection provided slightly better results than the more computationally expensive method of hill climbing.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Feature Selection Using Random Forest Algorithm to Diagnose Tuberculosis From Lung CT Images

AI Innovation in Medical Imaging Diagnostics - Advances in Medical Technologies and Clinical Practice ◽

10.4018/978-1-7998-3092-4.ch005 ◽

2021 ◽

pp. 92-100

Author(s):

Beaulah Jeyavathana Rajendran ◽

Kanimozhi K. V.

Keyword(s):

Feature Selection ◽

Random Forest ◽

The Body ◽

Support Vector ◽

Feature Descriptor ◽

Feature Sets ◽

Modified Particle Swarm Optimization ◽

Tuberculosis Disease ◽

Optimal Feature ◽

Lung Ct

Tuberculosis is one of the hazardous infectious diseases that can be categorized by the evolution of tubercles in the tissues. This disease mainly affects the lungs and also the other parts of the body. The disease can be easily diagnosed by the radiologists. The main objective of this chapter is to get best solution selected by means of modified particle swarm optimization is regarded as optimal feature descriptor. Five stages are being used to detect tuberculosis disease. They are pre-processing an image, segmenting the lungs and extracting the feature, feature selection and classification. These stages that are used in medical image processing to identify the tuberculosis. In the feature extraction, the GLCM approach is used to extract the features and from the extracted feature sets the optimal features are selected by random forest. Finally, support vector machine classifier method is used for image classification. The experimentation is done, and intermediate results are obtained. The proposed system accuracy results are better than the existing method in classification.

Download Full-text