incremental feature selection
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 16)

H-INDEX

8
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Yanyan Yang ◽  
Degang Chen ◽  
Xiao Zhang ◽  
Zhenyan Ji

Abstract Covering rough sets conceptualize different types of features with their respective generated coverings. By integrating these coverings into a single covering, covering rough set based feature selection finds valuable features from a mixed decision system with symbolic, real-valued, missing-valued, and set-valued features. Existing approaches to covering rough set based feature selection, however, are intractable to handle large mixed data. Therefore, an efficient strategy of incremental feature selection is proposed by presenting a mixed data set in sample subsets one after another. Once a new sample subset comes in, the relative discernible relation of each feature is updated to disclose incremental feature selection scheme that decides the strategies of increasing informative features and removing redundant features. The incremental scheme is applied to establish two incremental feature selection algorithms from large or dynamic mixed datasets. The first algorithm updates the feature subset upon the sequent arrival of sample subsets, and returns the reduct when no further sample subsets are obtained. The second one merely updates the relative discernible relations, and finds the reduct when no subsets are obtained. Extensive experiments demonstrate that the two proposed incremental algorithms, especially the second one speeds up covering rough set based feature selection without sacrificing too much classification performance.


Author(s):  
Lei Chen ◽  
Xianchao Zhou ◽  
Tao Zeng ◽  
Xiaoyong Pan ◽  
Yu-Hang Zhang ◽  
...  

Cancer has been generally defined as a cluster of systematic malignant pathogenesis involving abnormal cell growth. Genetic mutations derived from environmental factors and inherited genetics trigger the initiation and progression of cancers. Although several well-known factors affect cancer, mutation features and rules that affect cancers are relatively unknown due to limited related studies. In this study, a computational investigation on mutation profiles of cancer samples in 27 types was given. These profiles were first analyzed by the Monte Carlo Feature Selection (MCFS) method. A feature list was thus obtained. Then, the incremental feature selection (IFS) method adopted such list to extract essential mutation features related to 27 cancer types, find out 207 mutation rules and construct efficient classifiers. The top 37 mutation features corresponding to different cancer types were discussed. All the qualitatively analyzed gene mutation features contribute to the distinction of different types of cancers, and most of such mutation rules are supported by recent literature. Therefore, our computational investigation could identify potential biomarkers and prediction rules for cancers in the mutation signature level.


Author(s):  
Yu-Hang Zhang ◽  
Hao Li ◽  
Tao Zeng ◽  
Lei Chen ◽  
Zhandong Li ◽  
...  

The world-wide Coronavirus Disease 2019 (COVID-19) pandemic was triggered by the widespread of a new strain of coronavirus named as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Multiple studies on the pathogenesis of SARS-CoV-2 have been conducted immediately after the spread of the disease. However, the molecular pathogenesis of the virus and related diseases has still not been fully revealed. In this study, we attempted to identify new transcriptomic signatures as candidate diagnostic models for clinical testing or as therapeutic targets for vaccine design. Using the recently reported transcriptomics data of upper airway tissue with acute respiratory illnesses, we integrated multiple machine learning methods to identify effective qualitative biomarkers and quantitative rules for the distinction of SARS-CoV-2 infection from other infectious diseases. The transcriptomics data was first analyzed by Boruta so that important features were selected, which were further evaluated by the minimum redundancy maximum relevance method. A feature list was produced. This list was fed into the incremental feature selection, incorporating some classification algorithms, to extract qualitative biomarker genes and construct quantitative rules. Also, an efficient classifier was built to identify patients infected with SARS-COV-2. The findings reported in this study may help in revealing the potential pathogenic mechanisms of COVID-19 and finding new targets for vaccine design.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Lu Zhang ◽  
Min Liu ◽  
Xinyi Qin ◽  
Guangzhong Liu

Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k -spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F -measure.


2020 ◽  
Vol 536 ◽  
pp. 185-204
Author(s):  
Peng Ni ◽  
Suyun Zhao ◽  
Xizhao Wang ◽  
Hong Chen ◽  
Cuiping Li ◽  
...  

2020 ◽  
Vol 28 (5) ◽  
pp. 901-915 ◽  
Author(s):  
Xiao Zhang ◽  
Changlin Mei ◽  
Degang Chen ◽  
Yanyan Yang ◽  
Jinhai Li

Sign in / Sign up

Export Citation Format

Share Document