feature ranking
Recently Published Documents


TOTAL DOCUMENTS

360
(FIVE YEARS 113)

H-INDEX

24
(FIVE YEARS 5)

2022 ◽  
Vol 71 (2) ◽  
pp. 2249-2269
Author(s):  
Noha E. El-Attar ◽  
Sahar F. Sabbeh ◽  
Heba Fasihuddin ◽  
Wael A. Awad
Keyword(s):  

2021 ◽  
Author(s):  
◽  
Kourosh Neshatian

<p><b>Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic programming (GP). The goal is to modify the input representation of classification problems in order to improve classification performance and reduce the complexity of classification models. The thesis regards classification problems as a collection of variables including conditional variables (input features) and decision variables (target class labels). GP is used to discover the relationships between these variables. The types of relationship and the ways in which they are discovered vary with the three aspects of feature manipulation.</b></p> <p>In feature construction, the thesis proposes a GP-based method to construct high-level features in the form of functions of original input features. The functions are evolved by GP using an entropy-based fitness function that maximises the purity of class intervals. Unlike existing algorithms, the proposed GP-based method constructs multiple features and it can effectively perform transformational dimensionality reduction, using only a small number of GP-constructed features while preserving good classification performance.</p> <p>In feature ranking, the thesis proposes two GP-based methods for ranking single features and subsets of features. In single-feature ranking, the proposed method measures the influence of individual features on the classification performance by using GP to evolve a collection of weak classification models, and then measures the contribution of input features to the making of good models. In ranking of subsets of features, a virtual structure for GP trees and a new binary relevance function is proposed to measure the relationship between a subset of features and the target class labels. It is observed that the proposed method can discover complex relationships - such as multi-modal class distributions and multivariate correlations - that cannot be detected by traditional methods. In feature selection, the thesis provides a novel multi-objective GP-based approach to measuring the goodness of subsets of features. The subsets are evaluated based on their cardinality and their relationship to target class labels. The selection is performed by choosing a subset of features from a GP-discovered Pareto front containing suboptimal solutions (subsets). The thesis also proposes a novel method for measuring the redundancy between input features. It is used to select a subset of relevant features that do not exhibit redundancy with respect to each other. It is found that in all three aspects of feature manipulation, the proposed GP-based methodology is effective in discovering relationships between the features of a classification task. In the case of feature construction, the proposed GP-based methods evolve functions of conditional variables that can significantly improve the classification performance and reduce the complexity of the learned classifiers. In the case of feature ranking, the proposed GP-based methods can find complex relationships between conditional variables and decision variables. The resulted ranking shows a strong linear correlation with the actual classification performance. In the case of feature selection, the proposed GP-based method can find a set of sub-optimal subsets of features which provids a trade-off between the number of features and their relevance to the classification task. The proposed redundancy removal method can remove redundant features from a set of features. Both proposed feature selection methods can find an optimal subset of features that yields significantly better classification performance with a much smaller number of features than conventional classification methods.</p>


2021 ◽  
Author(s):  
◽  
Soha Ahmed

<p>Mass spectrometry (MS) is currently the most commonly used technology in biochemical research for proteomic analysis. The primary goal of proteomic profiling using mass spectrometry is the classification of samples from different experimental states. To classify the MS samples, the identification of protein or peptides (biomarker detection) that are expressed differently between the classes, is required.  However, due to the high dimensionality of the data and the small number of samples, classification of MS data is extremely challenging. Another important aspect of biomarker detection is the verification of the detected biomarker that acts as an intermediate step before passing these biomarkers to the experimental validation stage.  Biomarker detection aims at altering the input space of the learning algorithm for improving classification of proteomic or metabolomic data. This task is performed through feature manipulation.  Feature manipulation consists of three aspects: feature ranking, feature selection, and feature construction. Genetic programming (GP) is an evolutionary computation algorithm that has the intrinsic capability for the three aspects of feature manipulation. The ability of GP for feature manipulation in proteomic biomarker discovery has not been fully investigated. This thesis, therefore, proposes an embedded methodology for these three aspects of feature manipulation in high dimensional MS data using GP. The thesis also presents a method for biomarker verification, using GP. The thesis investigates the use of GP for both single-objective and multi-objective feature selection and construction.  In feature ranking, the thesis proposes a GP-based method for ranking subsets of features by using GP as an ensemble approach. The proposed algorithm uses GP capability to combine the advantages of different feature ranking metrics and evolve a new ranking scheme for the subset of the features selected from the top ranked features. The capability of GP as a classifier is also investigated by this method. The results show that GP can select a smaller number of features and provide a better ranking of the selected features, which can improve the classification performance of five classifiers.  In feature construction, this thesis proposes a novel multiple feature construction method, which uses a single GP tree to generate a new set of high-level features from the original set of selected features. The results show that the proposed new algorithm outperforms two feature selection algorithms.  In feature selection, the thesis introduces the first GP multi-objective method for biomarker detection, which simultaneously increase the classification accuracy and reduce the number of detected features. The proposed multi-objective method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. This thesis also develops the first multi-objective multiple feature construction algorithm for MS data. The proposed method aims at both maximising the classification performance and minimizing the cardinality of the constructed new high-level features. The results show that GP can dis- cover the complex relationships between the features and can significantly improve classification performance and reduce the cardinality.  For biomarker verification, the thesis proposes the first GP biomarker verification method through measuring the peptide detectability. The method solves the imbalance problem in the data and shows improvement over the benchmark algorithms. Also, the algorithm outperforms a well-known peptide detection method. The thesis also introduces a new GP method for alignment of MS data as a preprocessing stage, which will further help in improving the biomarker detection process.</p>


2021 ◽  
Author(s):  
◽  
Soha Ahmed

<p>Mass spectrometry (MS) is currently the most commonly used technology in biochemical research for proteomic analysis. The primary goal of proteomic profiling using mass spectrometry is the classification of samples from different experimental states. To classify the MS samples, the identification of protein or peptides (biomarker detection) that are expressed differently between the classes, is required.  However, due to the high dimensionality of the data and the small number of samples, classification of MS data is extremely challenging. Another important aspect of biomarker detection is the verification of the detected biomarker that acts as an intermediate step before passing these biomarkers to the experimental validation stage.  Biomarker detection aims at altering the input space of the learning algorithm for improving classification of proteomic or metabolomic data. This task is performed through feature manipulation.  Feature manipulation consists of three aspects: feature ranking, feature selection, and feature construction. Genetic programming (GP) is an evolutionary computation algorithm that has the intrinsic capability for the three aspects of feature manipulation. The ability of GP for feature manipulation in proteomic biomarker discovery has not been fully investigated. This thesis, therefore, proposes an embedded methodology for these three aspects of feature manipulation in high dimensional MS data using GP. The thesis also presents a method for biomarker verification, using GP. The thesis investigates the use of GP for both single-objective and multi-objective feature selection and construction.  In feature ranking, the thesis proposes a GP-based method for ranking subsets of features by using GP as an ensemble approach. The proposed algorithm uses GP capability to combine the advantages of different feature ranking metrics and evolve a new ranking scheme for the subset of the features selected from the top ranked features. The capability of GP as a classifier is also investigated by this method. The results show that GP can select a smaller number of features and provide a better ranking of the selected features, which can improve the classification performance of five classifiers.  In feature construction, this thesis proposes a novel multiple feature construction method, which uses a single GP tree to generate a new set of high-level features from the original set of selected features. The results show that the proposed new algorithm outperforms two feature selection algorithms.  In feature selection, the thesis introduces the first GP multi-objective method for biomarker detection, which simultaneously increase the classification accuracy and reduce the number of detected features. The proposed multi-objective method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. This thesis also develops the first multi-objective multiple feature construction algorithm for MS data. The proposed method aims at both maximising the classification performance and minimizing the cardinality of the constructed new high-level features. The results show that GP can dis- cover the complex relationships between the features and can significantly improve classification performance and reduce the cardinality.  For biomarker verification, the thesis proposes the first GP biomarker verification method through measuring the peptide detectability. The method solves the imbalance problem in the data and shows improvement over the benchmark algorithms. Also, the algorithm outperforms a well-known peptide detection method. The thesis also introduces a new GP method for alignment of MS data as a preprocessing stage, which will further help in improving the biomarker detection process.</p>


2021 ◽  
Author(s):  
◽  
Kourosh Neshatian

<p><b>Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic programming (GP). The goal is to modify the input representation of classification problems in order to improve classification performance and reduce the complexity of classification models. The thesis regards classification problems as a collection of variables including conditional variables (input features) and decision variables (target class labels). GP is used to discover the relationships between these variables. The types of relationship and the ways in which they are discovered vary with the three aspects of feature manipulation.</b></p> <p>In feature construction, the thesis proposes a GP-based method to construct high-level features in the form of functions of original input features. The functions are evolved by GP using an entropy-based fitness function that maximises the purity of class intervals. Unlike existing algorithms, the proposed GP-based method constructs multiple features and it can effectively perform transformational dimensionality reduction, using only a small number of GP-constructed features while preserving good classification performance.</p> <p>In feature ranking, the thesis proposes two GP-based methods for ranking single features and subsets of features. In single-feature ranking, the proposed method measures the influence of individual features on the classification performance by using GP to evolve a collection of weak classification models, and then measures the contribution of input features to the making of good models. In ranking of subsets of features, a virtual structure for GP trees and a new binary relevance function is proposed to measure the relationship between a subset of features and the target class labels. It is observed that the proposed method can discover complex relationships - such as multi-modal class distributions and multivariate correlations - that cannot be detected by traditional methods. In feature selection, the thesis provides a novel multi-objective GP-based approach to measuring the goodness of subsets of features. The subsets are evaluated based on their cardinality and their relationship to target class labels. The selection is performed by choosing a subset of features from a GP-discovered Pareto front containing suboptimal solutions (subsets). The thesis also proposes a novel method for measuring the redundancy between input features. It is used to select a subset of relevant features that do not exhibit redundancy with respect to each other. It is found that in all three aspects of feature manipulation, the proposed GP-based methodology is effective in discovering relationships between the features of a classification task. In the case of feature construction, the proposed GP-based methods evolve functions of conditional variables that can significantly improve the classification performance and reduce the complexity of the learned classifiers. In the case of feature ranking, the proposed GP-based methods can find complex relationships between conditional variables and decision variables. The resulted ranking shows a strong linear correlation with the actual classification performance. In the case of feature selection, the proposed GP-based method can find a set of sub-optimal subsets of features which provids a trade-off between the number of features and their relevance to the classification task. The proposed redundancy removal method can remove redundant features from a set of features. Both proposed feature selection methods can find an optimal subset of features that yields significantly better classification performance with a much smaller number of features than conventional classification methods.</p>


Author(s):  
Manish Kumar Pandey ◽  
Mamta Mittal ◽  
Karthikeyan Subbiah
Keyword(s):  

2021 ◽  
Vol 11 (2) ◽  
pp. 25-34
Author(s):  
Oyinkansola Oluwapelumi Kemi Afolabi-B ◽  
Maheyzah MD Siraj

Security and protection of information is an ever-evolving process in the field of information security. One of the major tools of protection is the Intrusion Detection Systems (IDS). For so many years, IDS have been developed for use in computer networks, they have been widely used to detect a range of network attacks; but one of its major drawbacks is that attackers, with the evolution of time and technology make it harder for IDS systems to cope. A sub-branch of IDS-Intrusion Alert Analysis was introduced into the research system to combat these problems and help support IDS by analyzing the alert triggered by the IDS. Intrusion Alert analysis has served as a good support for IDS systems for many years but also has its own short comings which are the amount of the voluminous number of alerts produced by IDS systems. From years of research, it has been observed that majority of the alerts produced are undesirables such as duplicates, false alerts, etc., leading to huge amounts of alerts causing alert flooding. This research proposed the reduction alert by targeting these undesirable alerts through the integration of supervised and unsupervised algorithms and approach. The research first selects significant features by comparing two feature ranking techniques this targets duplicates, low priority and irrelevant alert. To achieve further reduction, the research proposed the integration of supervised and unsupervised algorithms to filter out false alerts. Based on this, an effective model was gotten which achieved 94.02% reduction rate of alerts. Making use of the dataset ISCX 2012, experiments were conducted and the model with the highest reduction rate was chosen. The model was evaluated against other experimental results and benchmarked against a related work, it also improved on the said related work.


2021 ◽  
pp. 1-18
Author(s):  
Mehdi Shojaie ◽  
Solale Tabarestani ◽  
Mercedes Cabrerizo ◽  
Steven T. DeKosky ◽  
David E. Vaillancourt ◽  
...  

Background: Machine learning is a promising tool for biomarker-based diagnosis of Alzheimer’s disease (AD). Performing multimodal feature selection and studying the interaction between biological and clinical AD can help to improve the performance of the diagnosis models. Objective: This study aims to formulate a feature ranking metric based on the mutual information index to assess the relevance and redundancy of regional biomarkers and improve the AD classification accuracy. Methods: From the Alzheimer’s Disease Neuroimaging Initiative (ADNI), 722 participants with three modalities, including florbetapir-PET, flortaucipir-PET, and MRI, were studied. The multivariate mutual information metric was utilized to capture the redundancy and complementarity of the predictors and develop a feature ranking approach. This was followed by evaluating the capability of single-modal and multimodal biomarkers in predicting the cognitive stage. Results: Although amyloid-β deposition is an earlier event in the disease trajectory, tau PET with feature selection yielded a higher early-stage classification F1-score (65.4%) compared to amyloid-β PET (63.3%) and MRI (63.2%). The SVC multimodal scenario with feature selection improved the F1-score to 70.0% and 71.8% for the early and late-stage, respectively. When age and risk factors were included, the scores improved by 2 to 4%. The Amyloid-Tau-Neurodegeneration [AT(N)] framework helped to interpret the classification results for different biomarker categories. Conclusion: The results underscore the utility of a novel feature selection approach to reduce the dimensionality of multimodal datasets and enhance model performance. The AT(N) biomarker framework can help to explore the misclassified cases by revealing the relationship between neuropathological biomarkers and cognition.


Sign in / Sign up

Export Citation Format

Share Document