FRANKSUM: NEW FEATURE SELECTION METHOD FOR PROTEIN FUNCTION PREDICTION

2005 ◽  
Vol 15 (04) ◽  
pp. 259-275 ◽  
Author(s):  
ALI AL-SHAHIB ◽  
RAINER BREITLING ◽  
DAVID GILBERT

In the study of in silico functional genomics, improving the performance of protein function prediction is the ultimate goal for identifying proteins associated with defined cellular functions. The classical prediction approach is to employ pairwise sequence alignments. However this method often faces difficulties when no statistically significant homologous sequences are identified. An alternative way is to predict protein function from sequence-derived features using machine learning. In this case the choice of possible features which can be derived from the sequence is of vital importance to ensure adequate discrimination to predict function. In this paper we have successfully selected biologically significant features for protein function prediction. This was performed using a new feature selection method (FrankSum) that avoids data distribution assumptions, uses a data independent measurement (p-value) within the feature, identifies redundancy between features and uses an appropiate ranking criterion for feature selection. We have shown that classifiers generated from features selected by FrankSum outperforms classifiers generated from full feature sets, randomly selected features and features selected from the Wrapper method. We have also shown the features are concordant across all species and top ranking features are biologically informative. We conclude that feature selection is vital for successful protein function prediction and FrankSum is one of the feature selection methods that can be applied successfully to such a domain.

Author(s):  
Nguyen Thi Anh Dao ◽  
Le Trung Thanh ◽  
Viet-Dung Nguyen ◽  
Nguyen Linh-Trung ◽  
Ha Vu Le

Epilepsy is one of the most common and severe brain disorders. Electroencephalogram (EEG) is widely used in epilepsy diagnosis and treatment, with it the epileptic spikes can be observed. Tensor decomposition-based feature extraction has been proposed to facilitate automatic detection of EEG epileptic spikes. However, tensor decomposition may still result in a large number of features which are considered negligible in determining expected output performance. We proposed a new feature selection method that combines the Fisher score and p-value feature selection methods to rank the features by using the longest common sequences (LCS) to separate epileptic and non-epileptic spikes. The proposed method significantly outperformed several state-of-the-art feature selection methods.


2021 ◽  
Vol 25 (1) ◽  
pp. 21-34
Author(s):  
Rafael B. Pereira ◽  
Alexandre Plastino ◽  
Bianca Zadrozny ◽  
Luiz H.C. Merschmann

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.


2021 ◽  
pp. 535-542
Author(s):  
Zaifei Luo ◽  
Yun Zheng ◽  
Yuliang Ma ◽  
Qingshan She ◽  
Mingxu Sun ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document