On some aspects of minimum redundancy maximum relevance feature selection

2019 ◽  
Vol 63 (1) ◽  
Author(s):  
Peter Bugata ◽  
Peter Drotar
2017 ◽  
Vol 2017 ◽  
pp. 1-15 ◽  
Author(s):  
Fei Yuan ◽  
Yu-Hang Zhang ◽  
Xiang-Yin Kong ◽  
Yu-Dong Cai

Identification of disease genes is a hot topic in biomedicine and genomics. However, it is a challenging problem because of the complexity of diseases. Inflammatory bowel disease (IBD) is an idiopathic disease caused by a dysregulated immune response to host intestinal microflora. It has been proven to be associated with the development of intestinal malignancies. Although the specific pathological characteristics and genetic background of IBD have been partially revealed, it is still an overdetermined disease and the blueprint of all genetic variants still needs to be improved. In this study, a novel computational method was built to identify genes related to IBD. Samples from two subtypes of IBD (ulcerative colitis and Crohn’s disease) and normal samples were employed. By analyzing the gene expression profiles of these samples using minimum redundancy maximum relevance and incremental feature selection, 21 genes were obtained that could effectively distinguish samples from the two subtypes of IBD and the normal samples. Then, the shortest-path approach was used to search for an additional 20 genes in a large network constructed using protein-protein interactions based on the above-mentioned 21 genes. Analyses of the 41 genes obtained indicate that they are closely associated with this disease.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Xin Ma ◽  
Jing Guo ◽  
Xiao Sun

The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.


Sign in / Sign up

Export Citation Format

Share Document