scholarly journals Analyzing high dimensional correlated data using feature ranking and classifiers

2019 ◽  
Vol 7 (1) ◽  
pp. 98-120
Author(s):  
Abhijeet R Patil ◽  
Jongwha Chang ◽  
Ming-Ying Leung ◽  
Sangjin Kim

AbstractThe Illumina Infinium HumanMethylation27 (Illumina 27K) BeadChip assay is a relatively recent high-throughput technology that allows over 27,000 CpGs to be assayed. The Illumina 27K methylation data is less commonly used in comparison to gene expression in bioinformatics. It provides a critical need to find the optimal feature ranking (FR) method for handling the high dimensional data. The optimal FR method on the classifier is not well known, and choosing the best performing FR method becomes more challenging in high dimensional data setting. Therefore, identifying the statistical methods which boost the inference is of crucial importance in this context. This paper describes the detailed performances of FR methods such as fisher score, information gain, chi-square, and minimum redundancy and maximum relevance on different classification methods such as Adaboost, Random Forest, Naive Bayes, and Support Vector Machines. Through simulation study and real data applications, we show that the fisher score as an FR method, when applied on all the classifiers, achieved best prediction accuracy with significantly small number of ranked features.

2021 ◽  
Author(s):  
Jermaine Ramdass

A technique is proposed that can be used to predict the cup-to-disc ratio from a single optic fundus image and determine which image features have the highest contribution to a specific ophthalmologist’s measured cup-to-disc ratio. The procedure starts with image pre-processing. The main step of the procedure is feature extraction where image features related to pixel intensities are found. These features are used to train three different classifiers: neural networks, support vector machines, and sparse representation classifiers. The classifiers are tested and evaluated to see how accurately they can predict the cup-to-disc ratio. The best obtained results are in the 70-75% success range. Finally, feature ranking is performed using the methods of chi square and information gain on a combined feature vector using measured cup-to-disc ratios from each ophthalmologist to determine the importance and contribution of each feature to that ophthalmologist.


2008 ◽  
Vol 49 ◽  
pp. 107-113 ◽  
Author(s):  
A. Pozdnoukhov ◽  
R.S. Purves ◽  
M. Kanevski

AbstractAvalanche forecasting is a complex process involving the assimilation of multiple data sources to make predictions over varying spatial and temporal resolutions. Numerically assisted forecasting often uses nearest-neighbour methods (NN), which are known to have limitations when dealing with high-dimensional data. We apply support vector machines (SVMs) to a dataset from Lochaber, Scotland, UK, to assess their applicability in avalanche forecasting. SVMs belong to a family of theoretically based techniques from machine learning and are designed to deal with high-dimensional data. Initial experiments showed that SVMs gave results that were comparable with NN for categorical and probabilistic forecasts. Experiments utilizing the ability of SVMs to deal with high dimensionality in producing a spatial forecast show promise, but require further work.


2021 ◽  
Author(s):  
Jermaine Ramdass

A technique is proposed that can be used to predict the cup-to-disc ratio from a single optic fundus image and determine which image features have the highest contribution to a specific ophthalmologist’s measured cup-to-disc ratio. The procedure starts with image pre-processing. The main step of the procedure is feature extraction where image features related to pixel intensities are found. These features are used to train three different classifiers: neural networks, support vector machines, and sparse representation classifiers. The classifiers are tested and evaluated to see how accurately they can predict the cup-to-disc ratio. The best obtained results are in the 70-75% success range. Finally, feature ranking is performed using the methods of chi square and information gain on a combined feature vector using measured cup-to-disc ratios from each ophthalmologist to determine the importance and contribution of each feature to that ophthalmologist.


2011 ◽  
Vol 2011 ◽  
pp. 1-28 ◽  
Author(s):  
Zhongqiang Chen ◽  
Zhanyan Liang ◽  
Yuan Zhang ◽  
Zhongrong Chen

Grayware encyclopedias collect known species to provide information for incident analysis, however, the lack of categorization and generalization capability renders them ineffective in the development of defense strategies against clustered strains. A grayware categorization framework is therefore proposed here to not only classify grayware according to diverse taxonomic features but also facilitate evaluations on grayware risk to cyberspace. Armed with Support Vector Machines, the framework builds learning models based on training data extracted automatically from grayware encyclopedias and visualizes categorization results with Self-Organizing Maps. The features used in learning models are selected with information gain and the high dimensionality of feature space is reduced by word stemming and stopword removal process. The grayware categorizations on diversified features reveal that grayware typically attempts to improve its penetration rate by resorting to multiple installation mechanisms and reduced code footprints. The framework also shows that grayware evades detection by attacking victims' security applications and resists being removed by enhancing its clotting capability with infected hosts. Our analysis further points out that species in categoriesSpywareandAdwarecontinue to dominate the grayware landscape and impose extremely critical threats to the Internet ecosystem.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ersen Yılmaz

An expert system having two stages is proposed for cardiac arrhythmia diagnosis. In the first stage, Fisher score is used for feature selection to reduce the feature space dimension of a data set. The second stage is classification stage in which least squares support vector machines classifier is performed by using the feature subset selected in the first stage to diagnose cardiac arrhythmia. Performance of the proposed expert system is evaluated by using an arrhythmia data set which is taken from UCI machine learning repository.


Sign in / Sign up

Export Citation Format

Share Document