scholarly journals Booster in High Dimensional Data Classification using CNN and Decision Tree Algorithm

Classification problems in high dimensional data with small number of observations are becoming more common especially in microarray data. The performance in terms of accuracy is essential while handling sensitive data particularly in medical field. For this the stability of the selected features must be evaluated. Therefore, this paper proposes a new evaluation measure that incorporates the stability of the selected feature subsets and accuracy of the prediction. Booster in feature selection algorithm helps to achieve the same. The proposed work resolves both structured and unstructured data using convolution neural network based multimodal disease prediction and decision tree algorithm respectively. The algorithm is tested on heart disease dataset retrieved from UCI repository and the analysis shows the improved prediction accuracy.

Author(s):  
Giuseppe Nuti ◽  
Lluís Antoni Jiménez Rugama ◽  
Andreea-Ingrid Cross

Bayesian Decision Trees provide a probabilistic framework that reduces the instability of Decision Trees while maintaining their explainability. While Markov Chain Monte Carlo methods are typically used to construct Bayesian Decision Trees, here we provide a deterministic Bayesian Decision Tree algorithm that eliminates the sampling and does not require a pruning step. This algorithm generates the greedy-modal tree (GMT) which is applicable to both regression and classification problems. We tested the algorithm on various benchmark classification data sets and obtained similar accuracies to other known techniques. Furthermore, we show that we can statistically analyze how was the GMT derived from the data and demonstrate this analysis with a financial example. Notably, the GMT allows for a technique that provides explainable simpler models which is often a prerequisite for applications in finance or the medical industry.


As a wrongdoing of utilizing specialized intends to take sensitive data of clients and users in the internet, phishing is as of now an advanced risk confronting the Internet, and misfortunes due to phishing are developing consistently. Recognition of these phishing scams is a very testing issue on the grounds that phishing is predominantly a semantics based assault, which particularly manhandles human vulnerabilities, anyway not system or framework vulnerabilities. Phishing costs. As a product discovery plot, two primary methodologies are generally utilized: blacklists/whitelists and machine learning approaches. Every phishing technique has different parameters and type of attack. Using decision tree algorithm we find out whether the attack is legitimate or a scam. We measure this by grouping them with diverse parameters and features, thereby assisting the machine learning algorithm to edify.


Sign in / Sign up

Export Citation Format

Share Document