Comparison of data mining algorithms in remote sensing using Lidar data fusion and feature selection

Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method employed subsequently -- undesirable features are filtered out of the data before learning begins. These algorithms use heuristics based on general characteristics of the data to evaluate the merit of feature subsets. A sub-category of filter methods that will be refer to as rankers, are methods that employ some criterion to score each feature and provide a ranking. From this ordering, several feature subsets can be chosen by manually setting There are three main approaches for feature selection: wrapper, filter and embedded. The wrapper approach (Kohavi, 1995; Kohavi and John,1996), uses an inducer as a black box along with a statistical re-sampling technique such as cross-validation to select the best feature subset according to some predictive measure. The embedded approach (see for instance Guyon and Elisseeff, 2003) is similar to the wrapper approach in the sense that the features are specifically selected for a certain inducer, but it selects the features in the process of learning.

Download Full-text

Bio inspired Ensemble Feature Selection (BEFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction

2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA) ◽

10.1109/iccubea47591.2019.9129304 ◽

2019 ◽

Cited By ~ 1

Author(s):

Syed Javeed Pasha ◽

E. Syed Mohamed

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Risk Prediction ◽

Disease Risk ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

Anemia Selection in Pregnant Women by using Random prediction (Rp) Classification Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3016.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2623-2630 ◽

Cited By ~ 1

Keyword(s):

Data Mining ◽

Feature Selection ◽

Classification Algorithm ◽

Computational Time ◽

The Novel ◽

Data Mining Algorithms ◽

Novel Method ◽

Mining Algorithms ◽

Median Vector ◽

Selection Of

Anemia is the global hematological disorder that occurs in pregnancy. The feature selection of unknown logical knowledge from the large dataset is capable with data mining techniques. The paper evaluates anemia features classes of Non-anemic, Mild and Severe or moderate in real time large-dimensional dataset. In the previous works, Anemia diseases can be classified in a selection of approaches, based on the Artificial Neural Networks (ANN), Gausnominal Classification and VectNeighbour classification. In these previous studies attains the proper feature selection with classification accuracy but it takes large time to predict the feature selection. So the current paper to overcome the feature selection, computational time process presents an improved Median vector feature selection (IMVFS) algorithm and new RandomPrediction (RP) classification algorithm to predict the anemia disease classes (Mild, Not anemic and Severe and moderate) based on the data mining algorithms. The results have shown that the performance of the novel method is effective compared with our previous Classification of ANN, Gausnominal and VectNeighbour classification algorithms. As the Experimental results show that proposed RandomPrediction (RP) classification with (IMVFS) feature selection methods clearly outperform than our previous methods

Download Full-text

Application of Data Mining Algorithms for Feature Selection and Prediction of Diabetic Retinopathy

Computational Science and Its Applications – ICCSA 2019 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-24308-1_56 ◽

2019 ◽

pp. 716-730

Author(s):

Tinuke O. Oladele ◽

Roseline Oluwaseun Ogundokun ◽

Aderonke Anthonia Kayode ◽

Adekanmi Adeyinka Adegun ◽

Marion Oluwabunmi Adebiyi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Diabetic Retinopathy ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

Ensemble Gain Ratio Feature Selection (EGFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction

2020 International Conference on Inventive Computation Technologies (ICICT) ◽

10.1109/icict48043.2020.9112406 ◽

2020 ◽

Cited By ~ 1

Author(s):

Syed Javeed Pasha ◽

E.Syed Mohamed

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Risk Prediction ◽

Disease Risk ◽

Gain Ratio ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

Predicting fault-prone software modules using feature selection and classification through data mining algorithms

2012 IEEE International Conference on Computational Intelligence and Computing Research ◽

10.1109/iccic.2012.6510294 ◽

2012 ◽

Cited By ~ 4

Author(s):

R. Geetha Ramani ◽

S. Vinodh Kumar ◽

Shomona Gracia Jacob

Keyword(s):

Data Mining ◽

Feature Selection ◽

Data Mining Algorithms ◽

Software Modules ◽

Mining Algorithms

Download Full-text

Wrapper Feature Selection

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch322 ◽

2011 ◽

pp. 2103-2108 ◽

Cited By ~ 7

Author(s):

Kyriacos Chrysostomou

Keyword(s):

Data Mining ◽

Feature Selection ◽

Learning Task ◽

Classification Performance ◽

Feature Subset ◽

Relevant Feature ◽

Chi Square ◽

Data Mining Algorithms ◽

Research Areas ◽

Mining Algorithms

It is well known that the performance of most data mining algorithms can be deteriorated by features that do not add any value to learning tasks. Feature selection can be used to limit the effects of such features by seeking only the relevant subset from the original features (de Souza et al., 2006). This subset of the relevant features is discovered by removing those that are considered as irrelevant or redundant. By reducing the number of features in this way, the time taken to perform classification is significantly reduced; the reduced dataset is easier to handle as fewer training instances are needed (because fewer features are present), subsequently resulting in simpler classifiers which are often more accurate. Due to the abovementioned benefits, feature selection has been widely applied to reduce the number of features in many data mining applications where data have hundreds or even thousands of features. A large number of approaches exist for performing feature selection including filters (Kira & Rendell, 1992), wrappers (Kohavi & John, 1997), and embedded methods (Quinlan, 1993). Among these approaches, the wrapper appears to be the most popularly used approach. Wrappers have proven popular in many research areas, including Bioinformatics (Ni & Liu, 2004), image classification (Puig & Garcia, 2006) and web page classification (Piramuthu, 2003). One of the reasons for the popularity of wrappers is that they make use of a classifier to help in the selection of the most relevant feature subset (John et al., 1994). On the other hand, the remaining methods, especially filters, evaluate the merit of a feature subset based on the characteristics of the data and statistical measures, e.g., chi-square, rather than the classifiers intended for use (Huang et al., 2007). Discarding the classifier when performing feature selection can subsequently result in poor classification performance. This is because the relevant feature subset will not reflect the classifier’s specific characteristics. In this way, the resulting subset may not contain those features that are most relevant to the classifier and learning task. The wrapper is therefore superior to other feature selection methods like filters since it finds feature subsets that are more suited to the data mining problem.

Download Full-text

Novel Adverse Events of Iloperidone: A Disproportionality Analysis in US Food and Drug Administration Adverse Event Reporting System (FAERS) Database

Current Drug Safety ◽

10.2174/1574886313666181026100000 ◽

2019 ◽

Vol 14 (1) ◽

pp. 21-26 ◽

Cited By ~ 2

Author(s):

Viswam Subeesh ◽

Eswaran Maheswari ◽

Hemendra Singh ◽

Thomas Elsa Beulah ◽

Ann Mary Swaroop

Keyword(s):

Data Mining ◽

Adverse Event ◽

Adverse Events ◽

Reporting System ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Disproportionality Analysis ◽

Positive Signal ◽

Data Mining Algorithms ◽

Mining Algorithms

Background: The signal is defined as “reported information on a possible causal relationship between an adverse event and a drug, of which the relationship is unknown or incompletely documented previously”. Objective: To detect novel adverse events of iloperidone by disproportionality analysis in FDA database of Adverse Event Reporting System (FAERS) using Data Mining Algorithms (DMAs). Methodology: The US FAERS database consists of 1028 iloperidone associated Drug Event Combinations (DECs) which were reported from 2010 Q1 to 2016 Q3. We consider DECs for disproportionality analysis only if a minimum of ten reports are present in database for the given adverse event and which were not detected earlier (in clinical trials). Two data mining algorithms, namely, Reporting Odds Ratio (ROR) and Information Component (IC) were applied retrospectively in the aforementioned time period. A value of ROR-1.96SE>1 and IC- 2SD>0 were considered as the threshold for positive signal. Results: The mean age of the patients of iloperidone associated events was found to be 44years [95% CI: 36-51], nevertheless age was not mentioned in twenty-one reports. The data mining algorithms exhibited positive signal for akathisia (ROR-1.96SE=43.15, IC-2SD=2.99), dyskinesia (21.24, 3.06), peripheral oedema (6.67,1.08), priapism (425.7,9.09) and sexual dysfunction (26.6-1.5) upon analysis as those were well above the pre-set threshold. Conclusion: Iloperidone associated five potential signals were generated by data mining in the FDA AERS database. The result requires an integration of further clinical surveillance for the quantification and validation of possible risks for the adverse events reported of iloperidone.

Download Full-text