scholarly journals A Cardiotocographic Classification using Feature Selection: A comparative Study

Author(s):  
Septian Eko Prasetyo ◽  
Pulung Hendro Prastyo ◽  
Shindy Arti

Cardiotocography is a series of inspections to determine the health of the fetus in pregnancy. The inspection process is carried out by recording the baby's heart rate information whether in a healthy condition or contrarily. In addition, uterine contractions are also used to determine the health condition of the fetus. Fetal health is classified into 3 conditions namely normal, suspect, and pathological. This paper was performed to compare a classification algorithm for diagnosing the result of the cardiotocographic inspection. An experimental scheme is performed using feature selection and not using it. CFS Subset Evaluation, Info Gain, and Chi-Square are used to select the best feature which correlated to each other. The data set was obtained from the UCI Machine Learning repository available freely. To find out the performance of the classification algorithm, this study uses an evaluation matrix of precision, Recall, F-Measure, MCC, ROC, PRC, and Accuracy. The results showed that all algorithms can provide fairly good classification. However, the combination of the Random Forest algorithm and the Info Gain Feature Selection gives the best results with an accuracy of 93.74%.

This paper is aimed to analyze the feature selection process based on different statistical methods viz., Correlation, Gain Ratio, Information gain, OneR, Chi-square MapReduce model, Fisher’s exact test for agricultural data. During the recent past, Fishers exact test was commonly used for feature selection process. However, it supports only for small data set. To handle large data set, the Chi square, one of the most popular statistical methods is used. But, it also finds irrelevant data and thus resultant accuracy is not as expected. As a novelty, Fisher’s exact test is combined with Map Reduce model to handle large data set. In addition, the simulation outcome proves that proposed fisher’s exact test finds the significant attributes with more accurate and reduced time complexity when compared to other existing methods.


Text Classification is branch of text mining through which we can analyze the sentiment of the movie data. In this research paper we have applied different preprocessing techniques to reduce the features from cornell movie data set. We have also applied the Correlation-based feature subset selection and chi-square feature selection technique for gathering most valuable words of each category in text mining processes. The new cornell movie data set formed after applying the preprocessing steps and feature selection techniques. We have classified the cornell movie data as positive or negative using various classifiers like Support Vector Machine (SVM), Multilayer Perceptron (MLP), Naive Bayes (NB), Bays Net (BN) and Random Forest (RF) classifier. We have also compared the classification accuracy among classifiers and achieved better accuracy i. e. 87% in case of SVM classifier with reduced number of features. The suggested classifier can be useful in opinion of movie review, analysis of any blog and documents etc.


Author(s):  
Esraa H. Abd Al-Ameer, Ahmed H. Aliwy

Documents classification is from most important fields for Natural language processing and text mining. There are many algorithms can be used for this task. In this paper, focuses on improving Text Classification by feature selection. This means determine some of the original features without affecting the accuracy of the work, where our work is a new feature selection method was suggested which can be a general formulation and mathematical model of Recursive Feature Elimination (RFE). The used method was compared with other two well-known feature selection methods: Chi-square and threshold. The results proved that the new method is comparable with the other methods, The best results were 83% when 60% of features used, 82% when 40% of features used, and 82% when 20% of features used. The tests were done with the Naïve Bayes (NB) and decision tree (DT) classification algorithms , where the used dataset is a well-known English data set “20 newsgroups text” consists of approximately 18846 files. The results showed that our suggested feature selection method is comparable with standard Like Chi-square.


Author(s):  
Manpreet Kaur ◽  
Chamkaur Singh

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.


2018 ◽  
Vol 7 (1) ◽  
pp. 57-72
Author(s):  
H.P. Vinutha ◽  
Poornima Basavaraju

Day by day network security is becoming more challenging task. Intrusion detection systems (IDSs) are one of the methods used to monitor the network activities. Data mining algorithms play a major role in the field of IDS. NSL-KDD'99 dataset is used to study the network traffic pattern which helps us to identify possible attacks takes place on the network. The dataset contains 41 attributes and one class attribute categorized as normal, DoS, Probe, R2L and U2R. In proposed methodology, it is necessary to reduce the false positive rate and improve the detection rate by reducing the dimensionality of the dataset, use of all 41 attributes in detection technology is not good practices. Four different feature selection methods like Chi-Square, SU, Gain Ratio and Information Gain feature are used to evaluate the attributes and unimportant features are removed to reduce the dimension of the data. Ensemble classification techniques like Boosting, Bagging, Stacking and Voting are used to observe the detection rate separately with three base algorithms called Decision stump, J48 and Random forest.


2021 ◽  
pp. 1063293X2110160
Author(s):  
Dinesh Morkonda Gunasekaran ◽  
Prabha Dhandayudam

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.


Author(s):  
Shwet Ketu ◽  
Pramod Kumar Mishra

AbstractIn the last decade, we have seen drastic changes in the air pollution level, which has become a critical environmental issue. It should be handled carefully towards making the solutions for proficient healthcare. Reducing the impact of air pollution on human health is possible only if the data is correctly classified. In numerous classification problems, we are facing the class imbalance issue. Learning from imbalanced data is always a challenging task for researchers, and from time to time, possible solutions have been developed by researchers. In this paper, we are focused on dealing with the imbalanced class distribution in a way that the classification algorithm will not compromise its performance. The proposed algorithm is based on the concept of the adjusting kernel scaling (AKS) method to deal with the multi-class imbalanced dataset. The kernel function's selection has been evaluated with the help of weighting criteria and the chi-square test. All the experimental evaluation has been performed on sensor-based Indian Central Pollution Control Board (CPCB) dataset. The proposed algorithm with the highest accuracy of 99.66% wins the race among all the classification algorithms i.e. Adaboost (59.72%), Multi-Layer Perceptron (95.71%), GaussianNB (80.87%), and SVM (96.92). The results of the proposed algorithm are also better than the existing literature methods. It is also clear from these results that our proposed algorithm is efficient for dealing with class imbalance problems along with enhanced performance. Thus, accurate classification of air quality through our proposed algorithm will be useful for improving the existing preventive policies and will also help in enhancing the capabilities of effective emergency response in the worst pollution situation.


2007 ◽  
Vol 56 (6) ◽  
pp. 75-83 ◽  
Author(s):  
X. Flores ◽  
J. Comas ◽  
I.R. Roda ◽  
L. Jiménez ◽  
K.V. Gernaey

The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.


2021 ◽  
Author(s):  
Andrea C. Hupman

Classification algorithms predict the class membership of an unknown record. Methods such as logistic regression or the naïve Bayes algorithm produce a score related to the likelihood that a record belongs to a particular class. A cutoff threshold is then defined to delineate the prediction of one class over another. This paper derives analytic results for the selection of an optimal cutoff threshold for a classification algorithm that is used to inform a two-action decision in the cases of risk aversion and risk neutrality. The results provide insight to how the optimal cutoff thresholds relate to the associated costs and the sensitivity and specificity of the algorithm for both the risk neutral and risk averse decision makers. The optimal risk averse threshold is not reliably above or below the optimal risk neutral threshold, but the relation depends on the parameters of a particular application. The results further show the risk averse optimal threshold is insensitive to the size of the data set or the magnitude of the costs, but instead is sensitive to the proportion of positive records in the data and the ratio of costs. Numeric examples and sensitivity analysis derive further insight. Results show the percent value gap from a misspecified risk attitude increases as the specificity of the classification algorithm decreases.


2017 ◽  
Vol 20 (1) ◽  
pp. 138-148 ◽  
Author(s):  
Natália Araujo de Almeida ◽  
Annelita Almeida Oliveira Reiners ◽  
Rosemeiry Capriata de Souza Azevedo ◽  
Ageo Mário Cândido da Silva ◽  
Joana Darc Chaves Cardoso ◽  
...  

Abstract Objective: to verify the prevalence of and factors associated with polypharmacy among elderly residents of the city of Cuiabá, in the state of Mato Grosso. Method: a cross-sectional study of 573 people aged 60 and over was performed. Polypharmacy was defined as the use of five or more medications. To investigate the association between polypharmacy and sociodemographic variables, health and access to medication, the Mantel Haenszel chi square test was used in bivariate analysis and Poisson regression was used in multivariate analysis. The significance level adopted was 5%. Result: the prevalence of polypharmacy was 10.30%. Statistically significant associations were found between polypharmacy and living with others, describing suffering from circulatory, endocrine, nutritional and digestive tract diseases, and referring to financial difficulties for the purchase of medicines. Conclusion: some social and health condition factors play an important role in the use of multiple medications among the elderly.


Sign in / Sign up

Export Citation Format

Share Document