An Effective Fuzzy Feature Selection and Prediction Method for Modeling Tidal Current: A Case of Persian Gulf

2017 ◽  
Vol 55 (9) ◽  
pp. 4956-4961 ◽  
Author(s):  
Behnaz Papari ◽  
Chris S. Edrington ◽  
Farzaneh Kavousi-Fard
2021 ◽  
Vol 2129 (1) ◽  
pp. 012022
Author(s):  
Mohamad Faiz Dzulkalnine ◽  
Roselina Sallehuddin ◽  
Yusliza Yussof ◽  
Nor Haizan Mohd Radzi ◽  
Noorfa Haszlinna Binti Mustaffa ◽  
...  

Abstract In Malaysia, Colorectal Cancer (CRC) is one of the most common cancers that occur in both men and women. Early detection is very crucial and it can significantly increase the rate of survival for the patients and if left untreated can lead to death. With the lack of high-quality CRC data, expert systems and machine learning analysis are burdened with the presence of irrelevant features, outliers, and noise. This can reduce the classification accuracy for data analysis. Accordingly, it is essential to find a reliable feature selection method that can identify and remove any irrelevant feature while being resistant to noise and outliers. In this paper, Fuzzy Principal Component Analysis (FPCA) was tested for the classification of Malaysian’s CRC dataset. With the utilization of fuzzy membership in FPCA, the experimental results showed that the proposed method produces higher accuracy compared to PCA and SVM by almost 2% and 5% respectively. Empirical results showed that FPCA is a reliable feature selection method that can find the most informative features in the CRC dataset that could assist medical practitioners in making an informed decision.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Wenjing Ma ◽  
Kenong Su ◽  
Hao Wu

Abstract Background Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. Results In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. Conclusions Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub (https://github.com/marvinquiet/RefConstruction_supervisedCelltyping).


2018 ◽  
Vol 34 (1) ◽  
pp. 33-48 ◽  
Author(s):  
HUNG MINH LE ◽  
TOAN DINH TRAN ◽  
LANG VAN TRAN

This paper presents an automatic Heart Disease (HD) prediction method based on feature selection and data mining techniques using provided symptoms and clinical information in the patient’s dataset. Data mining which allows the extraction of hidden knowledges from the data and explores the relationship between attributes, is the promising technique for HD prediction. HD symptoms can be effectively learned by the computer to classify HD into different classes. However, the informationprovided may include redundant and interrelated symptoms. The use of such information may degrade the classification performance. Feature selection is an effective way to remove such noisy informationmeanwhile improving the learning accuracy and facilitating a better understanding for learning model. In our method, HD attributes are re-selected based on their rank and weights assigned by Infinite LatentFeature Selection (ILFS) method. Support Vector Machine (SVM) algorithm is applied to classify a subset of the selected attributes into different HD classes. SMOTE (Synthetic Minority Over-sampling Technique) data over-sampling technique is adopted to generate more amounts and varieties of data. The experiment is performed on the UCI Machine Learning Repository Heart Disease public dataset. Experimental results demonstrated that by only using a subset of selected 24 attributes over a total of 46 attributes, our method achieved an accuracy of 97.87% for distinguishing ‘no presence’ HD with ‘presence’ HD and an accuracy of 93.92% for distinguishing 5 different classes of HD.


Sign in / Sign up

Export Citation Format

Share Document