scholarly journals Data Mining and Principal Component Analysis on Coimbra Breast Cancer Dataset

Author(s):  
Anupam Sen

Machine Learning (ML) techniques play an important role in the medical field. Early diagnosis is required to improve the treatment of carcinoma. During this analysis Breast Cancer Coimbra dataset (BCCD) with ten predictors are analyzed to classify carcinoma. In this paper method for feature selection and Machine learning algorithms are applied to the dataset from the UCI repository. WEKA (“Waikato Environment for Knowledge Analysis”) tool is used for machine learning techniques. In this paper Principal Component Analysis (PCA) is used for feature extraction. Different Machine Learning classification algorithms are applied through WEKA such as Glmnet, Gbm, ada Boosting, Adabag Boosting, C50, Cforest, DcSVM, fnn, Ksvm, Node Harvest compares the accuracy and also compare values such as Kappa statistic, Mean Absolute Error (MAE), Root Mean Square Error (RMSE). Here the 10-fold cross validation method is used for training, testing and validation purposes.

Author(s):  
Ade Jamal ◽  
Annisa Handayani ◽  
Ali Akbar Septiandri ◽  
Endang Ripmiatin ◽  
Yunus Effendi

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.


2021 ◽  
Author(s):  
Richard Rios ◽  
Elkin A. Noguera-Urbano ◽  
Jairo Espinosa ◽  
Jose Manuael Ochoa

Bioclimatic classifications seek to divide a study region into geographic areas with similar bioclimatic characteristics. In this study we proposed two bioclimatic classifications for Colombia using machine learning techniques. We firstly characterized the precipitation space of Colombia using principal component analysis. Based on Lang classification, we then projected all background sites in the precipitation space with their corresponding categories. We sequentially fit logistic regression models to re-classify all background sites in the precipitation space with six redefined Lang categories. New categories were the used to define a new modified Lang and Caldas-Lang classifications.


Sign in / Sign up

Export Citation Format

Share Document