SEGMENTASI DAN PERAMALAN PASAR RETAIL MENGGUNAKAN XGBOOST DAN PRINCIPAL COMPONENT ANALYSIS

The growth of the online retail market in Indonesia is an excellent business opportunity. It is predicted that this growth will continue to move upward due to the increasing internet penetration. With greater exposure to brands, products and offerings, consumers become smarter and wiser in their purchasing decisions. Offering goods and services that match the tastes and behavior of consumers is very important to maintain business continuity. So far, the models developed are divided into two major parts, namely the time series approach and machine learning. In this study, segmentation and forecasting of online retail sector sales were carried out using extreme gradient boosting (XGBoost). The data used in this study is an online retail dataset obtained from the UCI repository. The k-means clustering (KMC) method is applied to determine the target or data class. Principal component analysis (PCA) is applied to reduce data dimensions by eliminating irrelevant features. Model evaluation is based on a confusion matrix and macro average ROC curve. Based on the research results, XGBoost can perform retail data classification well, this can be seen through confusion matrix metrics and ROC curves.

Download Full-text

Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2018.v09.i03.p08 ◽

2018 ◽

pp. 192 ◽

Cited By ~ 2

Author(s):

Ade Jamal ◽

Annisa Handayani ◽

Ali Akbar Septiandri ◽

Endang Ripmiatin ◽

Yunus Effendi

Keyword(s):

Breast Cancer ◽

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Gradient Boosting ◽

Support Vector ◽

Breast Cancer Dataset ◽

Cancer Prediction ◽

Extreme Gradient Boosting

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.

Download Full-text

Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2021.104516 ◽

2021 ◽

pp. 104516

Author(s):

Xue Wang ◽

Yaqun Zhang ◽

Bin Yu ◽

Adil Salhi ◽

Ruixin Chen ◽

...

Keyword(s):

Principal Component Analysis ◽

Protein Interaction ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Gradient Boosting ◽

Protein Protein Interaction ◽

Interaction Sites ◽

Extreme Gradient Boosting ◽

Protein Interaction Sites

Download Full-text

Structural Damage Classification in a Jacket-Type Wind-Turbine Foundation Using Principal Component Analysis and Extreme Gradient Boosting

Sensors ◽

10.3390/s21082748 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2748

Author(s):

Jersson X. Leon-Medina ◽

Maribel Anaya ◽

Núria Parés ◽

Diego A. Tibaduiza ◽

Francesc Pozo

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Wind Turbine ◽

Structural Damage ◽

Principal Component ◽

Gradient Boosting ◽

Damage Classification ◽

Linear Feature ◽

Linear Feature Extraction ◽

Extreme Gradient Boosting

Damage classification is an important topic in the development of structural health monitoring systems. When applied to wind-turbine foundations, it provides information about the state of the structure, helps in maintenance, and prevents catastrophic failures. A data-driven pattern-recognition methodology for structural damage classification was developed in this study. The proposed methodology involves several stages: (1) data acquisition, (2) data arrangement, (3) data normalization through the mean-centered unitary group-scaling method, (4) linear feature extraction, (5) classification using the extreme gradient boosting machine learning classifier, and (6) validation applying a 5-fold cross-validation technique. The linear feature extraction capabilities of principal component analysis are employed; the original data of 58,008 features is reduced to only 21 features. The methodology is validated with an experimental test performed in a small-scale wind-turbine foundation structure that simulates the perturbation effects caused by wind and marine waves by applying an unknown white noise signal excitation to the structure. A vibration-response methodology is selected for collecting accelerometer data from both the healthy structure and the structure subjected to four different damage scenarios. The datasets are satisfactorily classified, with performance measures over 99.9% after using the proposed damage classification methodology.

Download Full-text

LPG-model: A novel model for throughput prediction in stream processing, using a light gradient boosting machine, incremental principal component analysis, and deep gated recurrent unit network

Information Sciences ◽

10.1016/j.ins.2020.05.042 ◽

2020 ◽

Vol 535 ◽

pp. 107-129

Author(s):

Zheng Chu ◽

Jiong Yu ◽

Askar Hamdulla

Keyword(s):

Principal Component Analysis ◽

Stream Processing ◽

Principal Component ◽

Component Analysis ◽

Gradient Boosting ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Gated Recurrent Unit ◽

Novel Model ◽

Unit Network

Download Full-text

Comparative analysis and prediction of nucleosome positioning using integrative feature representation and machine learning algorithms

BMC Bioinformatics ◽

10.1186/s12859-021-04006-w ◽

2021 ◽

Vol 22 (S6) ◽

Author(s):

Guo-Sheng Han ◽

Qi Li ◽

Ying Li

Keyword(s):

Principal Component Analysis ◽

Comparative Analysis ◽

Dna Sequence ◽

Principal Component ◽

Nucleosome Positioning ◽

Component Analysis ◽

Feature Representation ◽

Support Vector ◽

Prediction Quality ◽

Extreme Gradient Boosting

Abstract Background Nucleosome plays an important role in the process of genome expression, DNA replication, DNA repair and transcription. Therefore, the research of nucleosome positioning has invariably received extensive attention. Considering the diversity of DNA sequence representation methods, we tried to integrate multiple features to analyze its effect in the process of nucleosome positioning analysis. This process can also deepen our understanding of the theoretical analysis of nucleosome positioning. Results Here, we not only used frequency chaos game representation (FCGR) to construct DNA sequence features, but also integrated it with other features and adopted the principal component analysis (PCA) algorithm. Simultaneously, support vector machine (SVM), extreme learning machine (ELM), extreme gradient boosting (XGBoost), multilayer perceptron (MLP) and convolutional neural networks (CNN) are used as predictors for nucleosome positioning prediction analysis, respectively. The integrated feature vector prediction quality is significantly superior to a single feature. After using principal component analysis (PCA) to reduce the feature dimension, the prediction quality of H. sapiens dataset has been significantly improved. Conclusions Comparative analysis and prediction on H. sapiens, C. elegans, D. melanogaster and S. cerevisiae datasets, demonstrate that the application of FCGR to nucleosome positioning is feasible, and we also found that integrative feature representation would be better.

Download Full-text

A Casing Damage Prediction Method Based on Principal Component Analysis and Gradient Boosting Decision Tree Algorithm

10.2118/194956-ms ◽

2019 ◽

Cited By ~ 3

Author(s):

Mengxin Song ◽

Xiangguang Zhou

Keyword(s):

Principal Component Analysis ◽

Decision Tree ◽

Prediction Method ◽

Principal Component ◽

Component Analysis ◽

Gradient Boosting ◽

Decision Tree Algorithm ◽

Tree Algorithm ◽

Damage Prediction ◽

Casing Damage

Download Full-text

Principal component analysis of financial statements. A compositional approach

Revista de Métodos Cuantitativos para la Economía y la Empresa ◽

10.46661/revmetodoscuanteconempresa.3580 ◽

2020 ◽

Vol 29 ◽

pp. 18-37

Author(s):

Miquel Carreras Simó ◽

Germà Coenders

Keyword(s):

Principal Component Analysis ◽

Compositional Data ◽

Principal Component ◽

Financial Statements ◽

Component Analysis ◽

Financial Ratios ◽

Compositional Data Analysis ◽

Retail Sector ◽

Financial Ratio ◽

Compositional Approach

Financial ratios are often used in principal component analysis and related techniques for the purposes of data reduction and visualization. Besides the dependence of results on ratio choice, ratios themselves pose a number of problems when subjected to a principal component analysis, such as skewed distributions. In this work, we put forward an alternative method drawn from compositional data analysis (CoDa), a standard statistical toolbox for use when data convey information about relative magnitudes, as financial ratios do. The method, referred to as the CoDa biplot, does not rely on any particular choice of financial ratio but allows researchers to visually order firms along the pairwise financial ratios for any two accounts. Non-financial magnitudes and time evolution can be added to the visualization as desired. We show an example of its application to the top chains in the Spanish grocery retail sector and show how the technique can be used to depict strategic management differences in financial structure or performance, and their evolution over time.

Download Full-text

A German version of the Intermittent Claudication Questionnaire (ICQ): cultural adaptation and validation

VASA ◽

10.1024/0301-1526/a000218 ◽

2012 ◽

Vol 41 (5) ◽

pp. 333-342 ◽

Cited By ~ 3

Author(s):

Kirchberger ◽

Finger ◽

Müller-Bühl

Keyword(s):

Principal Component Analysis ◽

Intermittent Claudication ◽

Completion Time ◽

Short Form ◽

Principal Component ◽

Component Analysis ◽

German Version ◽

Average Completion Time ◽

Sf 36 ◽

Related Quality

Background: The Intermittent Claudication Questionnaire (ICQ) is a short questionnaire for the assessment of health-related quality of life (HRQOL) in patients with intermittent claudication (IC). The objective of this study was to translate the ICQ into German and to investigate the psychometric properties of the German ICQ version in patients with IC. Patients and methods: The original English version was translated using a forward-backward method. The resulting German version was reviewed by the author of the original version and an experienced clinician. Finally, it was tested for clarity with 5 German patients with IC. A sample of 81 patients were administered the German ICQ. The sample consisted of 58.0 % male patients with a median age of 71 years and a median IC duration of 36 months. Test of feasibility included completeness of questionnaires, completion time, and ratings of clarity, length and relevance. Reliability was assessed through a retest in 13 patients at 14 days, and analysis of Cronbachs alpha for internal consistency. Construct validity was investigated using principal component analysis. Concurrent validity was assessed by correlating the ICQ scores with the Short Form 36 Health Survey (SF-36) as well as clinical measures. Results: The ICQ was completely filled in by 73 subjects (90.1 %) with an average completion time of 6.3 minutes. Cronbachs alpha coefficient reached 0.75. Intra-class correlation for test-retest reliability was r = 0.88. Principal component analysis resulted in a 3 factor solution. The first factor explained 51.5 of the total variation and all items had loadings of at least 0.65 on it. The ICQ was significantly associated with the SF-36 and treadmill-walking distances whereas no association was found for resting ABPI. Conclusions: The German version of the ICQ demonstrated good feasibility, satisfactory reliability and good validity. Responsiveness should be investigated in further validation studies.

Download Full-text