LPG-model: A novel model for throughput prediction in stream processing, using a light gradient boosting machine, incremental principal component analysis, and deep gated recurrent unit network

2020 ◽  
Vol 535 ◽  
pp. 107-129
Author(s):  
Zheng Chu ◽  
Jiong Yu ◽  
Askar Hamdulla
2021 ◽  
Vol 12 ◽  
Author(s):  
Cheng Zhang ◽  
Xiujuan Lei ◽  
Lian Liu

Metabolites have been shown to be closely related to the occurrence and development of many complex human diseases by a large number of biological experiments; investigating their correlation mechanisms is thus an important topic, which attracts many researchers. In this work, we propose a computational method named LGBMMDA, which is based on the Light Gradient Boosting Machine (LightGBM) to predict potential metabolite–disease associations. This method extracts the features from statistical measures, graph theoretical measures, and matrix factorization results, utilizing the principal component analysis (PCA) process to remove noise or redundancy. We evaluated our method compared with other used methods and demonstrated the better areas under the curve (AUCs) of LGBMMDA. Additionally, three case studies deeply confirmed that LGBMMDA has obvious superiority in predicting metabolite–disease pairs and represents a powerful bioinformatics tool.


Author(s):  
Ade Jamal ◽  
Annisa Handayani ◽  
Ali Akbar Septiandri ◽  
Endang Ripmiatin ◽  
Yunus Effendi

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.


2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
E. Zhu ◽  
M. Xu ◽  
D. Pi

Noise exhibits low rank or no sparsity in the low-rank matrix recovery, and the nuclear norm is not an accurate rank approximation of low-rank matrix. In the present study, to solve the mentioned problem, a novel nonconvex approximation function of the low-rank matrix was proposed. Subsequently, based on the nonconvex rank approximation function, a novel model of robust principal component analysis was proposed. Such model was solved with the alternating direction method, and its convergence was verified theoretically. Subsequently, the background separation experiments were performed on the Wallflower and SBMnet datasets. Furthermore, the effectiveness of the novel model was verified by numerical experiments.


Author(s):  
Rimbun Siringoringo ◽  
◽  
Resianta Perangin-angin ◽  
Mufria J. Purba

The growth of the online retail market in Indonesia is an excellent business opportunity. It is predicted that this growth will continue to move upward due to the increasing internet penetration. With greater exposure to brands, products and offerings, consumers become smarter and wiser in their purchasing decisions. Offering goods and services that match the tastes and behavior of consumers is very important to maintain business continuity. So far, the models developed are divided into two major parts, namely the time series approach and machine learning. In this study, segmentation and forecasting of online retail sector sales were carried out using extreme gradient boosting (XGBoost). The data used in this study is an online retail dataset obtained from the UCI repository. The k-means clustering (KMC) method is applied to determine the target or data class. Principal component analysis (PCA) is applied to reduce data dimensions by eliminating irrelevant features. Model evaluation is based on a confusion matrix and macro average ROC curve. Based on the research results, XGBoost can perform retail data classification well, this can be seen through confusion matrix metrics and ROC curves.


VASA ◽  
2012 ◽  
Vol 41 (5) ◽  
pp. 333-342 ◽  
Author(s):  
Kirchberger ◽  
Finger ◽  
Müller-Bühl

Background: The Intermittent Claudication Questionnaire (ICQ) is a short questionnaire for the assessment of health-related quality of life (HRQOL) in patients with intermittent claudication (IC). The objective of this study was to translate the ICQ into German and to investigate the psychometric properties of the German ICQ version in patients with IC. Patients and methods: The original English version was translated using a forward-backward method. The resulting German version was reviewed by the author of the original version and an experienced clinician. Finally, it was tested for clarity with 5 German patients with IC. A sample of 81 patients were administered the German ICQ. The sample consisted of 58.0 % male patients with a median age of 71 years and a median IC duration of 36 months. Test of feasibility included completeness of questionnaires, completion time, and ratings of clarity, length and relevance. Reliability was assessed through a retest in 13 patients at 14 days, and analysis of Cronbach’s alpha for internal consistency. Construct validity was investigated using principal component analysis. Concurrent validity was assessed by correlating the ICQ scores with the Short Form 36 Health Survey (SF-36) as well as clinical measures. Results: The ICQ was completely filled in by 73 subjects (90.1 %) with an average completion time of 6.3 minutes. Cronbach’s alpha coefficient reached 0.75. Intra-class correlation for test-retest reliability was r = 0.88. Principal component analysis resulted in a 3 factor solution. The first factor explained 51.5 of the total variation and all items had loadings of at least 0.65 on it. The ICQ was significantly associated with the SF-36 and treadmill-walking distances whereas no association was found for resting ABPI. Conclusions: The German version of the ICQ demonstrated good feasibility, satisfactory reliability and good validity. Responsiveness should be investigated in further validation studies.


Sign in / Sign up

Export Citation Format

Share Document