scholarly journals Selection of microbial biomarkers with genetic algorithm and principal component analysis

2019 ◽  
Vol 20 (S6) ◽  
Author(s):  
Ping Zhang ◽  
Nicholas P. West ◽  
Pin-Yen Chen ◽  
Mike W. C. Thang ◽  
Gareth Price ◽  
...  

Abstract Background Principal components analysis (PCA) is often used to find characteristic patterns associated with certain diseases by reducing variable numbers before a predictive model is built, particularly when some variables are correlated. Usually, the first two or three components from PCA are used to determine whether individuals can be clustered into two classification groups based on pre-determined criteria: control and disease group. However, a combination of other components may exist which better distinguish diseased individuals from healthy controls. Genetic algorithms (GAs) can be useful and efficient for searching the best combination of variables to build a prediction model. This study aimed to develop a prediction model that combines PCA and a genetic algorithm (GA) for identifying sets of bacterial species associated with obesity and metabolic syndrome (Mets). Results The prediction models built using the combination of principal components (PCs) selected by GA were compared to the models built using the top PCs that explained the most variance in the sample and to models built with selected original variables. The advantages of combining PCA with GA were demonstrated. Conclusions The proposed algorithm overcomes the limitation of PCA for data analysis. It offers a new way to build prediction models that may improve the prediction accuracy. The variables included in the PCs that were selected by GA can be combined with flexibility for potential clinical applications. The algorithm can be useful for many biological studies where high dimensional data are collected with highly correlated variables.

Author(s):  
Pengpeng Cheng ◽  
Daoling Chen ◽  
Jianping Wang

AbstractIn order to improve the efficiency and accuracy of thermal and moisture comfort prediction of underwear, a new prediction model is designed by using principal component analysis method to reduce the dimension of related variables and eliminate the multi-collinearity relationship between variables, and then inputting the converted variables into genetic algorithm (GA) and BP neural network. In order to avoid the problems of slow convergence speed and easy falling into local minimum of Back Propagation (BP) neural network, this paper adopted GA to optimize the weights and thresholds of BP neural network, and utilized MATLAB software to program, and established the prediction models of BP neural network and GA–BP neural network. To verify the superiority of the model, the predicted result of GA–BP, PCA–BP and BP are compared with GA–BP neural network. The results show that PCA could improve the accuracy and adaptability of GA–BP neural network for thermal and moisture comfort prediction. PCA–GA–BP model is obviously superior to GA–BP, PCA–BP, BP, SVM and K-means prediction models, which could accurately predict thermal and moisture comfort of underwear. The model has better accuracy prediction and simpler structure.


2006 ◽  
Vol 1 (1) ◽  
Author(s):  
K. Katayama ◽  
K. Kimijima ◽  
O. Yamanaka ◽  
A. Nagaiwa ◽  
Y. Ono

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 285
Author(s):  
Kwok Tai Chui ◽  
Brij B. Gupta ◽  
Pandian Vasant

Understanding the remaining useful life (RUL) of equipment is crucial for optimal predictive maintenance (PdM). This addresses the issues of equipment downtime and unnecessary maintenance checks in run-to-failure maintenance and preventive maintenance. Both feature extraction and prediction algorithm have played crucial roles on the performance of RUL prediction models. A benchmark dataset, namely Turbofan Engine Degradation Simulation Dataset, was selected for performance analysis and evaluation. The proposal of the combination of complete ensemble empirical mode decomposition and wavelet packet transform for feature extraction could reduce the average root-mean-square error (RMSE) by 5.14–27.15% compared with six approaches. When it comes to the prediction algorithm, the results of the RUL prediction model could be that the equipment needs to be repaired or replaced within a shorter or a longer period of time. Incorporating this characteristic could enhance the performance of the RUL prediction model. In this paper, we have proposed the RUL prediction algorithm in combination with recurrent neural network (RNN) and long short-term memory (LSTM). The former takes the advantages of short-term prediction whereas the latter manages better in long-term prediction. The weights to combine RNN and LSTM were designed by non-dominated sorting genetic algorithm II (NSGA-II). It achieved average RMSE of 17.2. It improved the RMSE by 6.07–14.72% compared with baseline models, stand-alone RNN, and stand-alone LSTM. Compared with existing works, the RMSE improvement by proposed work is 12.95–39.32%.


2015 ◽  
Vol 137 (9) ◽  
Author(s):  
Taeyong Sim ◽  
Hyunbin Kwon ◽  
Seung Eel Oh ◽  
Su-Bin Joo ◽  
Ahnryul Choi ◽  
...  

In general, three-dimensional ground reaction forces (GRFs) and ground reaction moments (GRMs) that occur during human gait are measured using a force plate, which are expensive and have spatial limitations. Therefore, we proposed a prediction model for GRFs and GRMs, which only uses plantar pressure information measured from insole pressure sensors with a wavelet neural network (WNN) and principal component analysis-mutual information (PCA-MI). For this, the prediction model estimated GRFs and GRMs with three different gait speeds (slow, normal, and fast groups) and healthy/pathological gait patterns (healthy and adolescent idiopathic scoliosis (AIS) groups). Model performance was validated using correlation coefficients (r) and the normalized root mean square error (NRMSE%) and was compared to the prediction accuracy of the previous methods using the same dataset. As a result, the performance of the GRF and GRM prediction model proposed in this study (slow group: r = 0.840–0.989 and NRMSE% = 10.693–15.894%; normal group: r = 0.847–0.988 and NRMSE% = 10.920–19.216%; fast group: r = 0.823–0.953 and NRMSE% = 12.009–20.182%; healthy group: r = 0.836–0.976 and NRMSE% = 12.920–18.088%; and AIS group: r = 0.917–0.993 and NRMSE% = 7.914–15.671%) was better than that of the prediction models suggested in previous studies for every group and component (p < 0.05 or 0.01). The results indicated that the proposed model has improved performance compared to previous prediction models.


Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1499-1506 ◽  
Author(s):  
Yangwu Zhang ◽  
Guohe Li ◽  
Heng Zong

Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.


Chronic renal syndrome is defined as a progressive loss of renal function over period. Analysers have make effort in attempting to diagnosis the risk factors that may affect the retrogression of chronic renal syndrome. The motivation of this project helps to develop a prediction model for level 4 CKD patients to detect on condition that, their estimated Glomerular Filtration Rate (eGFR) stage downscale to lower than 15 ml/min/1.73 m². End phase renal disease, after six months accumulating their concluding lab test observation by assessing time affiliated aspects. Data mining algorithm along with Temporal Abstraction (TA) are confederated to reinforce CKD evolvement of prognostication models. In this work a inclusive of 112 chronic renal disease patients are composed from April 1952 to September 2011 which were extracted from the patient’s Electronic Medical Records (EMR). The information of chronic renal patients are collected in a big spatial info-graphic data. In order to analyse these info-graphic data, it is significant to detect the issues affecting CKD deterioration and hence it becomes a challenging task. To overcome this challenge, time series graph has been generated in this project work based on creatinine and albumin lab test values and reports of the time period. The presence of CKD diagnostic codes are transformed into default seven digit default format of International Classification of Disease 10 Clinical Modification (ICD 10 CM). Feature selection is performed in this work based on wrapper method using genetic algorithm. It is helpful for finding the most relevant variables for a predictive model. High Utility Sequential Rule Miner (HUSRM) is used here to address the discovery of CKD sequential rules based on sequence patterns. Temporal Abstraction (TA) techniques namely basic TA and complex TA are used in this work to analyse the status of chronic renal syndrome patients. Classification and Regression Technique (CART) along with Adaptive Boosting (AdaBoost) and Support Vector Machine Boosting (SVMBoost) are applied to develop the CKD in which the progression prediction models exhibit most accurate prediction. The results obtained from this work divulged that comprehending temporal observation forward the prognostic instances has escalated the efficacy of the instances. Finally, an evaluation metrics namely accuracy, sensitivity, specificity, positive likelihood, negative likelihood and Area Under the Curve (AUC) are helps to evaluate the performance of the prediction models which are designed and implemented in this project. Key Words: CKD, progression, time series data, genetic algorithm, sequential rules, TA classification and prediction model.


2020 ◽  
Vol 10 (4) ◽  
pp. 280-292
Author(s):  
Allemar Jhone P. Delima

The k-nearest neighbor (KNN) algorithm is vulnerable to noise, which is rooted in the dataset and has negative effects on its accuracy. Hence, various researchers employ variable minimization techniques before predicting the KNN in the quest so as to improve its predictive capability. The genetic algorithm (GA) is the most widely used metaheuristics for such purpose; however, the GA suffers a problem that its mating scheme is bounded on its crossover operator. Thus, the use of the novel inversed bi-segmented average crossover (IBAX) is observed. In the present work, the crossover improved genetic algorithm (CIGAL) is instrumental in the enhancement of KNN’s prediction accuracy. The use of the unmodified genetic algorithm has removed 13 variables, while the CIGAL then further removes 20 variables from the 30 total variables in the faculty evaluation dataset. Consequently, the integration of the CIGAL to the KNN (CIGAL-KNN) prediction model improves the KNN prediction accuracy to 95.53%. In contrast to the model of having the unmodified genetic algorithm (GA-KNN), the use of the lone KNN algorithmand the prediction accuracy is only at 89.94% and 87.15%, respectively. To validate the accuracy of the models, the use of the 10-folds cross-validation technique reveals 93.13%, 89.27%, and 87.77% prediction accuracy of the CIGAL-KNN, GA-KNN, and KNN prediction models, respectively. As the result, the CIGAL carried out an optimized GA performance and increased the accuracy of the KNN algorithm as a prediction model.


Sign in / Sign up

Export Citation Format

Share Document