scholarly journals A treelet transform analysis to relate nutrient patterns to the risk of hormonal receptor-defined breast cancer in the European Prospective Investigation into Cancer and Nutrition (EPIC)

2015 ◽  
Vol 19 (2) ◽  
pp. 242-254 ◽  
Author(s):  
Nada Assi ◽  
Aurelie Moskal ◽  
Nadia Slimani ◽  
Vivian Viallon ◽  
Veronique Chajes ◽  
...  

AbstractObjectivePattern analysis has emerged as a tool to depict the role of multiple nutrients/foods in relation to health outcomes. The present study aimed at extracting nutrient patterns with respect to breast cancer (BC) aetiology.DesignNutrient patterns were derived with treelet transform (TT) and related to BC risk. TT was applied to twenty-three log-transformed nutrient densities from dietary questionnaires. Hazard ratios (HR) and 95 % confidence intervals computed using Cox proportional hazards models quantified the association between quintiles of nutrient pattern scores and risk of overall BC, and by hormonal receptor and menopausal status. Principal component analysis was applied for comparison.SettingThe European Prospective Investigation into Cancer and Nutrition (EPIC).SubjectsWomen (n 334 850) from the EPIC study.ResultsThe first TT component (TC1) highlighted a pattern rich in nutrients found in animal foods loading on cholesterol, protein, retinol, vitamins B12 and D, while the second TT component (TC2) reflected a diet rich in β-carotene, riboflavin, thiamin, vitamins C and B6, fibre, Fe, Ca, K, Mg, P and folate. While TC1 was not associated with BC risk, TC2 was inversely associated with BC risk overall (HRQ5 v. Q1=0·89, 95 % CI 0·83, 0·95, Ptrend<0·01) and showed a significantly lower risk in oestrogen receptor-positive (HRQ5 v. Q1=0·89, 95 % CI 0·81, 0·98, Ptrend=0·02) and progesterone receptor-positive tumours (HRQ5 v. Q1=0·87, 95 % CI 0·77, 0·98, Ptrend<0·01).ConclusionsTT produces readily interpretable sparse components explaining similar amounts of variation as principal component analysis. Our results suggest that participants with a nutrient pattern high in micronutrients found in vegetables, fruits and cereals had a lower risk of BC.

Author(s):  
Ade Jamal ◽  
Annisa Handayani ◽  
Ali Akbar Septiandri ◽  
Endang Ripmiatin ◽  
Yunus Effendi

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.


Author(s):  
Zuhaira Muhammad Zain ◽  
Mona Alshenaifi ◽  
Abeer Aljaloud ◽  
Tamadhur Albednah ◽  
Reham Alghanim ◽  
...  

Breast cancer recurrence is among the most noteworthy fears faced by women. Nevertheless, with modern innovations in data mining technology, early recurrence prediction can help relieve these fears. Although medical information is typically complicated, and simplifying searches to the most relevant input is challenging, new sophisticated data mining techniques promise accurate predictions from high-dimensional data. In this study, the performances of three established data mining algorithms: Naïve Bayes (NB), k-nearest neighbor (KNN), and fast decision tree (REPTree), adopting the feature extraction algorithm, principal component analysis (PCA), for predicting breast cancer recurrence were contrasted. The comparison was conducted between models built in the absence and presence of PCA. The results showed that KNN produced better prediction without PCA (F-measure = 72.1%), whereas the other two techniques: NB and REPTree, improved when used with PCA (F-measure = 76.1% and 72.8%, respectively). This study can benefit the healthcare industry in assisting physicians in predicting breast cancer recurrence precisely.


2018 ◽  
Vol 32 (10) ◽  
pp. e3053
Author(s):  
Yocanxóchitl Perfecto-Avalos ◽  
Raquel Cuevas-Díaz Durán ◽  
Luis Villela ◽  
Alejandro Garcia-Gonzalez ◽  
Ricardo Javier Díaz-Domínguez ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Heping Li ◽  
Yu Ren ◽  
Fan Yu ◽  
Dongliang Song ◽  
Lizhe Zhu ◽  
...  

To facilitate the enhanced reliability of Raman-based tumor detection and analytical methodologies, an ex vivo Raman spectral investigation was conducted to identify distinct compositional information of healthy (H), ductal carcinoma in situ (DCIS), and invasive ductal carcinoma (IDC). Then, principal component analysis-linear discriminant analysis (PCA-LDA) and principal component analysis-support vector machine (PCA-SVM) models were constructed for distinguishing spectral features among different tissue groups. Spectral analysis highlighted differences in levels of unsaturated and saturated lipids, carotenoids, protein, and nucleic acid between healthy and cancerous tissue and variations in the levels of nucleic acid, protein, and phenylalanine between DCIS and IDC. Both classification models were principal component analysis-linear discriminant analysis to be extremely efficient on discriminating tissue pathological types with 99% accuracy for PCA-LDA and 100%, 100%, and 96.7% for PCA-SVM analysis based on linear kernel, polynomial kernel, and radial basis function (RBF), respectively, while PCA-SVM algorithm greatly simplified the complexity of calculation without sacrificing performance. The present study demonstrates that Raman spectroscopy combined with multivariate analysis technology has considerable potential for improving the efficiency and performance of breast cancer diagnosis.


Author(s):  
Anupam Sen

Machine Learning (ML) techniques play an important role in the medical field. Early diagnosis is required to improve the treatment of carcinoma. During this analysis Breast Cancer Coimbra dataset (BCCD) with ten predictors are analyzed to classify carcinoma. In this paper method for feature selection and Machine learning algorithms are applied to the dataset from the UCI repository. WEKA (“Waikato Environment for Knowledge Analysis”) tool is used for machine learning techniques. In this paper Principal Component Analysis (PCA) is used for feature extraction. Different Machine Learning classification algorithms are applied through WEKA such as Glmnet, Gbm, ada Boosting, Adabag Boosting, C50, Cforest, DcSVM, fnn, Ksvm, Node Harvest compares the accuracy and also compare values such as Kappa statistic, Mean Absolute Error (MAE), Root Mean Square Error (RMSE). Here the 10-fold cross validation method is used for training, testing and validation purposes.


2021 ◽  
Vol 41 (3) ◽  
pp. 1229-1238
Author(s):  
Minghao Kou ◽  
Ning Ding ◽  
Shoshana H. Ballew ◽  
Maya J. Salameh ◽  
Seth S. Martin ◽  
...  

Objective: The aim of this study was to comprehensively assess the association of multiple lipid measures with incident peripheral artery disease (PAD). Approach and Results: We used Cox proportional hazards models to characterize the associations of each of the fasting lipid measures (total cholesterol, LDL-C [low-density lipoprotein cholesterol], HDL-C [high-density lipoprotein cholesterol], triglycerides, RLP-C [remnant lipoprotein cholesterol], LDL-TG [LDL-triglycerides], sdLDL-C [small dense LDL-C], and Apo-E-HDL [Apo-E-containing HDL-C]) with incident PAD identified by pertinent International Classification of Diseases, Ninth Revision, Clinical Modification ( ICD-9-CM ) hospital discharge codes (eg, 440.2) among 8330 Black and White ARIC (Atherosclerosis Risk in Communities) participants (mean age 62.8 [SD 5.6] years) free of PAD at baseline (1996–1998) through 2015. Since lipid traits are biologically correlated to each other, we also conducted principal component analysis to identify underlying components for PAD risk. There were 246 incident PAD cases with a median follow-up of 17 years. After accounting for potential confounders, the following lipid measures were significantly associated with PAD (hazard ratio per 1-SD increment [decrement for HDL-C and Apo-E-HDL]): triglycerides, 1.21 (95% CI, 1.08–1.36); RLP-C, 1.18 (1.08–1.29); LDL-TG, 1.18 (1.05–1.33); HDL-C, 1.39 (1.16–1.67); and Apo-E-HDL, 1.27 (1.07–1.51). The principal component analysis identified 3 components (1: mainly loaded by triglycerides, RLP-C, LDL-TG, and sdLDL-C; 2: by HDL-C and Apo-E-HDL; and 3: by LDL-C and RLP-C). Components 1 and 2 showed independent associations with incident PAD. Conclusions: Triglyceride-related and HDL-related lipids were independently associated with incident PAD, which has implications on preventive strategies for PAD. However, none of the novel lipid measures outperformed conventional ones. Graphic Abstract: A graphic abstract is available for this article.


Sign in / Sign up

Export Citation Format

Share Document