Penalized logistic regression for classification and feature selection with its application to detection of two official species of Ganoderma

AbstractDespite numerous efforts to predict suicide risk in children, the ability to reliably identify youth that will engage in suicide thoughts or behaviors (STB) has remained remarkably unsuccessful. To further knowledge in this area, we apply a novel machine learning approach and examine whether children with STB could be differentiated from children without STB based on a combination of sociodemographic, physical health, social environmental, clinical psychiatric, cognitive, biological and genetic characteristics. The study sample included 5,885 unrelated children (50% female, 67% white) between 9 and 11 years old from the Adolescent Brain Cognitive Development (ABCD) study. Both parents and youth reported on children’s STB and based on these reports, we divided children into three subgroups: 1. children with current or past STB, 2. children with psychiatric disorder but no STB (clinical controls) and 3. healthy control children. We performed binomial penalized logistic regression analysis to distinguish between groups. The analyses were performed separately for child-reported STB and parent-reported STB. Results showed that we were able to distinguish the STB group from healthy controls and clinical controls (area under the receiver operating characteristics curve (AUROC) range: 0.79-0.81 and 0.70-0.78 respectively). However, we could not distinguish children with suicidal ideation from those who attempted suicide (AUROC range 0.49-0.59). Factors that differentiated the STB group from the clinical control group included family conflict, prodromal psychosis symptoms, impulsivity, depression severity and a history of mental health treatment. Future research is needed to determine if these variables prospectively predict subsequent suicidal behavior.

Download Full-text

Sublinear Algorithms for Penalized Logistic Regression in Massive Datasets

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-642-33460-3_41 ◽

2012 ◽

pp. 553-568 ◽

Cited By ~ 3

Author(s):

Haoruo Peng ◽

Zhengyu Wang ◽

Edward Y. Chang ◽

Shuchang Zhou ◽

Zhihua Zhang

Keyword(s):

Logistic Regression ◽

Sublinear Algorithms ◽

Massive Datasets ◽

Penalized Logistic Regression

Download Full-text

Predicting the Level of Tumor-Infiltrating Lymphocytes in Patients With Breast Cancer: Usefulness of Mammographic Radiomics Features

Frontiers in Oncology ◽

10.3389/fonc.2021.628577 ◽

2021 ◽

Vol 11 ◽

Author(s):

Hongwei Yu ◽

Xianqi Meng ◽

Huang Chen ◽

Jian Liu ◽

Wenwen Gao ◽

...

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Feature Selection ◽

Regression Analysis ◽

Cancer Patients ◽

Training Dataset ◽

Validation Dataset ◽

Gray Level ◽

Breast Cancer Patients ◽

Short Run

ObjectivesThis study aimed to investigate whether radiomics classifiers from mammography can help predict tumor-infiltrating lymphocyte (TIL) levels in breast cancer.MethodsData from 121 consecutive patients with pathologically-proven breast cancer who underwent preoperative mammography from February 2018 to May 2019 were retrospectively analyzed. Patients were randomly divided into a training dataset (n = 85) and a validation dataset (n = 36). A total of 612 quantitative radiomics features were extracted from mammograms using the Pyradiomics software. Radiomics feature selection and radiomics classifier were generated through recursive feature elimination and logistic regression analysis model. The relationship between radiomics features and TIL levels in breast cancer patients was explored. The predictive capacity of the radiomics classifiers for the TIL levels was investigated through receiver operating characteristic curves in the training and validation groups. A radiomics score (Rad score) was generated using a logistic regression analysis method to compute the training and validation datasets, and combining the Mann–Whitney U test to evaluate the level of TILs in the low and high groups.ResultsAmong the 121 patients, 32 (26.44%) exhibited high TIL levels, and 89 (73.56%) showed low TIL levels. The ER negativity (p = 0.01) and the Ki-67 negative threshold level (p = 0.03) in the low TIL group was higher than that in the high TIL group. Through the radiomics feature selection, six top-class features [Wavelet GLDM low gray-level emphasis (mediolateral oblique, MLO), GLRLM short-run low gray-level emphasis (craniocaudal, CC), LBP2D GLRLM short-run high gray-level emphasis (CC), LBP2D GLDM dependence entropy (MLO), wavelet interquartile range (MLO), and LBP2D median (MLO)] were selected to constitute the radiomics classifiers. The radiomics classifier had an excellent predictive performance for TIL levels both in the training and validation sets [area under the curve (AUC): 0.83, 95% confidence interval (CI), 0.738–0.917, with positive predictive value (PPV) of 0.913; AUC: 0.79, 95% CI, 0.615–0.964, with PPV of 0.889, respectively]. Moreover, the Rad score in the training dataset was higher than that in the validation dataset (p = 0.007 and p = 0.001, respectively).ConclusionRadiomics from digital mammograms not only predicts the TIL levels in breast cancer patients, but can also serve as non-invasive biomarkers in precision medicine, allowing for the development of treatment plans.

Download Full-text

Identification of Bio-Markers for Breast Cancer Detection through Data Mining Methods

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1141.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 763-769

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Logistic Regression ◽

Feature Selection ◽

Discriminant Analysis ◽

Classification Tree ◽

Partial Least Square ◽

Diagnostic Methods ◽

Support Vector ◽

Breast Cancer Dataset

Worldwide, breast cancer is the leading type of cancer in women accounting for 25% of all cases. Survival rates in the developed countries are comparatively higher with that of developing countries. This had led to the importance of computer aided diagnostic methods for early detection of breast cancer disease. This eventually reduces the death rate. This paper intents the scope of the biomarker that can be used to predict the breast cancer from the anthropometric data. This experimental study aims at computing and comparing various classification models (Binary Logistic Regression, Ball Vector Machine (BVM), C4.5, Partial Least Square (PLS) for Classification, Classification Tree, Cost sensitive Classification Tree, Cost sensitive Decision Tree, Support Vector Machine for Classification, Core Vector Machine, ID3, K-Nearest Neighbor, Linear Discriminant Analysis (LDA), Log-Reg TRIRLS, Multi Layer Perceptron (MLP), Multinomial Logistic Regression (MLR), Naïve Bayes (NB), PLS for Discriminant Analysis, PLS for LDA, Random Tree (RT), Support Vector Machine SVM) for the UCI Coimbra breast cancer dataset. The feature selection algorithms (Backward Logit, Fisher Filtering, Forward Logit, ReleifF, Step disc) are worked out to find out the minimum attributes that can achieve a better accuracy. To ascertain the accuracy results, the Jack-knife cross validation method for the algorithms is conducted and validated. The Core vector machine classification algorithm outperforms the other nineteen algorithms with an accuracy of 82.76%, sensitivity of 76.92% and specificity of 87.50% for the selected three attributes, Age, Glucose and Resistin using ReleifF feature selection algorithm.

Download Full-text

Investigating Unique Genes of Five Molecular Subtypes of Breast Cancer Using Penalized Logistic Regression

10.21203/rs.3.rs-249085/v1 ◽

2021 ◽

Author(s):

sadegh raoufi ◽

Saeideh Jafarinejad Farsangi ◽

Tania Dehesh ◽

Morteza Hadizadeh

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Web Application ◽

Biological Process ◽

Molecular Subtypes ◽

Negative Regulation ◽

Adaptive Lasso ◽

Cell Processes ◽

Penalized Logistic Regression ◽

Unique Genes

Abstract Background: Breast cancer is the first cancer and fifth cause of death in women around the world. Exploring unique genes for cancers has become interesting. The aim of this study was to explore unique gens of five molecular subtypes of breast cancer in women using penalized logistic regression models.Methods: In this study, microarray data of five independent GEO datasets was combined. This combination includes genetic information of 324 women with breast cancer and 12 healthy women. Lasso logistic regression and adaptive lasso logistic regression were used to extract unique genes. Biological process of extracted gens was evaluated in open-source GOnet web-application. R software version 3.6.0 with glmnet package was used for fitting the models. Results: Totally, 119 genes were extracted among fifteen pairwise comparisons. 17 genes (%14) had overlap between comparative groups. Among 27 genes contributed in positive regulation of cell processes, one gene belonged exclusively to this biological process. Among 46 genes contributed in negative regulation of cell processes, 6 genes belonged exclusively. Among 50 genes that were significant in regulation of metabolism, 4 genes belonged exclusively. Among 32 genes that related to response of stress, 4 genes belonged exclusively. Conclusions: The most genes selected by lasso logistic regression and adaptive Lasso logistic regression, were diagnosed in negative regulation of cell processes.

Download Full-text