Group Penalized Logistic Regressions Predict Ovarian Cancer ∗

Abstract Objectives: Ovarian cancer ranks fifirst among gynecological cancers in terms of the mortality rate. Accurately diagnosing ovarian benign tumors and malignant tumors is of immense important. The goal of this paper is to combine group LASSO/SCAD/MCP penalized logistic regression with machine learning procedure to further improve the prediction accuracy to ovarian benign tumors and malignant tumors prediction problem. Methods: We combine group LASSO/SCAD/MCP penalty with logistic regression, and propose group LASSO/SCAD/MCP penalized logistic regression to predict the benign and malignant ovarian cancer. Firstly, we select 349 ovarian cancer patients data and divide them into two sets: one is the training set for learning, and the other is the testing set for checking, and then choose 46 explanatory variables and divide them into 11 difffferent groups. Secondly, we apply the training set and group coordinate descent algorithm to obtain group LASSO/SCAD/MCP estimator, and apply the testing set to compute confusion matrix, accuracy, sensitivity and specifificity. Finally, we compare the prediction performance for group LASSO/SCAD/MCP penalized logistic regression with that for artifificial neural network (ANN) and support vector machine (SVM). Results: Group LASSO/SCAD/MCP/ penalized logistic regression selects 6/4/1 groups. The prediction accuracy and AUC for group MCP/SCAD/LASSO penalized logistic regression/SVM/ANN is 93.33%/85.71%/82.26%/74.29%/72.38% and 0.892/0.852/0.823/0.639/0.789, respectively. Conclusions: Group MCP/SCAD/LASSO penalized logistic regression performs than SVM and ANN in terms of prediction accuracy and AUC. In particular, group MCP penalized logistic regression predicts the best. Therefore, we suggest group MCP penalized logistic regression to predict ovarian tumors.

Download Full-text

Group Penalized Logistic Regressions Predict Ovarian Cancer

10.21203/rs.3.rs-1223870/v1 ◽

2022 ◽

Author(s):

Ying Xie

Keyword(s):

Ovarian Cancer ◽

Logistic Regression ◽

Prediction Accuracy ◽

Malignant Tumors ◽

Group Lasso ◽

Benign Tumors ◽

Training Set ◽

Combine Group ◽

Penalized Logistic Regression ◽

Testing Set

Abstract Objectives: Ovarian cancer ranks first among gynecological cancers in terms of the mortality rate. Accurately diagnosing ovarian benign tumors and malignant tumors is of immense important. The goal of this paper is to combine group LASSO/SCAD/MCP penalized logistic regression with machine learning procedure to further improve the prediction accuracy to ovarian benign tumors and malignant tumors prediction problem. Methods: We combine group LASSO/SCAD/MCP penalty with logistic regression, and propose group LASSO/SCAD/MCP penalized logistic regression to predict the benign and malignant ovarian cancer. Firstly, we select 349 ovarian cancer patients data and divide them into two sets: one is the training set for learning, and the other is the testing set for checking, and then choose 46 explanatory variables and divide them into 11 different groups. Secondly, we apply the training set and group coordinate descent algorithm to obtain group LASSO/SCAD/MCP estimator, and apply the testing set to compute confusion matrix, accuracy, sensitivity and specificity. Finally, we compare the prediction performance for group LASSO/SCAD/MCP penalized logistic regression with that for artificial neural network (ANN) and support vector machine (SVM).Results: Group LASSO/SCAD/MCP/ penalized logistic regression selects 6/4/1 groups. The prediction accuracy and AUC for group MCP/SCAD/LASSO penalized logistic regression/SVM/ANN is 93.33%/85.71%/82.26%/74.29%/72.38% and 0.892/0.852/0.823/0.639/0.789, respectively.Conclusions: Group MCP/SCAD/LASSO penalized logistic regression performs than SVM and ANN in terms of prediction accuracy and AUC. In particular, group MCP penalized logistic regression predicts the best. Therefore, we suggest group MCP penalized logistic regression to predict ovarian tumors.

Download Full-text

A novel decision tree model based on chromosome imbalances in cell-free DNA and CA-125 in the differential diagnosis of ovarian cancer

The International Journal of Biological Markers ◽

10.1177/1724600821992356 ◽

2021 ◽

pp. 172460082199235

Author(s):

Weina Zhang ◽

Yu-min Zhang ◽

Yuan Gao ◽

Shengmiao Zhang ◽

Weixin Chu ◽

...

Keyword(s):

Ovarian Cancer ◽

Differential Diagnosis ◽

Decision Tree ◽

Copy Number ◽

Malignant Tumors ◽

Copy Number Variations ◽

Ca 125 ◽

Benign Tumors ◽

Cell Free Dna ◽

Free Dna

Objective: CA-125 is widely used as biomarker of ovarian cancer. However, CA-125 suffers low accuracy. We developed a hybrid analytical model, the Ovarian Cancer Decision Tree (OCDT), employing a two-layer decision tree, which considers genetic alteration information from cell-free DNA along with CA-125 value to distinguish malignant tumors from benign tumors. Methods: We consider major copy number alterations at whole chromosome and chromosome-arm level as the main feature of our detection model. Fifty-eight patients diagnosed with malignant tumors, 66 with borderline tumors, and 10 with benign tumors were enrolled. Results: Genetic analysis revealed significant arm-level imbalances in most malignant tumors, especially in high-grade serous cancers in which 12 chromosome arms with significant aneuploidy ( P<0.01) were identified, including 7 arms with significant gains and 5 with significant losses. The area under receiver operating characteristic curve (AUC) was 0.8985 for copy number variations analysis, compared to 0.8751 of CA125. The OCDT was generated with a cancerous score (CScore) threshold of 5.18 for the first level, and a CA-125 value of 103.1 for the second level. Our most optimized OCDT model achieved an AUC of 0.975. Conclusions: The results suggested that genetic variations extracted from cfDNA can be combined with CA-125, and together improved the differential diagnosis of malignant from benign ovarian tumors. The model would aid in the pre-operative assessment of women with adnexal masses. Future clinical trials need to be conducted to further evaluate the value of CScore in clinical settings and search for the optimal threshold for malignancy detection.

Download Full-text

Surface epithelial tumors of ovary - an analysis in a tertiary referral hospital

Journal of Pathology of Nepal ◽

10.3126/jpn.v3i5.7868 ◽

2013 ◽

Vol 3 (5) ◽

pp. 397-402 ◽

Cited By ~ 1

Author(s):

D Ghartimagar ◽

A Ghosh ◽

G KC ◽

S Ranabhat ◽

OP Talwar

Keyword(s):

Ovarian Cancer ◽

Malignant Tumors ◽

Data Bank ◽

Serous Cystadenoma ◽

Ovarian Tumors ◽

Benign Tumors ◽

Tertiary Referral ◽

Tertiary Referral Hospital ◽

Tertiary Referral Centre ◽

Epithelial Tumors

Background: Ovarian cancer accounts for 3% of all cancers in females. About 80% of these are benign, and they occur mostly in young women between 20 and 45 years. Borderline tumors occur at slightly older ages while incidence of malignant tumors increases with age, occurring predominantly in perimenopausal and postmenopausal women. About 190,000 new cases and 114,000 deaths from ovarian cancer are estimated to occur annually worldwide. The aim of the study was to fi nd the incidence of surface ovarian tumor in a tertiary referral centre. Materials and methods: This was a retrospective study carried out in the department of pathology, Manipal Teaching Hospital from January 2001 to December 2012. Specimens were received from the same and other hospitals. Records were retrieved from the departmental data bank and were analyzed. Results: : A total of 310 cases of ovarian tumors have been reported in the same period. Among them, 180 cases were of surface epithelial origin and out of which 24 cases had bilateral tumors. Benign tumors comprised of 148 cases, 6 were borderline and 44 were malignant. Among these, the commonest was serous cystadenoma (98 cases) and the least common was malignant Brenner (2 cases). Combined or mixed tumor was seen in 9 cases. Conclusion: : In our study surface epithelial tumors comprised 58% of all ovarian tumors. In both benign and malignant cases, serous tumor was the commonest followed by mucinous tumors. Journal of Pathology of Nepal (2013) Vol. 3, No.1, Issue 5, 397-402 DOI: http://dx.doi.org/10.3126/jpn.v3i5.7868

Download Full-text

Significance of serum colony-stimulating factor-1 as a breast cancer marker

Journal of Clinical Oncology ◽

10.1200/jco.2009.27.15_suppl.11071 ◽

2009 ◽

Vol 27 (15_suppl) ◽

pp. 11071-11071

Author(s):

S. Aharinejad ◽

A. Thomas ◽

C. Singer ◽

E. Kubista ◽

P. Paulus ◽

...

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Tumor Size ◽

Malignant Disease ◽

Operating Characteristic ◽

Malignant Tumors ◽

Benign Tumors ◽

Colony Stimulating Factor ◽

Cancer Marker ◽

Stimulating Factor

11071 Background: A specific and sensitive biomarker that indicates the presence of breast cancer is highly desirable, yet available markers are of limited value. Colony-stimulating factor-1 (CSF-1) is involved in mammary gland development and mediates breast cancer progression. Earlier work indicated correlation of serum CSF-1 with breast cancer staging, and a recent report suggests that CSF-1 is a potential breast cancer marker, however the data reported so far await validation. Methods: In a prospective study in 799 women with no history of malignant disease undergoing surgery, serum CSF-1 levels were measured by a commercially available ELISA. In this cohort, 312 patients had breast cancer and 487 age-matched women had benign tumors. The tumor size, nodal and metastasis status, histological tumor type, hormone and human epidermal growth factor receptor 2 (HER2) and menopausal status were evaluated. Mean CSF-1 serum concentrations were compared between the patient groups by non-parametric Wilcoxon two-sample and Kruskal-Wallis test. The area under the receiver operating characteristic curve was calculated by logistic regression. Results: Mean serum CSF-1 concentrations were significantly higher in all patients with malignant tumors (502±429 pg/mL) as compared to those with benign tumors (382±344 pg/mL) (p<0.0001, Wilcoxon). Increased CSF-1 concentrations were significantly related to malignant versus non-malignant disease in logistic regression and receiver operating characteristic analysis (p<0.0001, AUC=0.6). Increased CSF-1 levels in patients with malignant tumors were associated with postmenopausal (p=0.0038) but not premenopausal (p=0.94) status (Wilcoxon). Serum CSF-1 concentrations did not correlate significantly with tumor size, nodal and metastasis status, hormone receptor and HER2 status of patients (Kruskal-Wallis). Conclusions: Our data suggest that serum CSF-1 could serve as a breast cancer marker in postmenopausal women. While its serum levels are not related to breast cancer stage at diagnosis, they might be useful for breast cancer screening in postmenopausal women. No significant financial relationships to disclose.

Download Full-text

Malignant Versus Benign Tumors of the Sinonasal Cavity: A Case-Control Study on Occupational Etiology

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph15122887 ◽

2018 ◽

Vol 15 (12) ◽

pp. 2887 ◽

Cited By ~ 1

Author(s):

Enzo Emanuelli ◽

Vera Comiati ◽

Diego Cazzador ◽

Gloria Schiavo ◽

Enrico Alexandre ◽

...

Keyword(s):

Logistic Regression ◽

Clinical Setting ◽

Textile Industry ◽

Malignant Tumors ◽

Case Control ◽

Benign Tumors ◽

Control Group ◽

Case Control Studies ◽

Occupational Risk Factors ◽

Increased Risk

Case-control studies on malignant sinonasal tumors and occupational risk factors are generally weakened by non-occupational confounders and the selection of suitable controls. This study aimed to confirm the association between sinonasal malignant tumors and patients’ occupations with consideration for sinonasal inverted papillomas (SNIPs) as a control group. Thirty-two patients affected by adenocarcinoma (ADC) and 21 non-adenocarcinoma epithelial tumors (NAETs) were compared to 65 patients diagnosed with SNIPs. All patients were recruited in the same clinical setting between 2004 and 2016. A questionnaire was used to collect information on non-occupational factors (age, sex, smoking, allergies, and chronic sinusitis) and occupations (wood- and leather-related occupations, textile industry, metal working). Odds ratios (OR) with 95% confidence intervals (CI) associated with selected occupations were obtained by a multinomial and exact logistic regression. Between the three groups of patients, SNIP patients were significantly younger than ADC patients (p = 0.026). The risk of NAET increased in woodworkers (OR = 9.42; CI = 1.94–45.6) and metal workers (OR = 5.65; CI = 1.12–28.6). The risk of ADC increased in wood (OR = 86.3; CI = 15.2–488) and leather workers (OR = 119.4; CI = 11.3–1258). On the exact logistic regression, the OR associated to the textile industry was 9.32 (95%CI = 1.10–Inf) for ADC, and 7.21 (95%CI = 0.55–Inf) for NAET. Comparing sinonasal malignant tumors with controls recruited from the same clinical setting allowed demonstrating an increased risk associated with multiple occupations. Well-matched samples of cases and controls reduced the confounding bias and increased the strength of the association.

Download Full-text

An Investigation of the Cognitive and Linguistic Factors Influencing Chinese Readers’ Perception of Sentence Boundaries in Mandarin

10.31234/osf.io/hkv5f ◽

2020 ◽

Author(s):

Kun Sun

Keyword(s):

Logistic Regression ◽

Mandarin Chinese ◽

Training Set ◽

Response Variable ◽

Syntactic Information ◽

Chinese Texts ◽

Essential Prerequisite ◽

Different Levels ◽

Insight Into ◽

Testing Set

It is well known that sentence boundaries in Mandarin Chinese texts are to some extent flexible and subjective. Sentence boundaries are used parsimoniously in Mandarin Chinese and they often do not occur until the end of a block of clauses and so express the completeness of a meaning or an idea. Nonetheless, our hypothesis is that native Chinese depends on unspoken principles for judging the degree of completeness of meaning within a block of clauses. The concern on perceiving sentence boundaries by Chinese readers is similar as the issue of how Chinese readers segment words boundaries which has been investigated extensively. The former topic has, however, seldom been examined. The concern of our investigation is the kinds of unspoken principles or factors that play a role in detecting sentence boundaries (the practice of periods). Tackling these problems is an essential prerequisite to understanding the nature of this fundamental unit in Chinese, and gaining further insight into how Chinese readers perceive the completeness of meaning in this language. In order to clarify these issues, we conducted the re-punctuation experiments in the two separate groups (the training set vs. the testing set) that use different stimuli. Annotation data was also collected from the stimuli texts annotated by linguistic information at different levels. Logistic regression and the Bayesian statistical methods were then applied so as to test the effects of independent variables on the response variable. It turns out that the syntactic information does not affect the response variable. The logistic regression model works well in making good predictions about the data of the testing set. We propose that the sentence boundaries in Chinese are most probably semantically and textually conceptualized rather than syntactically defined.

Download Full-text

Which Parameters could be Useful for Predicting Malignancy in Solid Adnexal Masses?

Donald School Journal of Ultrasound in Obstetrics & Gynecology ◽

10.5005/jp-journals-10009-1001 ◽

2009 ◽

Vol 3 (1) ◽

pp. 1-5

Author(s):

Juan Luis Alcázar ◽

Pedro Royo ◽

Laura Pineda

Keyword(s):

Blood Flow ◽

Logistic Regression ◽

Regression Analysis ◽

Logistic Regression Analysis ◽

Malignant Tumors ◽

Univariate Analysis ◽

Menopausal Status ◽

Ca 125 ◽

Benign Tumors ◽

Adnexal Masses

Abstract To determine which clinical, biochemical and other sonographic parameters could be useful to predict malignancy in sonographically solid adnexal masses. Methods Clinical (age, menopausal status, complaints and physical examination), biochemical (serum CA-125 levels) and other sonographic features (tumor volume, ascites, bilaterality, blood flow location and velocimetric pattern) from 163 women diagnosed as having a solid adnexal mass on B-mode gray-scale ultrasound were reviewed for this retrospective study. All patients had undergone surgery and mass removal. Definitive histologic diagnosis was available in all cases. All parameters were compared to final histological diagnosis (benign or malignant) in univariate statistical analysis. Then a stepwise forward logistic regression analysis was performed to identify those features that independently predict malignancy. Results A total of 173 masses were analyzed. Patients mean age was 52.4 years (range: 15 to 84 years) 117 masses were malignant and 56 were benign. After univariate analysis all parameters showed statistical differences between benign and malignant tumors. After logistic regression analysis only central blood flow (odd ratio: 64.2, 95% CI: 17.07 to 242.03) and presence of ascites (odd ratio: 32.77, 95% CI: 5.38 to 199.72) were identified as independent predictors of malignancy. The presence of one of these two features correlated to malignancy in 98.6% of cases. The absence of both was found in 82.1% of benign tumors. Conclusions The presence or absence of ascites or central blood flow may be helpful for discriminating benign from malignant solid adnexal masses.

Download Full-text

Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification

10.1101/2021.05.23.445346 ◽

2021 ◽

Author(s):

Chakravarthi Kanduri ◽

Milena Pavlović ◽

Lonneke Scheffer ◽

Keshav Motwani ◽

Maria Chernigovskaya ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Accuracy ◽

Immune Receptor ◽

Adaptive Immune ◽

High Prediction Accuracy ◽

Wide Range ◽

Benchmark Datasets ◽

High Prediction ◽

Penalized Logistic Regression

Background: Machine learning (ML) methodology development for classification of immune states in adaptive immune receptor repertoires (AIRR) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where further method development of more sophisticated ML approaches may be required. Results: To identify those scenarios where a baseline method is able to perform well for AIRR classification, we generated a collection of synthetic benchmark datasets encompassing a wide range of dataset architecture-associated and immune state-associated sequence pattern (signal) complexity. We trained ≈1300 ML models with varying assumptions regarding immune signal on ≈850 datasets with a total of ≈210000 repertoires containing ≈42 billion TCRβ CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50000 AIR sequences. Conclusions: We provide a reference benchmark to guide new AIRR ML classification methodology by: (i) identifying those scenarios characterised by immune signal and dataset complexity, where baseline methods already achieve high prediction accuracy and (ii) facilitating realistic expectations of the performance of AIRR ML models given training dataset properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark datasets for comprehensive benchmarking of AIRR ML methods.

Download Full-text

Cytokines and Prognostic Factors in Epithelial Ovarian Cancer

Clinical Medicine Insights Oncology ◽

10.4137/cmo.s38333 ◽

2016 ◽

Vol 10 ◽

pp. CMO.S38333 ◽

Cited By ~ 11

Author(s):

Millena Prata Jammal ◽

Agrimaldo Martins-Filho ◽

Thales Parenti Silveira ◽

Eddie Fernando Candido Murta ◽

Rosekeila Simões Nomelini

Keyword(s):

Ovarian Cancer ◽

Prognostic Factors ◽

Malignant Tumors ◽

Histological Grade ◽

Tumor Development ◽

Delayed Diagnosis ◽

Disease Free Survival ◽

Benign Tumors ◽

Malignant Neoplasms ◽

Benign Neoplasms

Introduction Ovarian cancer has a high mortality and delayed diagnosis. Inflammation is a risk factor for ovarian cancer, and the inflammatory response is involved in almost all stages of tumor development. Immunohistochemical staining in stroma and epithelium of a panel of cytokines in benign and malignant ovarian neoplasm was evaluated. In addition, immunostaining was related to prognostic factors in malignant tumors. Method The study group comprised 28 ovarian benign neoplasias and 28 ovarian malignant neoplasms. A panel of cytokines was evaluated by immunohistochemistry (Th1: IL-2 and IL-8; Th2: IL-5, IL-6, and IL-10; and TNFR1). Chi-square test with Yates’ correction was used, which was considered significant if less than 0.05. Results TNFR1, IL-5, and IL-10 had more frequent immunostaining 2/3 in benign neoplasms compared with malignant tumors. Malignant tumors had more frequent immunostaining 2/3 for IL-2 in relation to benign tumors. The immunostaining 0/1 of IL 8 was more frequent in the stroma of benign neoplasms compared with malignant neoplasms. Evaluation of the ovarian cancer stroma showed that histological grade 3 was significantly correlated with staining 2/3 for IL-2 ( P = 0.004). Women whose disease-free survival was less than 2.5 years had TNFR1 stromal staining 2/3 ( P = 0.03) more frequently. Conclusion IL-2 and TNFR1 stromal immunostaining are related prognostic factors in ovarian cancer and can be the target of new therapeutic strategies.

Download Full-text

4D Doppler Ultrasound in High Grade Serous Ovarian Cancer Vascularity Evaluation—Preliminary Study

Diagnostics ◽

10.3390/diagnostics11040582 ◽

2021 ◽

Vol 11 (4) ◽

pp. 582

Author(s):

Marek Jerzy Kudla ◽

Michal Zikan ◽

Daniela Fischerova ◽

Mateusz Stolecki ◽

Juan Luis Alcazar

Keyword(s):

Ovarian Cancer ◽

Malignant Tumors ◽

Menopausal Status ◽

Power Doppler ◽

Serous Carcinoma ◽

Benign Tumors ◽

Control Group ◽

High Grade ◽

Tissue Samples ◽

4D Ultrasound

The aim of the study was to evaluate the usefulness of 4D Power Doppler tissue evaluation to discriminate between normal ovaries and ovarian cancer tumors. This was a prospective observational study. Twenty-three cases of surgically confirmed ovarian High Grade Serous Carcinoma (HGSC) were analyzed. The control group consisted of 23 healthy patients, each matching their study-group counterpart age wise (±3 years) and according to their menopausal status. Transvaginal Doppler 4D ultrasound scans were done on every patient and analyzed with 3D/4D software. Two 4D indices—volumetric Systolic/Diastolic Index (vS/D) and volumetric Pulsatility Index (vPI)—were calculated. To keep results standardized and due to technical limitations, virtual 1cc spherical tissue samples taken from the part with highest vascularization as detected by bi-directional Power Doppler were analyzed for both groups of ovaries. Values of volumetric S/D indices and volumetric PI indices were statistically lower in ovarian malignant tumors compared to normal ovaries: 1.096 vs. 1.794 and 0.092 vs. 0.558, respectively (p < 0.001). The 4D bi-directional Power Doppler vascular indices were statistically different between malignant tumors and normal ovaries. These findings could support the rationale for future studies for assessing this technology to discriminate between malignant and benign tumors.

Download Full-text