Comparisons of ADABOOST, KNN, SVM and Logistic Regression in Classification of Imbalanced Dataset

3044 Background: Epigenomics assays have recently become popular tools for identification of molecular biomarkers, both in tissue and in plasma. In particular 5-hydroxymethyl-cytosine (5hmC) method, has been shown to enable the epigenomic regulation of gene expression and subsequent gene activity, with different patterns, across several tumor and normal tissues types. In this study we show that 5hmC profiles enable discrete classification of tumor and normal tissue for breast, colorectal, lung ovary and pancreas. Such classification was also recapitulated in cfDNA from patient with breast, colorectal, lung, ovarian and pancreatic cancers. Methods: DNA was isolated from 176 fresh frozen tissues from breast, colorectal, lung, ovary and pancreas (44 per tumor per tissue type and up to 11 tumor tissues for each stage (I-IV)) and up to 10 normal tissues per tissue type. cfDNA was isolated from plasma from 783 non-cancer individuals and 569 cancer patients. Plasma-isolated cfDNA and tumor genomic DNA, were enriched for the 5hmC fraction using chemical labelling, sequenced, and aligned to a reference genome to construct features sets of 5hmC patterns. Results: 5hmC multinomial logistic regression analysis was employed across tumor and normal tissues and identified a set of specific and discrete tumor and normal tissue gene-based features. This indicates that we can classify samples regardless of source, with a high degree of accuracy, based on tissue of origin and also distinguish between normal and tumor status.Next, we employed a stacked ensemble machine learning algorithm combining multiple logistic regression models across diverse feature sets to the cfDNA dataset composed of 783 non cancers and 569 cancers comprising 67 breast, 118 colorectal, 210 Lung, 71 ovarian and 100 pancreatic cancers. We identified a genomic signature that enable the classification of non-cancer versus cancers with an outer fold cross validation sensitivity of 49% (CI 45%-53%) at 99% specificity. Further, individual cancer outer fold cross validation sensitivity at 99% specificity, was measured as follows: breast 30% (CI 119% -42%); colorectal 41% (CI 32%-50%); lung 49% (CI 42%-56%); ovarian 72% (CI 60-82%); pancreatic 56% (CI 46%-66%). Conclusions: This study demonstrates that 5hmC profiles can distinguish cancer and normal tissues based on their origin. Further, 5hmC changes in cfDNA enables detection of the several cancer types: breast, colorectal, lung, ovarian and pancreatic cancers. Our technology provides a non-invasive tool for cancer detection with low risk sample collection enabling improved compliance than current screening methods. Among other utilities, we believe our technology could be applied to asymptomatic high-risk individuals thus enabling enrichment for those subjects that most need a diagnostic imaging follow up.

Download Full-text

Classification of heart sound signals with BP neural network and logistic regression

2017 Chinese Automation Congress (CAC) ◽

10.1109/cac.2017.8244111 ◽

2017 ◽

Author(s):

Lina Li ◽

Xinpei Wang ◽

Xiaping Du ◽

Yuanyuan Liu ◽

Changchun Liu ◽

...

Keyword(s):

Neural Network ◽

Logistic Regression ◽

Bp Neural Network ◽

Heart Sound

Download Full-text

Gene Selection in Cancer Classification Using Sparse Logistic Regression with L1/2 Regularization

Applied Sciences ◽

10.3390/app8091569 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1569 ◽

Cited By ~ 3

Author(s):

Shengbing Wu ◽

Hongkun Jiang ◽

Haiwei Shen ◽

Ziyi Yang

Keyword(s):

Logistic Regression ◽

Gene Selection ◽

Classification Performance ◽

Cancer Classification ◽

Sparse Logistic Regression ◽

The Subject ◽

Selection For ◽

Microarray Datasets ◽

Sparse Methods

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.

Download Full-text

Prediksi Not Operational Transaction Menggunakan Logistic Regression pada Bank XYZ di Kota Kupang

AITI ◽

10.24246/aiti.v17i1.42-55 ◽

2020 ◽

Vol 17 (1) ◽

pp. 42-55

Author(s):

Radius Tanone ◽

Arnold B Emmanuel

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Method ◽

Learning Approach ◽

Know How ◽

Independent Variables ◽

Machine Learning Approach ◽

Logistic Regression Method ◽

Python Programming

Bank XYZ is one of the banks in Kupang City, East Nusa Tenggara Province which has several ATM machines and is placed in several merchant locations. The existing ATM machine is one of the goals of customers and non-customers in conducting transactions at the ATM machine. The placement of the ATM machines sometimes makes the machine not used optimally by the customer to transact, causing the disposal of machine resources and a condition called Not Operational Transaction (NOP). With the data consisting of several independent variables with numeric types, it is necessary to know how the classification of the dependent variable is NOP. Machine learning approach with Logistic Regression method is the solution in doing this classification. Some research steps are carried out by collecting data, analyzing using machine learning using python programming and writing reports. The results obtained with this machine learning approach is the resulting prediction value of 0.507 for its classification. This means that in the future XYZ Bank can classify NOP conditions based on the behavior of customers or non-customers in making transactions using Bank XYZ ATM machines.

Download Full-text

Machine state classification of electric track circuit by means of logistic regression

Omsk Scientific Bulletin ◽

10.25206/1813-8225-2018-160-67-72 ◽

2018 ◽

pp. 67-72 ◽

Cited By ~ 1

Author(s):

D. V. Borisenko ◽

◽

I. V. Prisukhina ◽

S. A. Lunev ◽

◽

...

Keyword(s):

Logistic Regression ◽

State Classification ◽

Track Circuit

Download Full-text

Selected Robust Logistic Regression Specification for Classification of Multi‑dimensional Functional Data in Presence of Outlier

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.334.04 ◽

2018 ◽

Vol 2 (334) ◽

Author(s):

Mirosław Krzyśko ◽

Łukasz Smaga

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Functional Data ◽

Logistic Regression Model ◽

Binary Classification ◽

Classification Problem ◽

Classification Rule ◽

Unknown Parameters ◽

Explanatory Variables

In this paper, the binary classification problem of multi‑dimensional functional data is considered. To solve this problem a regression technique based on functional logistic regression model is used. This model is re‑expressed as a particular logistic regression model by using the basis expansions of functional coefficients and explanatory variables. Based on re‑expressed model, a classification rule is proposed. To handle with outlying observations, robust methods of estimation of unknown parameters are also considered. Numerical experiments suggest that the proposed methods may behave satisfactory in practice.

Download Full-text

Association Between Diagnosis Code Expansion and Changes in 30‐Day Risk‐Adjusted Outcomes for Cardiovascular Diseases

Journal of the American Heart Association ◽

10.1161/jaha.120.020668 ◽

2021 ◽

Author(s):

Lauren Gilstrap ◽

Rishi K. Wadhera ◽

Andrea M. Austin ◽

Stephen Kearing ◽

Karen E. Joynt Maddox ◽

...

Keyword(s):

Logistic Regression ◽

International Classification Of Diseases ◽

Fee For Service ◽

Readmission Rates ◽

Difference In Differences ◽

Diagnosis Codes ◽

Classification Of Diseases ◽

Changes In Risk ◽

The Impact

BACKGROUND In January 2011, Centers for Medicare and Medicaid Services expanded the number of inpatient diagnosis codes from 9 to 25, which may influence comorbidity counts and risk‐adjusted outcome rates for studies spanning January 2011. This study examines the association between (1) limiting versus not limiting diagnosis codes after 2011, (2) using inpatient‐only versus inpatient and outpatient data, and (3) using logistic regression versus the Centers for Medicare and Medicaid Services risk‐standardized methodology and changes in risk‐adjusted outcomes. METHODS AND RESULTS Using 100% Medicare inpatient and outpatient files between January 2009 and December 2013, we created 2 cohorts of fee‐for‐service beneficiaries aged ≥65 years. The acute myocardial infarction cohort and the heart failure cohort had 578 728 and 1 595 069 hospitalizations, respectively. We calculate comorbidities using (1) inpatient‐only limited diagnoses, (2) inpatient‐only unlimited diagnoses, (3) inpatient and outpatient limited diagnoses, and (4) inpatient and outpatient unlimited diagnoses. Across both cohorts, International Classification of Diseases, Ninth Revision ( ICD‐9 ) diagnoses and hierarchical condition categories increased after 2011. When outpatient data were included, there were no significant differences in risk‐adjusted readmission rates using logistic regression or the Centers for Medicare and Medicaid Services risk standardization. A difference‐in‐differences analysis of risk‐adjusted readmission trends before versus after 2011 found that no significant differences between limited and unlimited models for either cohort. CONCLUSIONS For studies that span 2011, researchers should consider limiting the number of inpatient diagnosis codes to 9 and/or including outpatient data to minimize the impact of the code expansion on comorbidity counts. However, the 2011 code expansion does not appear to significantly affect risk‐adjusted readmission rate estimates using either logistic or risk‐standardization models or when using or excluding outpatient data.

Download Full-text