Machine Learning Prediction of Economic Effects of Busan’s Strategic Industry through Ridge Regression and Lasso Regression

Analyzing online or digital data for detecting epidemics is one of the hot areas of research and now becomes more relevant during the present outbreak of Covid-19. There are several different types of the influenza virus and moreover they keep evolving constantly in the same manner the COVID-19 virus has done. As a result, they pose a greater challenge when it comes to analyzing them, predicting when, where and at what degree of severity it will outbreak during the flu season across the world. There is need for greater surveillance to both seasonal and pandemic influenza to ensure the health and safety of the mankind. The objective of work is to apply machine learning algorithms for building predictive models that can predict where the occurrence, peak and severity of influenza in each season. For this work we have considered a freely available dataset of Ireland which is recorded for the duration of 2005 to 2016. Specifically, we have tested three ML Algorithms namely Linear Regression, Support Vector Regression and Random Forests. We found Random Forests is giving better predictive results. We also conducted experiment through weka tool and tested Zero R, Linear Regression, Lazy Kstar, Random Forest, REP Tree, Multilayer Perceptron models. We again found the Random Forest is performing better in comparison to all other models. We also evaluated other regression models including Ridge Regression, modified Ridge regression, Lasso Regression, K Neighbor Regression and evaluated the mean absolute errors. We found that modified Ridge regression is producing minimum error. The proposed work is inclined towards finding the suitability & appropriate ML algorithm for solving this problem on Flu.

Download Full-text

Value of radiomics in differential diagnosis of chromophobe renal cell carcinoma and renal oncocytoma

Abdominal Radiology ◽

10.1007/s00261-019-02269-9 ◽

2019 ◽

Vol 45 (10) ◽

pp. 3193-3201 ◽

Cited By ~ 3

Author(s):

Yajuan Li ◽

Xialing Huang ◽

Yuwei Xia ◽

Liling Long

Keyword(s):

Machine Learning ◽

Differential Diagnosis ◽

Cell Carcinoma ◽

Area Under The Curve ◽

Image Features ◽

Renal Tumors ◽

Support Vector ◽

Svm Classifier ◽

Renal Oncocytoma ◽

Lasso Regression

Abstract Purpose To explore the value of CT-enhanced quantitative features combined with machine learning for differential diagnosis of renal chromophobe cell carcinoma (chRCC) and renal oncocytoma (RO). Methods Sixty-one cases of renal tumors (chRCC = 44; RO = 17) that were pathologically confirmed at our hospital between 2008 and 2018 were retrospectively analyzed. All patients had undergone preoperative enhanced CT scans including the corticomedullary (CMP), nephrographic (NP), and excretory phases (EP) of contrast enhancement. Volumes of interest (VOIs), including lesions on the images, were manually delineated using the RadCloud platform. A LASSO regression algorithm was used to screen the image features extracted from all VOIs. Five machine learning classifications were trained to distinguish chRCC from RO by using a fivefold cross-validation strategy. The performance of the classifier was mainly evaluated by areas under the receiver operating characteristic (ROC) curve and accuracy. Results In total, 1029 features were extracted from CMP, NP, and EP. The LASSO regression algorithm was used to screen out the four, four, and six best features, respectively, and eight features were selected when CMP and NP were combined. All five classifiers had good diagnostic performance, with area under the curve (AUC) values greater than 0.850, and support vector machine (SVM) classifier showed a diagnostic accuracy of 0.945 (AUC 0.964 ± 0.054; sensitivity 0.999; specificity 0.800), showing the best performance. Conclusions Accurate preoperative differential diagnosis of chRCC and RO can be facilitated by a combination of CT-enhanced quantitative features and machine learning.

Download Full-text

A machine learning-based model for predicting intensive care unit length of stay in patients with cardiac surgery (Preprint)

10.2196/preprints.32887 ◽

2021 ◽

Author(s):

Nianyue Wu ◽

Siru Liu ◽

Haotian Zhang ◽

Xiaomin Hou ◽

Ping Zhang ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Length Of Stay ◽

Confidence Interval ◽

Medical Information ◽

Lasso Regression ◽

Icu Length Of Stay ◽

Data Source ◽

The Mean ◽

Selection Operator

BACKGROUND The intensive care unit (ICU) length of stay is significant to evaluate the effect of cardiac surgical treatment inpatient. OBJECTIVE This research aims to accurately predict the ICU length of stay in patients with cardiac surgery. Methods: We used machine learning methods to construct the model, and the medical information mart for intensive care (MIMIC IV) database was used as the data source. A total of 7,567 patients were enrolled and the mean length of stay in the ICU was 3.12 days. A total of 126 predictors were included, and 44 important predictors were screened by least absolute shrinkage and selection operator (Lasso) regression. METHODS We used machine learning methods to construct the model, and the medical information mart for intensive care (MIMIC IV) database was used as the data source. A total of 7,567 patients were enrolled and the mean length of stay in the ICU was 3.12 days. A total of 126 predictors were included, and 44 important predictors were screened by least absolute shrinkage and selection operator (Lasso) regression. RESULTS The mean accuracy are 0.603 (95% confidence interval (CI): [0.602-0.604]), 0.687 (95% confidence interval (CI): [0.687-0.688]) and 0.688 (95% confidence interval (CI): [0.687-0.689]) for the logistic regression (LR) with all variables, the gradient boosted decision tree (GBDT) with important variables and the GBDT with all variables respectively. CONCLUSIONS The GBDT model with important predictors partly overestimated patients whose length of stay was less than 3 days and underestimated patients whose length of stay was longer than 3 days. But the better prediction performance of GBDT facilitates early intervention of ICU patients with a long period of hospitalization.

Download Full-text

Self-Supervised Machine Learning Approach for Identifying Biochemical Influences on Protein-Ligand Binding Affinity

10.21203/rs.3.rs-1091733/v1 ◽

2021 ◽

Author(s):

Arjun Singh

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

3D Models ◽

Target Protein ◽

Supervised Machine Learning ◽

Significant Feature ◽

Computational Techniques ◽

Lasso Regression ◽

Generative Adversarial Network ◽

Adversarial Network

Abstract Drug discovery is incredibly time-consuming and expensive, averaging over 10 years and $985 million per drug. Calculating the binding affinity between a target protein and a ligand is critical for discovering viable drugs. Although supervised machine learning (ML) models can predict binding affinity accurately, they suffer from lack of interpretability and inaccurate feature selection caused by multicollinear data. This study used self-supervised ML to reveal underlying protein-ligand characteristics that strongly influence binding affinity. Protein-ligand 3D models were collected from the PDBBind database and vectorized into 2422 features per complex. LASSO Regression and hierarchical clustering were utilized to minimize multicollinearity between features. Correlation analyses and Autoencoder-based latent space representations were generated to identify features significantly influencing binding affinity. A Generative Adversarial Network was used to simulate ligands with certain counts of a significant feature, and thereby determine the effect of a feature on improving binding affinity with a given target protein. It was found that the CC and CCCN fragment counts in the ligand notably influence binding affinity. Re-pairing proteins with simulated ligands that had higher CC and CCCN fragment counts could increase binding affinity by 34.99-37.62% and 36.83%-36.94%, respectively. This discovery contributes to a more accurate representation of ligand chemistry that can increase the accuracy, explainability, and generalizability of ML models so that they can more reliably identify novel drug candidates. Directions for future work include integrating knowledge on ligand fragments into supervised ML models, examining the effect of CC and CCCN fragments on fragment-based drug design, and employing computational techniques to elucidate the chemical activity of these fragments.

Download Full-text

Clinical and genomic predictors of brain metastases (BM) in non-small cell lung cancer (NSCLC): An AACR Project GENIE analysis.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.2032 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 2032-2032

Author(s):

Protiva Rahman ◽

Michele LeNoue-Newton ◽

Sandip Chaugai ◽

Marilyn Holt ◽

Neha M Jain ◽

...

Keyword(s):

Ridge Regression ◽

Model Performance ◽

Parameter Tuning ◽

Ensemble Classifier ◽

Receiver Operating Curve ◽

Support Vector ◽

Lasso Regression ◽

Test Set ◽

Genetic Features ◽

Nsclc Patients

2032 Background: 30-50% of patients with non-early NSCLC will eventually develop BM, with a median survival of less than one year from BM diagnosis. There are no widely accepted clinical risk models for development of BM in patients without them at baseline. We predicted the binary risk of BM using clinical and genetic factors from a large multi-institutional cohort. Methods: Stage II-IV NSCLC patients from the AACR Project GENIE Biopharma Consortium dataset were eligible. This consisted of 4 academic institutions who curated clinical data of patients who had somatic next-generation tumor sequencing (NGS) between 2015-2017. We excluded patients who had BM at baseline, died within 30 days of NSCLC diagnosis, or did not undergo brain imaging. Covariates included demographics, anticancer therapies (received up to 90 days prior to BM development and within 5 years from NSCLC diagnosis), and NGS data; radiotherapy (RT) data were not available. NGS features included mutations and copy number alterations. These features were restricted to those classified as oncogenic by OncoKB. Univariate feature selection with Fisher’s test (p<.1) was performed on medication and genetic features. We compared 5 different machine learning models for prediction: random forest (RF), support vector machine (SVM), lasso regression, ridge regression, and an ensemble classifier. We split our data into training and test sets. 10-fold cross-validation was done on the training set for parameter tuning. The area under the receiver-operating curve (AUC) is reported on the test set. Results: 956 patients were included, 192 (20%) in the test set. Univariate features associated with BM were treatment with etoposide, Asian race, presence of bone metastases at NSCLC diagnosis, mutations in TP53 and EGFR, amplifications of ERBB2 and EGFR, and deletions of RB1, CDKN2A and CDKN2B. Univariate features inversely associated with BM were older age, treatment with nivolumab, vinorelbine, alectinib, pembrolizumab, atezolizumab, and gemcitabine, as well as mutations in NOTCH1 and KRAS. Ridge regression had the best AUC, 0.73 (Table). Conclusions: We achieved reasonable prediction performance using commonly obtained clinical and genomic information in non-early NSCLC. The biologic role of the associated alterations deserves further scrutiny; this study replicates similar findings for EGFR and KRAS in a much smaller cohort. Certain subsets of NSCLC patients may benefit from increased surveillance for BM and transition to drug therapies known to effectively cross the blood-brain barrier, e.g., nivolumab and alectinib. Inclusion of additional covariates, e.g., brain RT, may further improve model performance.[Table: see text]

Download Full-text

Modelling House Price Using Ridge Regression and Lasso Regression

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.30.22378 ◽

2018 ◽

Vol 7 (4.30) ◽

pp. 498 ◽

Cited By ~ 2

Author(s):

Seng Jia Xin ◽

Kamil Khalid

Keyword(s):

Multivariate Analysis ◽

Regression Model ◽

Ridge Regression ◽

House Price ◽

Lasso Regression ◽

Price Prediction ◽

Real Estate Sector ◽

Finance Company ◽

The Government ◽

House Condition

House price prediction is important for the government, finance company, real estate sector and also the house owner. The data of the house price at Ames, Iowa in United State which from the year 2006 to 2010 is used for multivariate analysis. However, multicollinearity is commonly occurred in the multivariate analysis and gives a serious effect to the model. Therefore, in this study investigates the performance of the Ridge regression model and Lasso regression model as both regressions can deal with multicollinearity. Ridge regression model and Lasso regression model are constructed and compared. The root mean square error (RMSE) and adjusted R-squared are used to evaluate the performance of the models. This comparative study found that the Lasso regression model is performing better compared to the Ridge regression model. Based on this analysis, the selected variables includes the aspect of house size, age of house, condition of house and also the location of the house.

Download Full-text

Prediction of Fatal and Major Injury of Drivers, Cyclists, and Pedestrians in Collisions

PROMET - Traffic&Transportation ◽

10.7307/ptt.v32i1.3134 ◽

2020 ◽

Vol 32 (1) ◽

pp. 39-53

Author(s):

Dalia Shanshal ◽

Ceni Babaoglu ◽

Ayşe Başar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Injury Severity ◽

Predictive Analytics ◽

Machine Learning Techniques ◽

Lasso Regression ◽

Severe Injuries ◽

Factors Affecting ◽

Spatio Temporal ◽

Using Data

Traffic-related deaths and severe injuries may affect every person on the roads, whether driving, cycling or walking. Toronto, the largest city in Canada and the fourth largest in North America, aims to eliminate traffic-related fatalities and serious injuries on city streets. The aim of this study is to build a prediction model using data analytics and machine learning techniques that learn from past patterns, providing additional data-driven decision support for strategic planning. A detailed exploratory analysis is presented, investigating the relationship between the variables and factors affecting collisions in Toronto. A learning-based model is proposed to predict the fatalities and severe injuries in traffic collisions through a comparison of two predictive models: Lasso Regression and Random Forest. Exploratory data analysis results reveal both spatio-temporal and behavioural patterns such as the prevalence of collisions in intersections, in the spring and summer and aggressive driving and inattentive behaviours in drivers. The prediction results show that the best predictor of injury severity for drivers, cyclists and pedestrians is Random Forest with an accuracy of 0.80, 0.89, and 0.80, respectively. The proposed methods demonstrate the effectiveness of machine learning application to traffic and collision data, both for exploratory and predictive analytics.

Download Full-text

Machine learning RF shimming: Prediction by iteratively projected ridge regression

Magnetic Resonance in Medicine ◽

10.1002/mrm.27192 ◽

2018 ◽

Vol 80 (5) ◽

pp. 1871-1881 ◽

Cited By ~ 6

Author(s):

Julianna D. Ianni ◽

Zhipeng Cao ◽

William A. Grissom

Keyword(s):

Machine Learning ◽

Ridge Regression ◽

Rf Shimming

Download Full-text

The Usage of Lasso, Ridge, and Linear Regression to Explore the Most Influential Metabolic Variables that Affect Fasting Blood Sugar in Type 2 Diabetes Patients

Romanian Journal of Diabetes Nutrition and Metabolic Diseases ◽

10.2478/rjdnmd-2019-0040 ◽

2019 ◽

Vol 26 (4) ◽

pp. 371-379

Author(s):

Arash Farbahari ◽

Tania Dehesh ◽

Mohammad Hossien Gozashti

Keyword(s):

Type 2 Diabetes ◽

Linear Regression ◽

Blood Sugar ◽

Ridge Regression ◽

Mean Squared Error ◽

Smoking Status ◽

Fasting Blood Sugar ◽

Lasso Regression ◽

Regression Methods

Abstract Background and aims: To explore the most influential variables of fasting blood sugar (FBS) with three regression methods, to identify the existence chance of type 2 diabetes based on influential variables with logistic regression (LR), and to compare the three regression methods according to Mean Squared Error (MSE) value. Material and Methods: In this cross-sectional study, 270 patients suffering from type 2 diabetes for at least 6 months and 380 healthy people were participated. The Linear regression, Ridge regression, and Least Absolute Shrinkage and Selection Operator (Lasso) regression were used to find influential variables for FBS. Results: Among 15 variables (8 metabolic, 7 characteristic), Lasso regression selected HbA1c, Urea, age, BMI, heredity, and gender, Ridge regression selected HbA1c, heredity, gender, smoking status, and drug use, and Linear regression selected HbA1c as the most effective predictors for FBS. Conclusion: HbA1c is the most influential predictor of FBS among 15 variables according to the result of three regression methods. Controlling the variation of HbA1c leads to a more stable FBS. Beside FBS that should be checked before breakfast, maybe HbA1c could be helpful in diagnosis of Type 2 diabetes.

Download Full-text

Landscape of Immune Microenvironment in Epithelial Ovarian Cancer and Establishing Risk Model by Machine Learning

Journal of Oncology ◽

10.1155/2021/5523749 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Shi-yi Liu ◽

Rong-hui Zhu ◽

Zi-tao Wang ◽

Wei Tan ◽

Li Zhang ◽

...

Keyword(s):

Machine Learning ◽

Ovarian Cancer ◽

Epithelial Ovarian Cancer ◽

Immune Cells ◽

Risk Model ◽

Dominant Role ◽

Immune Checkpoints ◽

Support Vector ◽

Lasso Regression ◽

High Infiltration

Background. Epithelial ovarian cancer (EOC) is an extremely lethal gynecological malignancy and has the potential to benefit from the immune checkpoint blockade (ICB) therapy, whose efficacy highly depends on the complex tumor microenvironment (TME). Method and Result. We comprehensively analyze the landscape of TME and its prognostic value through immune infiltration analysis, somatic mutation analysis, and survival analysis. The results showed that high infiltration of immune cells predicts favorable clinical outcomes in EOC. Then, the detailed TME landscape of the EOC had been investigated through “xCell” algorithm, Gene set variation analysis (GSVA), cytokines expression analysis, and correlation analysis. It is observed that EOC patients with high infiltrating immune cells have an antitumor phenotype and are highly correlated with immune checkpoints. We further found that dendritic cells (DCs) may play a dominant role in promoting the infiltration of immune cells into TME and forming an antitumor immune phenotype. Finally, we conducted machine-learning Lasso regression, support vector machines (SVMs), and random forest, identifying six DC-related prognostic genes (CXCL9, VSIG4, ALOX5AP, TGFBI, UBD, and CXCL11). And DC-related risk stratify model had been well established and validated. Conclusion. High infiltration of immune cells predicted a better outcome and an antitumor phenotype in EOC, and the DCs might play a dominant role in the initiation of antitumor immune cells. The well-established risk model can be used for prognostic prediction in EOC.

Download Full-text