Multivariate Regression and Machine Learning with Sums of Separable Functions

2009 ◽  
Vol 31 (3) ◽  
pp. 1840-1857 ◽  
Author(s):  
Gregory Beylkin ◽  
Jochen Garcke ◽  
Martin J. Mohlenkamp
2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii135-ii136
Author(s):  
John Lin ◽  
Michelle Mai ◽  
Saba Paracha

Abstract Glioblastoma multiforme (GBM), the most common form of glioma, is a malignant tumor with a high risk of mortality. By providing accurate survival estimates, prognostic models have been identified as promising tools in clinical decision support. In this study, we produced and validated two machine learning-based models to predict survival time for GBM patients. Publicly available clinical and genomic data from The Cancer Genome Atlas (TCGA) and Broad Institute GDAC Firehouse were obtained through cBioPortal. Random forest and multivariate regression models were created to predict survival. Predictive accuracy was assessed and compared through mean absolute error (MAE) and root mean square error (RMSE) calculations. 619 GBM patients were included in the dataset. There were 381 (62.9%) cases of recurrence/progression and 53 (8.7%) cases of disease-free survival. The MAE and RMSE values were 0.553 and 0.887 years respectively for the random forest regression model, and they were 1.756 and 2.451 years respectively for the multivariate regression model. Both models accurately predicted overall survival. Comparison of models through MAE, RMSE, and visual analysis produced higher accuracy values for random forest than multivariate linear regression. Further investigation on feature selection and model optimization may improve predictive power. These findings suggest that using machine learning in GBM prognostic modeling will improve clinical decision support. *Co-first authors.


2021 ◽  
Author(s):  
Bohan Zheng

With Internet of Things (IoT) being prevalently adopted in recent years, traditional machine learning and data mining methods can hardly be competent to deal with the complex big data problems if applied alone. However, hybridizing those who have complementary advantages could achieve optimized practical solutions. This work discusses how to solve multivariate regression problems and extract intrinsic knowledge by hybridizing Self-Organizing Maps (SOM) and Regression Trees. A dual-layer SOM map is developed in which the first layer accomplishes unsupervised learning and then regression tree layer performs supervised learning in the second layer to get predictions and extract knowledge. In this framework, SOM neurons serve as kernels with similar training samples mapped so that regression tree could achieve regression locally. In this way, the difficulties of applying and visualizing local regression on high dimensional data are overcome. Further, we provide an automated growing mechanism based on a few stop criteria without adding new parameters. A case study of solving Electrical Vehicle (EV) range anxiety problem is presented and it demonstrates that our proposed hybrid model is quantitatively precise and interpretive. key words: Multivariate Regression, Big Data, Machine Learning, Data Mining, Self-Organizing Maps (SOM), Regression Tree, Electrical Vehicle (EV), Range Estimation, Internet of Things (IoT)


Artificial intelligence (AI) can be implemented using Machine Learning which allows the computing to potentially robotically study and improve from its previous experiences without being manually typed. Data can be accessed and used by the computer programs developed using Machine learning. This paper mainly focused on implementation of machine learning in the arena of sports to predict the captivating team of an IPL match. Cricket is a popular uncertain sport, particularly the T-20 format, there’s a possibility of the complete game play to change with the effect of any single over. Millions of spectators watch the Indian Premier League (IPL) every year, hence it becomes a real-time problem to compose a technique that will forecast the conclusion of matches. Many aspects and features determine the result of a cricket match each of which has a weighted impact on the result of a T20 cricket match. This paper describes all those features in detail. A multivariate regression-based approach is proposed to measure the team's points in the league. The past performance of every team determines its probability of winning a match against a particular opponent. Finally, a set of seven factors or attributes is identified that can be used for predicting the IPL match winner. Various machine learning models were trained and used to perform within the time lapse between the toss and initiation of the match, to predict the winner. The performance of the model developed are evaluated with various classification techniques where Random Forest and Decision Tree have given good results.


2021 ◽  
pp. 000348942110412
Author(s):  
Marco A. Mascarella ◽  
Nikesh Muthukrishnan ◽  
Farhad Maleki ◽  
Marie-Jeanne Kergoat ◽  
Keith Richardson ◽  
...  

Objective: Major postoperative adverse events (MPAEs) following head and neck surgery are not infrequent and lead to significant morbidity. The objective of this study was to ascertain which factors are most predictive of MPAEs in patients undergoing head and neck surgery. Methods: A cohort study was carried out based on data from patients registered in the National Surgical Quality Improvement Program (NSQIP) from 2006 to 2018. All patients undergoing non-ambulatory head and neck surgery based on Current Procedural Terminology codes were included. Perioperative factors were evaluated to predict MPAEs within 30-days of surgery. Age was classified as both a continuous and categorical variable. Retained factors were classified by attributable fraction and C-statistic. Multivariate regression and supervised machine learning models were used to quantify the contribution of age as a predictor of MPAEs. Results: A total of 43 701 operations were analyzed with 5106 (11.7%) MPAEs. The results of supervised machine learning indicated that prolonged surgeries, anemia, free tissue transfer, weight loss, wound classification, hypoalbuminemia, wound infection, tracheotomy (concurrent with index head and neck surgery), American Society of Anesthesia (ASA) class, and sex as most predictive of MPAEs. On multivariate regression, ASA class (21.3%), hypertension on medication (15.8%), prolonged operative time (15.3%), sex (13.1%), preoperative anemia (12.8%), and free tissue transfer (9%) had the largest attributable fractions associated with MPAEs. Age was independently associated with MPAEs with an attributable fraction ranging from 0.6% to 4.3% with poor predictive ability (C-statistic 0.60). Conclusion: Surgical, comorbid, and frailty-related factors were most predictive of short-term MPAEs following head and neck surgery. Age alone contributed a small attributable fraction and poor prediction of MPAEs. Level of evidence: 3


2018 ◽  
Vol 28 (4) ◽  
pp. 340-348 ◽  
Author(s):  
Manuela Nagel ◽  
Katharina Holstein ◽  
Evelin Willner ◽  
Andreas Börner

AbstractSeed longevity is influenced by many factors, a widely discussed one of which is the seed lipid content and fatty acid composition. Here, linear and non-linear regressions based on machine learning were applied to analyse germinability and seed composition of a set of 42 oilseed rape (Brassica napusL.) accessions grown under the same single environment and at the same time following a period of up to 31 years storage at 7°C. Mean viability was halved after 27.0 years of storage, but this figure concealed a major influence of genotype. There was also wide variation with respect to fatty acid composition, particularly with respect to oleic, α-linolenic, eicosenoic and erucic acid. Linear regression (rL) revealed significant correlation coefficients between normal seedling appearance and the content of α-linolenic acid (+0.52) and total oil (+0.59). Multivariate regression using artificial neural networks including a radial basis function (RBF), a multilayer perceptron (MLP) and a partial least square (PLS) recognized underlying structures and revealed high significant correlation coefficients (rM) for oil content (+0.87), eicosenoic acid (+0.75), stearic acid (+0.73) and lignoceric acid (+0.97). Oil content or a combination of oleic, α-linolenic, arachidic, eicosenoic and eicosadienoic acids and glucosinolates resulted in highest model fitting parametersR2of 0.90 and 0.88, respectively. In addition, the glucosinolate content, predominantly in the Brassicaceae family and ranging from 4.6 to 79.5 µM, was negatively correlated with viability (rL= ‒0.43). Summarizing, oil content, some fatty acids and glucosinolates contribute to variations in average half-life (15.2 to 50.7 years) of oilseed rape seeds. In contrast to linear regression, multivariate regression using artificial neural networks revealed high associations for combinations of parameters including underestimated minor fatty acids such as arachidic, stearic and eicosadienoic acids. This indicates that genetic and seed composition factors contribute to seed longevity. In addition, multivariate regressions might be a successful approach to predict seed viability based on fatty acids and seed oil content.


2020 ◽  
Vol 35 (8) ◽  
pp. 1641-1653 ◽  
Author(s):  
Weijie Xu ◽  
Chen Sun ◽  
Yongqi Tan ◽  
Liang Gao ◽  
Yuqing Zhang ◽  
...  

The matrix effects in LIBS analyses have been considered with univariate and machine learning based multivariate regression models for TAS classification of rocks.


2021 ◽  
Author(s):  
Bohan Zheng

With Internet of Things (IoT) being prevalently adopted in recent years, traditional machine learning and data mining methods can hardly be competent to deal with the complex big data problems if applied alone. However, hybridizing those who have complementary advantages could achieve optimized practical solutions. This work discusses how to solve multivariate regression problems and extract intrinsic knowledge by hybridizing Self-Organizing Maps (SOM) and Regression Trees. A dual-layer SOM map is developed in which the first layer accomplishes unsupervised learning and then regression tree layer performs supervised learning in the second layer to get predictions and extract knowledge. In this framework, SOM neurons serve as kernels with similar training samples mapped so that regression tree could achieve regression locally. In this way, the difficulties of applying and visualizing local regression on high dimensional data are overcome. Further, we provide an automated growing mechanism based on a few stop criteria without adding new parameters. A case study of solving Electrical Vehicle (EV) range anxiety problem is presented and it demonstrates that our proposed hybrid model is quantitatively precise and interpretive. key words: Multivariate Regression, Big Data, Machine Learning, Data Mining, Self-Organizing Maps (SOM), Regression Tree, Electrical Vehicle (EV), Range Estimation, Internet of Things (IoT)


2020 ◽  
Vol 4 (s1) ◽  
pp. 52-52
Author(s):  
Priscila Rodrigues Armijo ◽  
Sindhura Bonthu ◽  
Alicia Schiller ◽  
Qiuming Zhu ◽  
Tiffany Tanner

OBJECTIVES/GOALS: Multivariate regression is used for surgical outcomes analyses; but does not allow for evaluation of all variables. Machine learning could be the perfect alternative to address this issue. Our aim was to evaluate whether machine learning is a feasible alternative to evaluate surgical outcomes. METHODS/STUDY POPULATION: H-CUP National Inpatient Sample database was queried for adult patients with colorectal cancer who underwent colorectal resection, while the NSQIP database was queried for adult patients with rectal cancer who underwent proctectomy. A multivariate regression analysis was performed to assess risk factors associated with 30-day complications following those procedures. Subsequently, machine learning techniques of under-sampling and oversampling were applied to the same datasets for the evaluation of risk factors for the same outcome. These techniques were used to achieve a larger population sample size and to detect statistical significance. Results between the two methodologies were compared. RESULTS/ANTICIPATED RESULTS: Multivariate regression revealed that open approach, gender, race, geographic location, number of comorbidities, and type of insurance was associated with increased 30-day mortality in colorectal resection patients. Conversely, the use of machine learning revealed that preoperative weight loss, preexistent chest heart failure, renal failure or perivascular disease were strongly associated with 30-day mortality. For proctectomy patients, multivariate regression found no association between surgical approach and 30-day mortality. However, machine learning revealed gender, hypertension, and reoperation to be strongly associated with 30-day mortality. DISCUSSION/SIGNIFICANCE OF IMPACT: Machine learning enabled multiple combinations that were not possible to examine in a conventional multivariate regression analysis. Machine learning compared to traditional multivariate regression produced significantly different outcomes, highlighting the need for in depth of these methodologies.


Sign in / Sign up

Export Citation Format

Share Document