Development of machine-learning performance prediction models for asphalt mixtures

Hardware architectures become increasingly complex as the compute capabilities grow to exascale. We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input and predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, and (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks and present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs. We further extend PPT-AMMP to predict the performance of a scientific application code, namely, the radiation transport mini-app SNAP. To this end, we analyze multi-variate regression models that accurately predict the reuse profiles and the basic block counts. We validate predicted SNAP runtimes against actual measured times.

Download Full-text

Using Decision Trees and Random Forest Algorithms to Predict and Determine Factors Contributing to First-Year University Students’ Learning Performance

Algorithms ◽

10.3390/a14110318 ◽

2021 ◽

Vol 14 (11) ◽

pp. 318

Author(s):

Thao-Trang Huynh-Cam ◽

Long-Sheng Chen ◽

Huynh Le

Keyword(s):

Random Forest ◽

Performance Prediction ◽

Prediction Models ◽

Family Background ◽

Educational Practice ◽

Poor Performance ◽

Learning Performance ◽

First Year ◽

First Year Students ◽

Early Performance

First-year students’ learning performance has received much attention in educational practice and theory. Previous works used some variables, which should be obtained during the course or in the progress of the semester through questionnaire surveys and interviews, to build prediction models. These models cannot provide enough timely support for the poor performance students, caused by economic factors. Therefore, other variables are needed that allow us to reach prediction results earlier. This study attempts to use family background variables that can be obtained prior to the start of the semester to build learning performance prediction models of freshmen using random forest (RF), C5.0, CART, and multilayer perceptron (MLP) algorithms. The real sample of 2407 freshmen who enrolled in 12 departments of a Taiwan vocational university will be employed. The experimental results showed that CART outperforms C5.0, RF, and MLP algorithms. The most important features were mother’s occupations, department, father’s occupations, main source of living expenses, and admission status. The extracted knowledge rules are expected to be indicators for students’ early performance prediction so that strategic intervention can be planned before students begin the semester.

Download Full-text

Simple Linear Cancer Risk Prediction Models with Novel Features Outperform Complex Approaches

10.1101/2021.01.11.21249290 ◽

2021 ◽

Author(s):

Scott Kulm ◽

Lior Kofman ◽

Jason Mezey ◽

Olivier Elemento

Keyword(s):

Machine Learning ◽

Cancer Survival ◽

Linear Models ◽

Prediction Models ◽

Learning Algorithm ◽

Learning Performance ◽

Health Study ◽

Learning Models ◽

The Uk ◽

Machine Learning Models

ABSTRACTA patient’s risk for cancer is usually estimated through simple linear models that sum effect sizes of proven risk factors. In theory, more advanced machine learning models can be used for the same task. Using data from the UK Biobank, a large prospective health study, we have developed linear and machine learning models for the prediction of 12 different cancers diagnoses within a 10 year time span. We find that the top machine learning algorithm, XGBoost (XGB), trained on 707 features generated an average area under the receiver operator curve of 0.736 (with a range of 0.65-0.85). Linear models trained with only 10 features were found to be statistically indifferent from the machine learning performance. The linear models were significantly more accurate than the prominent QCancer models (p = 0.0019), which are trained on 45 million patient records and available to over 4,000 United Kingdom general practices. The increase in accuracy may be caused by the consideration of often omitted feature types, including survey answers, census records, and genetic information. This approach led to the discovery of significant novel risk features, including self-reported happiness with own health (relevant to 12 cancers), measured testosterone (relevant to 8 cancers), and ICD codes for rehabilitation procedures (relevant to 3 cancers). These ten feature models can be easily implemented within the clinic, allowing for personalized screening schedules that may increase the cancer survival within a population.

Download Full-text

New machine learning-based prediction models for fracture energy of asphalt mixtures

Measurement ◽

10.1016/j.measurement.2018.11.081 ◽

2019 ◽

Vol 135 ◽

pp. 438-451 ◽

Cited By ~ 21

Author(s):

Hamed Majidifard ◽

Behnam Jahangiri ◽

William G. Buttlar ◽

Amir H. Alavi

Keyword(s):

Machine Learning ◽

Fracture Energy ◽

Prediction Models ◽

Asphalt Mixtures ◽

New Machine

Download Full-text

Comparison of Machine Learning Techniques for Developing Performance Prediction Models

Computing in Civil and Building Engineering (2014) ◽

10.1061/9780784413616.152 ◽

2014 ◽

Cited By ~ 1

Author(s):

Nima Kargah-Ostadi

Keyword(s):

Machine Learning ◽

Performance Prediction ◽

Prediction Models ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text

COVID-19 Outbreak Prediction with Machine Learning

10.34055/osf.io/xr4js ◽

2020 ◽

Author(s):

Sina Faizollahzadeh Ardabili ◽

Amir Mosavi ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

Annamaria R. Varkonyi-Koczy ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Fuzzy Inference ◽

Control Measures ◽

Future Research ◽

Complex Nature ◽

Inference System ◽

Wide Range ◽

Standard Models ◽

High Level

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed-decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and statistical models have received more attention by authorities, and they are popular in the media. Due to a high level of uncertainty and lack of essential data, standard models have shown low accuracy for long-term prediction. Although the literature includes several attempts to address this issue, the essential generalization and robustness abilities of existing models needs to be improved. This paper presents a comparative analysis of machine learning and soft computing models to predict the COVID-19 outbreak as an alternative to SIR and SEIR models. Among a wide range of machine learning models investigated, two models showed promising results (i.e., multi-layered perceptron, MLP, and adaptive network-based fuzzy inference system, ANFIS). Based on the results reported here, and due to the highly complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. Paper further suggests that real novelty in outbreak prediction can be realized through integrating machine learning and SEIR models.

Download Full-text

Comparison of Machine Learning Performance for Earnings Forecasting

Journal of Taxation and Accounting ◽

10.35850/kjta.20.6.01 ◽

2019 ◽

Vol 20 (6) ◽

pp. 9-34

Author(s):

Woo June Jung

Keyword(s):

Machine Learning ◽

Learning Performance ◽

Earnings Forecasting

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

A Review of Statistical and Machine Learning Techniques for Microvascular Complications in Type 2 Diabetes

Current Diabetes Reviews ◽

10.2174/1573399816666200511003357 ◽

2020 ◽

Vol 16 ◽

Author(s):

Nitigya Sambyal ◽

Poonam Saini ◽

Rupali Syal

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Clinical Medicine ◽

Microvascular Complications ◽

Descriptive Analysis ◽

Machine Learning Techniques ◽

World Health ◽

Public Health Issue ◽

Learning Techniques ◽

Health Organization

Background and Introduction: Diabetes mellitus is a metabolic disorder that has emerged as a serious public health issue worldwide. According to the World Health Organization (WHO), without interventions, the number of diabetic incidences is expected to be at least 629 million by 2045. Uncontrolled diabetes gradually leads to progressive damage to eyes, heart, kidneys, blood vessels and nerves. Method: The paper presents a critical review of existing statistical and Artificial Intelligence (AI) based machine learning techniques with respect to DM complications namely retinopathy, neuropathy and nephropathy. The statistical and machine learning analytic techniques are used to structure the subsequent content review. Result: It has been inferred that statistical analysis can help only in inferential and descriptive analysis whereas, AI based machine learning models can even provide actionable prediction models for faster and accurate diagnose of complications associated with DM. Conclusion: The integration of AI based analytics techniques like machine learning and deep learning in clinical medicine will result in improved disease management through faster disease detection and cost reduction for disease treatment.

Download Full-text