Determining the extent and drivers of attrition losses from wind using long-term datasets and machine learning techniques

John Moore; Yue Lin

doi:10.1093/forestry/cpy047

Determining the extent and drivers of attrition losses from wind using long-term datasets and machine learning techniques

Forestry An International Journal of Forest Research ◽

10.1093/forestry/cpy047 ◽

2019 ◽

Vol 92 (4) ◽

pp. 425-435 ◽

Cited By ~ 2

Author(s):

John Moore ◽

Yue Lin

Keyword(s):

Machine Learning ◽

Basal Area ◽

Wind Damage ◽

Machine Learning Techniques ◽

Training Dataset ◽

Validation Dataset ◽

Gradient Boosting ◽

Factors Associated ◽

Learning Techniques

Abstract In addition to causing large-scale catastrophic damage to forests, wind can also cause damage to individual trees or small groups of trees. Over time, the cumulative effect of this wind-induced attrition can result in a significant reduction in yield in managed forests. Better understanding of the extent of these losses and the factors associated with them can aid better forest management. Information on wind damage attrition is often captured in long-term growth monitoring plots but analysing these large datasets to identify factors associated with the damage can be problematic. Machine learning techniques offer the potential to overcome some of the challenges with analysing these datasets. In this study, we applied two commonly-available machine learning algorithms (Random Forests and Gradient Boosting Trees) to a large, long-term dataset of tree growth for radiata pine (Pinus radiata D. Don) in New Zealand containing more than 157 000 observations. Both algorithms identified stand density and height-to-diameter ratio as being the two most important variables associated with the proportion of basal area lost to wind. The algorithms differed in their ease of parameterization and processing time as well as their overall ability to predict wind damage loss. The Random Forest model was able to predict ~43 per cent of the variation in the proportion of basal area lost to wind damage in the training dataset (a random sample of 80 per cent of the original data) and 45 per cent of the validation dataset (the remaining 20 per cent of the data). Conversely, the Gradient Boosting Tree model was able to predict more than 99 per cent of the variation in wind damage loss in the training dataset, but only ~49 per cent of the variation in the validation dataset, which highlights the potential for overfitting models to specific datasets. When applying these techniques to long-term datasets, it is also important to be aware of potential issues with the underlying data such as missing observations resulting from plots being abandoned without measurement when damage levels have been very high.

Download Full-text

Identifying the Early Signs of a Preterm Birth: A Large Cohort Study

10.21203/rs.3.rs-265488/v1 ◽

2021 ◽

Author(s):

Alireza Ebrahimvandi ◽

Niyousha Hosseinichimeh ◽

Zhenyu James Kong

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Preterm Birth ◽

Socioeconomic Factors ◽

Parental Education ◽

Machine Learning Techniques ◽

Validation Dataset ◽

Gradient Boosting ◽

Chronic Hypertension ◽

Learning Techniques

Abstract Background and Purpose— Preterm birth (PTB) is the leading cause of infant mortality in the U.S. and globally. The goal of this study is to increase understanding of PTB risk factors that are present early in pregnancy by leveraging statistical and machine learning techniques on big data. Methods—The 2016 U.S. birth records is obtained and combined with two other area-level datasets, Area Health Resources File and County Health Ranking. Then, we applied multiple machine learning techniques to study a cohort of 3.6 million singleton deliveries to identify generalizable preterm risk factors.Results—The most important predictors of preterm birth are gestational and chronic hypertension, interval since last live birth, and history of a previous preterm birth that can respectively explain 14.91%, 6.92%, and 6.50% of the AUC. Parents education is one of the influential variables in prediction of PTB explaining 10.5% of the AUC. The relative importance of race declines when parents are more educated or have received adequate prenatal care. The gradient boosting machines outperformed other machine learning techniques with an AUC of 0.75 (recall: 0.64, specificity: 0.73) for the validation dataset. Conclusions—Application of ML techniques improved the performance measures in prediction of preterm birth. The results emphasize the importance of socioeconomic factors such as parental education as one of the most important indicators of a preterm birth. More research is needed on the mechanisms through which the socioeconomic factors affect the biological responses.

Download Full-text

Identifying the underlying factors associated with antidepressant drug discontinuation: Content analysis of patients’ drug reviews (Preprint)

10.2196/preprints.23572 ◽

2020 ◽

Author(s):

Mohammad Alarifi ◽

Somaieh Goudarzvand3 ◽

Abdulrahman Jabour ◽

Doreen Foy ◽

Maryam Zolnoori

Keyword(s):

Machine Learning ◽

Antidepressant Drug ◽

Prediction Method ◽

Analytical Framework ◽

Structured Data ◽

Withdrawal Symptoms ◽

Machine Learning Techniques ◽

Drug Discontinuation ◽

Factors Associated ◽

Learning Techniques

BACKGROUND The rate of antidepressant prescriptions is globally increasing. A large portion of patients stop their medications which could lead to many side effects including relapse, and anxiety. OBJECTIVE The aim of this was to develop a drug-continuity prediction model and identify the factors associated with drug-continuity using online patient forums. METHODS We retrieved 982 antidepressant drug reviews from the online patient’s forum AskaPatient.com. We followed the Analytical Framework Method to extract structured data from unstructured data. Using the structured data, we examined the factors associated with antidepressant discontinuity and developed a predictive model using multiple machine learning techniques. RESULTS We tested multiple machine learning techniques which resulted in different performances ranging from accuracy of 65% to 82%. We found that Radom Forest algorithm provides the highest prediction method with 82% Accuracy, 78% Precision, 88.03% Recall, and 84.2% F1-Score. The factors associated with drug discontinuity the most were; withdrawal symptoms, effectiveness-ineffectiveness, perceived-distress-adverse drug reaction, rating, and perceived-distress related to withdrawal symptoms. CONCLUSIONS Although the nature of data available at online forums differ from data collected through surveys, we found that online patients forum can be a valuable source of data for drug-continuity prediction and understanding patients experience. The factors identified through our techniques were consistent with the findings of prior studies that used surveys.

Download Full-text

Feasibility of Machine Learning Algorithms for Predicting the Deformation of Anodic Titanium Films by Modulating Anodization Processes

Materials ◽

10.3390/ma14051089 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1089

Author(s):

Sung-Hee Kim ◽

Chanyoung Jeong

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Multiclass Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Manufacturing ◽

Gradient Boosting ◽

Experimental Conditions ◽

Learning Techniques ◽

Tio2 Nanostructures

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.

Download Full-text

Predictors of outpatients’ no-show: big data analytics using apache spark

Journal Of Big Data ◽

10.1186/s40537-020-00384-9 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Tahani Daghistani ◽

Huda AlGhamdi ◽

Riyad Alshammari ◽

Raed H. AlHazme

Keyword(s):

Machine Learning ◽

Big Data ◽

Negative Impact ◽

Big Data Analytics ◽

Quality Of Healthcare ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Healthcare Organizations ◽

Data Framework ◽

Learning Techniques

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.

Download Full-text

Long-term Cholesterol Risk Prediction using Machine Learning Techniques in ELSA Database

10.5220/0010727200003063 ◽

2021 ◽

Author(s):

Nikos Fazakis ◽

Elias Dritsas ◽

Otilia Kocsis ◽

Nikos Fakotakis ◽

Konstantinos Moustakas

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Machine Learning for Neurodegenerative Disorder Diagnosis — Survey of Practices and Launch of Benchmark Dataset

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018500112 ◽

2018 ◽

Vol 27 (03) ◽

pp. 1850011 ◽

Cited By ~ 5

Author(s):

Athanasios Tagaris ◽

Dimitrios Kollias ◽

Andreas Stafylopatis ◽

Georgios Tagaris ◽

Stefanos Kollias

Keyword(s):

Machine Learning ◽

Neurodegenerative Disorder ◽

Developed Countries ◽

Machine Learning Techniques ◽

Network Architectures ◽

Imaging Data ◽

Learning Techniques ◽

Medical Examinations

Neurodegenerative disorders, such as Alzheimer’s and Parkinson’s, constitute a major factor in long-term disability and are becoming more and more a serious concern in developed countries. As there are, at present, no effective therapies, early diagnosis along with avoidance of misdiagnosis seem to be critical in ensuring a good quality of life for patients. In this sense, the adoption of computer-aided-diagnosis tools can offer significant assistance to clinicians. In the present paper, we provide in the first place a comprehensive recording of medical examinations relevant to those disorders. Then, a review is conducted concerning the use of Machine Learning techniques in supporting diagnosis of neurodegenerative diseases, with reference to at times used medical datasets. Special attention has been given to the field of Deep Learning. In addition to that, we communicate the launch of a newly created dataset for Parkinson’s disease, containing epidemiological, clinical and imaging data, which will be publicly available to researchers for benchmarking purposes. To assess the potential of the new dataset, an experimental study in Parkinson’s diagnosis is carried out, based on state-of-the-art Deep Neural Network architectures and yielding very promising accuracy results.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Machine Learning Generalisation across Different 3D Architectural Heritage

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9060379 ◽

2020 ◽

Vol 9 (6) ◽

pp. 379 ◽

Cited By ~ 4

Author(s):

Eleonora Grilli ◽

Fabio Remondino

Keyword(s):

Machine Learning ◽

Point Cloud ◽

Machine Learning Techniques ◽

Training Dataset ◽

High Complexity ◽

Architectural Heritage ◽

Learning Techniques ◽

Machine Learning Model ◽

Point Cloud Classification

The use of machine learning techniques for point cloud classification has been investigated extensively in the last decade in the geospatial community, while in the cultural heritage field it has only recently started to be explored. The high complexity and heterogeneity of 3D heritage data, the diversity of the possible scenarios, and the different classification purposes that each case study might present, makes it difficult to realise a large training dataset for learning purposes. An important practical issue that has not been explored yet, is the application of a single machine learning model across large and different architectural datasets. This paper tackles this issue presenting a methodology able to successfully generalise to unseen scenarios a random forest model trained on a specific dataset. This is achieved looking for the best features suitable to identify the classes of interest (e.g., wall, windows, roof and columns).

Download Full-text

Prediction of probable backorder scenarios in the supply chain using Distributed Random Forest and Gradient Boosting Machine learning techniques

Journal Of Big Data ◽

10.1186/s40537-020-00345-2 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Samiul Islam ◽

Saman Hassanzadeh Amin

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Random Forest ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Techniques ◽

Gradient Boosting Machine

Download Full-text

Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques

Applied Sciences ◽

10.3390/app10072406 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2406

Author(s):

Valentín Moreno ◽

Gonzalo Génova ◽

Manuela Alejandres ◽

Anabel Fraga

Keyword(s):

Machine Learning ◽

Software Reuse ◽

Unified Modeling Language ◽

Machine Learning Techniques ◽

Training Dataset ◽

Unified Modeling ◽

Learning Techniques ◽

Web Images ◽

Time Required ◽

Automated Software

Our purpose in this research is to develop a method to automatically and efficiently classify web images as Unified Modeling Language (UML) static diagrams, and to produce a computer tool that implements this function. The tool receives a bitmap file (in different formats) as an input and communicates whether the image corresponds to a diagram. For pragmatic reasons, we restricted ourselves to the simplest kinds of diagrams that are more useful for automated software reuse: computer-edited 2D representations of static diagrams. The tool does not require that the images are explicitly or implicitly tagged as UML diagrams. The tool extracts graphical characteristics from each image (such as grayscale histogram, color histogram and elementary geometric forms) and uses a combination of rules to classify it. The rules are obtained with machine learning techniques (rule induction) from a sample of 19,000 web images manually classified by experts. In this work, we do not consider the textual contents of the images. Our tool reaches nearly 95% of agreement with manually classified instances, improving the effectiveness of related research works. Moreover, using a training dataset 15 times bigger, the time required to process each image and extract its graphical features (0.680 s) is seven times lower.

Download Full-text