Comparison of Machine Learning Methods to Up-Scale Gross Primary Production

Eddy covariance observation is an applicable way to obtain accurate and continuous carbon flux at flux tower sites, while remote sensing technology could estimate carbon exchange and carbon storage at regional and global scales effectively. However, it is still challenging to up-scale the field-observed carbon flux to a regional scale, due to the heterogeneity and the unstable air conditions at the land surface. In this paper, gross primary production (GPP) from ground eddy covariance systems were up-scaled to a regional scale by using five machine learning methods (Cubist regression tree, random forest, support vector machine, artificial neural network, and deep belief network). Then, the up-scaled GPP were validated using GPP at flux tower sites, weighted GPP in the footprint, and MODIS GPP products. At last, the sensitivity of the input data (normalized difference vegetation index, fractional vegetation cover, shortwave radiation, relative humidity and air temperature) to the precision of up-scaled GPP was analyzed, and the uncertainty of the machine learning methods was discussed. The results of this paper indicated that machine learning methods had a great potential in up-scaling GPP at flux tower sites. The validation of up-scaled GPP, using five machine learning methods, demonstrated that up-scaled GPP using random forest obtained the highest accuracy.

Download Full-text

Integrating Machine/Deep Learning Methods and Filtering Techniques for Reliable Mineral Phase Segmentation of 3D X-ray Computed Tomography Images

Energies ◽

10.3390/en14154595 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4595

Author(s):

Parisa Asadi ◽

Lauren E. Beckingham

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ct Images ◽

Ct Imaging ◽

Learning Method ◽

Learning Methods ◽

X Ray ◽

Machine Learning Methods ◽

Filtering Techniques

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Innovation in Aging ◽

10.1093/geroni/igaa057.859 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 268-269

Author(s):

Jaime Speiser ◽

Kathryn Callahan ◽

Jason Fanning ◽

Thomas Gill ◽

Anne Newman ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Receiver Operating Curve ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

Possibility of Autonomous Estimation of Shiba Goat’s Estrus and Non-Estrus Behavior by Machine Learning Methods

Animals ◽

10.3390/ani10050771 ◽

2020 ◽

Vol 10 (5) ◽

pp. 771

Author(s):

Toshiya Arakawa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Markov Models ◽

Tracking System ◽

Video Tracking ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.

Download Full-text

Machine learning approaches to the determinants of women’s vasomotor symptoms using general hospital data

10.21203/rs.3.rs-74144/v1 ◽

2020 ◽

Author(s):

Ki-Jin Ryu ◽

Kyong Wook Yi ◽

Yong Jin Kim ◽

Jung Ho Shin ◽

Jun Young Hur ◽

...

Keyword(s):

Machine Learning ◽

Insulin Resistance ◽

Random Forest ◽

Variable Importance ◽

Vasomotor Symptoms ◽

Model Assessment ◽

Cancer Antigen ◽

Learning Methods ◽

Machine Learning Methods ◽

Glutamyl Transferase

Abstract Background To analyze the determinants of women’s vasomotor symptoms (VMS) using machine learning. Methods Data came from Korea University Anam Hospital in Seoul, Korea, with 3298 women, aged 40–80 years, who attended their general health check from January 2010 to December 2012. Five machine learning methods were applied and compared for the prediction of VMS, measured by a Menopause Rating Scale. Variable importance, the effect of a variable on model performance, was used for identifying major determinants of VMS. Results In terms of the mean squared error, the random forest (0.9326) was much better than linear regression (12.4856) and artificial neural networks with one, two and three hidden layers (1.5576, 1.5184 and 1.5833, respectively). Based on variable importance from the random forest, the most important determinants of VMS were age, menopause age, thyroid stimulating hormone, monocyte and triglyceride, as well as gamma glutamyl transferase, blood urea nitrogen, cancer antigen 19 − 9, C-reactive protein and low-density-lipoprotein cholesterol. Indeed, the following determinants ranked within the top 20 in terms of variable importance: cancer antigen 125, total cholesterol, insulin, free thyroxine, forced vital capacity, alanine aminotransferase, forced expired volume in one second, height, homeostatic model assessment for insulin resistance and carcinoembryonic antigen. Conclusions Machine learning provides an invaluable decision support system for the prediction of VMS. For preventing VMS, preventive measures would be needed regarding the thyroid function, the lipid profile, the liver function, inflammation markers, insulin resistance, the monocyte, cancer antigens and the lung function.

Download Full-text

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

Journal of Translational Medicine ◽

10.1186/s12967-020-02550-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Kerry E. Poppenberg ◽

Vincent M. Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Download Full-text

Evaluating Different Machine Learning Methods for Upscaling Evapotranspiration from Flux Towers to the Regional Scale

Journal of Geophysical Research Atmospheres ◽

10.1029/2018jd028447 ◽

2018 ◽

Vol 123 (16) ◽

pp. 8674-8690 ◽

Cited By ~ 41

Author(s):

Tongren Xu ◽

Zhixia Guo ◽

Shaomin Liu ◽

Xinlei He ◽

Yangfanyu Meng ◽

...

Keyword(s):

Machine Learning ◽

Regional Scale ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Spatio-Temporal Prediction of the Epidemic Spread of Dangerous Pathogens Using Machine Learning Methods

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9010044 ◽

2020 ◽

Vol 9 (1) ◽

pp. 44 ◽

Cited By ~ 4

Author(s):

Wolfgang B. Hamer ◽

Tim Birr ◽

Joseph-Alexander Verreet ◽

Rainer Duttmann ◽

Holger Klink

Keyword(s):

Machine Learning ◽

Powdery Mildew ◽

Regional Scale ◽

R Package ◽

Good Time ◽

Climate Information ◽

Learning Methods ◽

Temporal Prediction ◽

Machine Learning Methods ◽

Weather And Climate

Real-time identification of the occurrence of dangerous pathogens is of crucial importance for the rapid execution of countermeasures. For this purpose, spatial and temporal predictions of the spread of such pathogens are indispensable. The R package papros developed by the authors offers an environment in which both spatial and temporal predictions can be made, based on local data using various deterministic, geostatistical regionalisation, and machine learning methods. The approach is presented using the example of a crops infection by fungal pathogens, which can substantially reduce the yield if not treated in good time. The situation is made more difficult by the fact that it is particularly difficult to predict the behaviour of wind-dispersed pathogens, such as powdery mildew (Blumeria graminis f. sp. tritici). To forecast pathogen development and spatial dispersal, a modelling process scheme was developed using the aforementioned R package, which combines regionalisation and machine learning techniques. It enables the prediction of the probability of yield- relevant infestation events for an entire federal state in northern Germany at a daily time scale. To run the models, weather and climate information are required, as is knowledge of the pathogen biology. Once fitted to the pathogen, only weather and climate information are necessary to predict such events, with an overall accuracy of 68% in the case of powdery mildew at a regional scale. Thereby, 91% of the observed powdery mildew events are predicted.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

The Journals of Gerontology Series A ◽

10.1093/gerona/glaa138 ◽

2020 ◽

Author(s):

Jaime Lynn Speiser ◽

Kathryn E Callahan ◽

Denise K Houston ◽

Jason Fanning ◽

Thomas M Gill ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods ◽

Using Data

Abstract Background Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. Method We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Results Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated using data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Conclusions Machine learning methods offer an alternative to traditional approaches for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

Remote Sensing ◽

10.3390/rs12060914 ◽

2020 ◽

Vol 12 (6) ◽

pp. 914 ◽

Cited By ~ 4

Author(s):

Mahdieh Danesh Yazdi ◽

Zheng Kuang ◽

Konstantina Dimakopoulou ◽

Benjamin Barratt ◽

Esra Suel ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Meteorological Data ◽

Fine Particulate Matter ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Technological Advances

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.

Download Full-text

Machine-Learning-Based Prediction of Land Prices in Seoul, South Korea

Sustainability ◽

10.3390/su132313088 ◽

2021 ◽

Vol 13 (23) ◽

pp. 13088

Author(s):

Jungsun Kim ◽

Jaewoong Won ◽

Hyeongsoon Kim ◽

Joonghyeok Heo

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Estate ◽

South Korea ◽

Price Volatility ◽

Real Estate Market ◽

Accurate Estimation ◽

Learning Methods ◽

Land Prices ◽

Machine Learning Methods

The accurate estimation of real estate value helps the development of real estate policies that can respond to the complexities and instability of the real estate market. Previously, statistical methods were used to estimate real estate value, but machine learning methods have gained popularity because their predictions are more accurate. In contrast to existing studies that use various machine learning methods to estimate the transactions or list prices of real estate properties without separating the building and land prices, this study estimates land price using a large amount of land-use information obtained from various land- and building-related datasets. The random forest and XGBoost methods were used to estimate 52,900 land prices in Seoul, South Korea, from January 2017 to December 2020. The models were also separately trained for different land uses and different time periods. Overall, the results revealed that XGBoost yields a higher prediction accuracy. Whereas the XGBoost models were more accurate on the 2020 data than on the 2017–2020 data when analyzing residential areas, the random forest models were more accurate on the 2017–2020 data than on the 2020 data. Further analysis will extend the prediction model to consider submarkets determined by price volatility and locality.

Download Full-text