Gamma/Hadron Separation for a Ground Based IACT in Experiment TAIGA Using Machine Learning Methods Random Forest

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Innovation in Aging ◽

10.1093/geroni/igaa057.859 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 268-269

Author(s):

Jaime Speiser ◽

Kathryn Callahan ◽

Jason Fanning ◽

Thomas Gill ◽

Anne Newman ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Receiver Operating Curve ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

Possibility of Autonomous Estimation of Shiba Goat’s Estrus and Non-Estrus Behavior by Machine Learning Methods

Animals ◽

10.3390/ani10050771 ◽

2020 ◽

Vol 10 (5) ◽

pp. 771

Author(s):

Toshiya Arakawa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Markov Models ◽

Tracking System ◽

Video Tracking ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.

Download Full-text

Machine learning approaches to the determinants of women’s vasomotor symptoms using general hospital data

10.21203/rs.3.rs-74144/v1 ◽

2020 ◽

Author(s):

Ki-Jin Ryu ◽

Kyong Wook Yi ◽

Yong Jin Kim ◽

Jung Ho Shin ◽

Jun Young Hur ◽

...

Keyword(s):

Machine Learning ◽

Insulin Resistance ◽

Random Forest ◽

Variable Importance ◽

Vasomotor Symptoms ◽

Model Assessment ◽

Cancer Antigen ◽

Learning Methods ◽

Machine Learning Methods ◽

Glutamyl Transferase

Abstract Background To analyze the determinants of women’s vasomotor symptoms (VMS) using machine learning. Methods Data came from Korea University Anam Hospital in Seoul, Korea, with 3298 women, aged 40–80 years, who attended their general health check from January 2010 to December 2012. Five machine learning methods were applied and compared for the prediction of VMS, measured by a Menopause Rating Scale. Variable importance, the effect of a variable on model performance, was used for identifying major determinants of VMS. Results In terms of the mean squared error, the random forest (0.9326) was much better than linear regression (12.4856) and artificial neural networks with one, two and three hidden layers (1.5576, 1.5184 and 1.5833, respectively). Based on variable importance from the random forest, the most important determinants of VMS were age, menopause age, thyroid stimulating hormone, monocyte and triglyceride, as well as gamma glutamyl transferase, blood urea nitrogen, cancer antigen 19 − 9, C-reactive protein and low-density-lipoprotein cholesterol. Indeed, the following determinants ranked within the top 20 in terms of variable importance: cancer antigen 125, total cholesterol, insulin, free thyroxine, forced vital capacity, alanine aminotransferase, forced expired volume in one second, height, homeostatic model assessment for insulin resistance and carcinoembryonic antigen. Conclusions Machine learning provides an invaluable decision support system for the prediction of VMS. For preventing VMS, preventive measures would be needed regarding the thyroid function, the lipid profile, the liver function, inflammation markers, insulin resistance, the monocyte, cancer antigens and the lung function.

Download Full-text

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

Journal of Translational Medicine ◽

10.1186/s12967-020-02550-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Kerry E. Poppenberg ◽

Vincent M. Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

The Journals of Gerontology Series A ◽

10.1093/gerona/glaa138 ◽

2020 ◽

Author(s):

Jaime Lynn Speiser ◽

Kathryn E Callahan ◽

Denise K Houston ◽

Jason Fanning ◽

Thomas M Gill ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods ◽

Using Data

Abstract Background Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. Method We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Results Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated using data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Conclusions Machine learning methods offer an alternative to traditional approaches for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

Remote Sensing ◽

10.3390/rs12060914 ◽

2020 ◽

Vol 12 (6) ◽

pp. 914 ◽

Cited By ~ 4

Author(s):

Mahdieh Danesh Yazdi ◽

Zheng Kuang ◽

Konstantina Dimakopoulou ◽

Benjamin Barratt ◽

Esra Suel ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Meteorological Data ◽

Fine Particulate Matter ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Technological Advances

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.

Download Full-text

Machine-Learning-Based Prediction of Land Prices in Seoul, South Korea

Sustainability ◽

10.3390/su132313088 ◽

2021 ◽

Vol 13 (23) ◽

pp. 13088

Author(s):

Jungsun Kim ◽

Jaewoong Won ◽

Hyeongsoon Kim ◽

Joonghyeok Heo

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Estate ◽

South Korea ◽

Price Volatility ◽

Real Estate Market ◽

Accurate Estimation ◽

Learning Methods ◽

Land Prices ◽

Machine Learning Methods

The accurate estimation of real estate value helps the development of real estate policies that can respond to the complexities and instability of the real estate market. Previously, statistical methods were used to estimate real estate value, but machine learning methods have gained popularity because their predictions are more accurate. In contrast to existing studies that use various machine learning methods to estimate the transactions or list prices of real estate properties without separating the building and land prices, this study estimates land price using a large amount of land-use information obtained from various land- and building-related datasets. The random forest and XGBoost methods were used to estimate 52,900 land prices in Seoul, South Korea, from January 2017 to December 2020. The models were also separately trained for different land uses and different time periods. Overall, the results revealed that XGBoost yields a higher prediction accuracy. Whereas the XGBoost models were more accurate on the 2020 data than on the 2017–2020 data when analyzing residential areas, the random forest models were more accurate on the 2017–2020 data than on the 2020 data. Further analysis will extend the prediction model to consider submarkets determined by price volatility and locality.

Download Full-text

Prediction of Liver Weight Recovery by an Integrated Metabolomics and Machine Learning Approach After 2/3 Partial Hepatectomy

Frontiers in Pharmacology ◽

10.3389/fphar.2021.760474 ◽

2021 ◽

Vol 12 ◽

Author(s):

Runbin Sun ◽

Haokai Zhao ◽

Shuzhen Huang ◽

Ran Zhang ◽

Zhenyao Lu ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Liver Regeneration ◽

Partial Hepatectomy ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Liver Index ◽

Extreme Gradient Boosting

Liver has an ability to regenerate itself in mammals, whereas the mechanism has not been fully explained. Here we used a GC/MS-based metabolomic method to profile the dynamic endogenous metabolic change in the serum of C57BL/6J mice at different times after 2/3 partial hepatectomy (PHx), and nine machine learning methods including Least Absolute Shrinkage and Selection Operator Regression (LASSO), Partial Least Squares Regression (PLS), Principal Components Regression (PCR), k-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), eXtreme Gradient Boosting (xgbDART), Neural Network (NNET) and Bayesian Regularized Neural Network (BRNN) were used for regression between the liver index and metabolomic data at different stages of liver regeneration. We found a tree-based random forest method that had the minimum average Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and the maximum R square (R2) and is time-saving. Furthermore, variable of importance in the project (VIP) analysis of RF method was performed and metabolites with VIP ranked top 20 were selected as the most critical metabolites contributing to the model. Ornithine, phenylalanine, 2-hydroxybutyric acid, lysine, etc. were chosen as the most important metabolites which had strong correlations with the liver index. Further pathway analysis found Arginine biosynthesis, Pantothenate and CoA biosynthesis, Galactose metabolism, Valine, leucine and isoleucine degradation were the most influenced pathways. In summary, several amino acid metabolic pathways and glucose metabolism pathway were dynamically changed during liver regeneration. The RF method showed advantages for predicting the liver index after PHx over other machine learning methods used and a metabolic clock containing four metabolites is established to predict the liver index during liver regeneration.

Download Full-text

Ensemble machine learning methods for spatio-temporal data analysis of plant and ratoon sugarcane

Intelligent Data Analysis ◽

10.3233/ida-205302 ◽

2021 ◽

Vol 25 (5) ◽

pp. 1291-1322

Author(s):

Sandeep Kumar Singla ◽

Rahul Dev Garg ◽

Om Prakash Dubey

Keyword(s):

Machine Learning ◽

Random Forest ◽

Binary Classification ◽

Temporal Variations ◽

Classification Model ◽

Gradient Boosting ◽

Remotely Sensed Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Classification And Regression

Recent technological enhancements in the field of information technology and statistical techniques allowed the sophisticated and reliable analysis based on machine learning methods. A number of machine learning data analytical tools may be exploited for the classification and regression problems. These tools and techniques can be effectively used for the highly data-intensive operations such as agricultural and meteorological applications, bioinformatics and stock market analysis based on the daily prices of the market. Machine learning ensemble methods such as Decision Tree (C5.0), Classification and Regression (CART), Gradient Boosting Machine (GBM) and Random Forest (RF) has been investigated in the proposed work. The proposed work demonstrates that temporal variations in the spectral data and computational efficiency of machine learning methods may be effectively used for the discrimination of types of sugarcane. The discrimination has been considered as a binary classification problem to segregate ratoon from plantation sugarcane. Variable importance selection based on Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini (MDG) have been used to create the appropriate dataset for the classification. The performance of the binary classification model based on RF is the best in all the possible combination of input images. Feature selection based on MDA and MDG measures of RF is also important for the dimensionality reduction. It has been observed that RF model performed best with 97% accuracy, whereas the performance of GBM method is the lowest. Binary classification based on the remotely sensed data can be effectively handled using random forest method.

Download Full-text