Tuning machine learning dropout for subsurface uncertainty model accuracy

AbstractCho et al. report deep learning model accuracy for tilted myopic disc detection in a South Korean population. Here we explore the importance of generalisability of machine learning (ML) in healthcare, and we emphasise that recurrent underrepresentation of data-poor regions may inadvertently perpetuate global health inequity.Creating meaningful ML systems is contingent on understanding how, when, and why different ML models work in different settings. While we echo the need for the diversification of ML datasets, such a worthy effort would take time and does not obviate uses of presently available datasets if conclusions are validated and re-calibrated for different groups prior to implementation.The importance of external ML model validation on diverse populations should be highlighted where possible – especially for models built with single-centre data.

Download Full-text

Machine Learning Algorithms To Improve Model Accuracy and Latency, and Human-Autonomy Teaming

2018 Modeling and Simulation Technologies Conference ◽

10.2514/6.2018-4063 ◽

2018 ◽

Cited By ~ 1

Author(s):

Vincent E. Houston ◽

Bryan Barrows ◽

Walter Manuel ◽

Lisa R. Le Vie

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Model Accuracy ◽

Improve Model ◽

Human Autonomy

Download Full-text

Applying Machine-Learning to Human Gastrointestinal Microbial Species to Predict Dietary Intake (P20-040-19)

Current Developments in Nutrition ◽

10.1093/cdn/nzz040.p20-040-19 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

Cited By ~ 1

Author(s):

Leila Shinn ◽

Yutong Li ◽

Ruoqing Zhu ◽

Aditya Mansharamani ◽

Loretta Auvil ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Dietary Intake ◽

Predictive Accuracy ◽

Bacterial Species ◽

Whole Grains ◽

Classification Error ◽

Whole Grain ◽

Model Accuracy ◽

Food And Agriculture

Abstract Objectives To better understand host-microbe interactions, a more computationally intensive, multivariate, machine learning approach must be utilized. Accordingly, we aimed to identify biomarkers with high predictive accuracy for dietary intake. Methods Data were aggregated from five randomized, controlled, feeding studies in adults (n = 199) that provided avocados, almonds, broccoli, walnuts, or whole grain oats and whole grain barley. Fecal samples were collected during treatment and control periods for each study for DNA extraction. Subsequently, the 16S rRNA gene (V4 region) was amplified and sequenced. Sequence data were analyzed using DADA2 and QIIME2. Marginal screening using the Kruskal-Wallis test was performed on all species-level taxa to examine the differences between each of the 6 treatment groups and respective control groups. The top 20 species from each diet were selected and pooled together for multiclass classification using random forest. The resultant bacterial species were further decreased in a stepwise fashion and iteratively analyzed with the variable importance generated from random forest to determine a compact feature set with a minor loss of accuracy in the prediction of food consumed. Result When all six foods were analyzed together using the top 20 species of each diet, oats and barley were frequently confused for each other, with 44% and 47% classification error, respectively, and the overall model accuracy was 66%. Collapsing oats and barley into one category, whole grains, reduced the classification error of the whole grain category to 6% and improved the overall model accuracy to 73%. Refitting the random forest with the top 30, 20, and 10 important species resulted in correct identification of the 5 foods (avocados, almonds, broccoli, walnuts, and whole grains) 75%, 74%, and 70% of the time, respectively. Conclusions These results reveal promise in accurately predicting foods consumed using bacterial species as biomarkers. Ongoing analyses include incorporation of metagenomic and metabolomic data into the models to improve predictive accuracy and utilize the multi-omics dataset to predict health status. Long-term, these approaches may inform diet-microbiota-tailored recommendations. Funding Sources This research was funded by The Foundation for Food and Agriculture Research, USDA, Hass Avocado Board, and USDA National Institute of Food and Agriculture, Hatch project 1009249.

Download Full-text

Privacy-Preserving Gradient Boosting Decision Trees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5422 ◽

2020 ◽

Vol 34 (01) ◽

pp. 784-791 ◽

Cited By ~ 1

Author(s):

Qinbin Li ◽

Zhaomin Wu ◽

Zeyi Wen ◽

Bingsheng He

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Training Data ◽

Gradient Boosting ◽

Training Algorithm ◽

Model Accuracy ◽

Machine Learning Model ◽

Improve Model ◽

Privacy Budget ◽

Privacy Level

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

Download Full-text

Prediction and Forecasting of Air Quality Index in Chennai using Regression and ARIMA time series models

Journal of Engineering Research ◽

10.36909/jer.10253 ◽

2021 ◽

Vol 9 ◽

Author(s):

Geetha Mani ◽

◽

Joshi Kumar Viswanadhapalli ◽

Albert Alexander Stonie ◽

◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Air Quality ◽

Linear Regression ◽

Quality Index ◽

Air Quality Index ◽

Model Parameters ◽

Sensor Output ◽

Model Accuracy ◽

Life On Earth

Air is one of the most fundamental constituents for the sustenance of life on earth. The meteorological, traffic factors, consumption of non-renewable energy sources, and industrial parameters are steadily increasing air pollution. These factors affect the welfare and prosperity of life on earth; therefore, the nature of air quality in our environment needs to be monitored continuously. The Air Quality Index (AQI), which indicates air quality, is influenced by several individual factors such as the accumulation of NO2, CO, O3, PM2.5, SO2, and PM10. This research paper aims to predict and forecast the AQI with Machine Learning (ML) techniques, namely linear regression and time series analysis. Primarily,Multi Linear Regression (MLR) model, supervised machine learning, is developed to predict AQI. NO2, Ozone(O3), PM 2.5, and SO2 sensor output collected from Central Pollution Control Board (CPCB) – Chennai region, India feed as input features and optimized AQI calculated from sensor's output set as a target to train the regression model. The obtained model parameters are validated with new and unseen sensor output. The Key Performance Indices(KPI) like co-efficient of determination, root mean square error and mean absolute error were calculated to validate the model accuracy. The K-cross-fold validation for testing data of MLR was obtained as around 92%. Secondly, the Auto-Regressive Integrated Moving Average (ARIMA) time series model is applied to forecast the AQI. The obtained model parameters were validated with unseen data with a timestamp. The forecasted AQI value of the next 15 days lies in a 95 % confidence interval zone. The model accuracy of test data was obtained as more than 80%.

Download Full-text

Comparison of Machine Learning Regression Algorithms for Cotton Leaf Area Index Retrieval Using Sentinel-2 Spectral Bands

Applied Sciences ◽

10.3390/app9071459 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1459 ◽

Cited By ~ 5

Author(s):

Huihui Mao ◽

Jihua Meng ◽

Fujiang Ji ◽

Qiankun Zhang ◽

Huiting Fang

Keyword(s):

Machine Learning ◽

Sample Size ◽

Computational Efficiency ◽

Comprehensive Evaluation ◽

Training Sample ◽

Model Accuracy ◽

Area Index ◽

Training Sample Size ◽

Spectral Bands ◽

Sentinel 2

Leaf area index (LAI) is a crucial crop biophysical parameter that has been widely used in a variety of fields. Five state-of-the-art machine learning regression algorithms (MLRAs), namely, artificial neural network (ANN), support vector regression (SVR), Gaussian process regression (GPR), random forest (RF) and gradient boosting regression tree (GBRT), have been used in the retrieval of cotton LAI with Sentinel-2 spectral bands. The performances of the five machine learning models are compared for better applications of MLRAs in remote sensing, since challenging problems remain in the selection of MLRAs for crop LAI retrieval, as well as the decision as to the optimal number for the training sample size and spectral bands to different MLRAs. A comprehensive evaluation was employed with respect to model accuracy, computational efficiency, sensitivity to training sample size and sensitivity to spectral bands. We conducted the comparison of five MLRAs in an agricultural area of Northwest China over three cotton seasons with the corresponding field campaigns for modeling and validation. Results show that the GBRT model outperforms the other models with respect to model accuracy in average ( R 2 ¯ = 0.854, R M S E ¯ = 0.674 and M A E ¯ = 0.456). SVR achieves the best performance in computational efficiency, which means it is fast to train, and to validate that it has great potentials to deliver near-real-time operational products for crop management. As for sensitivity to training sample size, GBRT behaves as the most robust model, and provides the best model accuracy on the average among the variations of training sample size, compared with other models ( R 2 ¯ = 0.884, R M S E ¯ = 0.615 and M A E ¯ = 0.452). Spectral bands sensitivity analysis with dCor (distance correlation), combined with the backward elimination approach, indicates that SVR, GPR and RF provide relatively robust performance to the spectral bands, while ANN outperforms the other models in terms of model accuracy on the average among the reduction of spectral bands ( R 2 ¯ = 0.881, R M S E ¯ = 0.625 and M A E ¯ = 0.480). A comprehensive evaluation indicates that GBRT is an appealing alternative for cotton LAI retrieval, except for its computational efficiency. Despite the different performance of the ML models, all models exhibited considerable potential for cotton LAI retrieval, which could offer accurate crop parameters information timely and accurately for crop fields management and agricultural production decisions.

Download Full-text

Improving Model Accuracy with Probability Scoring Machine Learning Models

10.1007/978-3-030-71704-9_34 ◽

2021 ◽

pp. 517-530

Author(s):

Juily Vasandani ◽

Saumya Bharti ◽

Deepankar Singh ◽

Shreeansh Priyadarshi

Keyword(s):

Machine Learning ◽

Learning Models ◽

Model Accuracy ◽

Machine Learning Models

Download Full-text

Application of Raman spectroscopy and Machine Learning algorithms for fruit distillates discrimination

Scientific Reports ◽

10.1038/s41598-020-78159-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Camelia Berghian-Grosan ◽

Dana Alina Magdas

Keyword(s):

Machine Learning ◽

Raman Spectroscopy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Model Accuracy ◽

Region I ◽

Geographical Discrimination ◽

First Time ◽

Region Ii ◽

Fruit Spirits

AbstractThrough this pilot study, the association between Raman spectroscopy and Machine Learning algorithms were used for the first time with the purpose of distillates differentiation with respect to trademark, geographical and botanical origin. Two spectral Raman ranges (region I—200–600 cm−1 and region II—1200–1400 cm−1) appeared to have the higher discrimination potential for the investigated distillates. The proposed approach proved to be a very effective one for trademark fingerprint differentiation, a model accuracy of 95.5% being obtained (only one sample was misclassified). A comparable model accuracy (90.9%) was achieved for the geographical discrimination of the fruit spirits which can be considered as a very good one taking into account that this classification was made inside Transylvania region, among neighbouring areas. Because the trademark fingerprint is the prevailing one, the successfully distillate type differentiation, with respect to the fruit variety, was possible to be made only inside of each producing entity.

Download Full-text

Breast Cancer Prediction based on Deep Neural Network Model Implemented AWS Machine Learning Platform

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3944.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 868-873

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Machine Learning ◽

Deep Neural Network ◽

Breast Tissue ◽

Model Accuracy ◽

Cancer Prediction ◽

Learning Platform ◽

Public Dataset ◽

And Performance

Breast cancer in women is one of the most dangerous cancers leading to death in women by developing breast tissue. In this work, the application of the Deep Neural Network (DNN) model is implemented on AWS machine learning platform, besides, a comparison with other ML techniques includes XGBoost and Random Forest on a public dataset. Breast cancer prediction based on DNN model with Hyperparameter tuning has the best results of the plot of model accuracy for the training and validation sets and performance evaluation metrics to test the model.

Download Full-text

Effects of Class Imbalance Using Machine Learning Algorithms

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2021010101 ◽

2021 ◽

Vol 12 (1) ◽

pp. 1-17

Author(s):

Swati V. Narwane ◽

Sudhir D. Sawarkar

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Class Imbalance ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Training Data ◽

Model Accuracy ◽

Data Set ◽

Class Distribution ◽

Imbalance Problem

Class imbalance is the major hurdle for machine learning-based systems. Data set is the backbone of machine learning and must be studied to handle the class imbalance. The purpose of this paper is to investigate the effect of class imbalance on the data sets. The proposed methodology determines the model accuracy for class distribution. To find possible solutions, the behaviour of an imbalanced data set was investigated. The study considers two case studies with data set divided balanced to unbalanced class distribution. Testing of the data set with trained and test data was carried out for standard machine learning algorithms. Model accuracy for class distribution was measured with the training data set. Further, the built model was tested with individual binary class. Results show that, for the improvement of the system performance, it is essential to work on class imbalance problems. The study concludes that the system produces biased results due to the majority class. In the future, the multiclass imbalance problem can be studied using advanced algorithms.

Download Full-text