Predicting lying, sitting, walking and running using Apple Watch and Fitbit data

ObjectivesThis study’s objective was to examine whether commercial wearable devices could accurately predict lying, sitting and varying intensities of walking and running.MethodsWe recruited a convenience sample of 49 participants (23 men and 26 women) to wear three devices, an Apple Watch Series 2, a Fitbit Charge HR2 and iPhone 6S. Participants completed a 65 min protocol consisting of 40 min of total treadmill time and 25 min of sitting or lying time. The study’s outcome variables were six movement types: lying, sitting, walking self-paced and walking/running at 3 metabolic equivalents of task (METs), 5 METs and 7 METs. All analyses were conducted at the minute level with heart rate, steps, distance and calories from Apple Watch and Fitbit. These included three different machine learning models: support vector machines, Random Forest and Rotation forest.ResultsOur dataset included 3656 and 2608 min of Apple Watch and Fitbit data, respectively. Rotation Forest models had the highest classification accuracies for Apple Watch at 82.6%, and Random Forest models had the highest accuracy for Fitbit at 90.8%. Classification accuracies for Apple Watch data ranged from 72.6% for sitting to 89.0% for 7 METs. For Fitbit, accuracies varied between 86.2% for sitting to 92.6% for 7 METs.ConclusionThis preliminary study demonstrated that data from commercial wearable devices could predict movement types with reasonable accuracy. More research is needed, but these methods are a proof of concept for movement type classification at the population level using commercial wearable device data.

Download Full-text

Using machine learning methods to predict physical activity types with Apple Watch and Fitbit data using indirect calorimetry as the criterion.

10.21203/rs.3.rs-17022/v1 ◽

2020 ◽

Author(s):

Daniel Fuller ◽

Javad Rahimipour Anaraki ◽

Bo Simango ◽

Faramarz Dorani ◽

Arastoo Bozorgi ◽

...

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Indirect Calorimetry ◽

Population Level ◽

Wearable Devices ◽

Outcome Variable ◽

Support Vector ◽

Rotation Forest ◽

Forest Models ◽

Activity Types

Abstract Background There is considerable promise for using commercial wearable devices for measuring physical activity at the population level. The objective of this study was to examine whether commercial wearable devices could accurately predict lying, sitting, and intensity level of other activities in a lab-based protocol. Methods We recruited a convenience sample of 49 participants (23 men and 26 women) to wear three devices, an Apple Watch Series 2, a Fitbit Charge HR2, and and iPhone 6S. Participants completed a 65-minute protocol consisting of 40 minutes of total treadmill time and 25 minutes of sitting or lying time. Indirect calorimetry was used to measure energy expenditure. The outcome variable for the study was the activity class; lying, sitting, walking self-paced, and running 3 METs, 5 METs, and 7 METs. Minute-by-minute heart rate, steps, distance, and calories from Apple Watch and Fitbit were included in four different machine learning models. Results Our dataset included 3656 and 2608 minutes of Apple Watch and Fitbit data, respectively. We tested decision trees, support vector machines, random forest, and rotation forest models. Rotation forest models had the highest classification accuracies at 82.6% for Apple Watch and 89.3% for Fitbit. Classification accuracies for Apple Watch data ranged from 72.5% for sitting to 89.0% for 7 METs . For Fitbit, accuracies varied between 86.2% for sitting to 92.6% for 7 METs . Conclusion This study demonstrated that commercial wearable devices, Apple Watch and Fitbit, were able to predict physical activity types with a reasonable accuracy. The results support the use of minute-by-minute data from Apple Watch and Fitbit combined with machine learning approaches for scalable physical activity type classification at the population level.

Download Full-text

Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping

Water Resources Management ◽

10.1007/s11269-017-1660-3 ◽

2017 ◽

Vol 31 (9) ◽

pp. 2761-2775 ◽

Cited By ~ 103

Author(s):

Seyed Amir Naghibi ◽

Kourosh Ahmadi ◽

Alireza Daneshi

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Random Forest ◽

Groundwater Potential ◽

Support Vector ◽

Forest Models ◽

Random Forest Models ◽

Potential Mapping ◽

Groundwater Potential Mapping

Download Full-text

Fusion of Ultraviolet and Infrared Spectra Using Support Vector Machine and Random Forest Models for the Discrimination of Wild and Cultivated Mushrooms

Analytical Letters ◽

10.1080/00032719.2019.1692857 ◽

2019 ◽

Vol 53 (7) ◽

pp. 1019-1033

Author(s):

Sen Yao ◽

Jie-Qing Li ◽

Zhi-Li Duan ◽

Tao Li ◽

Yuan-Zhong Wang

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Infrared Spectra ◽

Support Vector ◽

Forest Models ◽

Random Forest Models ◽

Cultivated Mushrooms

Download Full-text

Data-Driven Wildfire Risk Prediction in Northern California

Atmosphere ◽

10.3390/atmos12010109 ◽

2021 ◽

Vol 12 (1) ◽

pp. 109

Author(s):

Ashima Malik ◽

Megha Rajam Rao ◽

Nandini Puppala ◽

Prathusha Koouri ◽

Venkata Anil Kumar Thota ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Curves ◽

Data Driven ◽

Northern California ◽

Combined Model ◽

Wildfire Risk ◽

Study Results ◽

Forest Models ◽

Random Forest Models

Over the years, rampant wildfires have plagued the state of California, creating economic and environmental loss. In 2018, wildfires cost nearly 800 million dollars in economic loss and claimed more than 100 lives in California. Over 1.6 million acres of land has burned and caused large sums of environmental damage. Although, recently, researchers have introduced machine learning models and algorithms in predicting the wildfire risks, these results focused on special perspectives and were restricted to a limited number of data parameters. In this paper, we have proposed two data-driven machine learning approaches based on random forest models to predict the wildfire risk at areas near Monticello and Winters, California. This study demonstrated how the models were developed and applied with comprehensive data parameters such as powerlines, terrain, and vegetation in different perspectives that improved the spatial and temporal accuracy in predicting the risk of wildfire including fire ignition. The combined model uses the spatial and the temporal parameters as a single combined dataset to train and predict the fire risk, whereas the ensemble model was fed separate parameters that were later stacked to work as a single model. Our experiment shows that the combined model produced better results compared to the ensemble of random forest models on separate spatial data in terms of accuracy. The models were validated with Receiver Operating Characteristic (ROC) curves, learning curves, and evaluation metrics such as: accuracy, confusion matrices, and classification report. The study results showed and achieved cutting-edge accuracy of 92% in predicting the wildfire risks, including ignition by utilizing the regional spatial and temporal data along with standard data parameters in Northern California.

Download Full-text

Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major U.S. metropolitan area

Computers Environment and Urban Systems ◽

10.1016/j.compenvurbsys.2021.101599 ◽

2021 ◽

Vol 87 ◽

pp. 101599

Author(s):

Zhiyue Xia ◽

Kathleen Stewart ◽

Junchuan Fan

Keyword(s):

Random Forest ◽

Metropolitan Area ◽

Space And Time ◽

Forest Models ◽

Random Forest Models

Download Full-text

Landslide susceptibility assessment for a transmission line in Gansu Province, China by using a hybrid approach of fractal theory, information value, and random forest models

Environmental Earth Sciences ◽

10.1007/s12665-021-09737-w ◽

2021 ◽

Vol 80 (12) ◽

Author(s):

Binbin Zhao ◽

Yunfeng Ge ◽

Hongzhi Chen

Keyword(s):

Random Forest ◽

Landslide Susceptibility ◽

Fractal Theory ◽

Hybrid Approach ◽

Gansu Province ◽

Information Value ◽

Susceptibility Assessment ◽

Landslide Susceptibility Assessment ◽

Forest Models ◽

Random Forest Models

Download Full-text

Random forest models of 305-days milk yield for Holstein cows in Bulgaria

10.1063/5.0034778 ◽

2020 ◽

Author(s):

A. Yordanova ◽

H. Kulina

Keyword(s):

Random Forest ◽

Milk Yield ◽

Holstein Cows ◽

Forest Models ◽

Random Forest Models

Download Full-text

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2012040103 ◽

2012 ◽

Vol 8 (2) ◽

pp. 44-63 ◽

Cited By ~ 30

Author(s):

Baoxun Xu ◽

Joshua Zhexue Huang ◽

Graham Williams ◽

Qiang Wang ◽

Yunming Ye

Keyword(s):

Random Forest ◽

High Dimensional Data ◽

Real Life ◽

Classification Performance ◽

Feature Weighting ◽

Random Forest Model ◽

High Dimensional ◽

Forest Model ◽

Forest Models ◽

Random Forest Models

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.

Download Full-text

Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in GIS

Journal of Environmental Management ◽

10.1016/j.jenvman.2018.11.110 ◽

2019 ◽

Vol 232 ◽

pp. 928-942 ◽

Cited By ~ 46

Author(s):

Alireza Arabameri ◽

Biswajeet Pradhan ◽

Khalil Rezaei

Keyword(s):

Random Forest ◽

Geographically Weighted Regression ◽

Gully Erosion ◽

Weighted Regression ◽

Certainty Factor ◽

Forest Models ◽

Random Forest Models

Download Full-text

Data Mining Crystallization Kinetics

10.26434/chemrxiv.11708286 ◽

2020 ◽

Author(s):

Cameron Brown ◽

Diego Maldonado ◽

Antony Vassileiou ◽

Blair Johnston ◽

Alastair Florence

Keyword(s):

Random Forest ◽

Kinetic Parameters ◽

Crystallization Kinetics ◽

Balance Model ◽

Forest Models ◽

Vast Literature ◽

Random Forest Models ◽

Kinetic Expression ◽

Population Balances ◽

Different Sources

<p>Population balance model is a valuable modelling tool which facilitates the optimization and understanding of crystallization processes. However, in order to use this tool, it is necessary to have previous knowledge of the crystallization kinetics, specifically crystal growth and nucleation. The majority of approaches to achieve proper estimations of kinetic parameters required experimental data. Across time, a vast literature about the estimation of kinetic parameters and population balances have been published. Considering the availability of data, this work built a database with information on solute, solvent, kinetic expression, parameters, crystallization method and seeding. Correlations were assessed and clusters structures identified by hierarchical clustering analysis. The final database contains 336 data of kinetic parameters from 185 different sources. The data were analysed using kinetic parameters of the most common expressions. Subsequently, clusters were identified for each kinetic model. With these clusters, classification random forest models were made using solute descriptors, seeding, solvent, and crystallization methods as classifiers. Random forest models had an overall classification accuracy higher than 70% whereby they were useful to provide rough estimates of kinetic parameters, although these methods have some limitations.</p>

Download Full-text