Statistical and Machine Learning Models for Classification of Human Wear and Delivery Days in Accelerometry Data

Accelerometers are increasingly being used in biomedical research, but the analysis of accelerometry data is often complicated by both the massive size of the datasets and the collection of unwanted data from the process of delivery to study participants. Current methods for removing delivery data involve arduous manual review of dense datasets. We aimed to develop models for the classification of days in accelerometry data as activity from human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery. We developed statistical and machine learning models for the classification of accelerometry data in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. Model performances were assessed and compared using Monte Carlo cross-validation. We found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively. The best performing models and related data processing techniques are made publicly available in the R package, Physical Activity.

Download Full-text

Statistical and machine learning models for classification of human wear and delivery days in accelerometry data

10.1101/2020.12.31.424867 ◽

2021 ◽

Author(s):

Ryan Moore ◽

Kristin R. Archer ◽

Leena Choi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Human Activity ◽

Recurrent Neural Network ◽

Learning Models ◽

Learning Context ◽

Machine Learning Models

AbstractPurposeAccelerometers are increasingly utilized in healthcare research to assess human activity. Accelerometry data are often collected by mailing accelerometers to participants, who wear the accelerometers to collect data on their activity. The devices are then mailed back to the laboratory for analysis. We develop models to classify days in accelerometry data as activity from actual human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery.MethodsFor the classification of delivery days in accelerometry data, we developed statistical and machine learning models in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. We extracted several features, which were included to develop random forest, logistic regression, mixed effects regression, and multilayer perceptron models, while convolutional neural network, recurrent neural network, and hybrid convolutional recurrent neural network models were developed without feature extraction. Model performances were assessed using Monte Carlo cross-validation.ResultsWe found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively.ConclusionThe models developed in this study can be used to classify days in accelerometry data as either human or delivery activity. An analyst can weigh the larger computational cost and greater performance of the convolutional recurrent neural network against the faster but slightly less powerful random forest or logistic regression. The best performing models for classification of delivery data are publicly available on the open source R package, PhysicalActivity.

Download Full-text

Federated Learning in a Medical Context: A Systematic Literature Review

ACM Transactions on Internet Technology ◽

10.1145/3412357 ◽

2021 ◽

Vol 21 (2) ◽

pp. 1-31

Author(s):

Bjarne Pfitzner ◽

Nico Steckhan ◽

Bert Arnrich

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Data Privacy ◽

Research Area ◽

Learning Models ◽

Related Data ◽

Private Data ◽

Large Databases ◽

Machine Learning Models

Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients’ anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.

Download Full-text

Assessment of Machine Learning Models for Classification of Movement Patterns During a Weight-Shifting Exergame

IEEE Transactions on Human-Machine Systems ◽

10.1109/thms.2021.3059716 ◽

2021 ◽

pp. 1-11

Author(s):

Elise Klaebo Vonstad ◽

Beatrix Vereijken ◽

Kerstin Bach ◽

Xiaomeng Su ◽

Jan Harald Nilsen

Keyword(s):

Machine Learning ◽

Movement Patterns ◽

Learning Models ◽

Weight Shifting ◽

Machine Learning Models

Download Full-text

A survey on various image processing techniques and machine learning models to detect, quantify and classify foliar plant disease

Proceedings of the Indian National Science Academy ◽

10.1007/s43538-021-00027-4 ◽

2021 ◽

Author(s):

Akruti Naik ◽

Hetal Thaker ◽

Dhaval Vyas

Keyword(s):

Machine Learning ◽

Image Processing ◽

Plant Disease ◽

Learning Models ◽

Image Processing Techniques ◽

Processing Techniques ◽

Machine Learning Models

Download Full-text

Random forest and long short-term memory based machine learning models for classification of ion mobility spectrometry spectra

Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE) Sensing XXII ◽

10.1117/12.2585829 ◽

2021 ◽

Author(s):

Patrick C. Riley ◽

Samir V. Deshpande ◽

Brian S. Ince ◽

Brian C. Hauck ◽

Kyle P. O'Donnell ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ion Mobility ◽

Short Term Memory ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Machine Learning Models

Download Full-text

Multivariate Classification of Drugs using Parametric and Nonparametric Machine Learning Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8740.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2021-2027

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Biological Activities ◽

Biological Effects ◽

Recursive Feature Elimination ◽

Drug Candidate ◽

Learning Models ◽

Machine Learning Models ◽

Non Parametric

In pharmaceutical research, traditional drug discovery process is time consuming and expensive, where several compounds are experimentally tested for their biological activities. Series of lab experiments are conducted to analyze newly synthesized drug’s pharmaceutical activities and its biological effects on human. With every new drug discovery, the required clinical properties can be determined using machine learning models and this greatly reduces the experimental cost. This paper explores parametric and non-parametric machine learning models to classify administration properties of drugs and its toxicity. The multinomial classification of drugs was based on their physicochemical and ADMET properties. Balanced data samples were drawn from chEMBL and was pre-processed. Features were reduced using Recursive Feature Elimination and the attributes were ranked based on their importance to reduce highly correlated attributes. The performance of parametric and non-parametric machine learning models was analyzed on cheminformatic data that includes physiochemical, biological and pharmaceutical properties of the drug molecules. Selecting the potent drug candidate along with its administration properties greatly reduces wet lab experimental time and cost. Multiclass classification can be determined efficiently using non-parametric machine learning model. Optimal feature engineering, tuning hyperparameters and adopting hybrid algorithms would result in more accurate predictions in future for cheminformatics data.

Download Full-text

The Classification of Skateboarding Tricks by Means of the Integration of Transfer Learning and Machine Learning Models

Embracing Industry 4.0 - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-15-6025-5_20 ◽

2020 ◽

pp. 219-226

Author(s):

Muhammad Nur Aiman Shapiee ◽

Muhammad Ar Rahim Ibrahim ◽

Mohd Azraai Mohd Razman ◽

Muhammad Amirul Abdullah ◽

Rabiu Muazu Musa ◽

...

Keyword(s):

Machine Learning ◽

Transfer Learning ◽

Learning Models ◽

Machine Learning Models

Download Full-text

chemmodlab: a cheminformatics modeling laboratory R package for fitting and assessing machine learning models

Journal of Cheminformatics ◽

10.1186/s13321-018-0309-4 ◽

2018 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Jeremy R. Ash ◽

Jacqueline M. Hughes-Oliver

Keyword(s):

Machine Learning ◽

R Package ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Benchmarking machine learning models for the analysis of genetic data using FRESA.CAD Binary Classification Benchmarking

10.1101/733675 ◽

2019 ◽

Author(s):

Javier de Velasco Oriol ◽

Antonio Martinez-Torteya ◽

Victor Trevino ◽

Israel Alanis ◽

Edgar E. Vallejo ◽

...

Keyword(s):

Machine Learning ◽

Model Selection ◽

Binary Classification ◽

Genetic Data ◽

R Package ◽

Learning Models ◽

Classification Problems ◽

Machine Learning Methods ◽

Computational Perspective ◽

Machine Learning Models

AbstractBackgroundMachine learning models have proven to be useful tools for the analysis of genetic data. However, with the availability of a wide variety of such methods, model selection has become increasingly difficult, both from the human and computational perspective.ResultsWe present the R package FRESA.CAD Binary Classification Benchmarking that performs systematic comparisons between a collection of representative machine learning methods for solving binary classification problems on genetic datasets.ConclusionsFRESA.CAD Binary Benchmarking demonstrates to be a useful tool over a variety of binary classification problems comprising the analysis of genetic data showing both quantitative and qualitative advantages over similar packages.

Download Full-text

Glycowork: A Python package for glycan data science and machine learning

10.1101/2021.04.22.440981 ◽

2021 ◽

Author(s):

Luc Thomès ◽

Rebekka Burkholz ◽

Daniel Bojar

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Science ◽

Biological Processes ◽

Biological Sequence ◽

Learning Models ◽

Related Data ◽

Strong Focus ◽

Python Package ◽

Machine Learning Models

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.

Download Full-text