scholarly journals Statistical and machine learning models for classification of human wear and delivery days in accelerometry data

2021 ◽  
Author(s):  
Ryan Moore ◽  
Kristin R. Archer ◽  
Leena Choi

AbstractPurposeAccelerometers are increasingly utilized in healthcare research to assess human activity. Accelerometry data are often collected by mailing accelerometers to participants, who wear the accelerometers to collect data on their activity. The devices are then mailed back to the laboratory for analysis. We develop models to classify days in accelerometry data as activity from actual human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery.MethodsFor the classification of delivery days in accelerometry data, we developed statistical and machine learning models in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. We extracted several features, which were included to develop random forest, logistic regression, mixed effects regression, and multilayer perceptron models, while convolutional neural network, recurrent neural network, and hybrid convolutional recurrent neural network models were developed without feature extraction. Model performances were assessed using Monte Carlo cross-validation.ResultsWe found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively.ConclusionThe models developed in this study can be used to classify days in accelerometry data as either human or delivery activity. An analyst can weigh the larger computational cost and greater performance of the convolutional recurrent neural network against the faster but slightly less powerful random forest or logistic regression. The best performing models for classification of delivery data are publicly available on the open source R package, PhysicalActivity.

PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257069
Author(s):  
Jae-Geum Shim ◽  
Kyoung-Ho Ryu ◽  
Sung Hyun Lee ◽  
Eun-Ah Cho ◽  
Sungho Lee ◽  
...  

Objective To construct a prediction model for optimal tracheal tube depth in pediatric patients using machine learning. Methods Pediatric patients aged <7 years who received post-operative ventilation after undergoing surgery between January 2015 and December 2018 were investigated in this retrospective study. The optimal location of the tracheal tube was defined as the median of the distance between the upper margin of the first thoracic(T1) vertebral body and the lower margin of the third thoracic(T3) vertebral body. We applied four machine learning models: random forest, elastic net, support vector machine, and artificial neural network and compared their prediction accuracy to three formula-based methods, which were based on age, height, and tracheal tube internal diameter(ID). Results For each method, the percentage with optimal tracheal tube depth predictions in the test set was calculated as follows: 79.0 (95% confidence interval [CI], 73.5 to 83.6) for random forest, 77.4 (95% CI, 71.8 to 82.2; P = 0.719) for elastic net, 77.0 (95% CI, 71.4 to 81.8; P = 0.486) for support vector machine, 76.6 (95% CI, 71.0 to 81.5; P = 1.0) for artificial neural network, 66.9 (95% CI, 60.9 to 72.5; P < 0.001) for the age-based formula, 58.5 (95% CI, 52.3 to 64.4; P< 0.001) for the tube ID-based formula, and 44.4 (95% CI, 38.3 to 50.6; P < 0.001) for the height-based formula. Conclusions In this study, the machine learning models predicted the optimal tracheal tube tip location for pediatric patients more accurately than the formula-based methods. Machine learning models using biometric variables may help clinicians make decisions regarding optimal tracheal tube depth in pediatric patients.


Author(s):  
Diana Gaifilina ◽  
Igor Kotenko

Introduction: The article discusses the problem of choosing deep learning models for detecting anomalies in Internet of Things (IoT) network traffic. This problem is associated with the necessity to analyze a large number of security events in order to identify the abnormal behavior of smart devices. A powerful technology for analyzing such data is machine learning and, in particular, deep learning. Purpose: Development of recommendations for the selection of deep learning models for anomaly detection in IoT network traffic. Results: The main results of the research are comparative analysis of deep learning models, and recommendations on the use of deep learning models for anomaly detection in IoT network traffic. Multilayer perceptron, convolutional neural network, recurrent neural network, long short-term memory, gated recurrent units, and combined convolutional-recurrent neural network were considered the basic deep learning models. Additionally, the authors analyzed the following traditional machine learning models: naive Bayesian classifier, support vector machines, logistic regression, k-nearest neighbors, boosting, and random forest. The following metrics were used as indicators of anomaly detection efficiency: accuracy, precision, recall, and F-measure, as well as the time spent on training the model. The constructed models demonstrated a higher accuracy rate for anomaly detection in large heterogeneous traffic typical for IoT, as compared to conventional machine learning methods. The authors found that with an increase in the number of neural network layers, the completeness of detecting anomalous connections rises. This has a positive effect on the recognition of unknown anomalies, but increases the number of false positives. In some cases, preparing traditional machine learning models takes less time. This is due to the fact that the application of deep learning methods requires more resources and computing power. Practical relevance: The results obtained can be used to build systems for network anomaly detection in Internet of Things traffic.


2021 ◽  
Author(s):  
Nilesh AnanthaSubramanian ◽  
Ashok Palaniappan

AbstractMetal-oxide nanoparticles find widespread applications in mundane life today, and cost-effective evaluation of their cytotoxicity and ecotoxicity is essential for sustainable progress. Machine learning models use existing experimental data, and learn the relationship of various features to nanoparticle cytotoxicity to generate predictive models. In this work, we adopted a principled approach to this problem by formulating a feature space based on intrinsic and extrinsic physico-chemical properties, but exclusive of any in vitro characteristics such as cell line, cell type, and assay method. A minimal set of features was developed by applying variance inflation analysis to the correlation structure of the feature space. Using a balanced dataset, a mapping was then obtained from the normalized feature space to the toxicity class using various hyperparameter-tuned machine learning models. Evaluation on an unseen test set yielded > 96% balanced accuracy for both the random forest model, and neural network with one hidden layer model. The obtained cytotoxicity models are parsimonious, with intelligible inputs, and include an applicability check. Interpretability investigations of the models yielded the key predictor variables of metal-oxide nanoparticle cytotoxicity. Our models could be applied on new, untested oxides, using a majority-voting ensemble classifier, NanoTox, that incorporates the neural network, random forest, support vector machine, and logistic regression models. NanoTox is the very first predictive nanotoxicology pipeline made freely available under the GNU General Public License (https://github.com/NanoTox).


Genus ◽  
2020 ◽  
Vol 76 (1) ◽  
Author(s):  
Fikrewold H. Bitew ◽  
Samuel H. Nyarko ◽  
Lloyd Potter ◽  
Corey S. Sparks

Abstract There is a dearth of literature on the use of machine learning models to predict important under-five mortality risks in Ethiopia. In this study, we showed spatial variations of under-five mortality and used machine learning models to predict its important sociodemographic determinants in Ethiopia. The study data were drawn from the 2016 Ethiopian Demographic and Health Survey. We used three machine learning models such as random forests, logistic regression, and K-nearest neighbors as well as one traditional logistic regression model to predict under-five mortality determinants. For each machine learning model, measures of model accuracy and receiver operating characteristic curves were used to evaluate the predictive power of each model. The descriptive results show that there are considerable regional variations in under-five mortality rates in Ethiopia. The under-five mortality prediction ability was found to be between 46.3 and 67.2% for the models considered, with the random forest model (67.2%) showing the best performance. The best predictive model shows that household size, time to the source of water, breastfeeding status, number of births in the preceding 5 years, sex of a child, birth intervals, antenatal care, birth order, type of water source, and mother’s body mass index play an important role in under-five mortality levels in Ethiopia. The random forest machine learning model produces a better predictive power for estimating under-five mortality risk factors and may help to improve policy decision-making in this regard. Childhood survival chances can be improved considerably by using these important factors to inform relevant policies.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2726
Author(s):  
Ryan Moore ◽  
Kristin R. Archer ◽  
Leena Choi

Accelerometers are increasingly being used in biomedical research, but the analysis of accelerometry data is often complicated by both the massive size of the datasets and the collection of unwanted data from the process of delivery to study participants. Current methods for removing delivery data involve arduous manual review of dense datasets. We aimed to develop models for the classification of days in accelerometry data as activity from human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery. We developed statistical and machine learning models for the classification of accelerometry data in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. Model performances were assessed and compared using Monte Carlo cross-validation. We found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively. The best performing models and related data processing techniques are made publicly available in the R package, Physical Activity.


2021 ◽  
Vol 19 (1) ◽  
pp. 953-971
Author(s):  
Songfeng Liu ◽  
◽  
Jinyan Wang ◽  
Wenliang Zhang ◽  

<abstract><p>User data usually exists in the organization or own local equipment in the form of data island. It is difficult to collect these data to train better machine learning models because of the General Data Protection Regulation (GDPR) and other laws. The emergence of federated learning enables users to jointly train machine learning models without exposing the original data. Due to the fast training speed and high accuracy of random forest, it has been applied to federated learning among several data institutions. However, for human activity recognition task scenarios, the unified model cannot provide users with personalized services. In this paper, we propose a privacy-protected federated personalized random forest framework, which considers to solve the personalized application of federated random forest in the activity recognition task. According to the characteristics of the activity recognition data, the locality sensitive hashing is used to calculate the similarity of users. Users only train with similar users instead of all users and the model is incrementally selected using the characteristics of ensemble learning, so as to train the model in a personalized way. At the same time, user privacy is protected through differential privacy during the training stage. We conduct experiments on commonly used human activity recognition datasets to analyze the effectiveness of our model.</p></abstract>


2020 ◽  
Author(s):  
Qing Wu ◽  
Fatma Nasoz ◽  
Jongyun Jung ◽  
Bibek Bhattarai

AbstractBone mineral density (BMD) is a highly heritable trait with heritability ranging from 50% to 80%. Numerous BMD-associated Single Nucleotide Polymorphisms (SNPs) were discovered by GWAS and GWAS meta-analysis. However, several studies found that combining these highly significant SNPs together only explained a small percentage of BMD variance. This inconsistency may be caused by limitations of the linear regression approaches employed because these traditional approaches lack the flexibility and the adequacy to model complex gene interactions and regulations. Hence, we developed various machine learning models of genomic data and ran experiments to identify the best machine learning model for BMD prediction at three different sites. We used genomic data of Osteoporotic Fractures in Men (MrOS) cohort Study (N=5,133) for analysis. Genotype imputation was conducted at the Sanger Imputation Server. A total of 1,103 BMD-associated SNPs were identified and corresponding weighted genetic risk scores were calculated. Genetic variants, as well as age and other traditional BMD predictors, were included for modeling. Data were normalized and were split into a training set (80%) and a test set (20%). BMD prediction models were built separately by random forest, gradient boosting, and neural network algorithms. Linear regression was used as a reference model. We applied the non-parametric Wilcoxon signed-rank tests for the measurement of MSE in each model for the pair-wise model comparison. We found that gradient boosting shows the lowest MSE for each BMD site and a prediction model built using the machine learning models achieves improved performance when a large number of SNPs are included in the models. With the predictors of phenotype covariate + 1,103 SNPs, all of the models were statistically significant except neural network vs. random forest at femoral neck BMD and gradient boosting vs. random forest at total hip BMD.


The purpose of the research described in this article is a comparative analysis of the predictive qualities of some models of machine learning and regression. The factors for models are the consumer characteristics of a used car: brand, transmission type, drive type, engine type, mileage, body type, year of manufacture, seller's region in Ukraine, condition of the car, information about accident, average price for analogue in Ukraine, engine volume, quantity of doors, availability of extra equipment, quantity of passenger’s seats, the first registration of a car, car was driven from abroad or not. Qualitative variables has been encoded as binary variables or by mean target encoding. The information about more than 200 thousand cars have been used for modeling. All models have been evaluated in the Python Software using Sklearn, Catboost, StatModels and Keras libraries. The following regression models and machine learning models were considered in the course of the study: linear regression; polynomial regression; decision tree; neural network; models based on "k-nearest neighbors", "random forest", "gradient boosting" algorithms; ensemble of models. The article presents the best in terms of quality (according to the criteria R2, MAE, MAD, MAPE) options from each class of models. It has been found that the best way to predict the price of a passenger car is through non-linear models. The results of the modeling show that the dependence between the price of a car and its characteristics is best described by the ensemble of models, which includes a neural network, models using "random forest" and "gradient boosting" algorithms. The ensemble of models showed an average relative approximation error of 11.2% and an average relative forecast error of 14.34%. All nonlinear models for car price have approximately the same predictive qualities (the difference between the MAPE within 2%) in this research.


Sign in / Sign up

Export Citation Format

Share Document