Cluster-Based Smartphone Predictive Analytics for Application Usage and Next Location Prediction

2018 ◽  
Vol 9 (2) ◽  
pp. 64-80
Author(s):  
Xiaoling Lu ◽  
Bharatendra Rai ◽  
Yan Zhong ◽  
Yuzhu Li

Prediction of app usage and location of smartphone users is an interesting problem and active area of research. Several smartphone sensors such as GPS, accelerometer, gyroscope, microphone, camera and Bluetooth make it easier to capture user behavior data and use it for appropriate analysis. However, differences in user behavior and increasing number of apps have made such prediction a challenging problem. In this article, a prediction approach that takes smartphone user behavior into consideration is proposed. The proposed approach is illustrated using data from over 30000 users from a leading IT company in China by first converting data in to recency, frequency, and monetary variables and then performing cluster analysis to capture user behavior. Prediction models are then developed for each cluster using a training dataset and their performance is assessed using a test dataset. The study involves ten different categories of apps and four different regions in Beijing. The proposed app usage prediction and next location prediction approach has provided interesting results.

2021 ◽  
Vol 13 (10) ◽  
pp. 5690
Author(s):  
Chengyuan Mao ◽  
Lewen Bao ◽  
Shengde Yang ◽  
Wenjiao Xu ◽  
Qin Wang

Pedestrian violations pose a danger to themselves and other road users. Most previous studies predict pedestrian violation behaviors based only on pedestrians’ demographic characteristics. In practice, in addition to demographic characteristics, other factors may also impact pedestrian violation behaviors. Therefore, this study aims to predict pedestrian crossing violations based on pedestrian attributes, traffic conditions, road geometry, and environmental conditions. Data on the pedestrian crossing, both in compliance and in violation, were collected from 10 signalized intersections in the city of Jinhua, China. We propose an illegal pedestrian crossing behavior prediction approach that consists of a logistic regression model and a Markov Chain model. The former calculates the likelihood that the first pedestrian who decides to cross the intersection illegally within each signal cycle, while the latter computes the probability that the subsequent pedestrians who decides to follow the violation. The proposed approach was validated using data gathered from an additional signalized intersection in Jinhua city. The results show that the proposed approach has a robust ability in pedestrian violation behavior prediction. The findings can provide theoretical references for pedestrian signal timing, crossing facility optimization, and warning system design.


2021 ◽  
pp. 0734242X2110179
Author(s):  
Mohammadali Faezirad ◽  
Alireza Pooya ◽  
Zahra Naji-Azimi ◽  
Maryam Amir Haeri

Food waste planning at universities is often a complex matter due to the large volume of food and variety of services. A major portion of university food waste arises from dining systems including meal booking and distribution. Although dining systems have a significant role in generating food wastes, few studies have designed prediction models that could control such wastes based on reservation data and behavior of students at meal delivery times. To fill this gap, analyzing meal booking systems at universities, the present study proposed a new model based on machine learning to reduce the food waste generated at major universities that provide food subsidies. Students’ reservation and their presence or absence at the dining hall (show/no-show rate) at mealtime were incorporated in data analysis. Given the complexity of the relationship between the attributes and the uncertainty observed in user behavior, a model was designed to analyze definite and random components of demand. An artificial neural network-based model designed for demand prediction provided a two-step prediction approach to dealing with uncertainty in actual demand. In order to estimate the lowest total cost based on the cost of waste and the shortage penalty cost, an uncertainty-based analysis was conducted at the final step of the research. This study formed a framework that could reduce the food waste volume by up to 79% and control the penalty and waste cost in the case study. The model was investigated with cost analysis and the results proved its efficiency in reducing total cost.


A vast availability of location based user data which is generated everyday whether it is GPS data from online cabs, or weather time series data, is essential in many ways to the user and has been applied to many real life applications such as location targeted-advertising, recommendation systems, crime-rate detection, home trajectory analysis etc. In order to analyze this data and use it to fruitfulness a vast majority of prediction models have been proposed and utilized over the years. A next location prediction model is a model that uses this data and can be designed as a combination of two or more models and techniques, but these have their own pros and cons. The aim of this document is to analyze and compare the various machine learning models and related experiments that can be applied for better location prediction algorithms in the near future. The paper is organized in a way so as to give readers insights and other noteworthy points and inferences from the papers surveyed. A summary table has been presented to get a glimpse of the methods in depth and our added inferences along with the data-sets analyzed.


2020 ◽  
Author(s):  
Xiude Fan ◽  
Bin Zhu ◽  
Masoud Nouri-Vaskeh ◽  
Chunguo Jiang ◽  
Xiaokai Feng ◽  
...  

Abstract Background. Risk scores are urgently needed to assist clinicians in predicting the risk of death in severe patients with SARS-CoV-2 infection in the context of millions of people infected, rapid disease progression, and shortage of medical resources.Method. A total of 139 severe patients with SARS-CoV-2 from China and Iran were included. Using data from China (training dataset, n = 96), prediction models were developed based on logistic regression models, nomogram and risk scoring system for simplification. Leave-one-out cross validation was used for internal validation and data from Iran (test dataset, n = 43) for external validation. Results. The NSL model (Area under the curve (AUC) 0.932) and NL model (AUC 0.903) were developed based on neutrophil percentage (NE), lactate dehydrogenase (LDH) with or without oxygen saturation (SaO2) using the training dataset. Compared with the training dataset, the predictability of NSL model (AUC 0.910) and NL model (AUC 0.871) were similar in the test dataset. The risk scoring systems corresponding to these two models were established for clinical application. The AUCs of the NSL and NL scores were 0.928 and 0.901 in the training dataset, respectively. At the optimal cut-off value of NSL score, the sensitivity was 94% and specificity was 82%. In addition, for NL score, the sensitivity and specificity were 94% and 75%, respectively.Conclusion. NSL and NL score are straightforward means for clinicians to predict the risk of death in severe patients. NL score could be used in selected regions where patients’ SaO2 cannot be tested.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Xiude Fan ◽  
Bin Zhu ◽  
Masoud Nouri-Vaskeh ◽  
Chunguo Jiang ◽  
Xiaokai Feng ◽  
...  

Abstract Background Risk scores are needed to predict the risk of death in severe coronavirus disease 2019 (COVID-19) patients in the context of rapid disease progression. Methods Using data from China (training dataset, n = 96), prediction models were developed by logistic regression and then risk scores were established. Leave-one-out cross validation was used for internal validation and data from Iran (test dataset, n = 43) was used for external validation. Results A NSL model (area under the curve (AUC) 0.932) and a NL model (AUC 0.903) were developed based on neutrophil percentage and lactate dehydrogenase with and without oxygen saturation (SaO2) using the training dataset. AUCs of the NSL and NL models in the test dataset were 0.910 and 0.871, respectively. The risk scoring systems corresponding to these two models were established. The AUCs of the NSL and NL scores in the training dataset were 0.928 and 0.901, respectively. At the optimal cut-off value of NSL score, the sensitivity and specificity were 94% and 82%, respectively. The sensitivity and specificity of NL score were 94% and 75%, respectively. Conclusions These scores may be used to predict the risk of death in severe COVID-19 patients and the NL score could be used in regions where patients' SaO2 cannot be tested.


JAMIA Open ◽  
2021 ◽  
Vol 4 (2) ◽  
Author(s):  
Divya Joshi ◽  
Ali Jalali ◽  
Todd Whipple ◽  
Mohamed Rehman ◽  
Luis M Ahumada

Abstract Objective To develop a predictive analytics tool that would help evaluate different scenarios and multiple variables for clearance of surgical patient backlog during the COVID-19 pandemic. Materials and Methods Using data from 27 866 cases (May 1 2018–May 1 2020) stored in the Johns Hopkins All Children’s data warehouse and inputs from 30 operations-based variables, we built mathematical models for (1) time to clear the case backlog (2), utilization of personal protective equipment (PPE), and (3) assessment of overtime needs. Results The tool enabled us to predict desired variables, including number of days to clear the patient backlog, PPE needed, staff/overtime needed, and cost for different backlog reduction scenarios. Conclusions Predictive analytics, machine learning, and multiple variable inputs coupled with nimble scenario-creation and a user-friendly visualization helped us to determine the most effective deployment of operating room personnel. Operating rooms worldwide can use this tool to overcome patient backlog safely.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mu Sook Lee ◽  
Yong Soo Kim ◽  
Minki Kim ◽  
Muhammad Usman ◽  
Shi Sub Byon ◽  
...  

AbstractWe examined the feasibility of explainable computer-aided detection of cardiomegaly in routine clinical practice using segmentation-based methods. Overall, 793 retrospectively acquired posterior–anterior (PA) chest X-ray images (CXRs) of 793 patients were used to train deep learning (DL) models for lung and heart segmentation. The training dataset included PA CXRs from two public datasets and in-house PA CXRs. Two fully automated segmentation-based methods using state-of-the-art DL models for lung and heart segmentation were developed. The diagnostic performance was assessed and the reliability of the automatic cardiothoracic ratio (CTR) calculation was determined using the mean absolute error and paired t-test. The effects of thoracic pathological conditions on performance were assessed using subgroup analysis. One thousand PA CXRs of 1000 patients (480 men, 520 women; mean age 63 ± 23 years) were included. The CTR values derived from the DL models and diagnostic performance exhibited excellent agreement with reference standards for the whole test dataset. Performance of segmentation-based methods differed based on thoracic conditions. When tested using CXRs with lesions obscuring heart borders, the performance was lower than that for other thoracic pathological findings. Thus, segmentation-based methods using DL could detect cardiomegaly; however, the feasibility of computer-aided detection of cardiomegaly without human intervention was limited.


2021 ◽  
pp. 159101992110009
Author(s):  
Xinke Liu ◽  
Junqiang Feng ◽  
Zhenzhou Wu ◽  
Zhonghao Neo ◽  
Chengcheng Zhu ◽  
...  

Objective Accurate diagnosis and measurement of intracranial aneurysms are challenging. This study aimed to develop a 3D convolutional neural network (CNN) model to detect and segment intracranial aneurysms (IA) on 3D rotational DSA (3D-RA) images. Methods 3D-RA images were collected and annotated by 5 neuroradiologists. The annotated images were then divided into three datasets: training, validation, and test. A 3D Dense-UNet-like CNN (3D-Dense-UNet) segmentation algorithm was constructed and trained using the training dataset. Diagnostic performance to detect aneurysms and segmentation accuracy was assessed for the final model on the test dataset using the free-response receiver operating characteristic (FROC). Finally, the CNN-inferred maximum diameter was compared against expert measurements by Pearson’s correlation and Bland-Altman limits of agreement (LOA). Results A total of 451 patients with 3D-RA images were split into n = 347/41/63 training/validation/test datasets, respectively. For aneurysm detection, observed FROC analysis showed that the model managed to attain a sensitivity of 0.710 at 0.159 false positives (FP)/case, and 0.986 at 1.49 FP/case. The proposed method had good agreement with reference manual aneurysmal maximum diameter measurements (8.3 ± 4.3 mm vs. 7.8 ± 4.8 mm), with a correlation coefficient r = 0.77, small bias of 0.24 mm, and LOA of -6.2 to 5.71 mm. 37.0% and 77% of diameter measurements were within ±1 mm and ±2.5 mm of expert measurements. Conclusions A 3D-Dense-UNet model can detect and segment aneurysms with relatively high accuracy using 3D-RA images. The automatically measured maximum diameter has potential clinical application value.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Kara-Louise Royle ◽  
David A. Cairns

Abstract Background The United Kingdom Myeloma Research Alliance (UK-MRA) Myeloma Risk Profile is a prognostic model for overall survival. It was trained and tested on clinical trial data, aiming to improve the stratification of transplant ineligible (TNE) patients with newly diagnosed multiple myeloma. Missing data is a common problem which affects the development and validation of prognostic models, where decisions on how to address missingness have implications on the choice of methodology. Methods Model building The training and test datasets were the TNE pathways from two large randomised multicentre, phase III clinical trials. Potential prognostic factors were identified by expert opinion. Missing data in the training dataset was imputed using multiple imputation by chained equations. Univariate analysis fitted Cox proportional hazards models in each imputed dataset with the estimates combined by Rubin’s rules. Multivariable analysis applied penalised Cox regression models, with a fixed penalty term across the imputed datasets. The estimates from each imputed dataset and bootstrap standard errors were combined by Rubin’s rules to define the prognostic model. Model assessment Calibration was assessed by visualising the observed and predicted probabilities across the imputed datasets. Discrimination was assessed by combining the prognostic separation D-statistic from each imputed dataset by Rubin’s rules. Model validation The D-statistic was applied in a bootstrap internal validation process in the training dataset and an external validation process in the test dataset, where acceptable performance was pre-specified. Development of risk groups Risk groups were defined using the tertiles of the combined prognostic index, obtained by combining the prognostic index from each imputed dataset by Rubin’s rules. Results The training dataset included 1852 patients, 1268 (68.47%) with complete case data. Ten imputed datasets were generated. Five hundred twenty patients were included in the test dataset. The D-statistic for the prognostic model was 0.840 (95% CI 0.716–0.964) in the training dataset and 0.654 (95% CI 0.497–0.811) in the test dataset and the corrected D-Statistic was 0.801. Conclusion The decision to impute missing covariate data in the training dataset influenced the methods implemented to train and test the model. To extend current literature and aid future researchers, we have presented a detailed example of one approach. Whilst our example is not without limitations, a benefit is that all of the patient information available in the training dataset was utilised to develop the model. Trial registration Both trials were registered; Myeloma IX-ISRCTN68454111, registered 21 September 2000. Myeloma XI-ISRCTN49407852, registered 24 June 2009.


Sign in / Sign up

Export Citation Format

Share Document