Analyzing large-scale human mobility data: a survey of machine learning methods and applications

2018 ◽  
Vol 58 (3) ◽  
pp. 501-523 ◽  
Author(s):  
Eran Toch ◽  
Boaz Lerner ◽  
Eyal Ben-Zion ◽  
Irad Ben-Gal
Author(s):  
Jiayi Wang ◽  
Raymond K. W. Wong ◽  
Jun Mikyoung ◽  
Courtney Schumacher ◽  
Ramalingam Saravanan ◽  
...  

Abstract Predicting rain from large-scale environmental variables remains a challenging problem for climate models and it is unclear how well numerical methods can predict the true characteristics of rainfall without smaller (storm) scale information. This study explores the ability of three statistical and machine learning methods to predict 3-hourly rain occurrence and intensity at 0.5° resolution over the tropical Pacific Ocean using rain observations the Global Precipitation Measurement (GPM) satellite radar and large-scale environmental profiles of temperature and moisture from the MERRA-2 reanalysis. We also separated the rain into different types (deep convective, stratiform, and shallow convective) because of their varying kinematic and thermodynamic structures that might respond to the large-scale environment in different ways. Our expectation was that the popular machine learning methods (i.e., the neural network and random forest) would outperform a standard statistical method (a generalized linear model) because of their more flexible structures, especially in predicting the highly skewed distribution of rain rates for each rain type. However, none of the methods obviously distinguish themselves from one another and each method still has issues with predicting rain too often and not fully capturing the high end of the rain rate distributions, both of which are common problems in climate models. One implication of this study is that machine learning tools must be carefully assessed and are not necessarily applicable to solving all big data problems. Another implication is that traditional climate model approaches are not sufficient to predict extreme rain events and that other avenues need to be pursued.


Author(s):  
Hong Cui

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges… 


Author(s):  
Georgia A. Papacharalampous ◽  
Hristos Tyralis ◽  
Demetris Koutsoyiannis

We perform an extensive comparison between 11 stochastic to 9 machine learning methods regarding their multi-step ahead forecasting properties by conducting 12 large-scale computational experiments. Each of these experiments uses 2 000 time series generated by linear stationary stochastic processes. We conduct each simulation experiment twice; the first time using time series of 110 values and the second time using time series of 310 values. Additionally, we conduct 92 real-world case studies using mean monthly time series of streamflow and particularly focus on one of them to reinforce the findings and highlight important facts. We quantify the performance of the methods using 18 metrics. The results indicate that the machine learning methods do not differ dramatically from the stochastic, while none of the methods under comparison is uniformly better or worse than the rest. However, there are methods that are regularly better or worse than others according to specific metrics.


Author(s):  
Markey Olson ◽  
Thurmon Lockhart

Falls represent a major burden on elderly individuals and society as a whole. Technologies that are able to detect individuals at risk of fall before occurrence could help reduce this burden by targeting those individuals for rehabilitation to reduce risk of falls. Wearable technologies especially, which can continuously monitor aspects of gait, balance, vital signs, and other aspects of health known to be related to falls, may be useful and are in need of study. A systematic review was conducted in accordance with the Preferred Reporting Items for Systematics Reviews and Meta-Analysis (PRISMA) 2009 guidelines to identify articles related to the use of wearable sensors to predict fall risk. Fifty four studies were analyzed. The majority of studies (98.0%) utilized inertial measurement units (IMUs) located at the lower back (58.0%), sternum (28.0%), and shins (28.0%). Most assessments were conducted in a structured setting (67.3%) instead of with free-living data. Fall risk was calculated based on retrospective falls history (48.9%), prospective falls reporting (36.2%), or clinical scales (19.1%). Measures of the duration spent walking and standing during free-living monitoring, linear measures such as gait speed and step length, and nonlinear measures such as entropy correlate with fall risk, and machine learning methods can distinguish between falls. However, because many studies generating machine learning models did not list the exact factors being considered, it is difficult to compare these models directly. Few studies to date have utilized results to give feedback about fall risk to the patient or to supply treatment or lifestyle suggestions to prevent fall, though these are considered important by end users. Wearable technology demonstrates considerable promise in detecting subtle changes in biomarkers of gait and balance related to an increase in fall risk. However, more large-scale studies measuring increasing fall risk before first fall are needed, and exact biomarkers and machine learning methods used need to be shared to compare results and pursue the most promising fall risk measurements. There is a great need for devices measuring fall risk also to supply patients with information about their fall risk and strategies and treatments for prevention.


PLoS ONE ◽  
2013 ◽  
Vol 8 (11) ◽  
pp. e77949 ◽  
Author(s):  
Ramon Casanova ◽  
Fang-Chi Hsu ◽  
Kaycee M. Sink ◽  
Stephen R. Rapp ◽  
Jeff D. Williamson ◽  
...  

2020 ◽  
Vol 66 (6) ◽  
pp. 2495-2522 ◽  
Author(s):  
Duncan Simester ◽  
Artem Timoshenko ◽  
Spyros I. Zoumpoulis

We investigate how firms can use the results of field experiments to optimize the targeting of promotions when prospecting for new customers. We evaluate seven widely used machine-learning methods using a series of two large-scale field experiments. The first field experiment generates a common pool of training data for each of the seven methods. We then validate the seven optimized policies provided by each method together with uniform benchmark policies in a second field experiment. The findings not only compare the performance of the targeting methods, but also demonstrate how well the methods address common data challenges. Our results reveal that when the training data are ideal, model-driven methods perform better than distance-driven methods and classification methods. However, the performance advantage vanishes in the presence of challenges that affect the quality of the training data, including the extent to which the training data captures details of the implementation setting. The challenges we study are covariate shift, concept shift, information loss through aggregation, and imbalanced data. Intuitively, the model-driven methods make better use of the information available in the training data, but the performance of these methods is more sensitive to deterioration in the quality of this information. The classification methods we tested performed relatively poorly. We explain the poor performance of the classification methods in our setting and describe how the performance of these methods could be improved. This paper was accepted by Matthew Shum, marketing.


2022 ◽  
pp. 285-305
Author(s):  
Siddharth Vinod Jain ◽  
Manoj Jayabalan

The credit card has been one of the most successful and prevalent financial services being widely used across the globe. However, with the upsurge in credit card holders, banks are facing a challenge from equally increasing payment default cases causing substantial financial damage. This necessitates the importance of sound and effective credit risk management in the banking and financial services industry. Machine learning models are being employed by the industry at a large scale to effectively manage this credit risk. This chapter presents the application of the various machine learning methods like time series models and deep learning models experimented in predicting the credit card payment defaults along with identification of the significant features and the most effective evaluation criteria. This chapter also discusses the challenges and future considerations in predicting credit card payment defaults. The importance of factoring in a cost function to associate with misclassification by the models is also given.


Sign in / Sign up

Export Citation Format

Share Document