scholarly journals Transport behavior-mining from smartphones: a review

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Valentino Servizi ◽  
Francisco C. Pereira ◽  
Marie K. Anderson ◽  
Otto A. Nielsen

Abstract Background Although people and smartphones have become almost inseparable, especially during travel, smartphones still represent a small fraction of a complex multi-sensor platform enabling the passive collection of users’ travel behavior. Smartphone-based travel survey data yields the richest perspective on the study of inter- and intrauser behavioral variations. Yet after over a decade of research and field experimentation on such surveys, and despite a consensus in transportation research as to their potential, smartphone-based travel surveys are seldom used on a large scale. Purpose This literature review pinpoints and examines the problems limiting prior research, and exposes drivers to select and rank machine-learning algorithms used for data processing in smartphone-based surveys. Conclusion Our findings show the main physical limitations from a device perspective; the methodological framework deployed for the automatic generation of travel-diaries, from the application perspective; and the relationship among user interaction, methods, and data, from the ground truth perspective.

Author(s):  
Ignacio Díaz ◽  
José M Enguita ◽  
Ana González ◽  
Diego García ◽  
Abel A Cuadrado ◽  
...  

Abstract Motivation Biomedical research entails analyzing high dimensional records of biomedical features with hundreds or thousands of samples each. This often involves using also complementary clinical metadata, as well as a broad user domain knowledge. Common data analytics software makes use of machine learning algorithms or data visualization tools. However, they are frequently one-way analyses, providing little room for the user to reconfigure the steps in light of the observed results. In other cases, reconfigurations involve large latencies, requiring a retraining of algorithms or a large pipeline of actions. The complex and multiway nature of the problem, nonetheless, suggests that user interaction feedback is a key element to boost the cognitive process of analysis, and must be both broad and fluid. Results In this article, we present a technique for biomedical data analytics, based on blending meaningful views in an efficient manner, allowing to provide a natural smooth way to transition among different but complementary representations of data and knowledge. Our hypothesis is that the confluence of diverse complementary information from different domains on a highly interactive interface allows the user to discover relevant relationships or generate new hypotheses to be investigated by other means. We illustrate the potential of this approach with three case studies involving gene expression data and clinical metadata, as representative examples of high dimensional, multidomain, biomedical data. Availability and implementation Code and demo app to reproduce the results available at https://gitlab.com/idiazblanco/morphing-projections-demo-and-dataset-preparation. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
A. V. Ponomarev

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems). 


2014 ◽  
Vol 2014 (10) ◽  
pp. 538-545 ◽  
Author(s):  
Abdul Basit ◽  
Anca Daniela Hansen ◽  
Mufit Altin ◽  
Poul Sørensen ◽  
Mette Gamst

2021 ◽  
Vol 13 (11) ◽  
pp. 2074
Author(s):  
Ryan R. Reisinger ◽  
Ari S. Friedlaender ◽  
Alexandre N. Zerbini ◽  
Daniel M. Palacios ◽  
Virginia Andrews-Goff ◽  
...  

Machine learning algorithms are often used to model and predict animal habitat selection—the relationships between animal occurrences and habitat characteristics. For broadly distributed species, habitat selection often varies among populations and regions; thus, it would seem preferable to fit region- or population-specific models of habitat selection for more accurate inference and prediction, rather than fitting large-scale models using pooled data. However, where the aim is to make range-wide predictions, including areas for which there are no existing data or models of habitat selection, how can regional models best be combined? We propose that ensemble approaches commonly used to combine different algorithms for a single region can be reframed, treating regional habitat selection models as the candidate models. By doing so, we can incorporate regional variation when fitting predictive models of animal habitat selection across large ranges. We test this approach using satellite telemetry data from 168 humpback whales across five geographic regions in the Southern Ocean. Using random forests, we fitted a large-scale model relating humpback whale locations, versus background locations, to 10 environmental covariates, and made a circumpolar prediction of humpback whale habitat selection. We also fitted five regional models, the predictions of which we used as input features for four ensemble approaches: an unweighted ensemble, an ensemble weighted by environmental similarity in each cell, stacked generalization, and a hybrid approach wherein the environmental covariates and regional predictions were used as input features in a new model. We tested the predictive performance of these approaches on an independent validation dataset of humpback whale sightings and whaling catches. These multiregional ensemble approaches resulted in models with higher predictive performance than the circumpolar naive model. These approaches can be used to incorporate regional variation in animal habitat selection when fitting range-wide predictive models using machine learning algorithms. This can yield more accurate predictions across regions or populations of animals that may show variation in habitat selection.


2021 ◽  
Vol 28 (1) ◽  
pp. e100251
Author(s):  
Ian Scott ◽  
Stacey Carter ◽  
Enrico Coiera

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sakthi Kumar Arul Prakash ◽  
Conrad Tucker

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.


2021 ◽  
Vol 13 (9) ◽  
pp. 1837
Author(s):  
Eve Laroche-Pinel ◽  
Sylvie Duthoit ◽  
Mohanad Albughdadi ◽  
Anne D. Costard ◽  
Jacques Rousseau ◽  
...  

Wine growing needs to adapt to confront climate change. In fact, the lack of water becomes more and more important in many regions. Whereas vineyards have been located in dry areas for decades, so they need special resilient varieties and/or a sufficient water supply at key development stages in case of severe drought. With climate change and the decrease of water availability, some vineyard regions face difficulties because of unsuitable variety, wrong vine management or due to the limited water access. Decision support tools are therefore required to optimize water use or to adapt agronomic practices. This study aimed at monitoring vine water status at a large scale with Sentinel-2 images. The goal was to provide a solution that would give spatialized and temporal information throughout the season on the water status of the vines. For this purpose, thirty six plots were monitored in total over three years (2018, 2019 and 2020). Vine water status was measured with stem water potential in field measurements from pea size to ripening stage. Simultaneously Sentinel-2 images were downloaded and processed to extract band reflectance values and compute vegetation indices. In our study, we tested five supervised regression machine learning algorithms to find possible relationships between stem water potential and data acquired from Sentinel-2 images (bands reflectance values and vegetation indices). Regression model using Red, NIR, Red-Edge and SWIR bands gave promising result to predict stem water potential (R2=0.40, RMSE=0.26).


Author(s):  
Yanbing Bai ◽  
Lu Sun ◽  
Haoyu Liu ◽  
Chao Xie

Large-scale population movements can turn local diseases into widespread epidemics. Grasping the characteristic of the population flow in the context of the COVID-19 is of great significance for providing information to epidemiology and formulating scientific and reasonable prevention and control policies. Especially in the post-COVID-19 phase, it is essential to maintain the achievement of the fight against the epidemic. Previous research focuses on flight and railway passenger travel behavior and patterns, but China also has numerous suburban residents with a not-high economic level; investigating their travel behaviors is significant for national stability. However, estimating the impacts of the COVID-19 for suburban residents’ travel behaviors remains challenging because of lacking apposite data. Here we submit bus ticketing data including approximately 26,000,000 records from April 2020–August 2020 for 2705 stations. Our results indicate that Suburban residents in Chinese Southern regions are more likely to travel by bus, and travel frequency is higher. Associated with the economic level, we find that residents in the economically developed region more likely to travel or carry out various social activities. Considering from the perspective of the traveling crowd, we find that men and young people are easier to travel by bus; however, they are exactly the main workforce. The indication of our findings is that suburban residents’ travel behavior is affected profoundly by economy and consistent with the inherent behavior patterns before the COVID-19 outbreak. We use typical regions as verification and it is indeed the case.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


Sign in / Sign up

Export Citation Format

Share Document