Transport behavior-mining from smartphones: a review

Abstract Background Although people and smartphones have become almost inseparable, especially during travel, smartphones still represent a small fraction of a complex multi-sensor platform enabling the passive collection of users’ travel behavior. Smartphone-based travel survey data yields the richest perspective on the study of inter- and intrauser behavioral variations. Yet after over a decade of research and field experimentation on such surveys, and despite a consensus in transportation research as to their potential, smartphone-based travel surveys are seldom used on a large scale. Purpose This literature review pinpoints and examines the problems limiting prior research, and exposes drivers to select and rank machine-learning algorithms used for data processing in smartphone-based surveys. Conclusion Our findings show the main physical limitations from a device perspective; the methodological framework deployed for the automatic generation of travel-diaries, from the application perspective; and the relationship among user interaction, methods, and data, from the ground truth perspective.

Download Full-text

Morphing projections: a new visual technique for fast and interactive large-scale analysis of biomedical datasets

Bioinformatics ◽

10.1093/bioinformatics/btaa989 ◽

2020 ◽

Author(s):

Ignacio Díaz ◽

José M Enguita ◽

Ana González ◽

Diego García ◽

Abel A Cuadrado ◽

...

Keyword(s):

Data Analytics ◽

Domain Knowledge ◽

Large Scale ◽

User Interaction ◽

Machine Learning Algorithms ◽

Supplementary Information ◽

High Dimensional ◽

Biomedical Data ◽

Efficient Manner ◽

Large Scale Analysis

Abstract Motivation Biomedical research entails analyzing high dimensional records of biomedical features with hundreds or thousands of samples each. This often involves using also complementary clinical metadata, as well as a broad user domain knowledge. Common data analytics software makes use of machine learning algorithms or data visualization tools. However, they are frequently one-way analyses, providing little room for the user to reconfigure the steps in light of the observed results. In other cases, reconfigurations involve large latencies, requiring a retraining of algorithms or a large pipeline of actions. The complex and multiway nature of the problem, nonetheless, suggests that user interaction feedback is a key element to boost the cognitive process of analysis, and must be both broad and fluid. Results In this article, we present a technique for biomedical data analytics, based on blending meaningful views in an efficient manner, allowing to provide a natural smooth way to transition among different but complementary representations of data and knowledge. Our hypothesis is that the confluence of diverse complementary information from different domains on a highly interactive interface allows the user to discover relevant relationships or generate new hypotheses to be investigated by other means. We illustrate the potential of this approach with three case studies involving gene expression data and clinical metadata, as representative examples of high dimensional, multidomain, biomedical data. Availability and implementation Code and demo app to reproduce the results available at https://gitlab.com/idiazblanco/morphing-projections-demo-and-dataset-preparation. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Efficient Image Retrieval approach for Large-scale Chest X Ray data using Hand-Crafted Features and Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.890896 ◽

2018 ◽

Vol 6 (11) ◽

pp. 890-896

Author(s):

Irene Getzi S ◽

D. Christopher Durairaj ◽

V Joseph Raj

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Chest X Ray

Download Full-text

Model and Method for Contributor’s Quality Assessment in Community Image Tagging Systems

Information and Control Systems ◽

10.31799/1684-8853-2018-4-45-51 ◽

2018 ◽

pp. 45-51

Author(s):

A. V. Ponomarev

Keyword(s):

Large Scale ◽

Wide Spectrum ◽

Preference Relation ◽

Pairwise Comparison ◽

Ground Truth ◽

Comparison Method ◽

Characteristic Matrix ◽

Image Tagging ◽

Proposed Model

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems).

Download Full-text

Wind power integration into the automatic generation control of power systems with large-scale wind power

The Journal of Engineering ◽

10.1049/joe.2014.0222 ◽

2014 ◽

Vol 2014 (10) ◽

pp. 538-545 ◽

Cited By ~ 6

Author(s):

Abdul Basit ◽

Anca Daniela Hansen ◽

Mufit Altin ◽

Poul Sørensen ◽

Mette Gamst

Keyword(s):

Power Systems ◽

Wind Power ◽

Large Scale ◽

Automatic Generation ◽

Automatic Generation Control ◽

Wind Power Integration ◽

Generation Control

Download Full-text

Combining Regional Habitat Selection Models for Large-Scale Prediction: Circumpolar Habitat Selection of Southern Ocean Humpback Whales

Remote Sensing ◽

10.3390/rs13112074 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2074

Author(s):

Ryan R. Reisinger ◽

Ari S. Friedlaender ◽

Alexandre N. Zerbini ◽

Daniel M. Palacios ◽

Virginia Andrews-Goff ◽

...

Keyword(s):

Habitat Selection ◽

Predictive Models ◽

Regional Variation ◽

Large Scale ◽

Predictive Performance ◽

Humpback Whale ◽

Machine Learning Algorithms ◽

Humpback Whales ◽

Environmental Covariates ◽

Animal Habitat

Machine learning algorithms are often used to model and predict animal habitat selection—the relationships between animal occurrences and habitat characteristics. For broadly distributed species, habitat selection often varies among populations and regions; thus, it would seem preferable to fit region- or population-specific models of habitat selection for more accurate inference and prediction, rather than fitting large-scale models using pooled data. However, where the aim is to make range-wide predictions, including areas for which there are no existing data or models of habitat selection, how can regional models best be combined? We propose that ensemble approaches commonly used to combine different algorithms for a single region can be reframed, treating regional habitat selection models as the candidate models. By doing so, we can incorporate regional variation when fitting predictive models of animal habitat selection across large ranges. We test this approach using satellite telemetry data from 168 humpback whales across five geographic regions in the Southern Ocean. Using random forests, we fitted a large-scale model relating humpback whale locations, versus background locations, to 10 environmental covariates, and made a circumpolar prediction of humpback whale habitat selection. We also fitted five regional models, the predictions of which we used as input features for four ensemble approaches: an unweighted ensemble, an ensemble weighted by environmental similarity in each cell, stacked generalization, and a hybrid approach wherein the environmental covariates and regional predictions were used as input features in a new model. We tested the predictive performance of these approaches on an independent validation dataset of humpback whale sightings and whaling catches. These multiregional ensemble approaches resulted in models with higher predictive performance than the circumpolar naive model. These approaches can be used to incorporate regional variation in animal habitat selection when fitting range-wide predictive models using machine learning algorithms. This can yield more accurate predictions across regions or populations of animals that may show variation in habitat selection.

Download Full-text

Clinician checklist for assessing suitability of machine learning applications in healthcare

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100251 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100251

Author(s):

Ian Scott ◽

Stacey Carter ◽

Enrico Coiera

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clinical Decision Making ◽

Improve Patient Care ◽

Clinical Decision ◽

Routine Care ◽

Machine Learning Algorithms ◽

Clinical Settings ◽

Machine Learning Applications ◽

Key Issues

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.

Download Full-text

Classification of unlabeled online media

Scientific Reports ◽

10.1038/s41598-021-85608-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sakthi Kumar Arul Prakash ◽

Conrad Tucker

Keyword(s):

Social Media ◽

Real World ◽

Graphical Model ◽

Ground Truth ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Social Media Networks ◽

Online Social Media ◽

Wide Range

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.

Download Full-text

Towards Vine Water Status Monitoring on a Large Scale Using Sentinel-2 Images

Remote Sensing ◽

10.3390/rs13091837 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1837

Author(s):

Eve Laroche-Pinel ◽

Sylvie Duthoit ◽

Mohanad Albughdadi ◽

Anne D. Costard ◽

Jacques Rousseau ◽

...

Keyword(s):

Climate Change ◽

Water Potential ◽

Large Scale ◽

Water Status ◽

Vegetation Indices ◽

Machine Learning Algorithms ◽

Severe Drought ◽

Stem Water Potential ◽

Stem Water ◽

Sentinel 2

Wine growing needs to adapt to confront climate change. In fact, the lack of water becomes more and more important in many regions. Whereas vineyards have been located in dry areas for decades, so they need special resilient varieties and/or a sufficient water supply at key development stages in case of severe drought. With climate change and the decrease of water availability, some vineyard regions face difficulties because of unsuitable variety, wrong vine management or due to the limited water access. Decision support tools are therefore required to optimize water use or to adapt agronomic practices. This study aimed at monitoring vine water status at a large scale with Sentinel-2 images. The goal was to provide a solution that would give spatialized and temporal information throughout the season on the water status of the vines. For this purpose, thirty six plots were monitored in total over three years (2018, 2019 and 2020). Vine water status was measured with stem water potential in field measurements from pea size to ripening stage. Simultaneously Sentinel-2 images were downloaded and processed to extract band reflectance values and compute vegetation indices. In our study, we tested five supervised regression machine learning algorithms to find possible relationships between stem water potential and data acquired from Sentinel-2 images (bands reflectance values and vegetation indices). Regression model using Red, NIR, Red-Edge and SWIR bands gave promising result to predict stem water potential (R2=0.40, RMSE=0.26).

Download Full-text

Using Bus Ticketing Big Data to Investigate the Behaviors of the Population Flow of Chinese Suburban Residents in the Post-COVID-19 Phase

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18116066 ◽

2021 ◽

Vol 18 (11) ◽

pp. 6066

Author(s):

Yanbing Bai ◽

Lu Sun ◽

Haoyu Liu ◽

Chao Xie

Keyword(s):

Travel Behavior ◽

Large Scale ◽

Behavior Patterns ◽

Population Movements ◽

Economic Level ◽

Passenger Travel ◽

Scale Population ◽

Travel Behaviors ◽

Railway Passenger ◽

And Control

Large-scale population movements can turn local diseases into widespread epidemics. Grasping the characteristic of the population flow in the context of the COVID-19 is of great significance for providing information to epidemiology and formulating scientific and reasonable prevention and control policies. Especially in the post-COVID-19 phase, it is essential to maintain the achievement of the fight against the epidemic. Previous research focuses on flight and railway passenger travel behavior and patterns, but China also has numerous suburban residents with a not-high economic level; investigating their travel behaviors is significant for national stability. However, estimating the impacts of the COVID-19 for suburban residents’ travel behaviors remains challenging because of lacking apposite data. Here we submit bus ticketing data including approximately 26,000,000 records from April 2020–August 2020 for 2705 stations. Our results indicate that Suburban residents in Chinese Southern regions are more likely to travel by bus, and travel frequency is higher. Associated with the economic level, we find that residents in the economically developed region more likely to travel or carry out various social activities. Considering from the perspective of the traveling crowd, we find that men and young people are easier to travel by bus; however, they are exactly the main workforce. The indication of our findings is that suburban residents’ travel behavior is affected profoundly by economy and consistent with the inherent behavior patterns before the COVID-19 outbreak. We use typical regions as verification and it is indeed the case.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text