On Car Sharing Usage Prediction with Open Socio-Demographic Data

Free Floating Car Sharing (FFCS) services are a flexible alternative to car ownership. These transportation services show highly dynamic usage both over different hours of the day, and across different city areas. In this work, we study the problem of predicting FFCS demand patterns -- a problem of great importance to an adequate provisioning of the service. We tackle both the prediction of the demand i) over time and ii) over space. We rely on months of real FFCS rides in Vancouver, which constitute our ground truth. We enrich this data with detailed socio-demographic information obtained from large open-data repositories to predict usage patterns. Our aim is to offer a thorough comparison of several machine learning algorithms in terms of accuracy and easiness of training, and to assess the effectiveness of current state-of-art approaches to address the prediction problem. Our results show that it is possible to predict the future usage with relative errors down to 10%, and the spatial prediction can be estimated with relative errors of about 40%. Our study also uncovered the socio-demographic features that most strongly correlate with FFCS usage, providing interesting insights for providers opening service in new regions.

Download Full-text

On Car-Sharing Usage Prediction with Open Socio-Demographic Data

Electronics ◽

10.3390/electronics9010072 ◽

2020 ◽

Vol 9 (1) ◽

pp. 72 ◽

Cited By ~ 3

Author(s):

Michele Cocca ◽

Douglas Teixeira ◽

Luca Vassio ◽

Marco Mellia ◽

Jussara M. Almeida ◽

...

Keyword(s):

Demographic Data ◽

Open Data ◽

Ground Truth ◽

Spatial Prediction ◽

Machine Learning Algorithms ◽

Car Sharing ◽

Data Repositories ◽

Transportation Services ◽

Usage Patterns ◽

Relative Errors

Free-Floating Car-Sharing (FFCS) services are a flexible alternative to car ownership. These transportation services show highly dynamic usage both over different hours of the day, and across different city areas. In this work, we study the problem of predicting FFCS demand patterns—a problem of great importance to the adequate provisioning of the service. We tackle both the prediction of the demand (i) over time and (ii) over space. We rely on months of real FFCS rides in Vancouver, which constitute our ground truth. We enrich this data with detailed socio-demographic information obtained from large open-data repositories to predict usage patterns. Our aim is to offer a thorough comparison of several machine-learning algorithms in terms of accuracy and ease of training, and to assess the effectiveness of current state-of-the-art approaches to address the prediction problem. Our results show that it is possible to predict the future usage with relative errors down to 10%, while the spatial prediction can be estimated with relative errors of about 40%. Our study also uncovers the socio-demographic features that most strongly correlate with FFCS usage, providing interesting insights for providers interested in offering services in new regions.

Download Full-text

Usage Patterns of Open Genomic Data

College & Research Libraries ◽

10.5860/crl-324 ◽

2013 ◽

Vol 74 (2) ◽

pp. 195-207 ◽

Cited By ~ 5

Author(s):

Jingfeng Xia ◽

Ying Liu

Keyword(s):

Large Scale ◽

Data Use ◽

Open Data ◽

Data Reuse ◽

Data Repository ◽

Biomedical Sciences ◽

Data Repositories ◽

Genome Expression ◽

Free Data ◽

Usage Patterns

This paper uses Genome Expression Omnibus (GEO), a data repository in biomedical sciences, to examine the usage patterns of open data repositories. It attempts to identify the degree of recognition of data reuse value and understand how e-science has impacted a large-scale scholarship. By analyzing a list of 1,211 publications that cite GEO data to support their independent studies, it discovers that free data can support a wealth of high-quality investigations, that the rate of open data use keeps growing over the years, and that scholars in different countries show different rates of complying with data-sharing policies.

Download Full-text

Neighbourhood Modelling for Urban Sustainability Assessment

Sustainability ◽

10.3390/su13094654 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4654

Author(s):

Javier Orozco-Messana ◽

Milagro Iborra-Lucas ◽

Raimon Calabuig-Moreno

Keyword(s):

Sustainable Development ◽

Impact Assessment ◽

Urban Policy ◽

Sustainability Assessment ◽

Open Data ◽

Policy Impact ◽

Physical Parameters ◽

Data Repositories ◽

Digital Twins ◽

Climate Action

Climate change is becoming a dominant concern for advanced countries. The Paris Agreement sets out a global framework whose implementation relates to all human activities and is commonly guided by the United Nations Sustainable Development Goals (UN SDGs), which set the scene for sustainable development performance configuring all climate action related policies. Fast control of CO2 emissions necessarily involves cities since they are responsible for 70 percent of greenhouse gas emissions. SDG 11 (Sustainable cities and communities) is clearly involved in the deployment of SDG 13 (Climate Action). European Sustainability policies are financially guided by the European Green Deal for a climate neutral urban environment. In turn, a common framework for urban policy impact assessment must be based on architectural design tools, such as building certification, and common data repositories for standard digital building models. Many Neighbourhood Sustainability Assessment (NSA) tools have been developed but the growing availability of open data repositories for cities, together with big-data sources (provided through Internet of Things repositories), allow accurate neighbourhood simulations, or in other words, digital twins of neighbourhoods. These digital twins are excellent tools for policy impact assessment. After a careful analysis of current scientific literature, this paper provides a generic approach for a simple neighbourhood model developed from building physical parameters which meets relevant assessment requirements, while simultaneously being updated (and tested) against real open data repositories, and how this assessment is related to building certification tools. The proposal is validated by real data on energy consumption and on its application to the Benicalap neighbourhood in Valencia (Spain).

Download Full-text

Spatial Prediction of Housing Prices in Beijing Using Machine Learning Algorithms

Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence ◽

10.1145/3409501.3409543 ◽

2020 ◽

Author(s):

Ziyue Yan ◽

Lu Zong

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Housing Prices ◽

Spatial Prediction ◽

Machine Learning Algorithms

Download Full-text

Classification of unlabeled online media

Scientific Reports ◽

10.1038/s41598-021-85608-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sakthi Kumar Arul Prakash ◽

Conrad Tucker

Keyword(s):

Social Media ◽

Real World ◽

Graphical Model ◽

Ground Truth ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Social Media Networks ◽

Online Social Media ◽

Wide Range

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.

Download Full-text

Detection of heat pumps from smart meter and open data

Energy Informatics ◽

10.1186/s42162-020-00124-6 ◽

2020 ◽

Vol 3 (S1) ◽

Author(s):

Andreas Weigert ◽

Konstantin Hopf ◽

Nicolai Weinig ◽

Thorsten Staake

Keyword(s):

Heat Pump ◽

Open Data ◽

Ground Truth ◽

Heat Pumps ◽

Geographical Information ◽

Weather Data ◽

Smart Meters ◽

Ground Truth Data ◽

Thermal Reservoir ◽

Grid Operators

Abstract Heat pumps embody solutions that heat or cool buildings effectively and sustainably, with zero emissions at the place of installation. As they pose significant load on the power grid, knowledge on their existence is crucial for grid operators, e.g., to forecast load and to plan grid operation. Further details, like the thermal reservoir (ground or air source) or the age of a heat pump installation renders energy-related services possible that utility companies can offer in the future (e.g., detecting wrongly calibrated installations, household energy efficiency checks). This study investigates the prediction of heat pump installations, their thermal reservoir and age. For this, we obtained a dataset with 397 households in Switzerland, all equipped with smart meters, collected ground truth data on installed heat pumps and enriched this data with weather data and geographical information. Our investigation replicates the state of the art in the area of heat pump detection and goes beyond it, as we obtain three major findings: First, machine learning can detect the existence of heat pumps with an AUC performance metric of 0.82, their heat reservoir with an AUC of 0.86, and their age with an AUC of 0.73. Second, heat pump existence can be better detected using data during the heating period than during summer. Third the number of training samples to detect the existence of heat pumps must not be necessarily large in terms of the number of training instances and observation period.

Download Full-text

Assessing biases, relaxing moralism: On ground-truthing practices in machine learning design and application

Big Data & Society ◽

10.1177/20539517211013569 ◽

2021 ◽

Vol 8 (1) ◽

pp. 205395172110135

Author(s):

Florian Jaton

Keyword(s):

Machine Learning ◽

William James ◽

A Priori ◽

Learning Algorithms ◽

Three Dimensional ◽

Ground Truth ◽

Machine Learning Algorithms ◽

Ground Truthing ◽

Set Up ◽

The Moment

This theoretical paper considers the morality of machine learning algorithms and systems in the light of the biases that ground their correctness. It begins by presenting biases not as a priori negative entities but as contingent external referents—often gathered in benchmarked repositories called ground-truth datasets—that define what needs to be learned and allow for performance measures. I then argue that ground-truth datasets and their concomitant practices—that fundamentally involve establishing biases to enable learning procedures—can be described by their respective morality, here defined as the more or less accounted experience of hesitation when faced with what pragmatist philosopher William James called “genuine options”—that is, choices to be made in the heat of the moment that engage different possible futures. I then stress three constitutive dimensions of this pragmatist morality, as far as ground-truthing practices are concerned: (I) the definition of the problem to be solved (problematization), (II) the identification of the data to be collected and set up (databasing), and (III) the qualification of the targets to be learned (labeling). I finally suggest that this three-dimensional conceptual space can be used to map machine learning algorithmic projects in terms of the morality of their respective and constitutive ground-truthing practices. Such techno-moral graphs may, in turn, serve as equipment for greater governance of machine learning algorithms and systems.

Download Full-text

Establishing Ground Truth on Pyschophysiological Models for Training Machine Learning Algorithms: Options for Ground Truth Proxies

Lecture Notes in Computer Science - Augmented Cognition. Neurocognition and Machine Learning ◽

10.1007/978-3-319-58628-1_35 ◽

2017 ◽

pp. 468-477

Author(s):

Keith Brawner ◽

Michael W. Boyce

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Ground Truth ◽

Machine Learning Algorithms

Download Full-text

TaxoNERD: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature

10.1101/2021.06.08.444426 ◽

2021 ◽

Author(s):

Nicolas Le Guillarme ◽

Wilfried Thuiller

Keyword(s):

Information Extraction ◽

High Performance ◽

Named Entity Recognition ◽

Open Data ◽

Entity Recognition ◽

Data Repositories ◽

Biodiversity Crisis ◽

Domain Specific ◽

Ecological Information ◽

Gold Standard Corpus

1. Given the biodiversity crisis, we more than ever need to access information on multiple taxa (e.g. distribution, traits, diet) in the scientific literature to understand, map and predict all-inclusive biodiversity. Tools are needed to automatically extract useful information from the ever-growing corpus of ecological texts and feed this information to open data repositories. A prerequisite is the ability to recognise mentions of taxa in text, a special case of named entity recognition (NER). In recent years, deep learning-based NER systems have become ubiqutous, yielding state-of-the-art results in the general and biomedical domains. However, no such tool is available to ecologists wishing to extract information from the biodiversity literature. 2. We propose a new tool called TaxoNERD that provides two deep neural network (DNN) models to recognise taxon mentions in ecological documents. To achieve high performance, DNN-based NER models usually need to be trained on a large corpus of manually annotated text. Creating such a gold standard corpus (GSC) is a laborious and costly process, with the result that GSCs in the ecological domain tend to be too small to learn an accurate DNN model from scratch. To address this issue, we leverage existing DNN models pretrained on large biomedical corpora using transfer learning. The performance of our models is evaluated on four GSCs and compared to the most popular taxonomic NER tools. 3. Our experiments suggest that existing taxonomic NER tools are not suited to the extraction of ecological information from text as they performed poorly on ecologically-oriented corpora, either because they do not take account of the variability of taxon naming practices, or because they do not generalise well to the ecological domain. Conversely, a domain-specific DNN-based tool like TaxoNERD outperformed the other approaches on an ecological information extraction task. 4. Efforts are needed in order to raise ecological information extraction to the same level of performance as its biomedical counterpart. One promising direction is to leverage the huge corpus of unlabelled ecological texts to learn a language representation model that could benefit downstream tasks. These efforts could be highly beneficial to ecologists on the long term.

Download Full-text

SEGMENTATION AND CLASSIFICATION OF NEPAL EARTHQUAKE INDUCED LANDSLIDES USING SENTINEL-1 PRODUCT

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-769-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 769-774

Author(s):

Saket Kunwar

Keyword(s):

Data Exchange ◽

Open Data ◽

Object Oriented ◽

Ground Truth ◽

British Geological Survey ◽

Exchange Program ◽

Amplitude Data ◽

Richter Scale ◽

Data Commons

On April 26, 2015, an earthquake of magnitude 7.8 on the Richter scale occurred, with epicentre at Barpak (28°12'20''N,84°44'19''E), Nepal. Landslides induced due to the earthquake and its aftershock added to the natural disaster claiming more than 9000 lives. Landslides represented as lines that extend from the head scarp to the toe of the deposit were mapped by the staff of the British Geological Survey and is available freely under Open Data Commons Open Database License(ODC-ODbL) license at the Humanitarian Data Exchange Program. This collection of 5578 landslides is used as preliminary ground truth in this study with the aim of producing polygonal delineation of the landslides from the polylines via object oriented segmentation. Texture measures from Sentinel-1a Ground Range Detected (GRD) Amplitude data and eigenvalue-decomposed Single Look Complex (SLC) polarimetry product are stacked for this purpose. This has also enabled the investigation of landslide properties in the H-Alpha plane, while developing a classification mechanism for identifying the occurrence of landslides.

Download Full-text