scholarly journals On Car Sharing Usage Prediction with Open Socio-Demographic Data

Author(s):  
Michele Cocca ◽  
Douglas Teixeira ◽  
Luca Vassio ◽  
Marco Mellia ◽  
Jussara M. Almeida ◽  
...  

Free Floating Car Sharing (FFCS) services are a flexible alternative to car ownership. These transportation services show highly dynamic usage both over different hours of the day, and across different city areas. In this work, we study the problem of predicting FFCS demand patterns -- a problem of great importance to an adequate provisioning of the service. We tackle both the prediction of the demand i) over time and ii) over space. We rely on months of real FFCS rides in Vancouver, which constitute our ground truth. We enrich this data with detailed socio-demographic information obtained from large open-data repositories to predict usage patterns. Our aim is to offer a thorough comparison of several machine learning algorithms in terms of accuracy and easiness of training, and to assess the effectiveness of current state-of-art approaches to address the prediction problem. Our results show that it is possible to predict the future usage with relative errors down to 10%, and the spatial prediction can be estimated with relative errors of about 40%. Our study also uncovered the socio-demographic features that most strongly correlate with FFCS usage, providing interesting insights for providers opening service in new regions.

Electronics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 72 ◽  
Author(s):  
Michele Cocca ◽  
Douglas Teixeira ◽  
Luca Vassio ◽  
Marco Mellia ◽  
Jussara M. Almeida ◽  
...  

Free-Floating Car-Sharing (FFCS) services are a flexible alternative to car ownership. These transportation services show highly dynamic usage both over different hours of the day, and across different city areas. In this work, we study the problem of predicting FFCS demand patterns—a problem of great importance to the adequate provisioning of the service. We tackle both the prediction of the demand (i) over time and (ii) over space. We rely on months of real FFCS rides in Vancouver, which constitute our ground truth. We enrich this data with detailed socio-demographic information obtained from large open-data repositories to predict usage patterns. Our aim is to offer a thorough comparison of several machine-learning algorithms in terms of accuracy and ease of training, and to assess the effectiveness of current state-of-the-art approaches to address the prediction problem. Our results show that it is possible to predict the future usage with relative errors down to 10%, while the spatial prediction can be estimated with relative errors of about 40%. Our study also uncovers the socio-demographic features that most strongly correlate with FFCS usage, providing interesting insights for providers interested in offering services in new regions.


2013 ◽  
Vol 74 (2) ◽  
pp. 195-207 ◽  
Author(s):  
Jingfeng Xia ◽  
Ying Liu

This paper uses Genome Expression Omnibus (GEO), a data repository in biomedical sciences, to examine the usage patterns of open data repositories. It attempts to identify the degree of recognition of data reuse value and understand how e-science has impacted a large-scale scholarship. By analyzing a list of 1,211 publications that cite GEO data to support their independent studies, it discovers that free data can support a wealth of high-quality investigations, that the rate of open data use keeps growing over the years, and that scholars in different countries show different rates of complying with data-sharing policies.


2021 ◽  
Vol 13 (9) ◽  
pp. 4654
Author(s):  
Javier Orozco-Messana ◽  
Milagro Iborra-Lucas ◽  
Raimon Calabuig-Moreno

Climate change is becoming a dominant concern for advanced countries. The Paris Agreement sets out a global framework whose implementation relates to all human activities and is commonly guided by the United Nations Sustainable Development Goals (UN SDGs), which set the scene for sustainable development performance configuring all climate action related policies. Fast control of CO2 emissions necessarily involves cities since they are responsible for 70 percent of greenhouse gas emissions. SDG 11 (Sustainable cities and communities) is clearly involved in the deployment of SDG 13 (Climate Action). European Sustainability policies are financially guided by the European Green Deal for a climate neutral urban environment. In turn, a common framework for urban policy impact assessment must be based on architectural design tools, such as building certification, and common data repositories for standard digital building models. Many Neighbourhood Sustainability Assessment (NSA) tools have been developed but the growing availability of open data repositories for cities, together with big-data sources (provided through Internet of Things repositories), allow accurate neighbourhood simulations, or in other words, digital twins of neighbourhoods. These digital twins are excellent tools for policy impact assessment. After a careful analysis of current scientific literature, this paper provides a generic approach for a simple neighbourhood model developed from building physical parameters which meets relevant assessment requirements, while simultaneously being updated (and tested) against real open data repositories, and how this assessment is related to building certification tools. The proposal is validated by real data on energy consumption and on its application to the Benicalap neighbourhood in Valencia (Spain).


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sakthi Kumar Arul Prakash ◽  
Conrad Tucker

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.


2020 ◽  
Vol 3 (S1) ◽  
Author(s):  
Andreas Weigert ◽  
Konstantin Hopf ◽  
Nicolai Weinig ◽  
Thorsten Staake

Abstract Heat pumps embody solutions that heat or cool buildings effectively and sustainably, with zero emissions at the place of installation. As they pose significant load on the power grid, knowledge on their existence is crucial for grid operators, e.g., to forecast load and to plan grid operation. Further details, like the thermal reservoir (ground or air source) or the age of a heat pump installation renders energy-related services possible that utility companies can offer in the future (e.g., detecting wrongly calibrated installations, household energy efficiency checks). This study investigates the prediction of heat pump installations, their thermal reservoir and age. For this, we obtained a dataset with 397 households in Switzerland, all equipped with smart meters, collected ground truth data on installed heat pumps and enriched this data with weather data and geographical information. Our investigation replicates the state of the art in the area of heat pump detection and goes beyond it, as we obtain three major findings: First, machine learning can detect the existence of heat pumps with an AUC performance metric of 0.82, their heat reservoir with an AUC of 0.86, and their age with an AUC of 0.73. Second, heat pump existence can be better detected using data during the heating period than during summer. Third the number of training samples to detect the existence of heat pumps must not be necessarily large in terms of the number of training instances and observation period.


2021 ◽  
Vol 8 (1) ◽  
pp. 205395172110135
Author(s):  
Florian Jaton

This theoretical paper considers the morality of machine learning algorithms and systems in the light of the biases that ground their correctness. It begins by presenting biases not as a priori negative entities but as contingent external referents—often gathered in benchmarked repositories called ground-truth datasets—that define what needs to be learned and allow for performance measures. I then argue that ground-truth datasets and their concomitant practices—that fundamentally involve establishing biases to enable learning procedures—can be described by their respective morality, here defined as the more or less accounted experience of hesitation when faced with what pragmatist philosopher William James called “genuine options”—that is, choices to be made in the heat of the moment that engage different possible futures. I then stress three constitutive dimensions of this pragmatist morality, as far as ground-truthing practices are concerned: (I) the definition of the problem to be solved (problematization), (II) the identification of the data to be collected and set up (databasing), and (III) the qualification of the targets to be learned (labeling). I finally suggest that this three-dimensional conceptual space can be used to map machine learning algorithmic projects in terms of the morality of their respective and constitutive ground-truthing practices. Such techno-moral graphs may, in turn, serve as equipment for greater governance of machine learning algorithms and systems.


2021 ◽  
Author(s):  
Nicolas Le Guillarme ◽  
Wilfried Thuiller

1. Given the biodiversity crisis, we more than ever need to access information on multiple taxa (e.g. distribution, traits, diet) in the scientific literature to understand, map and predict all-inclusive biodiversity. Tools are needed to automatically extract useful information from the ever-growing corpus of ecological texts and feed this information to open data repositories. A prerequisite is the ability to recognise mentions of taxa in text, a special case of named entity recognition (NER). In recent years, deep learning-based NER systems have become ubiqutous, yielding state-of-the-art results in the general and biomedical domains. However, no such tool is available to ecologists wishing to extract information from the biodiversity literature. 2. We propose a new tool called TaxoNERD that provides two deep neural network (DNN) models to recognise taxon mentions in ecological documents. To achieve high performance, DNN-based NER models usually need to be trained on a large corpus of manually annotated text. Creating such a gold standard corpus (GSC) is a laborious and costly process, with the result that GSCs in the ecological domain tend to be too small to learn an accurate DNN model from scratch. To address this issue, we leverage existing DNN models pretrained on large biomedical corpora using transfer learning. The performance of our models is evaluated on four GSCs and compared to the most popular taxonomic NER tools. 3. Our experiments suggest that existing taxonomic NER tools are not suited to the extraction of ecological information from text as they performed poorly on ecologically-oriented corpora, either because they do not take account of the variability of taxon naming practices, or because they do not generalise well to the ecological domain. Conversely, a domain-specific DNN-based tool like TaxoNERD outperformed the other approaches on an ecological information extraction task. 4. Efforts are needed in order to raise ecological information extraction to the same level of performance as its biomedical counterpart. One promising direction is to leverage the huge corpus of unlabelled ecological texts to learn a language representation model that could benefit downstream tasks. These efforts could be highly beneficial to ecologists on the long term.


Author(s):  
Saket Kunwar

On April 26, 2015, an earthquake of magnitude 7.8 on the Richter scale occurred, with epicentre at Barpak (28°12'20''N,84°44'19''E), Nepal. Landslides induced due to the earthquake and its aftershock added to the natural disaster claiming more than 9000 lives. Landslides represented as lines that extend from the head scarp to the toe of the deposit were mapped by the staff of the British Geological Survey and is available freely under Open Data Commons Open Database License(ODC-ODbL) license at the Humanitarian Data Exchange Program. This collection of 5578 landslides is used as preliminary ground truth in this study with the aim of producing polygonal delineation of the landslides from the polylines via object oriented segmentation. Texture measures from Sentinel-1a Ground Range Detected (GRD) Amplitude data and eigenvalue-decomposed Single Look Complex (SLC) polarimetry product are stacked for this purpose. This has also enabled the investigation of landslide properties in the H-Alpha plane, while developing a classification mechanism for identifying the occurrence of landslides.


Sign in / Sign up

Export Citation Format

Share Document