Recovering the Association Between Unlinked Fare Machines and Stations Using Automated Fare Collection Data in Metro Systems

Author(s):  
Pengfei Zhang ◽  
Zhenliang Ma ◽  
Xiaoxiong Weng ◽  
Haris N. Koutsopoulos

Data quality is the foundation of data-driven applications in transportation. Data problems such as missing and invalid data could sharply reduce the performance of the methods used in these applications. Although there exist plenty of studies related to data quality issues, they only focus on missing or invalid data caused by infrastructure failures (e.g., loop detector malfunction). In general, there is a lack of attention to data quality issues from insufficient data management. This paper proposes a tensor decomposition based framework to tackle a specific missing data problem which occurs when the machine-station dictionary of an automated fare collection system database is incomplete. In such cases, there is a large amount of loss of origin/destination information as the affected machines are not linked to any station. Consequently, all associated transactions may miss the origin/destination information. The proposed framework recovers the dictionary by capturing features of the passenger flow passing through the unlinked fare machine. Evaluation results show that the proposed approach could recover the missing data with high accuracy even when several fare machines are not linked to a station. The framework could also support other beneficial applications.

2020 ◽  
Vol 20 (23) ◽  
pp. 13984-13998
Author(s):  
Jinghan Du ◽  
Minghua Hu ◽  
Weining Zhang

Bad Science ◽  
2020 ◽  
pp. 135-138
Author(s):  
Florian Meinfelder ◽  
Rebekka Kluge

Author(s):  
Hatice Uenal ◽  
David Hampel

Registries are indispensable in medical studies and provide the basis for reliable study results for research questions. Depending on the purpose of use, a high quality of data is a prerequisite. However, with increasing registry quality, costs also increase accordingly. Considering these time and cost factors, this work is an attempt to estimate the cost advantages of applying statistical tools to existing registry data, including quality evaluation. Results for quality analysis showed that there are unquestionable savings of millions in study costs by reducing the time horizon and saving on average € 523,126 for every reduced year. Replacing additionally the over 25 % missing data in some variables, data quality was immensely improved. To conclude, our findings showed dearly the importance of data quality and statistical input in avoiding biased conclusions due to incomplete data.


2012 ◽  
Vol 40 (2) ◽  
pp. 282-303 ◽  
Author(s):  
Jieli Ding ◽  
Yanyan Liu ◽  
David B. Peden ◽  
Steven R. Kleeberger ◽  
Haibo Zhou

Sign in / Sign up

Export Citation Format

Share Document