A Feature-Based Approach to Large-Scale Freeway Congestion Detection Using Full Cellular Activity Data

Author(s):  
Shen Li ◽  
Yang Cheng ◽  
Peter Jin ◽  
Fan Ding ◽  
Qing Li ◽  
...  
Author(s):  
Bat-hen Nahmias-Biran ◽  
Yafei Han ◽  
Shlomo Bekhor ◽  
Fang Zhao ◽  
Christopher Zegras ◽  
...  

Smartphone-based travel surveys have attracted much attention recently, for their potential to improve data quality and response rate. One of the first such survey systems, Future Mobility Sensing (FMS), leverages sensors on smartphones, and machine learning techniques to collect detailed personal travel data. The main purpose of this research is to compare data collected by FMS and traditional methods, and study the implications of using FMS data for travel behavior modeling. Since its initial field test in Singapore, FMS has been used in several large-scale household travel surveys, including one in Tel Aviv, Israel. We present comparative analyses that make use of the rich datasets from Singapore and Tel Aviv, focusing on three main aspects: (1) richness in activity behaviors observed, (2) completeness of travel and activity data, and (3) data accuracy. Results show that FMS has clear advantages over traditional travel surveys: it has higher resolution and better accuracy of times, locations, and paths; FMS represents out-of-work and leisure activities well; and reveals large variability in day-to-day activity pattern, which is inadequately captured in a one-day snapshot in typical traditional surveys. FMS also captures travel and activities that tend to be under-reported in traditional surveys such as multiple stops in a tour and work-based sub-tours. These richer and more complete and accurate data can improve future activity-based modeling.


2021 ◽  
pp. 1-48
Author(s):  
Zuchao Li ◽  
Hai Zhao ◽  
Shexia He ◽  
Jiaxun Cai

Abstract Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruningbased and syntax feature-based. Experiments are conducted on the CoNLL-2005, 2009, and 2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.


Author(s):  
Benedict Irwin ◽  
Thomas Whitehead ◽  
Scott Rowland ◽  
Samar Mahmoud ◽  
Gareth Conduit ◽  
...  

More accurate predictions of the biological properties of chemical compounds would guide the selection and design of new compounds in drug discovery and help to address the enormous cost and low success-rate of pharmaceutical R&D. However this domain presents a significant challenge for AI methods due to the sparsity of compound data and the noise inherent in results from biological experiments. In this paper, we demonstrate how data imputation using deep learning provides substantial improvements over quantitative structure-activity relationship (QSAR) machine learning models that are widely applied in drug discovery. We present the largest-to-date successful application of deep-learning imputation to datasets which are comparable in size to the corporate data repository of a pharmaceutical company (678,994 compounds by 1166 endpoints). We demonstrate this improvement for three areas of practical application linked to distinct use cases; i) target activity data compiled from a range of drug discovery projects, ii) a high value and heterogeneous dataset covering complex absorption, distribution, metabolism and elimination properties and, iii) high throughput screening data, testing the algorithm’s limits on early-stage noisy and very sparse data. Achieving median coefficients of determination, R, of 0.69, 0.36 and 0.43 respectively across these applications, the deep learning imputation method offers an unambiguous improvement over random forest QSAR methods, which achieve median R values of 0.28, 0.19 and 0.23 respectively. We also demonstrate that robust estimates of the uncertainties in the predicted values correlate strongly with the accuracies in prediction, enabling greater confidence in decision-making based on the imputed values.


2017 ◽  
Vol 14 (4) ◽  
pp. 172988141770907 ◽  
Author(s):  
Hanbo Wu ◽  
Xin Ma ◽  
Zhimeng Zhang ◽  
Haibo Wang ◽  
Yibin Li

Human daily activity recognition has been a hot spot in the field of computer vision for many decades. Despite best efforts, activity recognition in naturally uncontrolled settings remains a challenging problem. Recently, by being able to perceive depth and visual cues simultaneously, RGB-D cameras greatly boost the performance of activity recognition. However, due to some practical difficulties, the publicly available RGB-D data sets are not sufficiently large for benchmarking when considering the diversity of their activities, subjects, and background. This severely affects the applicability of complicated learning-based recognition approaches. To address the issue, this article provides a large-scale RGB-D activity data set by merging five public RGB-D data sets that differ from each other on many aspects such as length of actions, nationality of subjects, or camera angles. This data set comprises 4528 samples depicting 7 action categories (up to 46 subcategories) performed by 74 subjects. To verify the challengeness of the data set, three feature representation methods are evaluated, which are depth motion maps, spatiotemporal depth cuboid similarity feature, and curvature space scale. Results show that the merged large-scale data set is more realistic and challenging and therefore more suitable for benchmarking.


2021 ◽  
Author(s):  
Davendu Y. Kulkarni ◽  
Gan Lu ◽  
Feng Wang ◽  
Luca di Mare

Abstract The gas turbine engine design involves multi-disciplinary, multi-fidelity iterative design-analysis processes. These highly intertwined processes are nowadays incorporated in automated design frameworks to facilitate high-fidelity, fully coupled, large-scale simulations. The most tedious and time-consuming step in such simulations is the construction of a common geometry database that ensures geometry consistency at every step of the design iteration, is accessible to multi-disciplinary solvers and allows system-level analysis. This paper presents a novel design-intent-driven geometry modelling environment that is based on a top-down feature-based geometry model generation method. In the proposed object-oriented environment, each feature entity possesses a separate identity, denotes an abstract geometry, and carries a set of characteristics. These geometry features are organised in a turbomachinery feature taxonomy. The engine geometry is represented by a tree-like logical structure of geometry features, wherein abstract features outline the engine architecture, while the detailed geometry is defined by lower-level features. This top-down flexible arrangement of feature-tree enables the design intent to be preserved throughout the design process, allows the design to be modified freely and supports the design intent variations to be propagated throughout the geometry automatically. The application of the proposed feature-based geometry modelling environment is demonstrated by generating a whole-engine computational geometry. This geometry modelling environment provides an efficient means of rapidly populating complex turbomachinery assemblies. The generated engine geometry is fully scalable, easily modifiable and is re-usable for generating the geometry models of new engines or their derivatives. This capability also enables fast multi-fidelity simulation and optimisation of various gas turbine systems.


2020 ◽  
pp. 1-28
Author(s):  
Tirthankar Ghosal ◽  
Vignesh Edithal ◽  
Asif Ekbal ◽  
Pushpak Bhattacharyya ◽  
Srinivasa Satya Sameer Kumar Chivukula ◽  
...  

Abstract Detecting, whether a document contains sufficient new information to be deemed as novel, is of immense significance in this age of data duplication. Existing techniques for document-level novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted features extracted from the documents in a rule-based or traditional feature-based machine learning setup. Here, we present an effective approach based on neural attention mechanism to detect document-level novelty without any manual feature engineering. We contend that the simple alignment of texts between the source and target document(s) could identify the state of novelty of a target document. Our deep neural architecture elicits inference knowledge from a large-scale natural language inference dataset, which proves crucial to the novelty detection task. Our approach is effective and outperforms the standard baselines and recent work on document-level novelty detection by a margin of $\sim$ 3% in terms of accuracy.


2020 ◽  
Author(s):  
Sinan Aral ◽  
Paramveer S. Dhillon

Most online content publishers have moved to subscription-based business models regulated by digital paywalls. But the managerial implications of such freemium content offerings are not well understood. We, therefore, utilized microlevel user activity data from the New York Times to conduct a large-scale study of the implications of digital paywall design for publishers. Specifically, we use a quasi-experiment that varied the (1) quantity (the number of free articles) and (2) exclusivity (the number of available sections) of free content available through the paywall to investigate the effects of paywall design on content demand, subscriptions, and total revenue. The paywall policy changes we studied suppressed total content demand by about 9.9%, reducing total advertising revenue. However, this decrease was more than offset by increased subscription revenue as the policy change led to a 31% increase in total subscriptions during our seven-month study, yielding net positive revenues of over $230,000. The results confirm an economically significant impact of the newspaper’s paywall design on content demand, subscriptions, and net revenue. Our findings can help structure the scientific discussion about digital paywall design and help managers optimize digital paywalls to maximize readership, revenue, and profit. This paper was accepted by Chris Forman, information systems.


Author(s):  
Pooja Parameshwarappa ◽  
Zhiyuan Chen ◽  
Gunes Koru

Publishing physical activity data can facilitate reproducible health-care research in several areas such as population health management, behavioral health research, and management of chronic health problems. However, publishing such data also brings high privacy risks related to re-identification which makes anonymization necessary. One of the challenges in anonymizing physical activity data collected periodically is its sequential nature. The existing anonymization techniques work sufficiently for cross-sectional data but have high computational costs when applied directly to sequential data. This article presents an effective anonymization approach, multi-level clustering-based anonymization to anonymize physical activity data. Compared with the conventional methods, the proposed approach improves time complexity by reducing the clustering time drastically. While doing so, it preserves the utility as much as the conventional approaches.


Sign in / Sign up

Export Citation Format

Share Document