Understanding Smart City—A Data-Driven Literature Review

This paper systematically reviews the top 200 Google Scholar publications in the area of smart city with the aid of data-driven methods from the fields natural language processing and time series forecasting. Specifically, our algorithm crawls the textual information of the considered articles and uses the created ad-hoc database to identify the most relevant streams “smart infrastructure”, “smart economy & policy”, “smart technology”, “smart sustainability”, and “smart health”. Next, we automatically assign each manuscript into these subject areas by dint of several interdisciplinary scientific methods. Each stream is evaluated in a deep-dive analysis by (i) creating a word cloud to find the most important keywords, (ii) examining the main contributions, and (iii) applying time series methodologies to determine the past and future relevance. Due to our large-scaled literature, an in-depth evaluation of each stream is possible, which ultimately reveals strengths and weaknesses. We hereby acknowledge that smart sustainability will come to the fore in the next years—this fact confirms the current trend, as minimizing the required input of energy, water, food, waste, heat output and air pollution is becoming increasingly important.

Download Full-text

A Data-Driven Strategy to Combine Word Embeddings in Information Retrieval

10.5121/csit.2021.110107 ◽

2021 ◽

Author(s):

Alfredo Silva ◽

Marcelo Mendoza

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

Ad Hoc ◽

Data Driven ◽

Word Embeddings ◽

Continuous Vector ◽

Benchmark Data ◽

Promising Line ◽

Vector Representations

Word embeddings are vital descriptors of words in unigram representations of documents for many tasks in natural language processing and information retrieval. The representation of queries has been one of the most critical challenges in this area because it consists of a few terms and has little descriptive capacity. Strategies such as average word embeddings can enrich the queries' descriptive capacity since they favor the identification of related terms from the continuous vector representations that characterize these approaches. We propose a datadriven strategy to combine word embeddings. We use Idf combinations of embeddings to represent queries, showing that these representations outperform the average word embeddings recently proposed in the literature. Experimental results on benchmark data show that our proposal performs well, suggesting that data-driven combinations of word embeddings are a promising line of research in ad-hoc information retrieval.

Download Full-text

Effective and practical neural ranking

ACM SIGIR Forum ◽

10.1145/3476415.3476432 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Sean MacAvaney

Keyword(s):

Language Processing ◽

Large Scale ◽

Ad Hoc ◽

Computational Cost ◽

Query Time ◽

Language Models ◽

Supervised Machine Learning ◽

Deep Dive ◽

Ranking Models ◽

Ranking Tasks

Supervised machine learning methods that use neural networks ("deep learning") have yielded substantial improvements to a multitude of Natural Language Processing (NLP) tasks in the past decade. Improvements to Information Retrieval (IR) tasks, such as ad-hoc search, lagged behind those in similar NLP tasks, despite considerable community efforts. Although there are several contributing factors, I argue in this dissertation that early attempts were not more successful because they did not properly consider the unique characteristics of IR tasks when designing and training ranking models. I first demonstrate this by showing how large-scale datasets containing weak relevance labels can successfully replace training on in-domain collections. This technique improves the variety of queries encountered when training and helps mitigate concerns of over-fitting particular test collections. I then show that dataset statistics available in specific IR tasks can be easily incorporated into neural ranking models alongside the textual features, resulting in more effective ranking models. I also demonstrate that contextualized representations, particularly those from transformer-based language models, considerably improve neural ad-hoc ranking performance. I find that this approach is neither limited to the task of ad-hoc ranking (as demonstrated by ranking clinical reports) nor English content (as shown by training effective cross-lingual neural rankers). These efforts demonstrate that neural approaches can be effective for ranking tasks. However, I observe that these techniques are impractical due to their high query-time computational costs. To overcome this, I study approaches for offloading computational cost to index-time, substantially reducing query-time latency. These techniques make neural methods practical for ranking tasks. Finally, I take a deep dive into better understanding the linguistic biases of the methods I propose compared to contemporary and traditional approaches. The findings from this analysis highlight potential pitfalls of recent methods and provide a way to measure progress in this area going forward.

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores

Hydrology and Earth System Sciences ◽

10.5194/hess-23-4323-2019 ◽

2019 ◽

Vol 23 (10) ◽

pp. 4323-4331 ◽

Cited By ~ 60

Author(s):

Wouter J. M. Knoben ◽

Jim E. Freer ◽

Ross A. Woods

Keyword(s):

Time Series ◽

Coefficient Of Variation ◽

Ad Hoc ◽

Model Performance ◽

Technical Note ◽

Mean Flow ◽

Model Adequacy ◽

Robust Model ◽

The Mean ◽

Adequacy Assessment

Abstract. A traditional metric used in hydrology to summarize model performance is the Nash–Sutcliffe efficiency (NSE). Increasingly an alternative metric, the Kling–Gupta efficiency (KGE), is used instead. When NSE is used, NSE = 0 corresponds to using the mean flow as a benchmark predictor. The same reasoning is applied in various studies that use KGE as a metric: negative KGE values are viewed as bad model performance, and only positive values are seen as good model performance. Here we show that using the mean flow as a predictor does not result in KGE = 0, but instead KGE =1-√2≈-0.41. Thus, KGE values greater than −0.41 indicate that a model improves upon the mean flow benchmark – even if the model's KGE value is negative. NSE and KGE values cannot be directly compared, because their relationship is non-unique and depends in part on the coefficient of variation of the observed time series. Therefore, modellers who use the KGE metric should not let their understanding of NSE values guide them in interpreting KGE values and instead develop new understanding based on the constitutive parts of the KGE metric and the explicit use of benchmark values to compare KGE scores against. More generally, a strong case can be made for moving away from ad hoc use of aggregated efficiency metrics and towards a framework based on purpose-dependent evaluation metrics and benchmarks that allows for more robust model adequacy assessment.

Download Full-text

Simulation of sports movement training based on machine learning and brain-computer interface

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189481 ◽

2020 ◽

pp. 1-12

Author(s):

Linuo Wang

Keyword(s):

Machine Learning ◽

Time Series ◽

Joint Learning ◽

Scientific Methods ◽

Learning Framework ◽

Brain Functions ◽

Movement Training ◽

Practical Effect ◽

Machine Interface ◽

The Brain

Injuries and hidden dangers in training have a greater impact on athletes ’careers. In particular, the brain function that controls the motor function area has a greater impact on the athlete ’s competitive ability. Based on this, it is necessary to adopt scientific methods to recognize brain functions. In this paper, we study the structure of motor brain-computer and improve it based on traditional methods. Moreover, supported by machine learning and SVM technology, this study uses a DSP filter to convert the preprocessed EEG signal X into a time series, and adjusts the distance between the time series to classify the data. In order to solve the inconsistency of DSP algorithms, a multi-layer joint learning framework based on logistic regression model is proposed, and a brain-machine interface system of sports based on machine learning and SVM is constructed. In addition, this study designed a control experiment to improve the performance of the method proposed by this study. The research results show that the method in this paper has a certain practical effect and can be applied to sports.

Download Full-text

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-021-00267-x ◽

2021 ◽

Author(s):

Ekaterina Kochmar ◽

Dung Do Vu ◽

Robert Belfer ◽

Varun Gupta ◽

Iulian Vlad Serban ◽

...

Keyword(s):

Machine Learning ◽

Student Performance ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Large Scale ◽

Intelligent Tutoring ◽

Performance Outcomes ◽

Data Driven ◽

Personalized Feedback ◽

Tutoring Systems

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.

Download Full-text

Neural Data-Driven Captioning of Time-Series Line Charts

Proceedings of the International Conference on Advanced Visual Interfaces ◽

10.1145/3399715.3399829 ◽

2020 ◽

Author(s):

Andrea Spreafico ◽

Giuseppe Carenini

Keyword(s):

Time Series ◽

Data Driven ◽

Neural Data

Download Full-text

A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter

Information Systems Frontiers ◽

10.1007/s10796-021-10169-x ◽

2021 ◽

Author(s):

Irina Wedel ◽

Michael Palk ◽

Stefan Voß

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Modeling ◽

New Product ◽

Business Value ◽

Data Driven ◽

New Product Introduction ◽

Social Media Analytics ◽

Product Introduction ◽

Textual Data

AbstractSocial media enable companies to assess consumers’ opinions, complaints and needs. The systematic and data-driven analysis of social media to generate business value is summarized under the term Social Media Analytics which includes statistical, network-based and language-based approaches. We focus on textual data and investigate which conversation topics arise during the time of a new product introduction on Twitter and how the overall sentiment is during and after the event. The analysis via Natural Language Processing tools is conducted in two languages and four different countries, such that cultural differences in the tonality and customer needs can be identified for the product. Different methods of sentiment analysis and topic modeling are compared to identify the usability in social media and in the respective languages English and German. Furthermore, we illustrate the importance of preprocessing steps when applying these methods and identify relevant product insights.

Download Full-text

Sustainable Virtual Reality Patient Rehabilitation Systems with IoT Sensors Using Virtual Smart Cities

Sustainability ◽

10.3390/su13094716 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4716

Author(s):

Moustafa M. Nasralla

Keyword(s):

Machine Learning ◽

Time Series ◽

Virtual Reality ◽

Time Series Analysis ◽

Smart City ◽

Smart Cities ◽

Rehabilitation Services ◽

Simulation Scenario ◽

Common Problems ◽

Iot Devices

To develop sustainable rehabilitation systems, these should consider common problems on IoT devices such as low battery, connection issues and hardware damages. These should be able to rapidly detect any kind of problem incorporating the capacity of warning users about failures without interrupting rehabilitation services. A novel methodology is presented to guide the design and development of sustainable rehabilitation systems focusing on communication and networking among IoT devices in rehabilitation systems with virtual smart cities by using time series analysis for identifying malfunctioning IoT devices. This work is illustrated in a realistic rehabilitation simulation scenario in a virtual smart city using machine learning on time series for identifying and anticipating failures for supporting sustainability.

Download Full-text

Ad hoc and Ubiquitous Communication Environment supported by Data-Driven Networking Processor

TENCON 2006 - 2006 IEEE Region 10 Conference ◽

10.1109/tencon.2006.343950 ◽

2006 ◽

Author(s):

Hiroshi Ishii ◽

Chee Onn Chow ◽

Masahiro Yamamoto ◽

Hiroaki Nishikawa

Keyword(s):

Ad Hoc ◽

Data Driven ◽

Communication Environment

Download Full-text