Mining Twitter Data for Landslide Events Reported Worldwide
The explosion of user generated content in social media published from mobile devices has led to the concept known as “citizen sensing.” Although English has been adopted by many as a de facto standard international language, reports about events, such as disasters, are frequently provided by citizens in their local language in addition to English. Attempting to integrate citizen reports from many languages is a significant challenge. This article describes the tools that address this challenge to enable the support of citizen-sensing of landslide events reported worldwide. Multilingual support is based on the first unified cross-lingual dataset of word vectors for representing texts in multiple languages. The classification model based on the proposed cross-lingual word vectors outperforms the “native” and “translated” approaches based on monolingual word vectors. Furthermore, it does not require the creation of a separate training set in a local language or its translation to English.