Generating Cross-Domain Text Classification Corpora from Social Media Comments

Author(s):  
Benjamin Murauer ◽  
Günther Specht
Author(s):  
Aibo Guo ◽  
Xinyi Li ◽  
Ning Pang ◽  
Xiang Zhao

Community Q&A forum is a special type of social media that provides a platform to raise questions and to answer them (both by forum participants), to facilitate online information sharing. Currently, community Q&A forums in professional domains have attracted a large number of users by offering professional knowledge. To support information access and save users’ efforts of raising new questions, they usually come with a question retrieval function, which retrieves similar existing questions (and their answers) to a user’s query. However, it can be difficult for community Q&A forums to cover all domains, especially those emerging lately with little labeled data but great discrepancy from existing domains. We refer to this scenario as cross-domain question retrieval. To handle the unique challenges of cross-domain question retrieval, we design a model based on adversarial training, namely, X-QR , which consists of two modules—a domain discriminator and a sentence matcher. The domain discriminator aims at aligning the source and target data distributions and unifying the feature space by domain-adversarial training. With the assistance of the domain discriminator, the sentence matcher is able to learn domain-consistent knowledge for the final matching prediction. To the best of our knowledge, this work is among the first to investigate the domain adaption problem of sentence matching for community Q&A forums question retrieval. The experiment results suggest that the proposed X-QR model offers better performance than conventional sentence matching methods in accomplishing cross-domain community Q&A tasks.


2020 ◽  
Author(s):  
Jiting Tang ◽  
Saini Yang ◽  
Weiping Wang

<p>In 2019, the typhoon Lekima hit China, bringing strong winds and heavy rainfall to the nine provinces and municipalities on the northeastern coast of China. According to the Ministry of Emergency Management of the People’s Republic of China, Lekima caused 66 direct fatalities, 14 million affected people and is responsible for a direct economic loss in excess of 50 billion yuan. The current observation technologies include remote sensing and meteorological observation. But they have a long time cycle of data collection and a low interaction with disaster victims. Social media big data is a new data source for natural disaster research, which can provide technical reference for natural hazard analysis, risk assessment and emergency rescue information management.</p><p>We propose an assessment framework of social media data-based typhoon-induced flood assessment, which includes five parts: (1) <strong>Data acquisition.</strong> Obtain Sina Weibo text and some tag attributes based on keywords, time and location. (2) <strong>Spatiotemporal quantitative analysis.</strong> Collect the public concerns and trends from the perspective of words, time and space of different scales to judge the impact range of typhoon-induced flood. (3) <strong>Text classification and multi-source heterogeneous data fusion analysis.</strong> Build a hazard intensity and disaster text classification model by CNN (Convolutional Neural Networks), then integrate multi-source data including meteorological monitoring, population economy and disaster report for secondary evaluation and correction. (4) <strong>Text clustering and sub event mining.</strong> Extract subevents by BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) text clustering algorithms for automatic recognition of emergencies. (5) <strong>Emotional analysis and crisis management.</strong> Use time-space sequence model and four-quadrant analysis method to track the public negative emotions and find the potential crisis for emergency management.</p><p>This framework is validated with the case study of typhoon Lekima. The results show that social media big data makes up for the gap of data efficiency and spatial coverage. Our framework can assess the influence coverage, hazard intensity, disaster information and emergency needs, and it can reverse the disaster propagation process based on the spatiotemporal sequence. The assessment results after the secondary correction of multi-source data can be used in the actual system.</p><p>The proposed framework can be applied on a wide spatial scope and even full coverage; it is spatially efficient and can obtain feedback from affected areas and people almost immediately at the same time as a disaster occurs. Hence, it has a promising potential in large-scale and real-time disaster assessment.</p>


Sign in / Sign up

Export Citation Format

Share Document