Social Media Data Integration: From Data Lake to NoSQL Data Warehouse

Author(s):  
Hichem Dabbèchi ◽  
Nahla Zaaboub Haddar ◽  
Haytham Elghazel ◽  
Kais Haddar
Author(s):  
Michael Yulianto ◽  
Abba Suganda Girsang ◽  
Reinert Yosua Rumagit

Electronic ticket (eticket) provider services are growing fast in Indonesia, makingthe competition between companies increasingly intense. Moreover, most of them have the sameservice or feature for serving their customers. To get back the feedback of their customers, manycompanies use social media (Facebook and Twitter) for marketing activity or communicatingdirectly with their customers. The development of current technology allows the company totake data from social media. Thus, many companies take social media data for analyses. Thisstudy proposed developing a data warehouse to analyze data in social media such as likes,comments, and sentiment. Since the sentiment is not provided directly from social media data,this study uses lexicon based classification to categorize the sentiment of users’ comments. Thisdata warehouse provides business intelligence to see the performance of the company based ontheir social media data. The data warehouse is built using three travel companies in Indonesia.As a result, this data warehouse provides the comparison of the performance based on the socialmedia data.


2022 ◽  
Vol 18 (1) ◽  
pp. 0-0

Social media data become an integral part in the business data and should be integrated into the decisional process for better decision making based on information which reflects better the true situation of business in any field. However, social media data are unstructured and generated in very high frequency which exceeds the capacity of the data warehouse. In this work, we propose to extend the data warehousing process with a staging area which heart is a large scale system implementing an information extraction process using Storm and Hadoop frameworks to better manage their volume and frequency. Concerning structured information extraction, mainly events, we combine a set of techniques from NLP, linguistic rules and machine learning to succeed the task. Finally, we propose the adequate data warehouse conceptual model for events modeling and integration with enterprise data warehouse using an intermediate table called Bridge table. For application and experiments, we focus on drug abuse events extraction from Twitter data and their modeling into the Event Data Warehouse.


2012 ◽  
Vol 3 (2) ◽  
pp. 1-12 ◽  
Author(s):  
Debora S. Bartoo

This paper argues that organizations need to prepare for the integration of social media data into their data warehouses in order to fully understand their customers. Social media has quickly gained acceptance in its adoption and use and firms are eager to get their hands on it to better understand customer sentiment. However, social media data is different and more complex than traditional data and most data warehouses are not structured in a way for BI applications to easily make sense it. As a result, it is becoming critical for business intelligence teams to begin to understand the challenges this data presents and to better plan for the integration of this information into corporate data warehouses.


2014 ◽  
Author(s):  
Kathleen M. Carley ◽  
L. R. Carley ◽  
Jonathan Storrick

2018 ◽  
Author(s):  
Anika Oellrich ◽  
George Gkotsis ◽  
Richard James Butler Dobson ◽  
Tim JP Hubbard ◽  
Rina Dutta

BACKGROUND Dementia is a growing public health concern with approximately 50 million people affected worldwide in 2017 and this number is expected to reach more than 131 million by 2050. The toll on caregivers and relatives cannot be underestimated as dementia changes family relationships, leaves people socially isolated, and affects the finances of all those involved. OBJECTIVE The aim of this study was to explore using automated analysis (i) the age and gender of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) relevant subreddits authors are posting to, (iv) the types of messages posted and (v) the content of these posts. METHODS We analysed Reddit posts concerning dementia diagnoses. We used a previously developed text analysis pipeline to determine attributes of the posts as well as their authors to characterise online communications about dementia diagnoses. The posts were also examined by manual curation for the diagnosis provided and the person affected. Furthermore, we investigated the communities these people engage in and assessed the contents of the posts with an automated topic gathering technique. RESULTS Our results indicate that the majority of posters in our data set are women, and it is mostly close relatives such as parents and grandparents that are mentioned. Both the communities frequented and topics gathered reflect not only the sufferer's diagnosis but also potential outcomes, e.g. hardships experienced by the caregiver. The trends observed from this dataset are consistent with findings based on qualitative review, validating the robustness of social media automated text processing. CONCLUSIONS This work demonstrates the value of social media data sources as a resource for in-depth studies of those affected by a dementia diagnosis and the potential to develop novel support systems based on their real time processing in line with the increasing digitalisation of medical care.


Sign in / Sign up

Export Citation Format

Share Document