Social Media-based User Embedding: A Literature Review

The analysis for social networks, such as the socially connected Internet of Things, has shown a deep influence of intelligent information processing technology on industrial systems for Smart Cities. The goal of social media representation learning is to learn dense, low-dimensional, and continuous representations for multimodal data within social networks, facilitating many real-world applications. Since social media images are usually accompanied by rich metadata (e.g., textual descriptions, tags, groups, and submitted users), simply modeling the image is not effective to learn the comprehensive information from social media images. In this work, we treat the image and its textual description as multimodal content, and transform other metainformation into the links between contents (such as two images marked by the same tag or submitted by the same user). Based on the multimodal content and social links, we propose a Deep Attentive Multimodal Graph Embedding model named DAMGE for more effective social image representation learning. We introduce both small- and large-scale datasets to conduct extensive experiments, of which the results confirm the superiority of the proposal on the tasks of social image classification and link prediction.

Download Full-text

The Use of Social Media for Urban Planning

Geospatial Intelligence ◽

10.4018/978-1-5225-8054-6.ch044 ◽

2019 ◽

pp. 1049-1070

Author(s):

Fabian Neuhaus

Keyword(s):

Social Media ◽

Big Data ◽

Urban Areas ◽

Large Scale ◽

Urban Environments ◽

Data Systems ◽

User Management ◽

Social Nature ◽

User Data ◽

Use Of Social Media

User data created in the digital context has increasingly been of interest to analysis and spatial analysis in particular. Large scale computer user management systems such as digital ticketing and social networking are creating vast amount of data. Such data systems can contain information generated by potentially millions of individuals. This kind of data has been termed big data. The analysis of big data can in its spatial but also in a temporal and social nature be of much interest for analysis in the context of cities and urban areas. This chapter discusses this potential along with a selection of sample work and an in-depth case study. Hereby the focus is mainly on the use and employment of insight gained from social media data, especially the Twitter platform, in regards to cities and urban environments. The first part of the chapter discusses a range of examples that make use of big data and the mapping of digital social network data. The second part discusses the way the data is collected and processed. An important section is dedicated to the aspects of ethical considerations. A summary and an outlook are discussed at the end.

Download Full-text

BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text

Bioinformatics ◽

10.1093/bioinformatics/btaa837 ◽

2020 ◽

Author(s):

Ronghui You ◽

Yuxuan Liu ◽

Hiroshi Mamitsuka ◽

Shanfeng Zhu

Keyword(s):

Full Text ◽

High Performance ◽

Large Scale ◽

Learning Strategy ◽

Learning To Rank ◽

Representation Learning ◽

Supplementary Information ◽

Medical Subject Headings ◽

The Difference ◽

Contextual Representation

Abstract Motivation With the rapid increase of biomedical articles, large-scale automatic Medical Subject Headings (MeSH) indexing has become increasingly important. FullMeSH, the only method for large-scale MeSH indexing with full text, suffers from three major drawbacks: FullMeSH (i) uses Learning To Rank, which is time-consuming, (ii) can capture some pre-defined sections only in full text and (iii) ignores the whole MEDLINE database. Results We propose a computationally lighter, full text and deep-learning-based MeSH indexing method, BERTMeSH, which is flexible for section organization in full text. BERTMeSH has two technologies: (i) the state-of-the-art pre-trained deep contextual representation, Bidirectional Encoder Representations from Transformers (BERT), which makes BERTMeSH capture deep semantics of full text. (ii) A transfer learning strategy for using both full text in PubMed Central (PMC) and title and abstract (only and no full text) in MEDLINE, to take advantages of both. In our experiments, BERTMeSH was pre-trained with 3 million MEDLINE citations and trained on ∼1.5 million full texts in PMC. BERTMeSH outperformed various cutting-edge baselines. For example, for 20 K test articles of PMC, BERTMeSH achieved a Micro F-measure of 69.2%, which was 6.3% higher than FullMeSH with the difference being statistically significant. Also prediction of 20 K test articles needed 5 min by BERTMeSH, while it took more than 10 h by FullMeSH, proving the computational efficiency of BERTMeSH. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

A Product Feature Inference Model for Mining Implicit Customer Preferences Within Large Scale Social Media Networks

Volume 1B: 35th Computers and Information in Engineering Conference ◽

10.1115/detc2015-47225 ◽

2015 ◽

Cited By ~ 6

Author(s):

Suppawong Tuarob ◽

Conrad S. Tucker

Keyword(s):

Social Media ◽

Large Scale ◽

Ground Truth ◽

Inference Model ◽

Underlying Assumption ◽

Ground Truth Data ◽

Social Media Networks ◽

Customer Preferences ◽

Product Features ◽

Online Sources

The acquisition and mining of product feature data from online sources such as customer review websites and large scale social media networks is an emerging area of research. In many existing design methodologies that acquire product feature preferences form online sources, the underlying assumption is that product features expressed by customers are explicitly stated and readily observable to be mined using product feature extraction tools. In many scenarios however, product feature preferences expressed by customers are implicit in nature and do not directly map to engineering design targets. For example, a customer may implicitly state “wow I have to squint to read this on the screen”, when the explicit product feature may be a larger screen. The authors of this work propose an inference model that automatically assigns the most probable explicit product feature desired by a customer, given an implicit preference expressed. The algorithm iteratively refines its inference model by presenting a hypothesis and using ground truth data, determining its statistical validity. A case study involving smartphone product features expressed through Twitter networks is presented to demonstrate the effectiveness of the proposed methodology.

Download Full-text

An effective drug-disease associations prediction model based on graphic representation learning over multi-biomolecular network

BMC Bioinformatics ◽

10.1186/s12859-021-04553-2 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Hanjing Jiang ◽

Yabing Huang

Keyword(s):

High Performance ◽

Large Scale ◽

Representation Learning ◽

Biological Data ◽

Graph Representation ◽

Data Set ◽

Validation Experiment ◽

Biomolecular Network ◽

Disease Associations ◽

Drug Reposition

Abstract Background Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. Results In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. Conclusions The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.

Download Full-text

Building High Performance Explainable Machine Learning Models for Social Media-based Substance Use Prediction

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821302060009x ◽

2020 ◽

Vol 29 (03n04) ◽

pp. 2060009

Author(s):

Tao Ding ◽

Fatema Hasan ◽

Warren K. Bickel ◽

Shimei Pan

Keyword(s):

Machine Learning ◽

Social Media ◽

Substance Use ◽

Human Behavior ◽

High Performance ◽

Supervised Machine Learning ◽

Learning Models ◽

Wide Range ◽

And Behavior ◽

Machine Learning Models

Social media contain rich information that can be used to help understand human mind and behavior. Social media data, however, are mostly unstructured (e.g., text and image) and a large number of features may be needed to represent them (e.g., we may need millions of unigrams to represent social media texts). Moreover, accurately assessing human behavior is often difficult (e.g., assessing addiction may require medical diagnosis). As a result, the ground truth data needed to train a supervised human behavior model are often difficult to obtain at a large scale. To avoid overfitting, many state-of-the-art behavior models employ sophisticated unsupervised or self-supervised machine learning methods to leverage a large amount of unsupervised data for both feature learning and dimension reduction. Unfortunately, despite their high performance, these advanced machine learning models often rely on latent features that are hard to explain. Since understanding the knowledge captured in these models is important to behavior scientists and public health providers, we explore new methods to build machine learning models that are not only accurate but also interpretable. We evaluate the effectiveness of the proposed methods in predicting Substance Use Disorders (SUD). We believe the methods we proposed are general and applicable to a wide range of data-driven human trait and behavior analysis applications.

Download Full-text

Comparison Study of Different NoSQL and Cloud Paradigm for Better Data Storage Technology

Handbook of Research on Cloud and Fog Computing Infrastructures for Data Science - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-5972-6.ch015 ◽

2018 ◽

pp. 312-343

Author(s):

Pankaj Lathar ◽

K. G. Srinivasa ◽

Abhishek Kumar ◽

Nabeel Siddiqui

Keyword(s):

Cloud Computing ◽

Data Management ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Web Based ◽

Storage Technology ◽

Data Store ◽

Challenges And Opportunities ◽

User Data

Advancements in web-based technology and the proliferation of sensors and mobile devices interacting with the internet have resulted in immense data management requirements. These data management activities include storage, processing, demand of high-performance read-write operations of big data. Large-scale and high-concurrency applications like SNS and search engines have appeared to be facing challenges in using the relational database to store and query dynamic user data. NoSQL and cloud computing has emerged as a paradigm that could meet these requirements. The available diversity of existing NoSQL and cloud computing solutions make it difficult to comprehend the domain and choose an appropriate solution for a specific business task. Therefore, this chapter reviews NoSQL and cloud-system-based solutions with the goal of providing a perspective in the field of data storage technology/algorithms, leveraging guidance to researchers and practitioners to select the best-fit data store, and identifying challenges and opportunities of the paradigm.

Download Full-text

The Use of Social Media for Urban Planning

Advances in Civil and Industrial Engineering - Technologies for Urban and Spatial Planning ◽

10.4018/978-1-4666-4349-9.ch006 ◽

2014 ◽

pp. 113-134 ◽

Cited By ~ 3

Author(s):

Fabian Neuhaus

Keyword(s):

Social Media ◽

Big Data ◽

Urban Areas ◽

Large Scale ◽

Urban Environments ◽

Data Systems ◽

User Management ◽

Social Nature ◽

User Data ◽

Use Of Social Media

User data created in the digital context has increasingly been of interest to analysis and spatial analysis in particular. Large scale computer user management systems such as digital ticketing and social networking are creating vast amount of data. Such data systems can contain information generated by potentially millions of individuals. This kind of data has been termed big data. The analysis of big data can in its spatial but also in a temporal and social nature be of much interest for analysis in the context of cities and urban areas. This chapter discusses this potential along with a selection of sample work and an in-depth case study. Hereby the focus is mainly on the use and employment of insight gained from social media data, especially the Twitter platform, in regards to cities and urban environments. The first part of the chapter discusses a range of examples that make use of big data and the mapping of digital social network data. The second part discusses the way the data is collected and processed. An important section is dedicated to the aspects of ethical considerations. A summary and an outlook are discussed at the end.

Download Full-text

D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data

GigaScience ◽

10.1093/gigascience/giaa126 ◽

2020 ◽

Vol 9 (11) ◽

Cited By ~ 1

Author(s):

Shaokun An ◽

Jizu Huang ◽

Lin Wan

Keyword(s):

Time Series ◽

Dimensionality Reduction ◽

Single Cell ◽

High Performance ◽

Large Scale ◽

Distributed Storage ◽

Distributed Computation ◽

Low Dimensional ◽

Cell Data ◽

Performance Computing

Abstract Background Dimensionality reduction and visualization play vital roles in single-cell RNA sequencing (scRNA-seq) data analysis. While they have been extensively studied, state-of-the-art dimensionality reduction algorithms are often unable to preserve the global structures underlying data. Elastic embedding (EE), a nonlinear dimensionality reduction method, has shown promise in revealing low-dimensional intrinsic local and global data structure. However, the current implementation of the EE algorithm lacks scalability to large-scale scRNA-seq data. Results We present a distributed optimization implementation of the EE algorithm, termed distributed elastic embedding (D-EE). D-EE reveals the low-dimensional intrinsic structures of data with accuracy equal to that of elastic embedding, and it is scalable to large-scale scRNA-seq data. It leverages distributed storage and distributed computation, achieving memory efficiency and high-performance computing simultaneously. In addition, an extended version of D-EE, termed distributed optimization implementation of time-series elastic embedding (D-TSEE), enables the user to visualize large-scale time-series scRNA-seq data by incorporating experimentally temporal information. Results with large-scale scRNA-seq data indicate that D-TSEE can uncover oscillatory gene expression patterns by using experimentally temporal information. Conclusions D-EE is a distributed dimensionality reduction and visualization tool. Its distributed storage and distributed computation technique allow us to efficiently analyze large-scale single-cell data at the cost of constant time speedup. The source code for D-EE algorithm based on C and MPI tailored to a high-performance computing cluster is available at https://github.com/ShaokunAn/D-EE.

Download Full-text

A survey of large-scale reasoning on the Web of data

The Knowledge Engineering Review ◽

10.1017/s0269888918000255 ◽

2018 ◽

Vol 33 ◽

Cited By ~ 3

Author(s):

Grigoris Antoniou ◽

Sotiris Batsakis ◽

Raghava Mutharaju ◽

Jeff Z. Pan ◽

Guilin Qi ◽

...

Keyword(s):

Systematic Review ◽

Social Media ◽

Parallel Algorithms ◽

High Performance ◽

Large Scale ◽

Open Problems ◽

Reasoning Systems ◽

Web Of Data ◽

Computational Properties ◽

The Web

AbstractAs more and more data is being generated by sensor networks, social media and organizations, the Web interlinking this wealth of information becomes more complex. This is particularly true for the so-called Web of Data, in which data is semantically enriched and interlinked using ontologies. In this large and uncoordinated environment, reasoning can be used to check the consistency of the data and of associated ontologies, or to infer logical consequences which, in turn, can be used to obtain new insights from the data. However, reasoning approaches need to be scalable in order to enable reasoning over the entire Web of Data. To address this problem, several high-performance reasoning systems, which mainly implement distributed or parallel algorithms, have been proposed in the last few years. These systems differ significantly; for instance in terms of reasoning expressivity, computational properties such as completeness, or reasoning objectives. In order to provide a first complete overview of the field, this paper reports a systematic review of such scalable reasoning approaches over various ontological languages, reporting details about the methods and over the conducted experiments. We highlight the shortcomings of these approaches and discuss some of the open problems related to performing scalable reasoning.

Download Full-text