vector representations
Recently Published Documents





2022 ◽  
Vol 40 (2) ◽  
pp. 1-38
Shangsong Liang ◽  
Yupeng Luo ◽  
Zaiqiao Meng

In this article, we study the task of user profiling in question answering communities (QACs). Previous user profiling algorithms suffer from a number of defects: they regard users and words as atomic units, leading to the mismatch between them; they are designed for other applications but not for QACs; and some semantic profiling algorithms do not co-embed users and words, leading to making the affinity measurement between them difficult. To improve the profiling performance, we propose a neural Flow-based Constrained Co-embedding Model, abbreviated as FCCM. FCCM jointly co-embeds the vector representations of both users and words in QACs such that the affinities between them can be semantically measured. Specifically, FCCM extends the standard variational auto-encoder model to enforce the inferred embeddings of users and words subject to the voting constraint, i.e., given a question and the users who answer this question in the community, representations of the users whose answers receive more votes are closer to the representations of the words associated with these answers, compared with representations of whose receiving fewer votes. In addition, FCCM integrates normalizing flow into the variational auto-encoder framework to avoid the assumption that the distributions of the embeddings are Gaussian, making the inferred embeddings fit the real distributions of the data better. Experimental results on a Chinese Zhihu question answering dataset demonstrate the effectiveness of our proposed FCCM model for the task of user profiling in QACs.

2022 ◽  
Vol 2022 ◽  
pp. 1-13
Jinbo Chao ◽  
Chunhui Zhao ◽  
Fuzhi Zhang

Information security is one of the key issues in e-commerce Internet of Things (IoT) platform research. The collusive spamming groups on e-commerce platforms can write a large number of fake reviews over a period of time for the evaluated products, which seriously affect the purchase decision behaviors of consumers and destroy the fair competition environment among merchants. To address this problem, we propose a network embedding based approach to detect collusive spamming groups. First, we use the idea of a meta-graph to construct a heterogeneous information network based on the user review dataset. Second, we exploit the modified DeepWalk algorithm to learn the low-dimensional vector representations of user nodes in the heterogeneous information network and employ the clustering methods to obtain candidate spamming groups. Finally, we leverage an indicator weighting strategy to calculate the spamming score of each candidate group, and the top-k groups with high spamming scores are considered to be the collusive spamming groups. The experimental results on two real-world review datasets show that the overall detection performance of the proposed approach is much better than that of baseline methods.

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 189
Álvaro de Pablo ◽  
Oscar Araque ◽  
Carlos A. Iglesias

The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they are written, enables the behavior analysis of users and their opinions with respect to different domains. In this work, a hybrid machine learning-based system has been developed to classify texts using topic modeling techniques and different word-vector representations, as well as traditional text representations. The system has been trained with ride-hailing posts extracted from Reddit, showing promising performance. Then, the generated models have been tested with data extracted from other sources such as Twitter and Google Play, classifying these texts without retraining any models and thus performing Transfer Learning. The obtained results show that our proposed architecture is effective when performing Transfer Learning from data-rich domains and applying them to other sources.

Chenglong Xie ◽  
Xu-Xu Zhuang ◽  
Zhangming Niu ◽  
Ruixue Ai ◽  
Sofie Lautrup ◽  

AbstractA reduced removal of dysfunctional mitochondria is common to aging and age-related neurodegenerative pathologies such as Alzheimer’s disease (AD). Strategies for treating such impaired mitophagy would benefit from the identification of mitophagy modulators. Here we report the combined use of unsupervised machine learning (involving vector representations of molecular structures, pharmacophore fingerprinting and conformer fingerprinting) and a cross-species approach for the screening and experimental validation of new mitophagy-inducing compounds. From a library of naturally occurring compounds, the workflow allowed us to identify 18 small molecules, and among them two potent mitophagy inducers (Kaempferol and Rhapontigenin). In nematode and rodent models of AD, we show that both mitophagy inducers increased the survival and functionality of glutamatergic and cholinergic neurons, abrogated amyloid-β and tau pathologies, and improved the animals’ memory. Our findings suggest the existence of a conserved mechanism of memory loss across the AD models, this mechanism being mediated by defective mitophagy. The computational–experimental screening and validation workflow might help uncover potent mitophagy modulators that stimulate neuronal health and brain homeostasis.

2022 ◽  
Maria Sindeeva ◽  
Nikolay Chekanov ◽  
Manvel Avetisian ◽  
Nikita Baranov ◽  
Elian Malkin ◽  

Interpretation of non-coding genomic variants is one of the most important challenges in human genetics. Machine learning methods have emerged recently as a powerful tool to solve this problem. State-of-the-art approaches allow prediction of transcriptional and epigenetic effects caused by non-coding mutations. However, these approaches require specific experimental data for training and can not generalize across cell types where required features were not experimentally measured. We show here that available epigenetic characteristics of human cell types are extremely sparse, limiting those approaches that rely on specific epigenetic input. We propose a new neural network architecture, DeepCT, which can learn complex interconnections of epigenetic features and infer unmeasured data from any available input. Furthermore, we show that DeepCT can learn cell type-specific properties, build biologically meaningful vector representations of cell types and utilize these representations to generate cell type-specific predictions of the effects of non-coding variations in the human genome.

2021 ◽  
pp. 1-12
Melesio Crespo-Sanchez ◽  
Ivan Lopez-Arevalo ◽  
Edwin Aldana-Bobadilla ◽  
Alejandro Molina-Villegas

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.

2021 ◽  
Vol 15 ◽  
Dongcheng He ◽  
Haluk Ogmen

Newborns demonstrate innate abilities in coordinating their sensory and motor systems through reflexes. One notable characteristic is circular reactions consisting of self-generated motor actions that lead to correlated sensory and motor activities. This paper describes a model for goal-directed reaching based on circular reactions and exocentric reference-frames. The model is built using physiologically plausible visual processing modules and arm-control neural networks. The model incorporates map representations with ego- and exo-centric reference frames for sensory inputs, vector representations for motor systems, as well as local associative learning that result from arm explorations. The integration of these modules is simulated and tested in a three-dimensional spatial environment using Unity3D. The results show that, through self-generated activities, the model self-organizes to generate accurate arm movements that are tolerant with respect to various sources of noise.

2021 ◽  
Bomin Wei ◽  
Xiang Gong

AbstractThe substantial cost of new drug research and development has consistently posed a huge burden and tremendous challenge for both pharmaceutical companies and patients. In order to lower the expenditure and development failure rate, repurposing existing and approved drugs and identifying novel interactions between the drug molecules and the target proteins based on computational methods have gained growing attention. Here, we propose the DeepPLA, a novel deep learning-based model that combines ResNet-based 1D CNN and biLSTM, to establish an end-to-end network for protein-ligand binding affinity prediction. We first apply pre-trained embedding methods to encode the raw drug molecular SMILES strings and target protein sequences into dense vector representations. The dense vector representations separately go through ResNet-based 1D CNN modules to derive features. The extracted feature vectors are concatenated and further fed into the biLSTM network after average pooling operation, followed by the MLP module to finally predict binding affinity. We used BindingDB dataset for training and evaluating our DeepPLA model. The result shows that the DeepPLA model reaches a good performance for the protein-ligand binding affinity prediction in terms of R, RMSE, MAE, R2 and MSE with 0.89, 0.68, 0.50, 0.79 and 0.46 on the training set; and scores 0.84, 0.80, 0.60, 0.71 and 0.64 on the independent testing set, respectively. This result suggests the high accuracy of the DeepPLA prediction performance, as well as its high capability in generalization, demonstrating that the DeepPLA can be the potential upgrade to pinpoint new drug-target interactions to find better destinations for proven drugs.

2021 ◽  
Vol 7 (1) ◽  
pp. 30-40
Changro Lee

Although clustering analysis is a popular tool in unsupervised learning, it is inefficient for the datasets dominated by categorical variables, e.g., real estate datasets. To apply clustering analysis to real estate datasets, this study proposes an entity embedding approach that transforms categorical variables into vector representations. Three variants of a clustering algorithm, i.e., the clustering based on the traditional Euclidean distance, the Gower distance, and the embedding vectors, are applied to the land sales records to delineate the real estate market in Gwacheon-si, Gyeonggi province, South Korea. Then, the relevance of the resultant submarkets is evaluated using the root mean squared errors (RMSE) obtained from a hedonic pricing model. The results show that the RMSE in the embedding vector-based algorithm decreases substantially from 0.076-0.077 to 0.069. This study shows that the clustering algorithm empowered by embedding vectors outperforms the conventional algorithms, thereby enhancing the relevance of the delineated submarkets.

2021 ◽  
Vol 19 (3) ◽  
pp. 61-69
N. I. Tikhonov

Visualizations are used to better understand collections of scientific publications. Various methods of analyzing text collections can be used to build these visualizations. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. To demonstrate a work of these techniques and an example of their application, visualizations were developed, which are described in this paper.

Sign in / Sign up

Export Citation Format

Share Document