scholarly journals A Deep-Learning-Inspired Person-Job Matching Model Based on Sentence Vectors and Subject-Term Graphs

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xiaowei Wang ◽  
Zhenhong Jiang ◽  
Lingxi Peng

In this study, an end-to-end person-to-job post data matching model is constructed, and the experiments for matching people with the actual recruitment data are conducted. First, the representation of the constructed knowledge in the low-dimensional space is described. Then, it is explained in the Bidirectional Encoder Representations from Transformers (BERT) pretraining language model, which is introduced as the encoding model for textual information. The structure of the person-post matching model is explained in terms of the attention mechanism and its computational layers. Finally, the experiments based on the person-post matching model are compared with a variety of person-post matching methods in the actual recruitment dataset, and the experimental results are analyzed.

2019 ◽  
Vol 15 (3) ◽  
pp. 346-358
Author(s):  
Luciano Barbosa

Purpose Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that learns different representations of Web entities for entity resolution. Design/methodology/approach To match Web entities, the proposed network learns the following representations of entities: embeddings, which are vector representations of the words in the entities in a low-dimensional space; convolutional vectors from a convolutional layer, which capture short-distance patterns in word sequences in the entities; and bag-of-word vectors, created by a bow layer that learns weights for words in the vocabulary based on the task at hand. Given a pair of entities, the similarity between their learned representations is used as a feature to a binary classifier that identifies a possible match. In addition to those features, the classifier also uses a modification of inverse document frequency for pairs, which identifies discriminative words in pairs of entities. Findings The proposed approach was evaluated in two commercial and two academic entity resolution benchmarking data sets. The results have shown that the proposed strategy outperforms previous approaches in the commercial data sets, which are more challenging, and have similar results to its competitors in the academic data sets. Originality/value No previous work has used a single deep learning framework to learn different representations of Web entities for entity resolution.


Author(s):  
Bolin Chen ◽  
Yourui Han ◽  
Xuequn Shang ◽  
Shenggui Zhang

The identification of disease related genes plays essential roles in bioinformatics. To achieve this, many powerful machine learning methods have been proposed from various computational aspects, such as biological network analysis, classification, regression, deep learning, etc. Among them, deep learning based methods have gained big success in identifying disease related genes in terms of higher accuracy and efficiency. However, these methods rarely handle the following two issues very well, which are (1) the multifunctions of many genes; and (2) the scale-free property of biological networks. To overcome these, we propose a novel network representation method to transfer individual vertices together with their surrounding topological structures into image-like datasets. It takes each node-induced sub-network as a represented candidate, and adds its environmental characteristics to generate a low-dimensional space as its representation. This image-like datasets can be applied directly in a Convolutional Neural Network-based method for identifying cancer-related genes. The numerical experiments show that the proposed method can achieve the AUC value at 0.9256 in a single network and at 0.9452 in multiple networks, which outperforms many existing methods.


2018 ◽  
Vol 4 ◽  
pp. e154 ◽  
Author(s):  
Kelwin Fernandes ◽  
Davide Chicco ◽  
Jaime S. Cardoso ◽  
Jessica Fernandes

Cervical cancer remains a significant cause of mortality all around the world, even if it can be prevented and cured by removing affected tissues in early stages. Providing universal and efficient access to cervical screening programs is a challenge that requires identifying vulnerable individuals in the population, among other steps. In this work, we present a computationally automated strategy for predicting the outcome of the patient biopsy, given risk patterns from individual medical records. We propose a machine learning technique that allows a joint and fully supervised optimization of dimensionality reduction and classification models. We also build a model able to highlight relevant properties in the low dimensional space, to ease the classification of patients. We instantiated the proposed approach with deep learning architectures, and achieved accurate prediction results (top area under the curve AUC = 0.6875) which outperform previously developed methods, such as denoising autoencoders. Additionally, we explored some clinical findings from the embedding spaces, and we validated them through the medical literature, making them reliable for physicians and biomedical researchers.


2019 ◽  
Vol 20 (S18) ◽  
Author(s):  
Qianlong Liu ◽  
Kangenbei Liao ◽  
Kelvin Kam-fai Tsoi ◽  
Zhongyu Wei

Abstract Background With the development of e-Health, it plays a more and more important role in predicting whether a doctor’s answer can be accepted by a patient through online healthcare community. Unlike the previous work which focus mainly on the numerical feature, in our framework, we combine both numerical and textual information to predict the acceptance of answers. The textual information is composed of questions posted by the patients and answers posted by the doctors. To extract the textual features from them, we first trained a sentence encoder to encode a pair of question and answer into a co-dependent representation on a held-out dataset. After that,we can use it to predict the acceptance of answers by doctors. Results Our experimental results on the real-world dataset demonstrate that by applying our model additional features from text can be extracted and the prediction can be more accurate. That’s to say, the model which take both textual features and numerical features as input performs significantly better than model which takes numerical features only on all the four metrics (Accuracy, AUC, F1-score and Recall). Conclusions This work proposes a generic framework combining numerical features and textual features for acceptance prediction, where textual features are extracted from text based on deep learning methods firstly and can be used to achieve a better prediction results.


2021 ◽  
Vol 336 ◽  
pp. 06021
Author(s):  
Hongshuai Liu ◽  
Ge Jun ◽  
Yuanyuan Zheng

Nowadays, most deep learning models ignore Chinese habits and global information when processing Chinese tasks. To solve this problem, we constructed the BERT-BiLSTM-Attention-CRF model. In the model, we embeded the BERT pre-training language model that adopts the Whole Word Mask strategy, and added a document-level attention. Experimental results show that our method achieves good results in the MSRA corpus, and F1 reaches 95.00%.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Zhe Yang ◽  
Shi Ying ◽  
Bingming Wang ◽  
Yiyao Li ◽  
Bo Dong ◽  
...  

The log analysis-based system fault diagnosis method can help engineers analyze the fault events generated by the system. The K-means algorithm can perform log analysis well and does not require a lot of prior knowledge, but the K-means-based system fault diagnosis method needs to be improved in both efficiency and accuracy. To solve this problem, we propose a system fault diagnosis method based on a reclustering algorithm. First, we propose a log vectorization method based on the PV-DM language model to obtain low-dimensional log vectors which can provide effective data support for the subsequent fault diagnosis; then, we improve the K-means algorithm and make the effect of K-means algorithm based log clustering; finally, we propose a reclustering method based on keywords’ extraction to improve the accuracy of fault diagnosis. We use system log data generated by two supercomputers to verify our method. The experimental results show that compared with the traditional K-means method, our method can improve the accuracy of fault diagnosis while ensuring the efficiency of fault diagnosis.


2021 ◽  
Author(s):  
Diego Kozlowski ◽  
Jennifer Dusdal ◽  
Jun Pang ◽  
Andreas Zilian

AbstractOver the last century, we observe a steady and exponential growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while GNN we enable us to build a relational space where the social practices of a research community are also encoded.


Author(s):  
Zhixian Liu ◽  
Qingfeng Chen ◽  
Wei Lan ◽  
Jiahai Liang ◽  
Yiping Pheobe Chen ◽  
...  

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.


NeuroImage ◽  
2021 ◽  
pp. 118200
Author(s):  
Sayan Ghosal ◽  
Qiang Chen ◽  
Giulio Pergola ◽  
Aaron L. Goldman ◽  
William Ulrich ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document