A Deep-Learning-Inspired Person-Job Matching Model Based on Sentence Vectors and Subject-Term Graphs

In this study, an end-to-end person-to-job post data matching model is constructed, and the experiments for matching people with the actual recruitment data are conducted. First, the representation of the constructed knowledge in the low-dimensional space is described. Then, it is explained in the Bidirectional Encoder Representations from Transformers (BERT) pretraining language model, which is introduced as the encoding model for textual information. The structure of the person-post matching model is explained in terms of the attention mechanism and its computational layers. Finally, the experiments based on the person-post matching model are compared with a variety of person-post matching methods in the actual recruitment dataset, and the experimental results are analyzed.

Download Full-text

Learning representations of Web entities for entity resolution

International Journal of Web Information Systems ◽

10.1108/ijwis-07-2018-0059 ◽

2019 ◽

Vol 15 (3) ◽

pp. 346-358

Author(s):

Luciano Barbosa

Keyword(s):

Deep Learning ◽

Dimensional Space ◽

Entity Resolution ◽

Data Sets ◽

Content Type ◽

Learning Framework ◽

Document Frequency ◽

Deep Learning Network ◽

Low Dimensional ◽

Vector Representations

Purpose Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that learns different representations of Web entities for entity resolution. Design/methodology/approach To match Web entities, the proposed network learns the following representations of entities: embeddings, which are vector representations of the words in the entities in a low-dimensional space; convolutional vectors from a convolutional layer, which capture short-distance patterns in word sequences in the entities; and bag-of-word vectors, created by a bow layer that learns weights for words in the vocabulary based on the task at hand. Given a pair of entities, the similarity between their learned representations is used as a feature to a binary classifier that identifies a possible match. In addition to those features, the classifier also uses a modification of inverse document frequency for pairs, which identifies discriminative words in pairs of entities. Findings The proposed approach was evaluated in two commercial and two academic entity resolution benchmarking data sets. The results have shown that the proposed strategy outperforms previous approaches in the commercial data sets, which are more challenging, and have similar results to its competitors in the academic data sets. Originality/value No previous work has used a single deep learning framework to learn different representations of Web entities for entity resolution.

Download Full-text

Identifying Disease Related Genes by Network Representation and Convolutional Neural Network

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.629876 ◽

2021 ◽

Vol 9 ◽

Author(s):

Bolin Chen ◽

Yourui Han ◽

Xuequn Shang ◽

Shenggui Zhang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Biological Networks ◽

Dimensional Space ◽

Scale Free ◽

Network Representation ◽

Disease Related Genes ◽

Representation Method ◽

Low Dimensional

The identification of disease related genes plays essential roles in bioinformatics. To achieve this, many powerful machine learning methods have been proposed from various computational aspects, such as biological network analysis, classification, regression, deep learning, etc. Among them, deep learning based methods have gained big success in identifying disease related genes in terms of higher accuracy and efficiency. However, these methods rarely handle the following two issues very well, which are (1) the multifunctions of many genes; and (2) the scale-free property of biological networks. To overcome these, we propose a novel network representation method to transfer individual vertices together with their surrounding topological structures into image-like datasets. It takes each node-induced sub-network as a represented candidate, and adds its environmental characteristics to generate a low-dimensional space as its representation. This image-like datasets can be applied directly in a Convolutional Neural Network-based method for identifying cancer-related genes. The numerical experiments show that the proposed method can achieve the AUC value at 0.9256 in a single network and at 0.9452 in multiple networks, which outperforms many existing methods.

Download Full-text

Supervised deep learning embeddings for the prediction of cervical cancer diagnosis

PeerJ Computer Science ◽

10.7717/peerj-cs.154 ◽

2018 ◽

Vol 4 ◽

pp. e154 ◽

Cited By ~ 13

Author(s):

Kelwin Fernandes ◽

Davide Chicco ◽

Jaime S. Cardoso ◽

Jessica Fernandes

Keyword(s):

Cervical Cancer ◽

Deep Learning ◽

Dimensional Space ◽

Cervical Screening ◽

Area Under The Curve ◽

Clinical Findings ◽

Screening Programs ◽

Efficient Access ◽

Low Dimensional

Cervical cancer remains a significant cause of mortality all around the world, even if it can be prevented and cured by removing affected tissues in early stages. Providing universal and efficient access to cervical screening programs is a challenge that requires identifying vulnerable individuals in the population, among other steps. In this work, we present a computationally automated strategy for predicting the outcome of the patient biopsy, given risk patterns from individual medical records. We propose a machine learning technique that allows a joint and fully supervised optimization of dimensionality reduction and classification models. We also build a model able to highlight relevant properties in the low dimensional space, to ease the classification of patients. We instantiated the proposed approach with deep learning architectures, and achieved accurate prediction results (top area under the curve AUC = 0.6875) which outperform previously developed methods, such as denoising autoencoders. Additionally, we explored some clinical findings from the embedding spaces, and we validated them through the medical literature, making them reliable for physicians and biomedical researchers.

Download Full-text

Acceptance Prediction for Answers on Online Health-care Community

BMC Bioinformatics ◽

10.1186/s12859-019-3129-2 ◽

2019 ◽

Vol 20 (S18) ◽

Author(s):

Qianlong Liu ◽

Kangenbei Liao ◽

Kelvin Kam-fai Tsoi ◽

Zhongyu Wei

Keyword(s):

Health Care ◽

Deep Learning ◽

Real World ◽

Experimental Results ◽

Health It ◽

Textual Information ◽

Care Community ◽

Generic Framework ◽

Textual Features ◽

Better Than

Abstract Background With the development of e-Health, it plays a more and more important role in predicting whether a doctor’s answer can be accepted by a patient through online healthcare community. Unlike the previous work which focus mainly on the numerical feature, in our framework, we combine both numerical and textual information to predict the acceptance of answers. The textual information is composed of questions posted by the patients and answers posted by the doctors. To extract the textual features from them, we first trained a sentence encoder to encode a pair of question and answer into a co-dependent representation on a held-out dataset. After that,we can use it to predict the acceptance of answers by doctors. Results Our experimental results on the real-world dataset demonstrate that by applying our model additional features from text can be extracted and the prediction can be more accurate. That’s to say, the model which take both textual features and numerical features as input performs significantly better than model which takes numerical features only on all the four metrics (Accuracy, AUC, F1-score and Recall). Conclusions This work proposes a generic framework combining numerical features and textual features for acceptance prediction, where textual features are extracted from text based on deep learning methods firstly and can be used to achieve a better prediction results.

Download Full-text

Chinese named entity recognition model based on BERT

MATEC Web of Conferences ◽

10.1051/matecconf/202133606021 ◽

2021 ◽

Vol 336 ◽

pp. 06021

Author(s):

Hongshuai Liu ◽

Ge Jun ◽

Yuanyuan Zheng

Keyword(s):

Deep Learning ◽

Language Model ◽

Named Entity Recognition ◽

Experimental Results ◽

Entity Recognition ◽

Global Information ◽

Learning Models ◽

Named Entity ◽

Whole Word ◽

Document Level

Nowadays, most deep learning models ignore Chinese habits and global information when processing Chinese tasks. To solve this problem, we constructed the BERT-BiLSTM-Attention-CRF model. In the model, we embeded the BERT pre-training language model that adopts the Whole Word Mask strategy, and added a document-level attention. Experimental results show that our method achieves good results in the MSRA corpus, and F1 reaches 95.00%.

Download Full-text

A System Fault Diagnosis Method with a Reclustering Algorithm

Scientific Programming ◽

10.1155/2021/6617882 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Zhe Yang ◽

Shi Ying ◽

Bingming Wang ◽

Yiyao Li ◽

Bo Dong ◽

...

Keyword(s):

Fault Diagnosis ◽

Prior Knowledge ◽

Language Model ◽

Experimental Results ◽

Log Analysis ◽

Log Data ◽

Data Support ◽

Diagnosis Method ◽

Low Dimensional

The log analysis-based system fault diagnosis method can help engineers analyze the fault events generated by the system. The K-means algorithm can perform log analysis well and does not require a lot of prior knowledge, but the K-means-based system fault diagnosis method needs to be improved in both efficiency and accuracy. To solve this problem, we propose a system fault diagnosis method based on a reclustering algorithm. First, we propose a log vectorization method based on the PV-DM language model to obtain low-dimensional log vectors which can provide effective data support for the subsequent fault diagnosis; then, we improve the K-means algorithm and make the effect of K-means algorithm based log clustering; finally, we propose a reclustering method based on keywords’ extraction to improve the accuracy of fault diagnosis. We use system log data generated by two supercomputers to verify our method. The experimental results show that compared with the traditional K-means method, our method can improve the accuracy of fault diagnosis while ensuring the efficiency of fault diagnosis.

Download Full-text

Semantic and relational spaces in science of science: deep learning models for article vectorisation

Scientometrics ◽

10.1007/s11192-021-03984-1 ◽

2021 ◽

Author(s):

Diego Kozlowski ◽

Jennifer Dusdal ◽

Jun Pang ◽

Andreas Zilian

Keyword(s):

Deep Learning ◽

Language Processing ◽

Dimensional Space ◽

Relevant Information ◽

Semantic Space ◽

Scientific Publications ◽

Social Patterns ◽

Scientific Research Articles ◽

Graph Neural Networks ◽

Low Dimensional

AbstractOver the last century, we observe a steady and exponential growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while GNN we enable us to build a relational space where the social practices of a research community are also encoded.

Download Full-text

A Survey of Network Embedding for Drug Analysis and Prediction

Current Protein and Peptide Science ◽

10.2174/1389203721666200702145701 ◽

2020 ◽

Vol 21 ◽

Author(s):

Zhixian Liu ◽

Qingfeng Chen ◽

Wei Lan ◽

Jiahai Liang ◽

Yiping Pheobe Chen ◽

...

Keyword(s):

Deep Learning ◽

Protein Function ◽

Dimensional Space ◽

Auxiliary Information ◽

Matrix Decomposition ◽

Drug Analysis ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Network Embedding ◽

Similarity Estimation

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.

Download Full-text