scholarly journals Implicit Skills Extraction Using Document Embedding and Its Use in Job Recommendation

2020 ◽  
Vol 34 (08) ◽  
pp. 13286-13293
Author(s):  
Akshay Gugnani ◽  
Hemant Misra

This paper presents a job recommender system to match resumes to job descriptions (JD), both of which are non-standard and unstructured/semi-structured in form. First, the paper proposes a combination of natural language processing (NLP) techniques for the task of skill extraction. The performance of the combined techniques on an industrial scale dataset yielded a precision and recall of 0.78 and 0.88 respectively. The paper then introduces the concept of extracting implicit skills – the skills which are not explicitly mentioned in a JD but may be implicit in the context of geography, industry or role. To mine and infer implicit skills for a JD, we find the other JDs similar to this JD. This similarity match is done in the semantic space. A Doc2Vec model is trained on 1.1 Million JDs covering several domains crawled from the web, and all the JDs are projected onto this semantic space. The skills absent in the JD but present in similar JDs are obtained, and the obtained skills are weighted using several techniques to obtain the set of final implicit skills. Finally, several similarity measures are explored to match the skills extracted from a candidate's resume to explicit and implicit skills of JDs. Empirical results for matching resumes and JDs demonstrate that the proposed approach gives a mean reciprocal rank of 0.88, an improvement of 29.4% when compared to the performance of a baseline method that uses only explicit skills.

2020 ◽  
Vol 44 (2) ◽  
pp. 231-246
Author(s):  
Karlo Babić ◽  
Francesco Guerra ◽  
Sanda Martinčić-Ipšić ◽  
Ana Meštrović

Measuring the semantic similarity of texts has a vital role in various tasks from the field of natural language processing. In this paper, we describe a set of experiments we carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts. We perform a comparison of four models based on word embeddings: two variants of Word2Vec (one based on Word2Vec trained on a specific dataset and the second extending it with embeddings of word senses), FastText, and TF-IDF. Since these models provide word vectors, we experiment with various methods that calculate the semantic similarity of short texts based on word vectors. More precisely, for each of these models, we test five methods for aggregating word embeddings into text embedding. We introduced three methods by making variations of two commonly used similarity measures. One method is an extension of the cosine similarity based on centroids, and the other two methods are variations of the Okapi BM25 function. We evaluate all approaches on the two publicly available datasets: SICK and Lee in terms of the Pearson and Spearman correlation. The results indicate that extended methods perform better from the original in most of the cases.


Designs ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 42
Author(s):  
Eric Lazarski ◽  
Mahmood Al-Khassaweneh ◽  
Cynthia Howard

In recent years, disinformation and “fake news” have been spreading throughout the internet at rates never seen before. This has created the need for fact-checking organizations, groups that seek out claims and comment on their veracity, to spawn worldwide to stem the tide of misinformation. However, even with the many human-powered fact-checking organizations that are currently in operation, disinformation continues to run rampant throughout the Web, and the existing organizations are unable to keep up. This paper discusses in detail recent advances in computer science to use natural language processing to automate fact checking. It follows the entire process of automated fact checking using natural language processing, from detecting claims to fact checking to outputting results. In summary, automated fact checking works well in some cases, though generalized fact checking still needs improvement prior to widespread use.


Author(s):  
Mohamed Biniz ◽  
Rachid El Ayachi ◽  
Mohamed Fakir

<p>Ontology matching is a discipline that means two things: first, the process of discovering correspondences between two different ontologies, and second is the result of this process, that is to say the expression of correspondences. This discipline is a crucial task to solve problems merging and evolving of heterogeneous ontologies in applications of the Semantic Web. This domain imposes several challenges, among them, the selection of appropriate similarity measures to discover the correspondences. In this article, we are interested to study algorithms that calculate the semantic similarity by using Adapted Lesk algorithm, Wu &amp; Palmer Algorithm, Resnik Algorithm, Leacock and Chodorow Algorithm, and similarity flooding between two ontologies and BabelNet as reference ontology, we implement them, and compared experimentally. Overall, the most effective methods are Wu &amp; Palmer and Adapted Lesk, which is widely used for Word Sense Disambiguation (WSD) in the field of Automatic Natural Language Processing (NLP).</p>


2013 ◽  
Vol 340 ◽  
pp. 126-130 ◽  
Author(s):  
Xiao Guang Yue ◽  
Guang Zhang ◽  
Qing Guo Ren ◽  
Wen Cheng Liao ◽  
Jing Xi Chen ◽  
...  

The concepts of Chinese information processing and natural language processing (NLP) and their development tendency are summarized. There are different comprehension of Chinese information processing and natural language processing in China and the other countries. But the work appears to emerge in the study of key point of languages processing. Mining engineering is very important for our country. Though the final task of languages processing is difficult, Chinese information processing has contributed substantially to our scientific research and social economy and it will play an important part for mining engineering in our future.


2020 ◽  
Author(s):  
Kyle Mahowald ◽  
George Kachergis ◽  
Michael C. Frank

Ambridge (2019) calls for exemplar-based accounts of language acquisition. Do modern neural networks such as transformers or word2vec – which have been extremely successful in modern natural language processing (NLP) applications – count? Although these models often have ample parametric complexity to store exemplars from their training data, they also go far beyond simple storage by processing and compressing the input via their architectural constraints. The resulting representations have been shown to encode emergent abstractions. If these models are exemplar-based then Ambridge’s theory only weakly constrains future work. On the other hand, if these systems are not exemplar models, why is it that true exemplar models are not contenders in modern NLP?


Author(s):  
Kiran Raj R

Today, everyone has a personal device to access the web. Every user tries to access the knowledge that they require through internet. Most of the knowledge is within the sort of a database. A user with limited knowledge of database will have difficulty in accessing the data in the database. Hence, there’s a requirement for a system that permits the users to access the knowledge within the database. The proposed method is to develop a system where the input be a natural language and receive an SQL query which is used to access the database and retrieve the information with ease. Tokenization, parts-of-speech tagging, lemmatization, parsing and mapping are the steps involved in the process. The project proposed would give a view of using of Natural Language Processing (NLP) and mapping the query in accordance with regular expression in English language to SQL.


2020 ◽  
Author(s):  
Yuqi Kong ◽  
Fanchao Meng ◽  
Ben Carterette

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.


2020 ◽  
Author(s):  
Masashi Sugiyama

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.


Author(s):  
Davide Picca ◽  
Dominique Jaccard ◽  
Gérald Eberlé

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure:  first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.


2017 ◽  
Vol 1 (2) ◽  
pp. 89 ◽  
Author(s):  
Azam Orooji ◽  
Mostafa Langarizadeh

It is estimated that each year many people, most of whom are teenagers and young adults die by suicide worldwide. Suicide receives special attention with many countries developing national strategies for prevention. Since, more medical information is available in text, Preventing the growing trend of suicide in communities requires analyzing various textual resources, such as patient records, information on the web or questionnaires. For this purpose, this study systematically reviews recent studies related to the use of natural language processing techniques in the area of people’s health who have completed suicide or are at risk. After electronically searching for the PubMed and ScienceDirect databases and studying articles by two reviewers, 21 articles matched the inclusion criteria. This study revealed that, if a suitable data set is available, natural language processing techniques are well suited for various types of suicide related research.


Sign in / Sign up

Export Citation Format

Share Document