Implicit Skills Extraction Using Document Embedding and Its Use in Job Recommendation

This paper presents a job recommender system to match resumes to job descriptions (JD), both of which are non-standard and unstructured/semi-structured in form. First, the paper proposes a combination of natural language processing (NLP) techniques for the task of skill extraction. The performance of the combined techniques on an industrial scale dataset yielded a precision and recall of 0.78 and 0.88 respectively. The paper then introduces the concept of extracting implicit skills – the skills which are not explicitly mentioned in a JD but may be implicit in the context of geography, industry or role. To mine and infer implicit skills for a JD, we find the other JDs similar to this JD. This similarity match is done in the semantic space. A Doc2Vec model is trained on 1.1 Million JDs covering several domains crawled from the web, and all the JDs are projected onto this semantic space. The skills absent in the JD but present in similar JDs are obtained, and the obtained skills are weighted using several techniques to obtain the set of final implicit skills. Finally, several similarity measures are explored to match the skills extracted from a candidate's resume to explicit and implicit skills of JDs. Empirical results for matching resumes and JDs demonstrate that the proposed approach gives a mean reciprocal rank of 0.88, an improvement of 29.4% when compared to the performance of a baseline method that uses only explicit skills.

Download Full-text

A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings

Journal of information and organizational sciences ◽

10.31341/jios.44.2.2 ◽

2020 ◽

Vol 44 (2) ◽

pp. 231-246

Author(s):

Karlo Babić ◽

Francesco Guerra ◽

Sanda Martinčić-Ipšić ◽

Ana Meštrović

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Similarity Measures ◽

Vital Role ◽

The Other ◽

Word Embeddings ◽

Spearman Correlation ◽

Word Senses

Measuring the semantic similarity of texts has a vital role in various tasks from the field of natural language processing. In this paper, we describe a set of experiments we carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts. We perform a comparison of four models based on word embeddings: two variants of Word2Vec (one based on Word2Vec trained on a specific dataset and the second extending it with embeddings of word senses), FastText, and TF-IDF. Since these models provide word vectors, we experiment with various methods that calculate the semantic similarity of short texts based on word vectors. More precisely, for each of these models, we test five methods for aggregating word embeddings into text embedding. We introduced three methods by making variations of two commonly used similarity measures. One method is an extension of the cosine similarity based on centroids, and the other two methods are variations of the Okapi BM25 function. We evaluate all approaches on the two publicly available datasets: SICK and Lee in terms of the Pearson and Spearman correlation. The results indicate that extended methods perform better from the original in most of the cases.

Download Full-text

Using NLP for Fact Checking: A Survey

Designs ◽

10.3390/designs5030042 ◽

2021 ◽

Vol 5 (3) ◽

pp. 42

Author(s):

Eric Lazarski ◽

Mahmood Al-Khassaweneh ◽

Cynthia Howard

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computer Science ◽

Language Processing ◽

The Internet ◽

Fake News ◽

Fact Checking ◽

The Many ◽

Human Powered ◽

The Web

In recent years, disinformation and “fake news” have been spreading throughout the internet at rates never seen before. This has created the need for fact-checking organizations, groups that seek out claims and comment on their veracity, to spawn worldwide to stem the tide of misinformation. However, even with the many human-powered fact-checking organizations that are currently in operation, disinformation continues to run rampant throughout the Web, and the existing organizations are unable to keep up. This paper discusses in detail recent advances in computer science to use natural language processing to automate fact checking. It follows the entire process of automated fact checking using natural language processing, from detecting claims to fact checking to outputting results. In summary, automated fact checking works well in some cases, though generalized fact checking still needs improvement prior to widespread use.

Download Full-text

Ontology Matching using BabelNet Dictionary and Word Sense Disambiguation Algorithms

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i1.pp196-205 ◽

2017 ◽

Vol 5 (1) ◽

pp. 196 ◽

Cited By ~ 5

Author(s):

Mohamed Biniz ◽

Rachid El Ayachi ◽

Mohamed Fakir

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Word Sense Disambiguation ◽

Similarity Measures ◽

Ontology Matching ◽

Word Sense ◽

Sense Disambiguation ◽

Lesk Algorithm ◽

Reference Ontology ◽

Selection Of

<p>Ontology matching is a discipline that means two things: first, the process of discovering correspondences between two different ontologies, and second is the result of this process, that is to say the expression of correspondences. This discipline is a crucial task to solve problems merging and evolving of heterogeneous ontologies in applications of the Semantic Web. This domain imposes several challenges, among them, the selection of appropriate similarity measures to discover the correspondences. In this article, we are interested to study algorithms that calculate the semantic similarity by using Adapted Lesk algorithm, Wu & Palmer Algorithm, Resnik Algorithm, Leacock and Chodorow Algorithm, and similarity flooding between two ontologies and BabelNet as reference ontology, we implement them, and compared experimentally. Overall, the most effective methods are Wu & Palmer and Adapted Lesk, which is widely used for Word Sense Disambiguation (WSD) in the field of Automatic Natural Language Processing (NLP).</p>

Download Full-text

Research on Sustainable Mining Engineering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.340.126 ◽

2013 ◽

Vol 340 ◽

pp. 126-130 ◽

Cited By ~ 2

Author(s):

Xiao Guang Yue ◽

Guang Zhang ◽

Qing Guo Ren ◽

Wen Cheng Liao ◽

Jing Xi Chen ◽

...

Keyword(s):

Information Processing ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Social Economy ◽

Scientific Research ◽

The Other ◽

Mining Engineering ◽

Chinese Information Processing ◽

Development Tendency

The concepts of Chinese information processing and natural language processing (NLP) and their development tendency are summarized. There are different comprehension of Chinese information processing and natural language processing in China and the other countries. But the work appears to emerge in the study of key point of languages processing. Mining engineering is very important for our country. Though the final task of languages processing is difficult, Chinese information processing has contributed substantially to our scientific research and social economy and it will play an important part for mining engineering in our future.

Download Full-text

What counts as an exemplar model, anyway? A commentary on Ambridge (2020)

10.31234/osf.io/ut86f ◽

2020 ◽

Author(s):

Kyle Mahowald ◽

George Kachergis ◽

Michael C. Frank

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Language Processing ◽

The Other ◽

Training Data ◽

Exemplar Model ◽

Exemplar Models ◽

Modern Natural ◽

Architectural Constraints ◽

Future Work

Ambridge (2019) calls for exemplar-based accounts of language acquisition. Do modern neural networks such as transformers or word2vec – which have been extremely successful in modern natural language processing (NLP) applications – count? Although these models often have ample parametric complexity to store exemplars from their training data, they also go far beyond simple storage by processing and compressing the input via their architectural constraints. The resulting representations have been shown to encode emergent abstractions. If these models are exemplar-based then Ambridge’s theory only weakly constrains future work. On the other hand, if these systems are not exemplar models, why is it that true exemplar models are not contenders in modern NLP?

Download Full-text

Natural Language to SQL query Generation

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35804 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 5069-5072

Author(s):

Kiran Raj R

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Regular Expression ◽

Parts Of Speech ◽

Query Generation ◽

Sql Query ◽

Speech Tagging ◽

The Web

Today, everyone has a personal device to access the web. Every user tries to access the knowledge that they require through internet. Most of the knowledge is within the sort of a database. A user with limited knowledge of database will have difficulty in accessing the data in the database. Hence, there’s a requirement for a system that permits the users to access the knowledge within the database. The proposed method is to develop a system where the input be a natural language and receive an SQL query which is used to access the database and retrieve the information with ease. Tokenization, parts-of-speech tagging, lemmatization, parsing and mapping are the steps involved in the process. The project proposed would give a view of using of Natural Language Processing (NLP) and mapping the query in accordance with regular expression in English language to SQL.

Download Full-text

A Topological Method for Comparing Document Semantics

10.5121/csit.2020.101411 ◽

2020 ◽

Author(s):

Yuqi Kong ◽

Fanchao Meng ◽

Ben Carterette

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Vector Space Model ◽

The Other ◽

Space Model ◽

Topological Persistence ◽

Art Methods ◽

Novel Algorithm

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.

Download Full-text

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Natural Language Processing in Serious Games: A state of the art.

International Journal of Serious Games ◽

10.17083/ijsg.v2i3.87 ◽

2015 ◽

Vol 2 (3) ◽

Cited By ~ 5

Author(s):

Davide Picca ◽

Dominique Jaccard ◽

Gérald Eberlé

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Serious Games ◽

State Of The Art ◽

Serious Game ◽

The Other ◽

Other Hand ◽

The One ◽

High Level

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure: first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.

Download Full-text

Using of Natural Language Processing Techniques in Suicide Research

Emerging Science Journal ◽

10.28991/esj-2017-01120 ◽

2017 ◽

Vol 1 (2) ◽

pp. 89 ◽

Cited By ~ 1

Author(s):

Azam Orooji ◽

Mostafa Langarizadeh

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Medical Information ◽

Inclusion Criteria ◽

Data Set ◽

Completed Suicide ◽

Teenagers And Young Adults ◽

Processing Techniques ◽

The Web

It is estimated that each year many people, most of whom are teenagers and young adults die by suicide worldwide. Suicide receives special attention with many countries developing national strategies for prevention. Since, more medical information is available in text, Preventing the growing trend of suicide in communities requires analyzing various textual resources, such as patient records, information on the web or questionnaires. For this purpose, this study systematically reviews recent studies related to the use of natural language processing techniques in the area of people’s health who have completed suicide or are at risk. After electronically searching for the PubMed and ScienceDirect databases and studying articles by two reviewers, 21 articles matched the inclusion criteria. This study revealed that, if a suitable data set is available, natural language processing techniques are well suited for various types of suicide related research.

Download Full-text