Talent Acquisition Process Optimization Using Machine Learning in Resumes’ Ranking and Matching to Job Descriptions

2021 ◽  
Author(s):  
Mohammed Alghazal

Abstract Employers commonly use time-consuming screening tools or online matching engines that are driven by manual roles and predefined keywords, to search for potential job applicants. Such traditional techniques have not kept pace with the new digital revolution in machine learning and big data analytics. This paper presents advanced artificial intelligent solutions employed for ranking resumes and CV-to-Job Description matching. Open source resumes and job descriptions' documents were used to construct and validate the machine learning models in this paper. Documents were converted to images and processed via Google cloud using Optical Character Recognition algorithm (OCR) to extract text information from all resumes and job descriptions' documents, with more than 97% accuracy. Prior to modeling, the extracted text were processed via a series of Natural Language Processing (NLP) techniques by splitting/tokenizing common words, grouping together inflected form of words, i.e. lemmatization, and removal of stop words and punctuation marks. After text processing, resumes were trained using the unsupervised machine learning algorithm, Latent Dirichlet Allocation (LDA), for topic modeling and categorization. Given the type of resumes used, the algorithm was able to categorize them into 4 main job sectors: marketing and business, engineering, computer science/IT and health. Scores were assigned to each resume to represent the maximum LDA probability for ranking. Another more advanced deep learning algorithm, called Doc2Vec, was also used to train and match potential resumes to relevant job descriptions. In this model, resumes are represented by unique vectors that can be used to group similar documents, match and retrieve resumes related to a given job description document provided by HR. The similarity is measured between each resume and the given job description file to query the top job candidates. The model was tested against several job description files related to engineering, IT and human resources, and was able to identify the top-ranking resumes from over hundreds of trained resumes. This paper presents an innovative method for processing, categorizing and ranking resumes using advanced computational models empowered by the latest fourth industrial resolution technologies. This solution is beneficial to both job seekers and employers, providing efficient and unbiased data-driven method for finding top applicants for a given job.

Author(s):  
Yaseen Khather Yaseen ◽  
Alaa Khudhair Abbas ◽  
Ahmed M. Sana

Today, images are a part of communication between people. However, images are being used to share information by hiding and embedding messages within it, and images that are received through social media or emails can contain harmful content that users are not able to see and therefore not aware of. This paper presents a model for detecting spam on images. The model is a combination of optical character recognition, natural language processing, and the machine learning algorithm. Optical character recognition extracts the text from images, and natural language processing uses linguistics capabilities to detect and classify the language, to distinguish between normal text and slang language. The features for selected images are then extracted using the bag-of-words model, and the machine learning algorithm is run to detect any kind of spam that may be on it. Finally, the model can predict whether or not the image contains any harmful content. The results show that the proposed method using a combination of the machine learning algorithm, optical character recognition, and natural language processing provides high detection accuracy compared to using machine learning alone.


2020 ◽  
Vol 7 (10) ◽  
pp. 380-389
Author(s):  
Asogwa D.C ◽  
Anigbogu S.O ◽  
Anigbogu G.N ◽  
Efozia F.N

Author's age prediction is the task of determining the author's age by studying the texts written by them. The prediction of author’s age can be enlightening about the different trends, opinions social and political views of an age group. Marketers always use this to encourage a product or a service to an age group following their conveyed interests and opinions. Methodologies in natural language processing have made it possible to predict author’s age from text by examining the variation of linguistic characteristics. Also, many machine learning algorithms have been used in author’s age prediction. However, in social networks, computational linguists are challenged with numerous issues just as machine learning techniques are performance driven with its own challenges in realistic scenarios. This work developed a model that can predict author's age from text with a machine learning algorithm (Naïve Bayes) using three types of features namely, content based, style based and topic based. The trained model gave a prediction accuracy of 80%.


2021 ◽  
Author(s):  
Mustapha Abba ◽  
Chidozie Nduka ◽  
Seun Anjorin ◽  
Shukri Mohamed ◽  
Emmanuel Agogo ◽  
...  

BACKGROUND Due to scientific and technical advancements in the field, published hypertension research has developed during the last decade. Given the huge amount of scientific material published in this field, identifying the relevant information is difficult. We employed topic modelling, which is a strong approach for extracting useful information from enormous amounts of unstructured text. OBJECTIVE To utilize a machine learning algorithm to uncover hidden topics and subtopics from 100 years of peer-reviewed hypertension publications and identify temporal trends. METHODS The titles and abstracts of hypertension papers indexed in PubMed were examined. We used the Latent Dirichlet Allocation (LDA) model to select 20 primary subjects and then ran a trend analysis to see how popular they were over time. RESULTS We gathered 581,750 hypertension-related research articles from 1900 to 2018 and divided them into 20 categories. Preclinical, risk factors, complications, and therapy studies were the categories used to categorise the publications. We discovered themes that were becoming increasingly ‘hot,' becoming less ‘cold,' and being published seldom. Risk variables and major cardiovascular events subjects displayed very dynamic patterns over time (how? – briefly detail here). The majority of the articles (71.2%) had a negative valency, followed by positive (20.6%) and neutral valencies (8.2 percent). Between 1980 and 2000, negative sentiment articles fell somewhat, while positive and neutral sentiment articles climbed significantly. CONCLUSIONS This unique machine learning methodology provided fascinating insights on current hypertension research trends. This method allows researchers to discover study subjects and shifts in study focus, and in the end, it captures the broader picture of the primary concepts in current hypertension research articles. CLINICALTRIAL Not applicable


Author(s):  
Subhadra Dutta ◽  
Eric M. O’Rourke

Natural language processing (NLP) is the field of decoding human written language. This chapter responds to the growing interest in using machine learning–based NLP approaches for analyzing open-ended employee survey responses. These techniques address scalability and the ability to provide real-time insights to make qualitative data collection equally or more desirable in organizations. The chapter walks through the evolution of text analytics in industrial–organizational psychology and discusses relevant supervised and unsupervised machine learning NLP methods for survey text data, such as latent Dirichlet allocation, latent semantic analysis, sentiment analysis, word relatedness methods, and so on. The chapter also lays out preprocessing techniques and the trade-offs of growing NLP capabilities internally versus externally, points the readers to available resources, and ends with discussing implications and future directions of these approaches.


Author(s):  
Karthikeyan P. ◽  
Karunakaran Velswamy ◽  
Pon Harshavardhanan ◽  
Rajagopal R. ◽  
JeyaKrishnan V. ◽  
...  

Machine learning is the part of artificial intelligence that makes machines learn without being expressly programmed. Machine learning application built the modern world. Machine learning techniques are mainly classified into three techniques: supervised, unsupervised, and semi-supervised. Machine learning is an interdisciplinary field, which can be joined in different areas including science, business, and research. Supervised techniques are applied in agriculture, email spam, malware filtering, online fraud detection, optical character recognition, natural language processing, and face detection. Unsupervised techniques are applied in market segmentation and sentiment analysis and anomaly detection. Deep learning is being utilized in sound, image, video, time series, and text. This chapter covers applications of various machine learning techniques, social media, agriculture, and task scheduling in a distributed system.


2020 ◽  
Vol 25 (4) ◽  
pp. 174-189 ◽  
Author(s):  
Guillaume  Palacios ◽  
Arnaud Noreña ◽  
Alain Londero

Introduction: Subjective tinnitus (ST) and hyperacusis (HA) are common auditory symptoms that may become incapacitating in a subgroup of patients who thereby seek medical advice. Both conditions can result from many different mechanisms, and as a consequence, patients may report a vast repertoire of associated symptoms and comorbidities that can reduce dramatically the quality of life and even lead to suicide attempts in the most severe cases. The present exploratory study is aimed at investigating patients’ symptoms and complaints using an in-depth statistical analysis of patients’ natural narratives in a real-life environment in which, thanks to the anonymization of contributions and the peer-to-peer interaction, it is supposed that the wording used is totally free of any self-limitation and self-censorship. Methods: We applied a purely statistical, non-supervised machine learning approach to the analysis of patients’ verbatim exchanged on an Internet forum. After automated data extraction, the dataset has been preprocessed in order to make it suitable for statistical analysis. We used a variant of the Latent Dirichlet Allocation (LDA) algorithm to reveal clusters of symptoms and complaints of HA patients (topics). The probability of distribution of words within a topic uniquely characterizes it. The convergence of the log-likelihood of the LDA-model has been reached after 2,000 iterations. Several statistical parameters have been tested for topic modeling and word relevance factor within each topic. Results: Despite a rather small dataset, this exploratory study demonstrates that patients’ free speeches available on the Internet constitute a valuable material for machine learning and statistical analysis aimed at categorizing ST/HA complaints. The LDA model with K = 15 topics seems to be the most relevant in terms of relative weights and correlations with the capability to individualizing subgroups of patients displaying specific characteristics. The study of the relevance factor may be useful to unveil weak but important signals that are present in patients’ narratives. Discussion/Conclusion: We claim that the LDA non-supervised approach would permit to gain knowledge on the patterns of ST- and HA-related complaints and on patients’ centered domains of interest. The merits and limitations of the LDA algorithms are compared with other natural language processing methods and with more conventional methods of qualitative analysis of patients’ output. Future directions and research topics emerging from this innovative algorithmic analysis are proposed.


E-commerce is evolving at a rapid pace that new doors have been opened for the people to express their emotions towards the products. The opinions of the customers plays an important role in the e-commerce sites. It is practically a tedious job to analyze the opinions of users and form a pros and cons for respective products. This paper develops a solution through machine learning algorithms by pre-processing the reviews based on features of mobile products. This mainly focus on aspect level of opinions which uses SentiWordNet, Natural Language Processing and aggregate scores for analyzing the text reviews. The experimental results provide the visual representation of products which provide better understanding of product reviews rather than reading through long textual reviews which includes strengths and weakness of the product using Naive Bayes algorithm. This results also helps the e-commerce vendors to overcome the weakness of the products and meet the customer expectations.


Author(s):  
Salam Ayad Hussein ◽  
Mohsin Raad Kareem

Sentiment analysis is examined within natural language processing. It is instrumental in finding the sentiment (feeling) or opinion (idea) hidden within a text. This research focuses on finding sentiments in a “text image” and then classifying them whether they are desirable or not. These phrases and words refer to the perspectives of people about anything they think about it, such as services, products, governments, and social media events. In this study, the optical character recognition (OCR) algorithm was used, which is considered as a classification procedure of visual patterns that appear in the form of a digital image. Moreover, the Naïve Bayes machine learning algorithm was employed to classify these texts. These two algorithms form a hybrid system that supports our needs, especially in this day of technological advances and frequent use of websites and sharing of text images through the internet. Finally, the new vision in this work involves dealing with Arabic language texts that are transformed into images, which are extracted from a URL address and then classified into desirable and undesirable content.


10.2196/23957 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e23957
Author(s):  
Chengda Zheng ◽  
Jia Xue ◽  
Yumin Sun ◽  
Tingshao Zhu

Background During the COVID-19 pandemic in Canada, Prime Minister Justin Trudeau provided updates on the novel coronavirus and the government’s responses to the pandemic in his daily briefings from March 13 to May 22, 2020, delivered on the official Canadian Broadcasting Corporation (CBC) YouTube channel. Objective The aim of this study was to examine comments on Canadian Prime Minister Trudeau’s COVID-19 daily briefings by YouTube users and track these comments to extract the changing dynamics of the opinions and concerns of the public over time. Methods We used machine learning techniques to longitudinally analyze a total of 46,732 English YouTube comments that were retrieved from 57 videos of Prime Minister Trudeau’s COVID-19 daily briefings from March 13 to May 22, 2020. A natural language processing model, latent Dirichlet allocation, was used to choose salient topics among the sampled comments for each of the 57 videos. Thematic analysis was used to classify and summarize these salient topics into different prominent themes. Results We found 11 prominent themes, including strict border measures, public responses to Prime Minister Trudeau’s policies, essential work and frontline workers, individuals’ financial challenges, rental and mortgage subsidies, quarantine, government financial aid for enterprises and individuals, personal protective equipment, Canada and China’s relationship, vaccines, and reopening. Conclusions This study is the first to longitudinally investigate public discourse and concerns related to Prime Minister Trudeau’s daily COVID-19 briefings in Canada. This study contributes to establishing a real-time feedback loop between the public and public health officials on social media. Hearing and reacting to real concerns from the public can enhance trust between the government and the public to prepare for future health emergencies.


Sign in / Sign up

Export Citation Format

Share Document