Smarter people analytics with organizational text data: Demonstrations using classic and advanced NLP models

A free text data collection system has been developed at the University of Illinois utilizing single word, syntax free dictionary lookup to process data for retrieval. The source document for the system is the Surgical Pathology Request and Report form. To date 12,653 documents have been entered into the system.The free text data was used to create an IRS (Information Retrieval System) database. A program to interrogate this database has been developed to numerically coded operative procedures. A total of 16,519 procedures records were generated. One and nine tenths percent of the procedures could not be fitted into any procedures category; 6.1% could not be specifically coded, while 92% were coded into specific categories. A system of PL/1 programs has been developed to facilitate manual editing of these records, which can be performed in a reasonable length of time (1 week). This manual check reveals that these 92% were coded with precision = 0.931 and recall = 0.924. Correction of the readily correctable errors could improve these figures to precision = 0.977 and recall = 0.987. Syntax errors were relatively unimportant in the overall coding process, but did introduce significant error in some categories, such as when right-left-bilateral distinction was attempted.The coded file that has been constructed will be used as an input file to a gynecological disease/PAP smear correlation system. The outputs of this system will include retrospective information on the natural history of selected diseases and a patient log providing information to the clinician on patient follow-up.Thus a free text data collection system can be utilized to produce numerically coded files of reasonable accuracy. Further, these files can be used as a source of useful information both for the clinician and for the medical researcher.

Download Full-text

Diagnostics of professional competence of IT students based on digital footprint data

Informatics and Education ◽

10.32517/0234-0453-2020-35-4-4-11 ◽

2020 ◽

pp. 4-11

Author(s):

I. G. Zakharova ◽

Yu. V. Boganyuk ◽

M. S. Vorobyova ◽

E. A. Pavlova

Keyword(s):

Information Technology ◽

Educational Program ◽

Professional Competence ◽

Objective Data ◽

Text Data ◽

Data Set ◽

Job Requirements ◽

Digital Footprint ◽

Graduate Employment ◽

The University

The article goal is to demonstrate the possibilities of the approach to diagnosing the level of IT graduates’ professional competence, based on the analysis of the student’s digital footprint and the content of the corresponding educational program. We describe methods for extracting student professional level indicators from digital footprint text data — courses’ descriptions and graduation qualification works. We show methods of comparing these indicators with the formalized requirements of employers, reflected in the texts of vacancies in the field of information technology. The proposed approach was applied at the Institute of Mathematics and Computer Science of the University of Tyumen. We performed diagnostics using a data set that included texts of courses’ descriptions for IT areas of undergraduate studies, 542 graduation qualification works in these areas, 879 descriptions of job requirements and information on graduate employment. The presented approach allows us to evaluate the relevance of the educational program as a whole and the level of professional competence of each student based on objective data. The results were used to update the content of some major courses and to include new elective courses in the curriculum.

Download Full-text

A New Circle based Symmetric key Encryption Technique for Text Data

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2019/106852019 ◽

2019 ◽

Vol 8 (5) ◽

pp. 2573-2578

Author(s):

Sailaja K L ◽

Keyword(s):

Text Data ◽

Symmetric Key

Download Full-text

USING MACHINE LEARNING AND SEMANTIC FEATURES IN INTELLECTUAL ANALYSIS OF TEXT DATA

Electronics and Information Technologies ◽

10.30970/eli.13.1 ◽

2020 ◽

Vol 13 ◽

Author(s):

Bohdan Pavlyshenko

Keyword(s):

Machine Learning ◽

Semantic Features ◽

Text Data

Download Full-text

Predicting the citation and impact factor of terms for scientific publications using machine learning algorithms

CPT2020 The 8th International Scientific Conference on Computing in Physics and Technology Proceedings ◽

10.30987/conferencearticle_5fd755c0ea6458.82600196 ◽

2020 ◽

Author(s):

Aleksey Klokov ◽

Evgenii Slobodyuk ◽

Michael Charnine

Keyword(s):

Machine Learning ◽

Semantic Processing ◽

The Body ◽

Machine Learning Algorithms ◽

Scientific Publications ◽

Text Data ◽

Semantic Relationships ◽

Subject Areas ◽

The Subject ◽

Scientific Environment

The object of the research when writing the work was the body of text data collected together with the scientific advisor and the algorithms for processing the natural language of analysis. The stream of hypotheses has been tested against computer science scientific publications through a series of simulation experiments described in this dissertation. The subject of the research is algorithms and the results of the algorithms, aimed at predicting promising topics and terms that appear in the course of time in the scientific environment. The result of this work is a set of machine learning models, with the help of which experiments were carried out to identify promising terms and semantic relationships in the text corpus. The resulting models can be used for semantic processing and analysis of other subject areas.

Download Full-text

Preliminary chart C-C' showing electric log correlation, facies, and text data of some Cretaceous and Tertiary rocks, Wind River basin, Wyoming

Open-File Report ◽

10.3133/ofr83624c ◽

1983 ◽

Author(s):

J.E. Fox ◽

Robert L. Priestley

Keyword(s):

River Basin ◽

Text Data ◽

Wind River Basin

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text

Text Encryption based on Huffman Coding and ElGamal Cryptosystem

Recent Patents on Engineering ◽

10.2174/1872212114999200917144000 ◽

2020 ◽

Vol 14 ◽

Author(s):

Khoirom Motilal Singh ◽

Laiphrakpam Dolendro Singh ◽

Themrichon Tuithung

Keyword(s):

Data Transfer ◽

Huffman Coding ◽

Large Integer ◽

Text Data ◽

Storage Devices ◽

Scientific World ◽

Elgamal Cryptosystem ◽

Encryption Schemes ◽

Transfer Operation ◽

And Performance

Background: Data which are in the form of text, audio, image and video are used everywhere in our modern scientific world. These data are stored in physical storage, cloud storage and other storage devices. Some of it are very sensitive and requires efficient security while storing as well as in transmitting from the sender to the receiver. Objective: With the increase in data transfer operation, enough space is also required to store these data. Many researchers have been working to develop different encryption schemes, yet there exist many limitations in their works. There is always a need for encryption schemes with smaller cipher data, faster execution time and low computation cost. Methods: A text encryption based on Huffman coding and ElGamal cryptosystem is proposed. Initially, the text data is converted to its corresponding binary bits using Huffman coding. Next, the binary bits are grouped and again converted into large integer values which will be used as the input for the ElGamal cryptosystem. Results: Encryption and Decryption are successfully performed where the data size is reduced using Huffman coding and advance security with the smaller key size is provided by the ElGamal cryptosystem. Conclusion: Simulation results and performance analysis specifies that our encryption algorithm is better than the existing algorithms under consideration.

Download Full-text

A Review on Sentiment Classification: Natural Language Understanding

Recent Patents on Engineering ◽

10.2174/1872212112666180731113353 ◽

2019 ◽

Vol 13 (1) ◽

pp. 20-27 ◽

Cited By ~ 1

Author(s):

Srishty Jindal ◽

Kamlesh Sharma

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Natural Language Understanding ◽

Business Analytics ◽

Language Understanding ◽

Text Data ◽

Data Set ◽

Market Positioning ◽

Illegal Activities

Background: With the tremendous increase in the use of social networking sites for sharing the emotions, views, preferences etc. a huge volume of data and text is available on the internet, there comes the need for understanding the text and analysing the data to determine the exact intent behind the same for a greater good. This process of understanding the text and data involves loads of analytical methods, several phases and multiple techniques. Efficient use of these techniques is important for an effective and relevant understanding of the text/data. This analysis can in turn be very helpful in ecommerce for targeting audience, social media monitoring for anticipating the foul elements from society and take proactive actions to avoid unethical and illegal activities, business analytics, market positioning etc. Method: The goal is to understand the basic steps involved in analysing the text data which can be helpful in determining sentiments behind them. This review provides detailed description of steps involved in sentiment analysis with the recent research done. Patents related to sentiment analysis and classification are reviewed to throw some light in the work done related to the field. Results: Sentiment analysis determines the polarity behind the text data/review. This analysis helps in increasing the business revenue, e-health, or determining the behaviour of a person. Conclusion: This study helps in understanding the basic steps involved in natural language understanding. At each step there are multiple techniques that can be applied on data. Different classifiers provide variable accuracy depending upon the data set and classification technique used.

Download Full-text

Are social networking platforms the key to effective social distancing for COVID-19? (Preprint)

10.2196/preprints.19722 ◽

2020 ◽

Author(s):

Viknesh Sounderajah ◽

Hutan Ashrafian ◽

Sheraz Markar ◽

Ara Darzi

Keyword(s):

Operating System ◽

Social Networking ◽

Contact Tracing ◽

Text Data ◽

Social Distancing ◽

Web Mapping ◽

Viral Spread ◽

Behavioural Insights ◽

Mapping Software ◽

Incidence And Mortality

UNSTRUCTURED If health systems are to effectively employ social distancing measures to in response to further COVID-19 peaks, they must adopt new behavioural metrics that can supplement traditional downstream measures, such as incidence and mortality. Access to mobile digital innovations may dynamically quantify compliance to social distancing (e.g. web mapping software) as well as establish personalised real-time contact tracing of viral spread (e.g. mobile operating system infrastructure through Google-Apple partnership). In particular, text data from social networking platforms can be mined for unique behavioural insights, such as symptom tracking and perception monitoring. Platforms, such as Twitter, have shown significant promise in tracking communicable pandemics. As such, it is critical that social networking companies collaborate with each other in order to (1) enrich the data that is available for analysis, (2) promote the creation of open access datasets for researchers and (3) cultivate relationships with governments in order to affect positive change.

Download Full-text