A Novel Feature Hashing With Efficient Collision Resolution for Bag-of-Words Representation of Text Data

Application of Text Mining Methodologies to Health Insurance Schedules

Medical Informatics ◽

10.4018/978-1-60566-050-9.ch072 ◽

2011 ◽

pp. 944-963

Author(s):

Ah Chung Tsoi ◽

Phuong Kim To ◽

Markus Hagenbuchner

Keyword(s):

Health Insurance ◽

Text Mining ◽

Health Insurer ◽

Medical Procedures ◽

Bag Of Words ◽

Text Data ◽

Health Insurers

This chapter describes the application of a number of text mining techniques to discover patterns in the health insurance schedule with an aim to uncover any inconsistency or ambiguity in the schedule. In particular, we will apply first a simple “bag of words” technique to study the text data, and to evaluate the hypothesis: Is there any inconsistency in the text description of the medical procedures used? It is found that the hypothesis is not valid, and hence the investigation is continued on how best to cluster the text. This work would have significance to health insurers to assist them to differentiate descriptions of the medical procedures. Secondly, it would also assist the health insurer to describe medical procedures in an unambiguous manner.

Download Full-text

Application of Text Mining Methodologies to Health Insurance Schedules

Computational Intelligence and its Applications - Advances in Applied Artificial Intelligence ◽

10.4018/978-1-59140-827-7.ch002 ◽

2006 ◽

pp. 29-51

Author(s):

Ah Chung Tsoi ◽

Phuong Kim To ◽

Markus Hagenbuchner

Keyword(s):

Health Insurance ◽

Text Mining ◽

Health Insurer ◽

Medical Procedures ◽

Bag Of Words ◽

Text Data ◽

Health Insurers

This chapter describes the application of a number of text mining techniques to discover patterns in the health insurance schedule with an aim to uncover any inconsistency or ambiguity in the schedule. In particular, we will apply first a simple “bag of words” technique to study the text data, and to evaluate the hypothesis: Is there any inconsistency in the text description of the medical procedures used? It is found that the hypothesis is not valid, and hence the investigation is continued on how best to cluster the text. This work would have significance to health insurers to assist them to differentiate descriptions of the medical procedures. Secondly, it would also assist the health insurer to describe medical procedures in an unambiguous manner.

Download Full-text

Application of Text Mining Methodologies to Health Insurance Schedules

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch045 ◽

2010 ◽

pp. 785-806

Author(s):

Ah Chung Tsoi ◽

Phuong Kim To ◽

Markus Hagenbuchner

Keyword(s):

Health Insurance ◽

Text Mining ◽

Health Insurer ◽

Medical Procedures ◽

Bag Of Words ◽

Text Data ◽

Health Insurers

This chapter describes the application of a number of text mining techniques to discover patterns in the health insurance schedule with an aim to uncover any inconsistency or ambiguity in the schedule. In particular, we will apply first a simple “bag of words” technique to study the text data, and to evaluate the hypothesis: Is there any inconsistency in the text description of the medical procedures used? It is found that the hypothesis is not valid, and hence the investigation is continued on how best to cluster the text. This work would have significance to health insurers to assist them to differentiate descriptions of the medical procedures. Secondly, it would also assist the health insurer to describe medical procedures in an unambiguous manner.

Download Full-text

Testing word embeddings for Polish

Cognitive Studies | Études cognitives ◽

10.11649/cs.1468 ◽

2017 ◽

Cited By ~ 1

Author(s):

Agnieszka Mykowiecka ◽

Małgorzata Marciniak ◽

Piotr Rychlik

Keyword(s):

Word Meaning ◽

Bag Of Words ◽

Distributional Semantics ◽

Word Embeddings ◽

Text Data ◽

Universal Approach ◽

Polish Language

Testing word embeddings for PolishDistributional Semantics postulates the representation of word meaning in the form of numeric vectors which represent words which occur in context in large text data. This paper addresses the problem of constructing such models for the Polish language. The paper compares the effectiveness of models based on lemmas and forms created with Continuous Bag of Words (CBOW) and skip-gram approaches based on different Polish corpora. For the purposes of this comparison, the results of two typical tasks solved with the help of distributional semantics, i.e. synonymy and analogy recognition, are compared. The results show that it is not possible to identify one universal approach to vector creation applicable to various tasks. The most important feature is the quality and size of the data, but different strategy choices can also lead to significantly different results. Testowanie wektorowych reprezentacji dystrybucyjnych słów języka polskiegoSemantyka dystrybucyjna opiera się na założeniu, że znaczenie słów wyrażone jest za pomocą wektorów reprezentujących, w sposób bezpośredni bądź pośredni, konteksty, w jakich słowo to jest używane w dużym zbiorze tekstów. Niniejszy artykuł dotyczy ewaluacji wielu takich modeli skonstruowanych dla języka polskiego. W pracy porównano skuteczność modeli opartych na lematach i formach słów, utworzonych przy wykorzystaniu sieci neuronowych na danych z dwóch różnych korpusów języka polskiego. Ewaluacji dokonano na podstawie wyników dwóch typowych zadań rozwiązywanych za pomocą metod semantyki dystrybucyjnej, tzn. rozpoznania występowania synonimii i analogii między konkretnymi parami słów. Uzyskane wyniki dowodzą, że nie można wskazać jednego uniwersalnego podejścia do tworzenia modeli dystrybucyjnych, gdyż ich skuteczność jest różna w zależności od zastosowania. Najważniejszą cechą wpływającą na jakość modelu jest jakość oraz rozmiar danych, ale wybory różnych strategii uczenia sieci mogą również prowadzić do istotnie odmiennych wyników.

Download Full-text

A Numerically Coded File of Operative Procedures Derived from a Free Text Data Collection System : A Measure of the Accuracy

Methods of Information in Medicine ◽

10.1055/s-0038-1635717 ◽

1976 ◽

Vol 15 (01) ◽

pp. 21-28 ◽

Cited By ~ 3

Author(s):

Carmen A. Scudiero ◽

Ruth L. Wong

Keyword(s):

Data Collection ◽

Pap Smear ◽

Operative Procedures ◽

Free Text ◽

Collection System ◽

Process Data ◽

Text Data ◽

Data Collection System ◽

History Of ◽

Correlation System

A free text data collection system has been developed at the University of Illinois utilizing single word, syntax free dictionary lookup to process data for retrieval. The source document for the system is the Surgical Pathology Request and Report form. To date 12,653 documents have been entered into the system.The free text data was used to create an IRS (Information Retrieval System) database. A program to interrogate this database has been developed to numerically coded operative procedures. A total of 16,519 procedures records were generated. One and nine tenths percent of the procedures could not be fitted into any procedures category; 6.1% could not be specifically coded, while 92% were coded into specific categories. A system of PL/1 programs has been developed to facilitate manual editing of these records, which can be performed in a reasonable length of time (1 week). This manual check reveals that these 92% were coded with precision = 0.931 and recall = 0.924. Correction of the readily correctable errors could improve these figures to precision = 0.977 and recall = 0.987. Syntax errors were relatively unimportant in the overall coding process, but did introduce significant error in some categories, such as when right-left-bilateral distinction was attempted.The coded file that has been constructed will be used as an input file to a gynecological disease/PAP smear correlation system. The outputs of this system will include retrospective information on the natural history of selected diseases and a patient log providing information to the clinician on patient follow-up.Thus a free text data collection system can be utilized to produce numerically coded files of reasonable accuracy. Further, these files can be used as a source of useful information both for the clinician and for the medical researcher.

Download Full-text

Diagnostics of professional competence of IT students based on digital footprint data

Informatics and Education ◽

10.32517/0234-0453-2020-35-4-4-11 ◽

2020 ◽

pp. 4-11

Author(s):

I. G. Zakharova ◽

Yu. V. Boganyuk ◽

M. S. Vorobyova ◽

E. A. Pavlova

Keyword(s):

Information Technology ◽

Educational Program ◽

Professional Competence ◽

Objective Data ◽

Text Data ◽

Data Set ◽

Job Requirements ◽

Digital Footprint ◽

Graduate Employment ◽

The University

The article goal is to demonstrate the possibilities of the approach to diagnosing the level of IT graduates’ professional competence, based on the analysis of the student’s digital footprint and the content of the corresponding educational program. We describe methods for extracting student professional level indicators from digital footprint text data — courses’ descriptions and graduation qualification works. We show methods of comparing these indicators with the formalized requirements of employers, reflected in the texts of vacancies in the field of information technology. The proposed approach was applied at the Institute of Mathematics and Computer Science of the University of Tyumen. We performed diagnostics using a data set that included texts of courses’ descriptions for IT areas of undergraduate studies, 542 graduation qualification works in these areas, 879 descriptions of job requirements and information on graduate employment. The presented approach allows us to evaluate the relevance of the educational program as a whole and the level of professional competence of each student based on objective data. The results were used to update the content of some major courses and to include new elective courses in the curriculum.

Download Full-text

A New Circle based Symmetric key Encryption Technique for Text Data

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2019/106852019 ◽

2019 ◽

Vol 8 (5) ◽

pp. 2573-2578

Author(s):

Sailaja K L ◽

Keyword(s):

Text Data ◽

Symmetric Key

Download Full-text

USING MACHINE LEARNING AND SEMANTIC FEATURES IN INTELLECTUAL ANALYSIS OF TEXT DATA

Electronics and Information Technologies ◽

10.30970/eli.13.1 ◽

2020 ◽

Vol 13 ◽

Author(s):

Bohdan Pavlyshenko

Keyword(s):

Machine Learning ◽

Semantic Features ◽

Text Data

Download Full-text

Predicting the citation and impact factor of terms for scientific publications using machine learning algorithms

CPT2020 The 8th International Scientific Conference on Computing in Physics and Technology Proceedings ◽

10.30987/conferencearticle_5fd755c0ea6458.82600196 ◽

2020 ◽

Author(s):

Aleksey Klokov ◽

Evgenii Slobodyuk ◽

Michael Charnine

Keyword(s):

Machine Learning ◽

Semantic Processing ◽

The Body ◽

Machine Learning Algorithms ◽

Scientific Publications ◽

Text Data ◽

Semantic Relationships ◽

Subject Areas ◽

The Subject ◽

Scientific Environment

The object of the research when writing the work was the body of text data collected together with the scientific advisor and the algorithms for processing the natural language of analysis. The stream of hypotheses has been tested against computer science scientific publications through a series of simulation experiments described in this dissertation. The subject of the research is algorithms and the results of the algorithms, aimed at predicting promising topics and terms that appear in the course of time in the scientific environment. The result of this work is a set of machine learning models, with the help of which experiments were carried out to identify promising terms and semantic relationships in the text corpus. The resulting models can be used for semantic processing and analysis of other subject areas.

Download Full-text

Preliminary chart C-C' showing electric log correlation, facies, and text data of some Cretaceous and Tertiary rocks, Wind River basin, Wyoming

Open-File Report ◽

10.3133/ofr83624c ◽

1983 ◽

Author(s):

J.E. Fox ◽

Robert L. Priestley

Keyword(s):

River Basin ◽

Text Data ◽

Wind River Basin

Download Full-text