Mining Meaning

Organizational Data Mining ◽

10.4018/978-1-59140-134-6.ch009 ◽

2011 ◽

pp. 125-140

Author(s):

William L. Tullar

Keyword(s):

Data Mining ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Group Discussions ◽

Text Data ◽

Virtual Group ◽

Extraction Step ◽

Using Data ◽

Lower Frequencies

This chapter focuses on the pattern detection and extraction step in text data commonly called text data mining. I examine some of the literature on natural language processing and propose a method of recovering value from the text of virtual group discussions based on methods derived from the communication field. Then, I apply the method in a case using data from 216 different groups from a virtual group experiment. The results from the case show that higher performing groups are characterized by higher frequencies of acts of dominance and higher frequencies of terms concerning cognition, communication and praise. Higher performing groups were also characterized by lower frequencies of acts of equivalence and lower frequencies of leveling terms and numerical terms. Ways to use this knowledge to improve the groups’ performance are discussed.

Download Full-text

Advising Projects to Students using Data Mining and Natural Language Processing

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/sv7i5/0146 ◽

2017 ◽

Vol 7 (5) ◽

pp. 260-262

Author(s):

Pratik Solim ◽

◽

Ankit Sagwekar ◽

Rishikesh Patil ◽

Swapnali Kurhade ◽

...

Keyword(s):

Data Mining ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Using Data

Download Full-text

Learning Document Similarity Using Natural Language Processing

Linguistik Online ◽

10.13092/lo.17.788 ◽

2003 ◽

Vol 17 (5) ◽

Author(s):

Paola Merlo ◽

James Henderson ◽

Gerold Schneider ◽

Eric Wehrli

Keyword(s):

Data Mining ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Great Efficiency ◽

Self Organizing Maps ◽

Text Data ◽

Document Similarity ◽

On Line

The recent considerable growth in the amount of easily available on-line text has brought to the foreground the need for large-scale natural language processing tools for text data mining. In this paper we address the problem of organizing documents into meaningful groups according to their content and to visualize a text collection, providing an overview of the range of documents and of their relationships, so that they can be browsed more easily. We use Self-Organizing Maps (SOMs) (Kohonen 1984). Great efficiency challenges arise in creating these maps. We study linguistically-motivated ways of reducing the representation of a document to increase efficiency and ways to disambiguate the words in the documents.

Download Full-text

How Language Shapes Prejudice Against Women: An Examination Across 45 World Languages

10.31234/osf.io/mrbcf ◽

2020 ◽

Author(s):

David DeFranza ◽

Himanshu Mishra ◽

Arul Mishra

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Ongoing Debate ◽

Text Data ◽

Gender Prejudice ◽

World Languages ◽

The World ◽

Present Context ◽

The Common

Language provides an ever-present context for our cognitions and has the ability to shape them. Languages across the world can be gendered (language in which the form of noun, verb, or pronoun is presented as female or male) versus genderless. In an ongoing debate, one stream of research suggests that gendered languages are more likely to display gender prejudice than genderless languages. However, another stream of research suggests that language does not have the ability to shape gender prejudice. In this research, we contribute to the debate by using a Natural Language Processing (NLP) method which captures the meaning of a word from the context in which it occurs. Using text data from Wikipedia and the Common Crawl project (which contains text from billions of publicly facing websites) across 45 world languages, covering the majority of the world’s population, we test for gender prejudice in gendered and genderless languages. We find that gender prejudice occurs more in gendered rather than genderless languages. Moreover, we examine whether genderedness of language influences the stereotypic dimensions of warmth and competence utilizing the same NLP method.

Download Full-text

Sentiment of App with Word Vectors

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1416.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 2156-2159

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Text Data ◽

Vector Representations ◽

Text Sentiment Analysis

Vector representations for language have been shown to be useful in a number of Natural Language Processing tasks. In this paper, we aim to investigate the effectiveness of word vector representations for the problem of Sentiment Analysis. In particular, we target three sub-tasks namely sentiment words extraction, polarity of sentiment words detection, and text sentiment prediction. We investigate the effectiveness of vector representations over different text data and evaluate the quality of domain-dependent vectors. Vector representations has been used to compute various vector-based features and conduct systematically experiments to demonstrate their effectiveness. Using simple vector based features can achieve better results for text sentiment analysis of APP.

Download Full-text

The Future of Instruction Modeling

Instruction Modeling ◽

10.1093/oso/9780190910709.003.0010 ◽

2020 ◽

pp. 205-228

Author(s):

George A. Khachatryan

Keyword(s):

Data Mining ◽

Natural Language Processing ◽

Foreign Language ◽

Natural Language ◽

Blended Learning ◽

Language Processing ◽

Educational Data Mining ◽

Government Policies ◽

Free Form ◽

Learning Programs

Instruction modeling is still in its early stages. This chapter discusses promising directions in which instruction modeling could develop in coming years. This includes increasing the richness of interfaces used in instruction modeling programs (e.g., by allowing students to enter responses in free form and have them graded via natural language processing); applying instruction modeling to subjects beyond mathematics, including English, foreign language, and science; using educational data mining to create automated “coaches” to help teachers better implement instruction modeling programs in their classrooms; creating approaches to instruction modeling that allow for rapid authorship of content; redesigning schools (in schedules as well as architecture) to optimize the use of instruction modeling; and putting in place government policies to encourage the use of comprehensive blended learning programs (such as those developed through instruction modeling).

Download Full-text

How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database

BioMed Research International ◽

10.1155/2018/6217812 ◽

2018 ◽

Vol 2018 ◽

pp. 1-7 ◽

Cited By ~ 7

Author(s):

J. Bouaziz ◽

R. Mashiach ◽

S. Cohen ◽

A. Kedem ◽

A. Baron ◽

...

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Data Extraction ◽

Endometrial Tissue ◽

Endometrial Cells ◽

Pubmed Database ◽

Using Data

Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.

Download Full-text

Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods

Cancer ◽

10.1002/cncr.30245 ◽

2016 ◽

Vol 123 (1) ◽

pp. 114-121 ◽

Cited By ~ 24

Author(s):

Tejal A. Patel ◽

Mamta Puppala ◽

Richard O. Ogunti ◽

Joe E. Ensor ◽

Tiancheng He ◽

...

Keyword(s):

Data Mining ◽

Natural Language Processing ◽

Decision Support ◽

Natural Language ◽

Clinical Decision Support ◽

Language Processing ◽

Clinical Decision ◽

Pathologic Findings ◽

Mining Methods

Download Full-text

IDENTIFYING BEST PRACTICES FOR USE OF TEXT DATA IN HEALTH ECONOMICS AND OUTCOMES RESEARCH USING NATURAL LANGUAGE PROCESSING

Value in Health ◽

10.1016/j.jval.2016.03.1776 ◽

2016 ◽

Vol 19 (3) ◽

pp. A82

Author(s):

B.A. Feinberg ◽

L. Lal ◽

D.F. Garofalo ◽

U. Mujumdar

Keyword(s):

Natural Language Processing ◽

Health Economics ◽

Best Practices ◽

Natural Language ◽

Language Processing ◽

Outcomes Research ◽

Text Data

Download Full-text

Natural language processing based advanced method of unnecessary video detection

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i6.pp5411-5419 ◽

2021 ◽

Vol 11 (6) ◽

pp. 5411

Author(s):

Nazmun Nessa Moon ◽

Imrus Salehin ◽

Masuma Parvin ◽

Md. Mehedi Hasan ◽

Iftakhar Mohammad Talha ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Detection System ◽

Video Content ◽

Text Data ◽

Data Set ◽

Plain Text ◽

Video Detection ◽

Library Function

<span>In this study we have described the process of identifying unnecessary video using an advanced combined method of natural language processing and machine learning. The system also includes a framework that contains analytics databases and which helps to find statistical accuracy and can detect, accept or reject unnecessary and unethical video content. In our video detection system, we extract text data from video content in two steps, first from video to MPEG-1 audio layer 3 (MP3) and then from MP3 to WAV format. We have used the text part of natural language processing to analyze and prepare the data set. We use both Naive Bayes and logistic regression classification algorithms in this detection system to determine the best accuracy for our system. In our research, our video MP4 data has converted to plain text data using the python advance library function. This brief study discusses the identification of unauthorized, unsocial, unnecessary, unfinished, and malicious videos when using oral video record data. By analyzing our data sets through this advanced model, we can decide which videos should be accepted or rejected for the further actions.</span>

Download Full-text

Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study (Preprint)

10.2196/preprints.12575 ◽

2018 ◽

Author(s):

Jeremy Petch ◽

Jane Batt ◽

Joshua Murray ◽

Muhammad Mamdani

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Clinical Features ◽

Language Processing ◽

A Priori ◽

Cross Sectional Study ◽

Free Text ◽

Cross Sectional ◽

Text Data ◽

Complex Features

BACKGROUND The increasing adoption of electronic health records (EHRs) in clinical practice holds the promise of improving care and advancing research by serving as a rich source of data, but most EHRs allow clinicians to enter data in a text format without much structure. Natural language processing (NLP) may reduce reliance on manual abstraction of these text data by extracting clinical features directly from unstructured clinical digital text data and converting them into structured data. OBJECTIVE This study aimed to assess the performance of a commercially available NLP tool for extracting clinical features from free-text consult notes. METHODS We conducted a pilot, retrospective, cross-sectional study of the accuracy of NLP from dictated consult notes from our tuberculosis clinic with manual chart abstraction as the reference standard. Consult notes for 130 patients were extracted and processed using NLP. We extracted 15 clinical features from these consult notes and grouped them a priori into categories of simple, moderate, and complex for analysis. RESULTS For the primary outcome of overall accuracy, NLP performed best for features classified as simple, achieving an overall accuracy of 96% (95% CI 94.3-97.6). Performance was slightly lower for features of moderate clinical and linguistic complexity at 93% (95% CI 91.1-94.4), and lowest for complex features at 91% (95% CI 87.3-93.1). CONCLUSIONS The findings of this study support the use of NLP for extracting clinical features from dictated consult notes in the setting of a tuberculosis clinic. Further research is needed to fully establish the validity of NLP for this and other purposes.

Download Full-text