Mining Meaning

2011 ◽  
pp. 125-140
Author(s):  
William L. Tullar

This chapter focuses on the pattern detection and extraction step in text data commonly called text data mining. I examine some of the literature on natural language processing and propose a method of recovering value from the text of virtual group discussions based on methods derived from the communication field. Then, I apply the method in a case using data from 216 different groups from a virtual group experiment. The results from the case show that higher performing groups are characterized by higher frequencies of acts of dominance and higher frequencies of terms concerning cognition, communication and praise. Higher performing groups were also characterized by lower frequencies of acts of equivalence and lower frequencies of leveling terms and numerical terms. Ways to use this knowledge to improve the groups’ performance are discussed.

Author(s):  
Pratik Solim ◽  
◽  
Ankit Sagwekar ◽  
Rishikesh Patil ◽  
Swapnali Kurhade ◽  
...  

2003 ◽  
Vol 17 (5) ◽  
Author(s):  
Paola Merlo ◽  
James Henderson ◽  
Gerold Schneider ◽  
Eric Wehrli

The recent considerable growth in the amount of easily available on-line text has brought to the foreground the need for large-scale natural language processing tools for text data mining. In this paper we address the problem of organizing documents into meaningful groups according to their content and to visualize a text collection, providing an overview of the range of documents and of their relationships, so that they can be browsed more easily. We use Self-Organizing Maps (SOMs) (Kohonen 1984). Great efficiency challenges arise in creating these maps. We study linguistically-motivated ways of reducing the representation of a document to increase efficiency and ways to disambiguate the words in the documents.


2020 ◽  
Author(s):  
David DeFranza ◽  
Himanshu Mishra ◽  
Arul Mishra

Language provides an ever-present context for our cognitions and has the ability to shape them. Languages across the world can be gendered (language in which the form of noun, verb, or pronoun is presented as female or male) versus genderless. In an ongoing debate, one stream of research suggests that gendered languages are more likely to display gender prejudice than genderless languages. However, another stream of research suggests that language does not have the ability to shape gender prejudice. In this research, we contribute to the debate by using a Natural Language Processing (NLP) method which captures the meaning of a word from the context in which it occurs. Using text data from Wikipedia and the Common Crawl project (which contains text from billions of publicly facing websites) across 45 world languages, covering the majority of the world’s population, we test for gender prejudice in gendered and genderless languages. We find that gender prejudice occurs more in gendered rather than genderless languages. Moreover, we examine whether genderedness of language influences the stereotypic dimensions of warmth and competence utilizing the same NLP method.


Vector representations for language have been shown to be useful in a number of Natural Language Processing tasks. In this paper, we aim to investigate the effectiveness of word vector representations for the problem of Sentiment Analysis. In particular, we target three sub-tasks namely sentiment words extraction, polarity of sentiment words detection, and text sentiment prediction. We investigate the effectiveness of vector representations over different text data and evaluate the quality of domain-dependent vectors. Vector representations has been used to compute various vector-based features and conduct systematically experiments to demonstrate their effectiveness. Using simple vector based features can achieve better results for text sentiment analysis of APP.


2020 ◽  
pp. 205-228
Author(s):  
George A. Khachatryan

Instruction modeling is still in its early stages. This chapter discusses promising directions in which instruction modeling could develop in coming years. This includes increasing the richness of interfaces used in instruction modeling programs (e.g., by allowing students to enter responses in free form and have them graded via natural language processing); applying instruction modeling to subjects beyond mathematics, including English, foreign language, and science; using educational data mining to create automated “coaches” to help teachers better implement instruction modeling programs in their classrooms; creating approaches to instruction modeling that allow for rapid authorship of content; redesigning schools (in schedules as well as architecture) to optimize the use of instruction modeling; and putting in place government policies to encourage the use of comprehensive blended learning programs (such as those developed through instruction modeling).


2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
J. Bouaziz ◽  
R. Mashiach ◽  
S. Cohen ◽  
A. Kedem ◽  
A. Baron ◽  
...  

Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.


Cancer ◽  
2016 ◽  
Vol 123 (1) ◽  
pp. 114-121 ◽  
Author(s):  
Tejal A. Patel ◽  
Mamta Puppala ◽  
Richard O. Ogunti ◽  
Joe E. Ensor ◽  
Tiancheng He ◽  
...  

Author(s):  
Nazmun Nessa Moon ◽  
Imrus Salehin ◽  
Masuma Parvin ◽  
Md. Mehedi Hasan ◽  
Iftakhar Mohammad Talha ◽  
...  

<span>In this study we have described the process of identifying unnecessary video using an advanced combined method of natural language processing and machine learning. The system also includes a framework that contains analytics databases and which helps to find statistical accuracy and can detect, accept or reject unnecessary and unethical video content. In our video detection system, we extract text data from video content in two steps, first from video to MPEG-1 audio layer 3 (MP3) and then from MP3 to WAV format. We have used the text part of natural language processing to analyze and prepare the data set. We use both Naive Bayes and logistic regression classification algorithms in this detection system to determine the best accuracy for our system. In our research, our video MP4 data has converted to plain text data using the python advance library function. This brief study discusses the identification of unauthorized, unsocial, unnecessary, unfinished, and malicious videos when using oral video record data. By analyzing our data sets through this advanced model, we can decide which videos should be accepted or rejected for the further actions.</span>


2018 ◽  
Author(s):  
Jeremy Petch ◽  
Jane Batt ◽  
Joshua Murray ◽  
Muhammad Mamdani

BACKGROUND The increasing adoption of electronic health records (EHRs) in clinical practice holds the promise of improving care and advancing research by serving as a rich source of data, but most EHRs allow clinicians to enter data in a text format without much structure. Natural language processing (NLP) may reduce reliance on manual abstraction of these text data by extracting clinical features directly from unstructured clinical digital text data and converting them into structured data. OBJECTIVE This study aimed to assess the performance of a commercially available NLP tool for extracting clinical features from free-text consult notes. METHODS We conducted a pilot, retrospective, cross-sectional study of the accuracy of NLP from dictated consult notes from our tuberculosis clinic with manual chart abstraction as the reference standard. Consult notes for 130 patients were extracted and processed using NLP. We extracted 15 clinical features from these consult notes and grouped them a priori into categories of simple, moderate, and complex for analysis. RESULTS For the primary outcome of overall accuracy, NLP performed best for features classified as simple, achieving an overall accuracy of 96% (95% CI 94.3-97.6). Performance was slightly lower for features of moderate clinical and linguistic complexity at 93% (95% CI 91.1-94.4), and lowest for complex features at 91% (95% CI 87.3-93.1). CONCLUSIONS The findings of this study support the use of NLP for extracting clinical features from dictated consult notes in the setting of a tuberculosis clinic. Further research is needed to fully establish the validity of NLP for this and other purposes.


Sign in / Sign up

Export Citation Format

Share Document