scholarly journals Coreference Resolution for Anaphoric Pronouns in Texts on Medical Products

2018 ◽  
Vol 56 (1) ◽  
pp. 205-216
Author(s):  
Jerzy Krawczuk ◽  
Mariusz Ferenc

Abstract Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is one of the higher level NLP (Natural Language Processing) tasks. It allows, for example, to extract more information about medical products from larger texts. A product such as ‘ambidextrous gloves’ may appear in a text in many different forms. For example, they could be referred to by the pronoun ‘they’, such as in this sentence. The algorithm presented in this paper finds pronouns and for each of them (except the pleonastic ‘it’) it creates a coreference candidate with entities that appeared earlier in the same sentence or in the previous sentence. Each candidate (pair of mentions) is described by 48 binary features which represent their grammatical and location properties. In the training set, each pair is marked as a coreference or not, based on which a decision tree classifier is trained. A classifier with a high precision of 0.94 and a decent recall of 0.61 were obtained on the training set, still with a good precision out of a sample of 0.64.

The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).


The main objective of this paper is Analyze the reviews of Social Media Big Data of E-Commerce product’s. And provides helpful result to online shopping customers about the product quality and also provides helpful decision making idea to the business about the customer’s mostly liking and buying products. This covers all features or opinion words, like capitalized words, sequence of repeated letters, emoji, slang words, exclamatory words, intensifiers, modifiers, conjunction words and negation words etc available in tweets. The existing work has considered only two or three features to perform Sentiment Analysis with the machine learning technique Natural Language Processing (NLP). In this proposed work familiar Machine Learning classification models namely Multinomial Naïve Bayes, Support Vector Machine, Decision Tree Classifier, and, Random Forest Classifier are used for sentiment classification. The sentiment classification is used as a decision support system for the customers and also for the business.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2423-2426

Natural Language Processing is a vital field of research having applications in different subjects. Text Classification is a part of NLP where the text is converted into a machine-readable form by performing various methods. Tokenizing, part-of-speech tagging, stemming, chunking are some of the text classification methods. Implementing these methods on our data gives us a classified data on which we will train the model to detect spam and ham messages using Scikit-Learn Classifiers. We proposed a model to solve the issue of classifying messages as spam or ham by experimenting and analyzing the relative strengths of several machine learning algorithms such as K-Nearest Neighbors (KNN), Decision Tree Classifier, Random Forest Classifier, Logistic Regression, SGD Classifier, Multinomial Naive Bayes(NB), Support Vector Machine(SVM) to have a logical comparison of the performance measures of the methods we utilized in this research. The algorithm we proposed achieved an average accuracy of 98.49% with SVM model on ‘SMS Spam Collection’ dataset


Author(s):  
Zahra Mousavi ◽  
Heshaam Faili

Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a fully-automated method for extending a previously developed Persian wordnet to cover more comprehensive and accurate verbal entries. At first, by using a bilingual dictionary, some Persian verbs are linked to Princeton WordNet synsets. A feature set related to the semantic behavior of compound verbs as the majority of Persian verbs is proposed. This feature set is employed in a supervised classification system to select the proper links for inclusion in the wordnet. We also benefit from a pre-existing Persian wordnet, FarsNet, and a similarity-based method to produce a training set. This is the largest automatically developed Persian wordnet with more than 27,000 words, 28,000 PWN synsets and 67,000 word-sense pairs that substantially outperforms the previous Persian wordnet with about 16,000 words, 22,000 PWN synsets and 38,000 word-sense pairs.


Healthcare ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 169
Author(s):  
Sergi Gómez-Quintana ◽  
Christoph E. Schwarz ◽  
Ihor Shelevytsky ◽  
Victoriya Shelevytska ◽  
Oksana Semenova ◽  
...  

The current diagnosis of Congenital Heart Disease (CHD) in neonates relies on echocardiography. Its limited availability requires alternative screening procedures to prioritise newborns awaiting ultrasound. The routine screening for CHD is performed using a multidimensional clinical examination including (but not limited to) auscultation and pulse oximetry. While auscultation might be subjective with some heart abnormalities not always audible it increases the ability to detect heart defects. This work aims at developing an objective clinical decision support tool based on machine learning (ML) to facilitate differentiation of sounds with signatures of Patent Ductus Arteriosus (PDA)/CHDs, in clinical settings. The heart sounds are pre-processed and segmented, followed by feature extraction. The features are fed into a boosted decision tree classifier to estimate the probability of PDA or CHDs. Several mechanisms to combine information from different auscultation points, as well as consecutive sound cycles, are presented. The system is evaluated on a large clinical dataset of heart sounds from 265 term and late-preterm newborns recorded within the first six days of life. The developed system reaches an area under the curve (AUC) of 78% at detecting CHD and 77% at detecting PDA. The obtained results for PDA detection compare favourably with the level of accuracy achieved by an experienced neonatologist when assessed on the same cohort.


2019 ◽  
Vol 9 (22) ◽  
pp. 4833 ◽  
Author(s):  
Ardo Allik ◽  
Kristjan Pilt ◽  
Deniss Karai ◽  
Ivo Fridolin ◽  
Mairo Leier ◽  
...  

The aim of this study was to develop an optimized physical activity classifier for real-time wearable systems with the focus on reducing the requirements on device power consumption and memory buffer. Classification parameters evaluated in this study were the sampling frequency of the acceleration signal, window length of the classification fragment, and the number of classification features, found with different feature selection methods. For parameter evaluation, a decision tree classifier was created based on the acceleration signals recorded during tests, where 25 healthy test subjects performed various physical activities. Overall average F1-score achieved in this study was about 0.90. Similar F1-scores were achieved with the evaluated window lengths of 5 s (0.92 ± 0.02) and 3 s (0.91 ± 0.02), while classification performance with 1 s were lower (0.87 ± 0.02). Tested sampling frequencies of 50 Hz, 25 Hz, and 13 Hz had similar results with most classified activity types, with an exception of outdoor cycling, where differences were significant. Using forward sequential feature selection enabled the decreasing of the number of features from initial 110 features to about 12 features without lowering the classification performance. The results of this study have been used for developing more efficient real-time physical activity classifiers.


Sign in / Sign up

Export Citation Format

Share Document