Predicting word choice in affective text

2015 ◽  
Vol 22 (1) ◽  
pp. 97-134
Author(s):  
M. GARDINER ◽  
M. DRAS

AbstractChoosing the best word or phrase for a given context from among the candidate near-synonyms, such as slim and skinny, is a difficult language generation problem. In this paper, we describe approaches to solving an instance of this problem, the lexical gap problem, with a particular focus on affect and subjectivity; to do this we draw upon techniques from the sentiment and subjectivity analysis fields. We present a supervised approach to this problem, initially with a unigram model that solidly outperforms the baseline, with a 6.8% increase in accuracy. The results to some extent confirm those from related problems, where feature presence outperforms feature frequency, and immediate context features generally outperform wider context features. However, this latter is somewhat surprisingly not always the case, and not necessarily where intuition might first suggest; and an analysis of where document-level models are in some cases better suggested that, in our corpus, broader features related to the ‘tone’ of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself. From these, our best model has a 10.1% increase in accuracy, corresponding to a 38% reduction in errors. Moreover, our models do not just improve accuracy on affective word choice, but on non-affective word choice also.

Author(s):  
Nafiseh Zeinali ◽  
Karim Faez ◽  
Sahar Seifzadeh

Purpose: One of the essential problems in deep-learning face recognition research is the use of self-made and less counted data sets, which forces the researcher to work on duplicate and provided data sets. In this research, we try to resolve this problem and get to high accuracy. Materials and Methods: In the current study, the goal is to identify individual facial expressions in the image or sequence of images that include identifying ten facial expressions. Considering the increasing use of deep learning in recent years, in this study, using the convolution networks and, most importantly, using the concept of transfer learning, led us to use pre-trained networks to train our networks. Results: One way to improve accuracy in working with less counted data and deep-learning is to use pre-trained using pre-trained networks. Due to the small number of data sets, we used the techniques for data augmentation and eventually tripled the data size. These techniques include: rotating 10 degrees to the left and right and eventually turning to elastic transmation. We also applied deep Res-Net's network to public data sets existing for face expression by data augmentation. Conclusion: We saw a seven percent increase in accuracy compared to the highest accuracy in previous work on the considering dataset.


10.2196/17638 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e17638
Author(s):  
Jian Wang ◽  
Xiaoyu Chen ◽  
Yu Zhang ◽  
Yijia Zhang ◽  
Jiabin Wen ◽  
...  

Background Automatically extracting relations between chemicals and diseases plays an important role in biomedical text mining. Chemical-disease relation (CDR) extraction aims at extracting complex semantic relationships between entities in documents, which contain intrasentence and intersentence relations. Most previous methods did not consider dependency syntactic information across the sentences, which are very valuable for the relations extraction task, in particular, for extracting the intersentence relations accurately. Objective In this paper, we propose a novel end-to-end neural network based on the graph convolutional network (GCN) and multihead attention, which makes use of the dependency syntactic information across the sentences to improve CDR extraction task. Methods To improve the performance of intersentence relation extraction, we constructed a document-level dependency graph to capture the dependency syntactic information across sentences. GCN is applied to capture the feature representation of the document-level dependency graph. The multihead attention mechanism is employed to learn the relatively important context features from different semantic subspaces. To enhance the input representation, the deep context representation is used in our model instead of traditional word embedding. Results We evaluate our method on CDR corpus. The experimental results show that our method achieves an F-measure of 63.5%, which is superior to other state-of-the-art methods. In the intrasentence level, our method achieves a precision, recall, and F-measure of 59.1%, 81.5%, and 68.5%, respectively. In the intersentence level, our method achieves a precision, recall, and F-measure of 47.8%, 52.2%, and 49.9%, respectively. Conclusions The GCN model can effectively exploit the across sentence dependency information to improve the performance of intersentence CDR extraction. Both the deep context representation and multihead attention are helpful in the CDR extraction task.


2006 ◽  
Vol 32 (2) ◽  
pp. 223-262 ◽  
Author(s):  
Diana Inkpen ◽  
Graeme Hirst

Choosing the wrong word in a machine translation or natural language generation system can convey unwanted connotations, implications, or attitudes. The choice between near-synonyms such as error, mistake, slip, and blunder—words that share the same core meaning, but differ in their nuances—can be made only if knowledge about their differences is available. We present a method to automatically acquire a new type of lexical resource: a knowledge base of near-synonym differences. We develop an unsupervised decision-list algorithm that learns extraction patterns from a special dictionary of synonym differences. The patterns are then used to extract knowledge from the text of the dictionary. The initial knowledge base is later enriched with information from other machine-readable dictionaries. Information about the collocational behavior of the near-synonyms is acquired from free text. The knowledge base is used by Xenon, a natural language generation system that shows how the new lexical resource can be used to choose the best near-synonym in specific situations.


2019 ◽  
Vol 9 (17) ◽  
pp. 3571
Author(s):  
Li Wang ◽  
Qiao Guo

Language plays a prominent role in the activities of human beings and other intelligent creatures. One of the most important functions of languages is communication. Inspired by this, we attempt to develop a novel language for cooperation between artificial agents. The language generation problem has been studied earlier in the context of evolutionary games in computational linguistics. In this paper, we take a different approach by formulating it in the computational model of rationality in a multi-agent planning setting. This paper includes three main parts: First, we present a language generation problem that is connected to state abstraction and introduce a few of the languages’ properties. Second, we give the sufficient and necessary conditions of a valid abstraction with proofs and develop an efficient algorithm to construct the languages where several words are generated naturally. The sentences composed of words can be used by agents to regulate their behaviors during task planning. Finally, we conduct several experiments to evaluate the benefits of the languages in a variety of scenarios of a path-planning domain. The empirical results demonstrate that our languages lead to reduction in communication cost and behavior restriction.


2019 ◽  
Author(s):  
Jian Wang ◽  
Xiaoyu Chen ◽  
Yu Zhang ◽  
Yijia Zhang ◽  
Jiabin Wen ◽  
...  

BACKGROUND Automatically extracting relations between chemicals and diseases plays an important role in biomedical text mining. Chemical-disease relation (CDR) extraction aims at extracting complex semantic relationships between entities in documents, which contain intrasentence and intersentence relations. Most previous methods did not consider dependency syntactic information across the sentences, which are very valuable for the relations extraction task, in particular, for extracting the intersentence relations accurately. OBJECTIVE In this paper, we propose a novel end-to-end neural network based on the graph convolutional network (GCN) and multihead attention, which makes use of the dependency syntactic information across the sentences to improve CDR extraction task. METHODS To improve the performance of intersentence relation extraction, we constructed a document-level dependency graph to capture the dependency syntactic information across sentences. GCN is applied to capture the feature representation of the document-level dependency graph. The multihead attention mechanism is employed to learn the relatively important context features from different semantic subspaces. To enhance the input representation, the deep context representation is used in our model instead of traditional word embedding. RESULTS We evaluate our method on CDR corpus. The experimental results show that our method achieves an F-measure of 63.5%, which is superior to other state-of-the-art methods. In the intrasentence level, our method achieves a precision, recall, and F-measure of 59.1%, 81.5%, and 68.5%, respectively. In the intersentence level, our method achieves a precision, recall, and F-measure of 47.8%, 52.2%, and 49.9%, respectively. CONCLUSIONS The GCN model can effectively exploit the across sentence dependency information to improve the performance of intersentence CDR extraction. Both the deep context representation and multihead attention are helpful in the CDR extraction task.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 262-262
Author(s):  
Ling-Yun Chang ◽  
Sajjad Toghiani ◽  
E L Hamidi Hay ◽  
Samuel E Aggrey ◽  
Romdhane Rekaya

Abstract Using low to moderate density SNP marker panels, a substantial increase in accuracy was achieved. The dramatic increase in the number of identified variants due to advances in next generation sequencing was expected to significantly increase the accuracy of genomic selection (GS). Unfortunately, little to no improvement was observed. For mixed model-based approaches, using all SNPs in the panel to compute the observed relationship matrix (G) will not increase accuracy as the additive relationships between individuals can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. Further, it has been shown that weighting SNPs when calculating G could be effective in improving the accuracy of GS. FST as a measure population differential has been successfully used to identify genome segments under selection pressure. Consequently, FST could be used to both prioritize SNPs and to derive their relative weight in the calculation of the genomic relationship matrix. A population of 15,000 animals genotyped for 400K SNP markers uniformly-distributed along 10 chromosomes was simulated. A trait with heritability 0.3 genetically controlled by two hundred QTL was generated. The top 20K SNPs based on their FST scores were used either alone or with the remaining 380K SNPs to compute G with or without weighting. When only the top 20K SNPs were used to compute G, two scenarios were considered: 1) equal weights for all SNPs or 2) weights proportional to the SNP FST scores. When all 400K SNP markers were used, different weighting scenarios were evaluated. The results clearly showed that prioritizing SNP markers based on their FST score and using the latter to compute relative weights has increased the genetic similarity between training and validations animals and resulted in more than 5% improvement in the accuracy of GS.


Author(s):  
Anita Desiani ◽  
Sugandi Yahdin ◽  
Annisa Kartikasari ◽  
Irmeilyana Irmeilyana

<span id="docs-internal-guid-cd6caed5-7fff-5f99-0341-cb32fa5ad787"><span>The imbalanced data affect the accuracy of models, especially for precision and sensitivity, it makes difficult to find information on minority class. The problem is identified in the tracer study dataset Universitas Sriwijaya that has 2934 data. The label attribute is divided into several label classes, namely not tight, somewhat-tight, tight, very tight, and tightest. The number of the tightest and very tight is 27% and 38.6% of the number majority classes. In the study, the SMOTE is combined with eliminating the missing value of data to handle the imbalanced data. The method was evaluated by the classification methods KNN, ANN, and C4.5. The results of these methods show a significant increase in accuracy as a whole and a significant increase in the precision and sensitivity of minority classes. The precision and sensitivity of both the majority and minority are not too different, although the number of the minority is very less compared to the majority class. the information on minority classes can be obtained with quite high precision and sensitivity. As a conclusion, the proposed method is passably to improve accuracy and greatly affects the increase in sensitivity and precision.</span></span>


Sign in / Sign up

Export Citation Format

Share Document