scholarly journals LexiPers: An ontology based sentiment lexicon for Persian

10.29007/f4j4 ◽  
2018 ◽  
Author(s):  
Behnam Sabeti ◽  
Pedram Hosseini ◽  
Gholamreza Ghassem-Sani ◽  
Sَeyed Abolghasem Mirroshandel

Sentiment analysis refers to the use of natural language processing to identify and extract subjective information from textual resources. One approach for sentiment extraction is using a sentiment lexicon. A sentiment lexicon is a set of words associated with the sentiment orientation that they express. In this paper, we describe the process of generating a general purpose sentiment lexicon for Persian. A new graph-based method is introduced for seed selection and expansion based on an ontology. Sentiment lexicon generation is then mapped to a document classification problem. We used the K-nearest neighbors and nearest centroid methods for classification. These classifiers have been evaluated based on a set of hand labeled synsets. The final sentiment lexicon has been generated by the best classifier. The results show an acceptable performance in terms of accuracy and F-measure in the generated sentiment lexicon.

Clinical parsing is useful in medical domain .Clinical narratives are difficult to understand as it is in unstructured format .Medical Natural language processing systems are used to make these clinical narratives in readable format. Clinical Parser is the combination of natural language processing and medical lexicon .For making clinical narrative understandable parsing technique is used .In this paper we are discussing about constituency parser for clinical narratives, which is based on phrase structured grammar. This parser convert unstructured clinical narratives into structured report. This paper focus on clinical sentences which is in unstructured format after parsing convert into structured format. For each sentence recall ,precision and bracketing f- measure are calculated .


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Graham Neubig ◽  
Patrick Littell ◽  
Chian-Yu Chen ◽  
Jean Lee ◽  
Zirui Li ◽  
...  

Language documentation is inherently a time-intensive process; transcription, glossing, and corpus management consume a significant portion of documentary linguists’ work. Advances in natural language processing can help to accelerate this work, using the linguists’ past decisions as training material, but questions remain about how to prioritize human involvement. In this extended abstract, we describe the beginnings of a new project that will attempt to ease this language documentation process through the use of natural language processing (NLP) technology. It is based on (1) methods to adapt NLP tools to new languages, based on recent advances in massively multilingual neural networks, and (2) backend APIs and interfaces that allow linguists to upload their data (§2). We then describe our current progress on two fronts: automatic phoneme transcription, and glossing (§3). Finally, we briefly describe our future directions (§4).


2015 ◽  
Vol 7 (1) ◽  
Author(s):  
Paula Carvalho ◽  
Mário J. Silva

This paper describes the main characteristics of SentiLex-PT, a sentiment lexicon designed for the extraction of sentiment and opinion about human entities in Portuguese texts. The potential of this resource is illustrated on its application to two types of corpora, the SentiCorpus-PT, a social media corpus, consisting of user comments to news articles, and a literary piece of the early twentieth century, The Poor (Os Pobres), by Raul Brandão. The data were processed by UNITEX, a natural language processing system based on dictionaries and grammars.


Author(s):  
Deniz Caliskan ◽  
Jakob Zierk ◽  
Detlef Kraska ◽  
Stefan Schulz ◽  
Philipp Daumke ◽  
...  

Introduction: The aim of this study is to evaluate the use of a natural language processing (NLP) software to extract medication statements from unstructured medical discharge letters. Methods: Ten randomly selected discharge letters were extracted from the data warehouse of the University Hospital Erlangen (UHE) and manually annotated to create a gold standard. The AHD NLP tool, provided by MIRACUM’s industry partner was used to annotate these discharge letters. Annotations by the NLP tool where then compared to the gold standard on two levels: phrase precision (whether or not the whole medication statement has been identified correctly) and token precision (whether or not the medication name has been identified correctly within correctly discovered medication phrases). Results: The NLP tool detected medication related phrases with an overall F-measure of 0.852. The medication name has been identified correctly with an overall F-measure of 0.936. Discussion: This proof-of-concept study is a first step towards an automated scalable evaluation system for MIRACUM’s industry partner’s NLP tool by using a gold standard. Medication phrases and names have been correctly identified in most cases by the NLP system. Future effort needs to be put into extending and validating the gold standard.


2019 ◽  
Vol 886 ◽  
pp. 221-226 ◽  
Author(s):  
Kesinee Boonchuay

Sentiment classification gains a lot of attention nowadays. For a university, the knowledge obtained from classifying sentiments of student learning in courses is highly valuable, and can be used to help teachers improve their teaching skills. In this research, sentiment classification based on text embedding is applied to enhance the performance of sentiment classification for Thai teaching evaluation. Text embedding techniques considers both syntactic and semantic elements of sentences that can be used to improve the performance of the classification. This research uses two approaches to apply text embedding for classification. The first approach uses fastText classification. According to the results, fastText provides the best overall performance; its highest F-measure was at 0.8212. The second approach constructs text vectors for classification using traditional classifiers. This approach provides better performance over TF-IDF for k-nearest neighbors and naïve Bayes. For naïve Bayes, the second approach yields the best performance of geometric mean at 0.8961. The performance of TF-IDF is better suited to using decision tree than the second approach. The benefit of this research is that it presents the workflow of using text embedding for Thai teaching evaluation to improve the performance of sentiment classification. By using embedding techniques, similarity and analogy tasks of texts are established along with the classification.


2014 ◽  
Vol 22 (1) ◽  
pp. 132-142 ◽  
Author(s):  
Ching-Heng Lin ◽  
Nai-Yuan Wu ◽  
Wei-Shao Lai ◽  
Der-Ming Liou

Abstract Background and objective Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entry-level interoperable clinical documents. Methods Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry-level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines. Results The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p<0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines. Conclusions The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents.


2018 ◽  
Author(s):  
Shoko Wakamiya ◽  
Mizuki Morita ◽  
Yoshinobu Kano ◽  
Tomoko Ohkuma ◽  
Eiji Aramaki

BACKGROUND The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. OBJECTIVE This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. METHODS In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. RESULTS The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. CONCLUSIONS This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.


2017 ◽  
Vol 3 (1) ◽  
pp. 25-32 ◽  
Author(s):  
Claudio Fresta Suharno ◽  
M. Ali Fauzi ◽  
Rizal Setya Perdana

K-Nearest Neighbors (K-NN) merupakan metode klasifikasi yang mudah untuk dipahami. Akan tetapi metode tersebut memiliki beberapa kekurangan, salah satunya dalam aspek komputasi perhitungan yang besar. Oleh karena itu, seleksi fitur digunakan sebagai salah satu cara untuk mengurangi besarnya komputasi adalah dengan mengurangi jumlah fitur yang tidak relevan dalam klasifikasi teks. Metode seleksi fitur yang digunakan adalah menggunakan metode Chi-Square untuk menghitung tingkat dependensi fitur. Proses yang dilakukan adalah mengumpulkan dokumen latih dan dokumen uji, melakukan tahap preprocessing dan seleksi fitur, kemudian dilakukan klasifikasi, dan pada tahap akhir dilakukan pengujian dan analisis terhadap hasil klasifikasi oleh sistem terkait nilai precision, recall, dan F-Measure. Dari penelitian ini dihasilkan bahwa seleksi fitur dapat meningkatkan nilai F-Measure dalam klasifikasi teks berbahasa Indonesia pada dokumen pengaduan SAMBAT Online dengan menggunakan metode klasifikasi K-Nearest Neighbors


Sign in / Sign up

Export Citation Format

Share Document