scholarly journals What Can We Learn from Almost a Decade of Food Tweets

Author(s):  
Uga Sproģis ◽  
Matīss Rikters

We present the Latvian Twitter Eater Corpus - a set of tweets in the narrow domain related to food, drinks, eating and drinking. The corpus has been collected over time-span of over 8 years and includes over 2 million tweets entailed with additional useful data. We also separate two sub-corpora of question and answer tweets and sentiment annotated tweets. We analyse the contents of the corpus and demonstrate use-cases for the sub-corpora by training domain-specific question-answering and sentiment-analysis models using the data from the corpus.

2018 ◽  
Vol 2 (4) ◽  
pp. 140 ◽  
Author(s):  
Ramadhana Rosyadi ◽  
Said Al-Faraby ◽  
Adiwijaya Adiwijaya

Islam has 25 prophets as guidelines for human life, documents containing information about the stories of the lives of the prophets during their lifetime. This study aims to build a more specific question and answer system by generating relevant answers not in the form of documents. Question Answering System is able to overcome problems in the Question and answer system, information retrieval systems where the answers issued are correct with responses to requests submitted, not in the form of documents that may contain answers. This study uses the Pattern Based method as extracting sentence pieces which are the answers to find answers that match the patterns that have been made. The selection of datasets causes a number of questions that can be submitted to be limited to information stored in the data itself. Besides that, questions are also limited in the form of Question words that are Factoid, namely Who, when, where, what and how. Accuracy results obtained using the Pattern Based method on Question Answering System are 39.36%.


2021 ◽  
Vol 3 (2) ◽  
pp. 299-317
Author(s):  
Patrick Schrempf ◽  
Hannah Watson ◽  
Eunsoo Park ◽  
Maciej Pajak ◽  
Hamish MacKinnon ◽  
...  

Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.


2021 ◽  
Author(s):  
Tiago de Melo

Online reviews are readily available on the Web and widely used for decision-making. However, only a few studies on Portuguese sentiment analysis are reported due to the lack of resources including domain-specific sentiment lexical collections. In this paper, we present an effective methodology using probabilities of the Bayes’ Theorem for building a set of lexicons, called SentiProdBR, for 10 different product categories for the Portuguese language. Experimental results indicate that our methodology significantly outperforms several alternative approaches of building domain-specific sentiment lexicons.


Author(s):  
Emrah Inan ◽  
Burak Yonyul ◽  
Fatih Tekbacak

Most of the data on the web is non-structural, and it is required that the data should be transformed into a machine operable structure. Therefore, it is appropriate to convert the unstructured data into a structured form according to the requirements and to store those data in different data models by considering use cases. As requirements and their types increase, it fails using one approach to perform on all. Thus, it is not suitable to use a single storage technology to carry out all storage requirements. Managing stores with various type of schemas in a joint and an integrated manner is named as 'multistore' and 'polystore' in the database literature. In this paper, Entity Linking task is leveraged to transform texts into wellformed data and this data is managed by an integrated environment of different data models. Finally, this integrated big data environment will be queried and be examined by presenting the method.


Sign in / Sign up

Export Citation Format

Share Document