scholarly journals Collection of Meta Information with User-Generated Question Answer Pairs and its Reflection for Improving Expressibility in Response Generation

2021 ◽  
Vol 28 (1) ◽  
pp. 136-159
Author(s):  
Takashi Kodama ◽  
Ryuichiro Higashinaka ◽  
Koh Mitsuda ◽  
Ryo Masumura ◽  
Yushi Aono ◽  
...  
Keyword(s):  
2007 ◽  
Author(s):  
Jonathan Pfautz ◽  
Emilie Roth ◽  
Ann Bisantz ◽  
Cullen Jackson ◽  
Gina Thomas ◽  
...  

2014 ◽  
Vol 596 ◽  
pp. 292-296
Author(s):  
Xin Li Li

PageRank algorithms only consider hyperlink information, without other page information such as page hits frequency, page update time and web page category. Therefore, the algorithms rank a lot of advertising pages and old pages pretty high and can’t meet the users' needs. This paper further studies the page meta-information such as category, page hits frequency and page update time. The Web page with high hits frequency and with smaller age should get a high rank, while the above two factors are more or less dependent on page category. Experimental results show that the algorithm has good results.


2017 ◽  
Vol 4 (1) ◽  
pp. 95-110 ◽  
Author(s):  
Deepika Punj ◽  
Ashutosh Dixit

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.


2018 ◽  
Vol 95 ◽  
pp. 90-98 ◽  
Author(s):  
Vanderson Dill ◽  
Pedro Costa Klein ◽  
Alexandre Rosa Franco ◽  
Márcio Sarroglia Pinho

Author(s):  
Elvys Linhares Pontes ◽  
Luis Adrián Cabrera-Diego ◽  
Jose G. Moreno ◽  
Emanuela Boros ◽  
Ahmed Hamdi ◽  
...  

AbstractDigital libraries have a key role in cultural heritage as they provide access to our culture and history by indexing books and historical documents (newspapers and letters). Digital libraries use natural language processing (NLP) tools to process these documents and enrich them with meta-information, such as named entities. Despite recent advances in these NLP models, most of them are built for specific languages and contemporary documents that are not optimized for handling historical material that may for instance contain language variations and optical character recognition (OCR) errors. In this work, we focused on the entity linking (EL) task that is fundamental to the indexation of documents in digital libraries. We developed a Multilingual Entity Linking architecture for HIstorical preSS Articles that is composed of multilingual analysis, OCR correction, and filter analysis to alleviate the impact of historical documents in the EL task. The source code is publicly available. Experimentation has been done over two historical documents covering five European languages (English, Finnish, French, German, and Swedish). Results have shown that our system improved the global performance for all languages and datasets by achieving an F-score@1 of up to 0.681 and an F-score@5 of up to 0.787.


2015 ◽  
Vol 20 (6) ◽  
pp. 848-861 ◽  
Author(s):  
Jongbin Park ◽  
Han-Duck Lee ◽  
Kyung-Won Kim ◽  
Jong-Jin Jung ◽  
Tae-Beom Lim
Keyword(s):  

2015 ◽  
Author(s):  
Roberto Maffei ◽  
Livia S Convertini ◽  
Sabrina Quatraro ◽  
Stefania Ressa ◽  
Annalisa Velasco

Background. Interpretation is the process through which humans attribute meanings to every input they grasp from their natural or social environment. Formulation and exchange of meanings through natural language are basic aspects of human behaviour and important neuroscience subjects; from long ago, they are the object of dedicated scientific research. Two main theoretical positions (cognitivism and embodied cognition) are at present confronting each other; however, available data is not conclusive and scientific knowledge of the interpretation process is still unsatisfactory. Our work proposes some contributions aimed to improve it. Methodology. Our field research involved a random sample of 102 adults. We presented them a real world-like case of written communication using unabridged message texts. We collected data (written accounts by participants about their interpretations) in controlled conditions through a specially designed questionnaire (closed and opened answers). Finally, we carried out qualitative and quantitative analyses through some fundamental statistics. Principal Findings. While readers are expected to concentrate on the text’s content, they rather report focusing on the most varied and unpredictable components: certain physical features of the message (e.g. the message’s period lengths) as well as meta-information like the position of a statement or even the lack of some content. Just about 12% of the participants' indications point directly at the text's content. Our data converge on the hypothesis that the components of a message work at first like physical stimuli, causing readers' automatic (body level) reactions independent of the conscious attribution of meaning. So, interpretation would be a (learned) stimulus-reaction mechanism, before switching to information processing, and the basis of meaning could be perceptual/analogical, before propositional/digital. We carried out a first check of our hypothesis: the employed case contained the emerging of a conflict and two versions (“H” and “S”, same content, different forms) of a reply to be sent at a crucial point. We collected the participants’ (independent) interpretations of the two versions; then, we asked them to choose which one could solve the conflict; finally, we assessed the coherence between interpretations and choice on a 4-level scale. The analysis of the coherence levels' distribution returned that, with regards to our expectations, incoherence levels are over-represented; such imbalance is totally ascribable to “H” choosers. “H” and “S” choosers show significant differences (p<<0.01) in the distributions of coherence levels, what is inconsistent with the traditional hypothesis of a linear information processing resulting in the final choice. In the end, with respect to the currently opposing theories, we found out that our hypothesis has either important convergences or at least one critical divergence, joined with the capacity to encompass they both.


Author(s):  
Włodzimierz Lewoniewski ◽  
Krzysztof Węcel ◽  
Witold Abramowicz

One of the most important factors impacting quality of content in Wikipedia is presence of credible sources. By following references readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about nearly 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia.


Sign in / Sign up

Export Citation Format

Share Document