scholarly journals News Across Languages - Cross-Lingual Document Similarity and Event Tracking (Extended Abstract)

Author(s):  
Jan Rupnik ◽  
Andrej Muhič ◽  
Gregor Leban ◽  
Blaž Fortuna ◽  
Marko Grobelnik

In today's world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Building on previous work, we show there are methods which scale well and can compute a meaningful similarity between articles from languages with little or no direct overlap in the training data.Using this capability, we then propose an approach to link clusters of articles across languages which represent the same event.


2016 ◽  
Vol 55 ◽  
pp. 283-316 ◽  
Author(s):  
Jan Rupnik ◽  
Andrej Muhic ◽  
Gregor Leban ◽  
Primoz Skraba ◽  
Blaz Fortuna ◽  
...  

In today's world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Taking a multilingual stream and clusters of articles from each language, we compare different cross-lingual document similarity measures based on Wikipedia. This allows us to compute the similarity of any two articles regardless of language. Building on previous work, we show there are methods which scale well and can compute a meaningful similarity between articles from languages with little or no direct overlap in the training data. Using this capability, we then propose an approach to link clusters of articles across languages which represent the same event. We provide an extensive evaluation of the system as a whole, as well as an evaluation of the quality and robustness of the similarity measure and the linking algorithm.





2019 ◽  
Vol 70 (1) ◽  
pp. 58-104
Author(s):  
Philipp Dankel ◽  
Ignacio Satti

Abstract This article focuses on the practice of listing in Talk-in-Interaction. Lists are frequently used in spoken language as a discursive resource and can be considered as a universal, cross-lingual practice for structuring ideas. As such, they have been given attention in several fields of linguistics, mainly in intonation research, conversation analysis and interactional linguistics. However, the role of gestures and other physical forms of expression in listing has been mostly disregarded so far. For this reason, we attempt to cast light on the form and function of gestures and other bodily resources that are embedded in this practice. We argue that lists are multimodal and that bodily resources play a major role in establishing the format and in organizing the interaction. In order to do so, we use a broad collection of examples from different sources in French, Italian and Spanish.



2007 ◽  
Vol 30 (1) ◽  
pp. 135-162 ◽  
Author(s):  
Ralf Steinberger ◽  
Bruno Pouliquen

Named Entity Recognition and Classification (NERC) is a known and well-explored text analysis application that has been applied to various languages. We are presenting an automatic, highly multilingual news analysis system that fully integrates NERC for locations, persons and organisations with document clustering, multi-label categorisation, name attribute extraction, name variant merging and the calculation of social networks. The proposed application goes beyond the state-of-the-art by automatically merging the information found in news written in ten different languages, and by using the aggregated name information to automatically link related news documents across languages for all 45 language pair combinations. While state-of-the-art approaches for cross-lingual name variant merging and document similarity calculation require bilingual resources, the methods proposed here are mostly language-independent and require a minimal amount of monolingual language-specific effort. The development of resources for additional languages is therefore kept to a minimum and new languages can be plugged into the system effortlessly. The presented online news analysis application is fully functional and has, at the end of the year 2006, reached average usage statistics of 600,000 hits per day.



2021 ◽  
Author(s):  
Haoyang Wen ◽  
Ying Lin ◽  
Tuan Lai ◽  
Xiaoman Pan ◽  
Sha Li ◽  
...  


2016 ◽  
Vol 22 (4) ◽  
pp. 627-653 ◽  
Author(s):  
RAZIEH RAHIMI ◽  
AZADEH SHAKERY ◽  
JAVID DADASHKARIMI ◽  
MOZHDEH ARIANNEZHAD ◽  
MOSTAFA DEHGHANI ◽  
...  

AbstractComparable corpora are key translation resources for both languages and domains with limited linguistic resources. The existing approaches for building comparable corpora are mostly based on ranking candidate documents in the target language for each source document using a cross-lingual retrieval model. These approaches also exploit other evidence of document similarity, such as proper names and publication dates, to build more reliable alignments. However, the importance of each evidence in the scores of candidate target documents is determined heuristically. In this paper, we employ a learning to rank method for ranking candidate target documents with respect to each source document. The ranking model is constructed by defining each evidence for similarity of bilingual documents as a feature whose weight is learned automatically. Learning feature weights can significantly improve the quality of alignments, because the reliability of features depends on the characteristics of both source and target languages of a comparable corpus. We also propose a method to generate appropriate training data for the task of building comparable corpora. We employed the proposed learning-based approach to build a multi-domain English–Persian comparable corpus which covers twelve different domains obtained from Open Directory Project. Experimental results show that the created alignments have high degrees of comparability. Comparison with existing approaches for building comparable corpora shows that our learning-based approach improves both quality and coverage of alignments.



2021 ◽  
Vol XII (2) ◽  
pp. 267-279
Author(s):  
Jaume García Rosselló ◽  

In this article the social and technological dynamics detected in the transition from hand-made pottery to wheel-thrown ware in a modern context is considered. The many different sources supplemented by fieldwork provide a long-term perspective and a depiction of its present consequences. It is specifically explained, how an indigenous, hand-made, domestic and female pottery-production system has turned into an essentially male, wheel-thrown and workshop activity. After a series of significant events, the Indian village of Pomaire gained a reputation as a potter’s village. The several changes underwent by its population as regards to pottery production makes it an interesting example to analyse the origin and development of a process of technological change which ended up with the displacement of women from pottery-making and the introduction of the means for mechanised production during the 1980s. Thus, the social and technical transformations which have taken place since colonial times (beginning of the 16th century), for the potters of Pomaire are explained, enlarged on their history in order to contribute to a general reflection.



Sign in / Sign up

Export Citation Format

Share Document