Making Two Vast Historical Manuscript Collections Searchable and Extracting Meaningful Textual Features Through Large-Scale Probabilistic Indexing

Author(s):  
Alejandro Hector Toselli ◽  
Veronica Romero ◽  
Joan Andreu Sanchez ◽  
Enrique Vidal
2016 ◽  
Vol 58 (2) ◽  
Author(s):  
Lambert Schomaker

AbstractThis article gives an overview of design considerations for a handwriting search engine based on pattern recognition and high-performance computing, “Monk”. In order to satisfy multiple and often conflicting technological requirements, an architecture is used which heavily relies on high-performance computing, interactivity, and a Posix file-access model for the scientific programmers. The resulting system is able to handle billions of image files, in the order of petabytes of storage capacity, with a single mount point. Monk is operational since the year 2009.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Fengcai Qiao ◽  
Cheng Wang ◽  
Xin Zhang ◽  
Hui Wang

Near-duplicate image retrieval is a classical research problem in computer vision toward many applications such as image annotation and content-based image retrieval. On the web, near-duplication is more prevalent in queries for celebrities and historical figures which are of particular interest to the end users. Existing methods such as bag-of-visual-words (BoVW) solve this problem mainly by exploiting purely visual features. To overcome this limitation, this paper proposes a novel text-based data-driven reranking framework, which utilizes textual features and is combined with state-of-art BoVW schemes. Under this framework, the input of the retrieval procedure is still only a query image. To verify the proposed approach, a dataset of 2 million images of 1089 different celebrities together with their accompanying texts is constructed. In addition, we comprehensively analyze the different categories of near duplication observed in our constructed dataset. Experimental results on this dataset show that the proposed framework can achieve higher mean average precision (mAP) with an improvement of 21% on average in comparison with the approaches based only on visual features, while does not notably prolong the retrieval time.


2010 ◽  
Vol 11 (1) ◽  
pp. 11-22 ◽  
Author(s):  
Oya Y. Rieger

As we explore the evolving information landscape and institutional context of rare and manuscript collections, one of the critical matters is to consider the implications of large-scale digitization initiatives (LSDIs) for our programs. Although most LSDI efforts thus far have focused on general collections, it is inevitable that the attention will soon be turned to special collections. With the current networked information environment and increasing reliance on digital content subscriptions, rare and manuscript collections increasingly define the uniqueness and character of individual research libraries. The goal of this article is to characterize current LSDIs and discuss the potential implications for . . .


Author(s):  
Alejandro H. Toselli ◽  
Verónica Romero ◽  
Enrique Vidal ◽  
Joan Andreu Sánchez ◽  
Louise Seaward ◽  
...  

2018 ◽  
Vol 10 (2) ◽  
pp. 158-182
Author(s):  
Zalán Bodó ◽  
Eszter Szilágyi

Abstract Music information retrieval has lately become an important field of information retrieval, because by profound analysis of music pieces important information can be collected: genre labels, mood prediction, artist identification, just to name a few. The lack of large-scale music datasets containing audio features and metadata has lead to the construction and publication of the Million Song Dataset (MSD) and its satellite datasets. Nonetheless, mainly because of licensing limitations, no freely available lyrics datasets have been published for research. In this paper we describe the construction of an English lyrics dataset based on the Last.fm Dataset, connected to LyricWiki’s database and MusicBrainz’s encyclopedia. To avoid copyright issues, only the URLs to the lyrics are stored in the database. In order to demonstrate the eligibility of the compiled dataset, in the second part of the paper we present genre classification experiments with lyrics-based features, including bagof-n-grams, as well as higher-level features such as rhyme-based and statistical text features. We obtained results similar to the experimental outcomes presented in other works, showing that more sophisticated textual features can improve genre classification performance, and indicating the superiority of the binary weighting scheme compared to tf–idf.


1999 ◽  
Vol 173 ◽  
pp. 243-248
Author(s):  
D. Kubáček ◽  
A. Galád ◽  
A. Pravda

AbstractUnusual short-period comet 29P/Schwassmann-Wachmann 1 inspired many observers to explain its unpredictable outbursts. In this paper large scale structures and features from the inner part of the coma in time periods around outbursts are studied. CCD images were taken at Whipple Observatory, Mt. Hopkins, in 1989 and at Astronomical Observatory, Modra, from 1995 to 1998. Photographic plates of the comet were taken at Harvard College Observatory, Oak Ridge, from 1974 to 1982. The latter were digitized at first to apply the same techniques of image processing for optimizing the visibility of features in the coma during outbursts. Outbursts and coma structures show various shapes.


1994 ◽  
Vol 144 ◽  
pp. 29-33
Author(s):  
P. Ambrož

AbstractThe large-scale coronal structures observed during the sporadically visible solar eclipses were compared with the numerically extrapolated field-line structures of coronal magnetic field. A characteristic relationship between the observed structures of coronal plasma and the magnetic field line configurations was determined. The long-term evolution of large scale coronal structures inferred from photospheric magnetic observations in the course of 11- and 22-year solar cycles is described.Some known parameters, such as the source surface radius, or coronal rotation rate are discussed and actually interpreted. A relation between the large-scale photospheric magnetic field evolution and the coronal structure rearrangement is demonstrated.


2000 ◽  
Vol 179 ◽  
pp. 205-208
Author(s):  
Pavel Ambrož ◽  
Alfred Schroll

AbstractPrecise measurements of heliographic position of solar filaments were used for determination of the proper motion of solar filaments on the time-scale of days. The filaments have a tendency to make a shaking or waving of the external structure and to make a general movement of whole filament body, coinciding with the transport of the magnetic flux in the photosphere. The velocity scatter of individual measured points is about one order higher than the accuracy of measurements.


Sign in / Sign up

Export Citation Format

Share Document