page segmentation
Recently Published Documents


TOTAL DOCUMENTS

180
(FIVE YEARS 15)

H-INDEX

22
(FIVE YEARS 2)

Nowadays the usage of mobile phones is widely spread in our lifestyle; we use cell phones as a camera, a radio, a music player, and even as a web browser. Since most web pages are created for desktop computers, navigating through web pages is highly fatigued. Hence, there is a great interest in computer science to adopt such pages with rich content into small screens of our mobile devices. On the other hand, every web page has got many different parts that do not have the equal importance to the end user. Consequently, the authors propose a mechanism to identify the most useful part of a web page to a user regarding his or her search query while the information loss is avoided. The challenge here comes from the fact that long web contents cannot be easily displayed in both vertical and horizontal ways.


Author(s):  
Johannes Kiesel ◽  
Lars Meyer ◽  
Florian Kneist ◽  
Benno Stein ◽  
Martin Potthast

2020 ◽  
Vol 9 (6) ◽  
pp. 2492-2498
Author(s):  
Wildan Budiawan Zulfikar ◽  
Mohamad Irfan ◽  
Muhammad Ghufron ◽  
Jumadi Jumadi ◽  
Esa Firmansyah

One success factor of an online affiliate is determined by the quality of the content source. Therefore, affiliate marketplaces need to do an objective assessment to retrieve content data that will be used to choose the right product in the appropriate product filter. Usually, the selection is not made using a good and measured system so that the selection of product content is only based on parts that are not in accordance with what is seen or subjective. However, if analyzed using a good and measurable system will produce an objective product content and can have a positive impact on users because the selection is based on factual data. The purpose of this research is to analyze the potential of the affiliate marketplace by combining cosine similarity with vision-based page segmentation. This is a new breakthrough made for optimization to get the best content in accordance with the required criteria. This work will produce a number of product recommendations that are appropriate for publication and then made use of for comparison that matches the required criteria. At the limited evaluation stage, the performance of the proposed model obtained satisfactory results, in which 5 queries tested were all as expected. 


Author(s):  
Johannes Kiesel ◽  
Florian Kneist ◽  
Lars Meyer ◽  
Kristof Komlossy ◽  
Benno Stein ◽  
...  
Keyword(s):  
Web Page ◽  

2020 ◽  
Vol 10 (16) ◽  
pp. 5430 ◽  
Author(s):  
Yekta Said Can ◽  
M. Erdem Kabadayı

Historical manuscripts and archival documentation are handwritten texts which are the backbone sources for historical inquiry. Recent developments in the digital humanities field and the need for extracting information from the historical documents have fastened the digitization processes. Cutting edge machine learning methods are applied to extract meaning from these documents. Page segmentation (layout analysis), keyword, number and symbol spotting, handwritten text recognition algorithms are tested on historical documents. For most of the languages, these techniques are widely studied and high performance techniques are developed. However, the properties of Arabic scripts (i.e., diacritics, varying script styles, diacritics, and ligatures) create additional problems for these algorithms and, therefore, the number of research is limited. In this research, we first automatically spotted the Arabic numerals from the very first series of population registers of the Ottoman Empire conducted in the mid-nineteenth century and recognized these numbers. They are important because they held information about the number of households, registered individuals and ages of individuals. We applied a red color filter to separate numerals from the document by taking advantage of the structure of the studied registers (numerals are written in red). We first used a CNN-based segmentation method for spotting these numerals. In the second part, we annotated a local Arabic handwritten digit dataset from the spotted numerals by selecting uni-digit ones and tested the Deep Transfer Learning method from large open Arabic handwritten digit datasets for digit recognition. We achieved promising results for recognizing digits in these historical documents.


2020 ◽  
Vol 6 (5) ◽  
pp. 32 ◽  
Author(s):  
Yekta Said Can ◽  
M. Erdem Kabadayı

Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places.


Sign in / Sign up

Export Citation Format

Share Document