TRAVELOGUE ENRICHING AND SCENIC SPOT OVERVIEW BASED ON TEXTUAL AND VISUAL TOPIC MODELS

Author(s):  
YANWEI PANG ◽  
XIN LU ◽  
YUAN YUAN ◽  
XUELONG LI

We consider the problem of enriching the travelogue associated with a small number (even one) of images with more web images. Images associated with the travelogue always consist of the content and the style of textual information. Relying on this assumption, in this paper, we present a framework of travelogue enriching, exploiting both textual and visual information generated by different users. The framework aims to select the most relevant images from automatically collected candidate image set to enrich the given travelogue, and form a comprehensive overview of the scenic spot. To do these, we propose to build two-layer probabilistic models, i.e. a text-layer model and image-layer models, on offline collected travelogues and images. Each topic (e.g. Sea, Mountain, Historical Sites) in the text-layer model is followed by an image-layer model with sub-topics learnt (e.g. the topic of sea is with the sub-topic like beach, tree, sunrise and sunset). Based on the model, we develop strategies to enrich travelogues in the following steps: (1) remove noisy names of scenic spots from travelogues; (2) generate queries to automatically gather candidate image set; (3) select images to enrich the travelogue; and (4) choose images to portray the visual content of a scenic spot. Experimental results on Chinese travelogues demonstrate the potential of the proposed approach on tasks of travelogue enrichment and the corresponding scenic spot illustration.

2020 ◽  
Vol 37 (4) ◽  
pp. 619-626
Author(s):  
Shizhen Bai ◽  
Fuli Han

The monitoring of tourist behaviors, coupled with the recognition of scenic spots, greatly improves the quality and safety of travel. The visual information is the underlying features of scenic spot images, but the semantics of the information have not been satisfactorily classified or described. Based on image processing technologies, this paper presents a novel method for scenic spot retrieval and tourist behavior recognition. Firstly, the framework of scenic spot image retrieval was constructed, followed by a detailed introduction to the extraction of scale invariant feature transform (SIFT) features. The SIFT feature extraction includes five steps: scale space construction, local space extreme point detection, precise positioning of key points, determination of key point size and direction, and generation of SIFT descriptor. Next, multiple correlated images were mined for the target scenic spot image, and the feature matching method between the target image and the set of scenic spot images was introduced in details. On this basis, a tourist behavior recognition method was designed based on temporal and spatial consistency. The proposed method was proved effective through experiments. The research results provide theoretical reference for image retrieval and behavior recognition in many other fields.


2017 ◽  
Vol 16 (1) ◽  
pp. 7515-7523
Author(s):  
Meenu Meenu ◽  
Sonika Jindal

Content Based Image Retrieval (CBIR) techniques are becoming an essential requirement in the multimedia systems with the widespread use of internet, declining cost of storage devices and the exponential growth of un-annotated digital image information available in recent years.  Therefore multi query systems have been used rather than a single query in order to bridge the semantic gaps and in order to understand user’s requirements. Moreover, query replacement algorithm has been used in the previous works in which user provides multiple images to the query image set referred as representative images. Feature vectors are extracted for each image in the representative image set and every image in the database. The centroid, Crep of the representative images is obtained by computing the mean of their feature vectors. Then every image in the representative image set is replaced with the same candidate image in the dataset one by one and new centroids are calculated for every replacement .The distance between each of the centroids resulting from the replacement and the representative image centroid Crep is calculated using Euclidean distance. The cumulative sum of these distances determines the similarity of the candidate image with the representative image set and is used for ranking the images. The smaller the distance, the similar will be the image with the representative image set. But it has some research gaps like it takes a lot of time to extract feature of each and every image from the database and compare our image with the database images and complexity as well as cost increases. So in our proposed work, the KNN algorithm is applied for classification of images in the database image set using the query images and the candidate images are reduced to images returned after classification mechanism which leads to decrease the execution time and reduce the number of iterations. Hence due to hybrid model of multi query and KNN, the effectiveness of image retrieval in CBIR system increases. The language used in this work is C /C++ with Open CV libraries and IDE is Visual studio 2015. The experimental results show that our method is more effective to improve the performance of the retrieval of images.


2019 ◽  
Vol 64 (3) ◽  
pp. 285-295
Author(s):  
T. Rajalakshmi ◽  
Shanthi Prince

Abstract The physiological modeling of retinal layers to provide an insight into how the incoming image is converted into its equivalent spike train that can be decoded by the human brain is a key issue. Most of the retinal layer models concentrate mainly on image compression, edge detection and image reconstruction. A retinal layer model to generate spike waveform corresponding to the visual information is not covered much in the literature. The aim of this study was to develop a mathematical model of retinal layers that has complex neural structures, that can detect the incoming signal and transform the signal into the equivalent spike train. The proposed retinal layer model includes a photoreceptor, an outer plexiform (OPL), an inner plexiform (IPL) and ganglion cell layers exhibiting the properties of compression, luminance and spatial temporal filtering in the processing of visual information. The photoreceptor layer enhances the contrast visibility in the dark region and maintains the same in the bright regions. The OPL is modeled to enhance the contour of the image. The finer detail of the image is extracted by mathematically modeling the IPL. The ganglion cell layer is modeled using the Hodgkin-Huxley model to generate the spike train for the incoming information. The spike train was generated for color deficient individuals namely protanopia, deuteranopia, tritanopia and for individuals suffering from night blindness. Simulation results showed a spike train was generated only for a certain threshold stimulus value. The differences in spike pattern for a normal and visually impaired individual were studied. This may lead to a methodology for earlier diagnosis.


Methodology ◽  
2011 ◽  
Vol 7 (2) ◽  
pp. 63-67 ◽  
Author(s):  
Ali Ünlü

Schrepp (2005) points out and builds upon the connection between knowledge space theory (KST) and latent class analysis (LCA) to propose a method for constructing knowledge structures from data. Candidate knowledge structures are generated, they are considered as restricted latent class models and fitted to the data, and the BIC is used to choose among them. This article adds additional information about the relationship between KST and LCA. It gives a more comprehensive overview of the literature and the probabilistic models that are at the interface of KST and LCA. KST and LCA are also compared with regard to parameter estimation and model testing methodologies applied in their fields. This article concludes with an overview of KST-related publications addressing the outlined connection and presents further remarks about possible future research arising from a connection of KST to other latent variable modeling approaches.


2012 ◽  
Vol 3 (3) ◽  
pp. 26-41
Author(s):  
Tessai Hayama ◽  
Susumu Kunifuji

Including visual information in slides is one of the important factors for enabling the audience to easily understand the content of a slide presentation. However, it’s difficult for an inexperienced preparer to improve the presentation of slides without adequate experience and awareness of the requirements of a good slide. The authors developed Presentation Gadgets, which consists of modules that can be used to support the creation of slides that represent information in an audience friendly manner. Presentation Gadgets provide information on web-based slides, web pages, web news, and web images, thus supporting replacement of content on slides with visual representations and illustrations of key points, and insertion of common topics and it helps identifying slides that need improvement. To construct the previously mentioned modules, the authors developed a novel slide search method and slide ranking method and to allow easy access to information, Presentation Gadgets has a user interface that automatically creates queries from the selected slide text or speech text that is recognized by a speech recognition engine. By using Presentation Gadgets, an inexperienced slide preparer can effectively create slides that are easily understandable because he/she can interactively refer to various information resources relating to the slide content based on the situation.


2014 ◽  
Vol 926-930 ◽  
pp. 3350-3353
Author(s):  
Xin Ning ◽  
Wei Jun Li ◽  
Wen Jie Liu

This paper proposes a text localization method with multi-features based on cascade classifier for a variety of web images. Specifically, first, the original image is divided into sub-images with different scales, which form more satisfactory edge image blocks after being pretreated respectively; then, the researchers determine in the classifier whether the text area is contained in the candidate image blocks according to the edge connectivity characteristics, stroke density characteristics and text arrangement characteristics of text area; finally, the location results of sub-images with different scales are mixed together to obtain the final result. The experiments show that this location method has the relatively high precision and recall rate and quite strong robustness, which is suitable for a variety of web images.


2021 ◽  
Author(s):  
Yuanhong Ma ◽  
Shao-Jie Lou ◽  
Zhaomin Hou

This review article provides a comprehensive overview to recognise the current status of electron-deficient boron-based catalysis in C–H functionalisations.


Crisis ◽  
2011 ◽  
Vol 32 (4) ◽  
pp. 178-185 ◽  
Author(s):  
Maurizio Pompili ◽  
Marco Innamorati ◽  
Monica Vichi ◽  
Maria Masocco ◽  
Nicola Vanacore ◽  
...  

Background: Suicide is a major cause of premature death in Italy and occurs at different rates in the various regions. Aims: The aim of the present study was to provide a comprehensive overview of suicide in the Italian population aged 15 years and older for the years 1980–2006. Methods: Mortality data were extracted from the Italian Mortality Database. Results: Mortality rates for suicide in Italy reached a peak in 1985 and declined thereafter. The different patterns observed by age and sex indicated that the decrease in the suicide rate in Italy was initially the result of declining rates in those aged 45+ while, from 1997 on, the decrease was attributable principally to a reduction in suicide rates among the younger age groups. It was found that socioeconomic factors underlined major differences in the suicide rate across regions. Conclusions: The present study confirmed that suicide is a multifaceted phenomenon that may be determined by an array of factors. Suicide prevention should, therefore, be targeted to identifiable high-risk sociocultural groups in each country.


2009 ◽  
Vol 23 (2) ◽  
pp. 63-76 ◽  
Author(s):  
Silke Paulmann ◽  
Sarah Jessen ◽  
Sonja A. Kotz

The multimodal nature of human communication has been well established. Yet few empirical studies have systematically examined the widely held belief that this form of perception is facilitated in comparison to unimodal or bimodal perception. In the current experiment we first explored the processing of unimodally presented facial expressions. Furthermore, auditory (prosodic and/or lexical-semantic) information was presented together with the visual information to investigate the processing of bimodal (facial and prosodic cues) and multimodal (facial, lexic, and prosodic cues) human communication. Participants engaged in an identity identification task, while event-related potentials (ERPs) were being recorded to examine early processing mechanisms as reflected in the P200 and N300 component. While the former component has repeatedly been linked to physical property stimulus processing, the latter has been linked to more evaluative “meaning-related” processing. A direct relationship between P200 and N300 amplitude and the number of information channels present was found. The multimodal-channel condition elicited the smallest amplitude in the P200 and N300 components, followed by an increased amplitude in each component for the bimodal-channel condition. The largest amplitude was observed for the unimodal condition. These data suggest that multimodal information induces clear facilitation in comparison to unimodal or bimodal information. The advantage of multimodal perception as reflected in the P200 and N300 components may thus reflect one of the mechanisms allowing for fast and accurate information processing in human communication.


Sign in / Sign up

Export Citation Format

Share Document