Mining the Web for generating thematic metadata from textual data

Author(s):  
Chien-Chung Huang ◽  
Shui-Lung Chuang ◽  
Lee-Feng Chien
Keyword(s):  
2021 ◽  
Vol 15 (3) ◽  
pp. 310-317
Author(s):  
Kristijan Lukaček ◽  
Matija Mikac ◽  
Miroslav Horvatić

This paper is focused on the usage of location services in mobile applications that were developed for the purpose of reporting different events that are based on their location. The event that is intended to be generic and universal can, as in examples used in this paper, be the reporting of some occurrence to a city’s communal affairs office. Such a generic event can include both multimedia and textual data, in addition to location information obtained using mobile device running the app. The software solution that is described in this paper consists of a mobile application that was developed for the Android operating system and a web application that includes a series of PHP scripts that run on a dedicated server. The web application consists of a backend scripts that facilitate the communication of a smart phone and the server and frontend related scripts used by users and administrators to access and check the data and process the reported events.


Author(s):  
ThippaReddy Gadekallu ◽  
Akshat Soni ◽  
Deeptanu Sarkar ◽  
Lakshmanna Kuruva

Sentiment analysis is a sub-domain of opinion mining where the analysis is focused on the extraction of emotions and opinions of the people towards a particular topic from a structured, semi-structured, or unstructured textual data. In this chapter, the authors try to focus the task of sentiment analysis on IMDB movie review database. This chapter presents the experimental work on a new kind of domain-specific feature-based heuristic for aspect-level sentiment analysis of movie reviews. The authors have devised an aspect-oriented scheme that analyzes the textual reviews of a movie and assign it a sentiment label on each aspect. Finally, the authors conclude that incorporating syntactical information in the models is vital to the sentiment analysis process. The authors also conclude that the proposed approach to sentiment classification supplements the existing rating movie rating systems used across the web and will serve as base to future researches in this domain.


2018 ◽  
Vol 9 (2) ◽  
pp. 111-120
Author(s):  
Argha Roy ◽  
Shyamali Guria ◽  
Suman Halder ◽  
Sayani Banerjee ◽  
Sourav Mandal

Recently, the web has been crowded with growing volumes of various texts on every aspect of human life. It is difficult to rapidly access, analyze, and compose important decisions using efficient methods for raw textual data in the form of social media, blogs, feedback, reviews, etc., which receive textual inputs directly. It proposes an efficient method for summarization of various reviews of tourists on a specific tourist spot towards analyzing their sentiments towards the place. A classification technique automatically arranges documents into predefined categories and a summarization algorithm produces the exact condensed input such that output is most significant concepts of source documents. Finally, sentiment analysis is done in summarized opinion using NLP and text analysis techniques to show overall sentiment about the spot. Therefore, interested tourists can plan to visit the place do not go through all the reviews, rather they go through summarized documents with the overall sentiment about target place.


Author(s):  
Dorothea Tsatsou ◽  
Symeon Papadopoulos ◽  
Ioannis Kompatsiaris ◽  
Paul C. Davis

This chapter provides an overview on personalized advertisement delivery paradigms on the web with a focus on the recommendation of advertisements expressed in or accompanied by text. Different methods of online targeted advertising will be examined, while justifying the need for channeling the appropriate ads to the corresponding users. The aim of the work presented here is to illustrate how the semantic representation of ads and user preferences can achieve optimal and unobtrusive ad delivery. We propose a set of distributed technologies that efficiently handles the lack of textual data in ads by enriching ontological knowledge with statistical contextual data in order to classify ads and generic content under a uniform, machine-understandable vocabulary. This classification is used to construct lightweight semantic user profiles, matched with semantic ad descriptions via fuzzy semantic reasoning. A real world user study, as well as an evaluative exploration of framework alternatives validate the system’s effectiveness to produce high quality ad recommendations.


Author(s):  
Argha Roy ◽  
Shyamali Guria ◽  
Suman Halder ◽  
Sayani Banerjee ◽  
Sourav Mandal

Recently, the web has been crowded with growing volumes of various texts on every aspect of human life. It is difficult to rapidly access, analyze, and compose important decisions using efficient methods for raw textual data in the form of social media, blogs, feedback, reviews, etc., which receive textual inputs directly. It proposes an efficient method for summarization of various reviews of tourists on a specific tourist spot towards analyzing their sentiments towards the place. A classification technique automatically arranges documents into predefined categories and a summarization algorithm produces the exact condensed input such that output is most significant concepts of source documents. Finally, sentiment analysis is done in summarized opinion using NLP and text analysis techniques to show overall sentiment about the spot. Therefore, interested tourists can plan to visit the place do not go through all the reviews, rather they go through summarized documents with the overall sentiment about target place.


2004 ◽  
Vol 13 (04) ◽  
pp. 829-849 ◽  
Author(s):  
LARS E. HOLZMAN ◽  
TODD A. FISHER ◽  
LEON M. GALITSKY ◽  
APRIL KONTOSTATHIS ◽  
WILLIAM M. POTTENGER

Few tools exist that address the challenges facing researchers in the Textual Data Mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. We have created a Textual Data Mining Infrastructure (TMI) that incorporates both existing and new capabilities in a reusable framework conducive to developing new tools and components. TMI adheres to strict guidelines that allow it to run in a wide range of processing environments – as a result, it accommodates the volume of computing and diversity of research occurring in TDM. A unique capability of TMI is support for optimization. This facilitates text mining research by automating the search for optimal parameters in text mining algorithms. In this article we describe a number of applications that use the TMI. A brief tutorial is provided on the use of TMI. We present several novel results that have not been published elsewhere. We also discuss how the TMI utilizes existing machine-learning libraries, thereby enabling researchers to continue and extend their endeavors with minimal effort. Towards that end, TMI is available on the web at .


Author(s):  
Alberto Trobia ◽  
Fabio M. Lo Verde

This chapter investigates how and why amateur musicians use social networking sites, employing a mixed-methods approach. Attention is focused on four big Italian Facebook communities of pop-rock musicians: drums, bass, guitar, and keyboard players (overall, 2,101 active users), analyzing the relational and textual data extracted from the web. The chapter analyzes the network structures emerging from the interactions among the users. It also identifies and maps the main areas of discussion (sound shaping, studio recording, marketplace, musical references, computer production, and relations) and the latent semantic dimension characterizing Facebook users’ activities, through social network analysis and lexical correspondence analysis. Meanings, values, aesthetics, and representations of amateur music making, emerging from the data, are framed within two orthogonal dimensions: theory versus praxis, and competence versus music production. The Italian singularity is then explained with respect to this space. Some theoretical conclusions are finally drawn.


2014 ◽  
Vol 10 (2) ◽  
pp. 20-36
Author(s):  
Andreas Schieber ◽  
Andreas Hilbert

This paper develops and evaluates a BPMN-based process model which identifies and extracts blog content from the web and stores its textual data in a data warehouse for further analyses. Depending on the characteristics of the technologies used to create the weblogs, the process has to perform specific tasks in order to extract blog content correctly. The paper describes three phases: extraction, transformation and loading of data in a repository specifically adapted for blog content extraction. It highlights the objectives in these phases which must be achieved to ensure the correct extraction. The authors integrate the described process in a previously developed framework for blog mining. The authors' process model closes the conceptual gap in this framework as well as the gap in current research of blog mining process models. Furthermore, it can easily be adapted for other web extraction proposals.


2021 ◽  
Vol 2089 (1) ◽  
pp. 012048
Author(s):  
Kishor Kumar Reddy C ◽  
P R Anisha ◽  
Nhu Gia Nguyen ◽  
G Sreelatha

Abstract This research involves the usage of Machine Learning technology and Natural Language Processing (NLP) along with the Natural Language Tool-Kit (NLTK). This helps develop a logical Text Summarization tool, which uses the Extractive approach to generate an accurate and a fluent summary. The aim of this tool is to efficiently extract a concise and a coherent version, having only the main needed outline points from the long text or the input document avoiding any type of repetitions of the same text or information that has already been mentioned earlier in the text. The text to be summarized can be inherited from the web using the process of web scraping or entering the textual data manually on the platform i.e., the tool. The summarization process can be quite beneficial for the users as these long texts, needs to be shortened to help them to refer to the input quickly and understand points that might be out of their scope to understand.


Author(s):  
Mohammad Aman Ullah ◽  
Anika Tahrin ◽  
Sumaiya Marjan

The web is the largest world-wide communication system of computers. The web has local, academic, commercial and government sites. As the types of websites increases in numbers, the cost and accuracy of manual classification became cumbersome and cannot satisfy the increasing internet service demands, thereby automated classification became important for better and more accurate search engine results. Therefore, this research has proposed an algorithm for classifying different websites automatically by using randomly collected textual data from the webpages. This research also contributed ten dictionaries covering different domains and used as training data in the classification process. Finally, the classification was carried out using the proposed and Naïve Bayes algorithms and found the proposed algorithm outperformed on the scale of accuracy by 1.25%. This research suggests that the proposed algorithm could be applied to any number of domains if the related dictionaries are available.


Sign in / Sign up

Export Citation Format

Share Document