Mining the Web for generating thematic metadata from textual data

This paper is focused on the usage of location services in mobile applications that were developed for the purpose of reporting different events that are based on their location. The event that is intended to be generic and universal can, as in examples used in this paper, be the reporting of some occurrence to a city’s communal affairs office. Such a generic event can include both multimedia and textual data, in addition to location information obtained using mobile device running the app. The software solution that is described in this paper consists of a mobile application that was developed for the Android operating system and a web application that includes a series of PHP scripts that run on a dedicated server. The web application consists of a backend scripts that facilitate the communication of a smart phone and the server and frontend related scripts used by users and administrators to access and check the data and process the reported events.

Download Full-text

Application of Sentiment Analysis in Movie reviews

Advances in Business Information Systems and Analytics - Sentiment Analysis and Knowledge Discovery in Contemporary Business ◽

10.4018/978-1-5225-4999-4.ch006 ◽

2019 ◽

pp. 77-90 ◽

Cited By ~ 5

Author(s):

ThippaReddy Gadekallu ◽

Akshat Soni ◽

Deeptanu Sarkar ◽

Lakshmanna Kuruva

Keyword(s):

Sentiment Analysis ◽

Experimental Work ◽

Opinion Mining ◽

Rating Systems ◽

Domain Specific ◽

Analysis Process ◽

The People ◽

Textual Data ◽

Feature Based ◽

The Web

Sentiment analysis is a sub-domain of opinion mining where the analysis is focused on the extraction of emotions and opinions of the people towards a particular topic from a structured, semi-structured, or unstructured textual data. In this chapter, the authors try to focus the task of sentiment analysis on IMDB movie review database. This chapter presents the experimental work on a new kind of domain-specific feature-based heuristic for aspect-level sentiment analysis of movie reviews. The authors have devised an aspect-oriented scheme that analyzes the textual reviews of a movie and assign it a sentiment label on each aspect. Finally, the authors conclude that incorporating syntactical information in the models is vital to the sentiment analysis process. The authors also conclude that the proposed approach to sentiment classification supplements the existing rating movie rating systems used across the web and will serve as base to future researches in this domain.

Download Full-text

Summarizing Opinions with Sentiment Analysis from Multiple Reviews on Travel Destinations

International Journal of Synthetic Emotions ◽

10.4018/ijse.2018070107 ◽

2018 ◽

Vol 9 (2) ◽

pp. 111-120

Author(s):

Argha Roy ◽

Shyamali Guria ◽

Suman Halder ◽

Sayani Banerjee ◽

Sourav Mandal

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Efficient Method ◽

Text Analysis ◽

Human Life ◽

Analysis Techniques ◽

Classification Technique ◽

Textual Data ◽

The Web

Recently, the web has been crowded with growing volumes of various texts on every aspect of human life. It is difficult to rapidly access, analyze, and compose important decisions using efficient methods for raw textual data in the form of social media, blogs, feedback, reviews, etc., which receive textual inputs directly. It proposes an efficient method for summarization of various reviews of tourists on a specific tourist spot towards analyzing their sentiments towards the place. A classification technique automatically arranges documents into predefined categories and a summarization algorithm produces the exact condensed input such that output is most significant concepts of source documents. Finally, sentiment analysis is done in summarized opinion using NLP and text analysis techniques to show overall sentiment about the spot. Therefore, interested tourists can plan to visit the place do not go through all the reviews, rather they go through summarized documents with the overall sentiment about target place.

Download Full-text

Distributed Technologies for Personalized Advertisement Delivery

Advances in Multimedia and Interactive Technologies - Online Multimedia Advertising ◽

10.4018/978-1-60960-189-8.ch013 ◽

2011 ◽

pp. 233-261 ◽

Cited By ~ 1

Author(s):

Dorothea Tsatsou ◽

Symeon Papadopoulos ◽

Ioannis Kompatsiaris ◽

Paul C. Davis

Keyword(s):

User Study ◽

Semantic Representation ◽

User Preferences ◽

User Profiles ◽

Contextual Data ◽

Uniform Machine ◽

Textual Data ◽

Personalized Advertisement ◽

Distributed Technologies ◽

The Web

This chapter provides an overview on personalized advertisement delivery paradigms on the web with a focus on the recommendation of advertisements expressed in or accompanied by text. Different methods of online targeted advertising will be examined, while justifying the need for channeling the appropriate ads to the corresponding users. The aim of the work presented here is to illustrate how the semantic representation of ads and user preferences can achieve optimal and unobtrusive ad delivery. We propose a set of distributed technologies that efficiently handles the lack of textual data in ads by enriching ontological knowledge with statistical contextual data in order to classify ads and generic content under a uniform, machine-understandable vocabulary. This classification is used to construct lightweight semantic user profiles, matched with semantic ad descriptions via fuzzy semantic reasoning. A real world user study, as well as an evaluative exploration of framework alternatives validate the system’s effectiveness to produce high quality ad recommendations.

Download Full-text

Summarizing Opinions with Sentiment Analysis from Multiple Reviews on Travel Destinations

Destination Management and Marketing ◽

10.4018/978-1-7998-2469-5.ch018 ◽

2020 ◽

pp. 310-320

Author(s):

Argha Roy ◽

Shyamali Guria ◽

Suman Halder ◽

Sayani Banerjee ◽

Sourav Mandal

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Efficient Method ◽

Text Analysis ◽

Human Life ◽

Analysis Techniques ◽

Classification Technique ◽

Textual Data ◽

The Web

Recently, the web has been crowded with growing volumes of various texts on every aspect of human life. It is difficult to rapidly access, analyze, and compose important decisions using efficient methods for raw textual data in the form of social media, blogs, feedback, reviews, etc., which receive textual inputs directly. It proposes an efficient method for summarization of various reviews of tourists on a specific tourist spot towards analyzing their sentiments towards the place. A classification technique automatically arranges documents into predefined categories and a summarization algorithm produces the exact condensed input such that output is most significant concepts of source documents. Finally, sentiment analysis is done in summarized opinion using NLP and text analysis techniques to show overall sentiment about the spot. Therefore, interested tourists can plan to visit the place do not go through all the reviews, rather they go through summarized documents with the overall sentiment about target place.

Download Full-text

A SOFTWARE INFRASTRUCTURE FOR RESEARCH IN TEXTUAL DATA MINING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213004001843 ◽

2004 ◽

Vol 13 (04) ◽

pp. 829-849 ◽

Cited By ~ 7

Author(s):

LARS E. HOLZMAN ◽

TODD A. FISHER ◽

LEON M. GALITSKY ◽

APRIL KONTOSTATHIS ◽

WILLIAM M. POTTENGER

Keyword(s):

Data Mining ◽

Text Mining ◽

Optimal Parameters ◽

Software Infrastructure ◽

Wide Range ◽

Textual Data ◽

Mining Algorithms ◽

Minimal Effort ◽

The Web ◽

Unique Capability

Few tools exist that address the challenges facing researchers in the Textual Data Mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. We have created a Textual Data Mining Infrastructure (TMI) that incorporates both existing and new capabilities in a reusable framework conducive to developing new tools and components. TMI adheres to strict guidelines that allow it to run in a wide range of processing environments – as a result, it accommodates the volume of computing and diversity of research occurring in TDM. A unique capability of TMI is support for optimization. This facilitates text mining research by automating the search for optimal parameters in text mining algorithms. In this article we describe a number of applications that use the TMI. A brief tutorial is provided on the use of TMI. We present several novel results that have not been published elsewhere. We also discuss how the TMI utilizes existing machine-learning libraries, thereby enabling researchers to continue and extend their endeavors with minimal effort. Towards that end, TMI is available on the web at .

Download Full-text

Italian Amateur Pop-Rock Musicians on Facebook

10.1093/oxfordhb/9780190244705.013.8 ◽

2017 ◽

Author(s):

Alberto Trobia ◽

Fabio M. Lo Verde

Keyword(s):

Social Networking ◽

Social Networking Sites ◽

Mixed Methods Approach ◽

Music Production ◽

Music Making ◽

Textual Data ◽

Theory Versus ◽

Rock Musicians ◽

Semantic Dimension ◽

The Web

This chapter investigates how and why amateur musicians use social networking sites, employing a mixed-methods approach. Attention is focused on four big Italian Facebook communities of pop-rock musicians: drums, bass, guitar, and keyboard players (overall, 2,101 active users), analyzing the relational and textual data extracted from the web. The chapter analyzes the network structures emerging from the interactions among the users. It also identifies and maps the main areas of discussion (sound shaping, studio recording, marketplace, musical references, computer production, and relations) and the latent semantic dimension characterizing Facebook users’ activities, through social network analysis and lexical correspondence analysis. Meanings, values, aesthetics, and representations of amateur music making, emerging from the data, are framed within two orthogonal dimensions: theory versus praxis, and competence versus music production. The Italian singularity is then explained with respect to this space. Some theoretical conclusions are finally drawn.

Download Full-text

Process Model for Content Extraction from Weblogs

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2014040102 ◽

2014 ◽

Vol 10 (2) ◽

pp. 20-36

Author(s):

Andreas Schieber ◽

Andreas Hilbert

Keyword(s):

Data Warehouse ◽

Process Model ◽

Process Models ◽

Content Extraction ◽

Web Extraction ◽

Textual Data ◽

The Web ◽

Three Phases

This paper develops and evaluates a BPMN-based process model which identifies and extracts blog content from the web and stores its textual data in a data warehouse for further analyses. Depending on the characteristics of the technologies used to create the weblogs, the process has to perform specific tasks in order to extract blog content correctly. The paper describes three phases: extraction, transformation and loading of data in a repository specifically adapted for blog content extraction. It highlights the objectives in these phases which must be achieved to ensure the correct extraction. The authors integrate the described process in a previously developed framework for blog mining. The authors' process model closes the conceptual gap in this framework as well as the gap in current research of blog mining process models. Furthermore, it can easily be adapted for other web extraction proposals.

Download Full-text

A Text Mining using Web Scraping for Meaningful Insights

Journal of Physics Conference Series ◽

10.1088/1742-6596/2089/1/012048 ◽

2021 ◽

Vol 2089 (1) ◽

pp. 012048

Author(s):

Kishor Kumar Reddy C ◽

P R Anisha ◽

Nhu Gia Nguyen ◽

G Sreelatha

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Learning Technology ◽

Web Scraping ◽

Textual Data ◽

The Web

Abstract This research involves the usage of Machine Learning technology and Natural Language Processing (NLP) along with the Natural Language Tool-Kit (NLTK). This helps develop a logical Text Summarization tool, which uses the Extractive approach to generate an accurate and a fluent summary. The aim of this tool is to efficiently extract a concise and a coherent version, having only the main needed outline points from the long text or the input document avoiding any type of repetitions of the same text or information that has already been mentioned earlier in the text. The text to be summarized can be inherited from the web using the process of web scraping or entering the textual data manually on the platform i.e., the tool. The summarization process can be quite beneficial for the users as these long texts, needs to be shortened to help them to refer to the input quickly and understand points that might be out of their scope to understand.

Download Full-text

An Algorithm for Multi-Domain Website Classification

International Journal of Web-Based Learning and Teaching Technologies ◽

10.4018/ijwltt.2020100104 ◽

2020 ◽

Vol 15 (4) ◽

pp. 57-65

Author(s):

Mohammad Aman Ullah ◽

Anika Tahrin ◽

Sumaiya Marjan

Keyword(s):

Search Engine ◽

Communication System ◽

World Wide ◽

Training Data ◽

Automated Classification ◽

Internet Service ◽

Textual Data ◽

The Cost ◽

Manual Classification ◽

The Web

The web is the largest world-wide communication system of computers. The web has local, academic, commercial and government sites. As the types of websites increases in numbers, the cost and accuracy of manual classification became cumbersome and cannot satisfy the increasing internet service demands, thereby automated classification became important for better and more accurate search engine results. Therefore, this research has proposed an algorithm for classifying different websites automatically by using randomly collected textual data from the webpages. This research also contributed ten dictionaries covering different domains and used as training data in the classification process. Finally, the classification was carried out using the proposed and Naïve Bayes algorithms and found the proposed algorithm outperformed on the scale of accuracy by 1.25%. This research suggests that the proposed algorithm could be applied to any number of domains if the related dictionaries are available.

Download Full-text