Information Retrieval Methods for Multidisciplinary Applications

MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch015 ◽

2013 ◽

pp. 250-265

Author(s):

K.G. Srinivasa ◽

Anil Kumar Muppalla ◽

Varun A. Bharghava ◽

M. Amulya

Keyword(s):

Search Engines ◽

World Wide ◽

Speed Of Convergence ◽

User Choice ◽

Link Structure ◽

Ranking Algorithms ◽

The World ◽

Retrieval Algorithms ◽

Web Graph ◽

The Web

In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes use of the link structure of the Web and is developed using MapReduce framework to improve the speed of convergence of ranking the WebPages. Categorization is used to retrieve and order the results according to the user choice to personalize the search. A new score is introduced in this paper that is associated with each WebPage and is calculated using user’s query and number of occurrences of the terms in the query in the document corpus. The experiments are conducted on Web graph datasets and the results are compared with the serial versions of crawler, indexer and ranking algorithms.

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Model of E-Reading Process for E-School Book in Libya

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch012 ◽

2013 ◽

pp. 188-206

Author(s):

Azza Abubaker ◽

Joan Lu

Keyword(s):

Cognitive Processes ◽

Reading Strategies ◽

Primary Schools ◽

Reading Process ◽

The Internet ◽

Electronic Books ◽

School Book ◽

Actual Reading ◽

Significant Difference ◽

School Books

Defining the stages which the reader follows when reading e-resources is one of several factors which can provide significant insights into actual reading behaviours and cognitive processes of readers. Two different samples of students who study in Libyan primary schools, aged 9 to 12, were selected to investigate how students use and interact with both print and digital school books, identify the e-reading process, outline the aims of using the internet and technology, and define what students like and dislike in both versions. Furthermore, students found using the e-textbook to be more difficult than paper book and a significant difference is found in the reading process between paper books and electronic books. In addition, two reading strategies were used to read school book in both versions (electronic and paper): (1) view the text then answer the questions, or (2) view the questions than search for the correct answers.

XRecursive

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch017 ◽

2013 ◽

pp. 281-292

Author(s):

Mohammed Adam Ibrahim Fakharaldien ◽

Jasni Mohamed Zain ◽

Norrozila Sulaiman ◽

Tutut Herawan

Keyword(s):

General Solution ◽

Relational Database ◽

Relational Databases ◽

Xml Schema ◽

Xml Data ◽

Xml Documents ◽

Extra Effort ◽

Extensible Markup ◽

Promising Solution ◽

Reconstruction Time

Storing XML documents in a relational database is a promising solution because relational databases are mature and scale very well. They have the advantages that in a relational database XML data and structured data can coexist making it possible to build application that involve both kinds of data with little extra effort. This paper proposes an alternative method named Xrecursive for mapping XML (eXtensible Markup Language) documents to RDB (Relational Databases). The Xrecursive method does not need a DTD (Document Text Definition) or XML schema. Further, it can be applied as a general solution for any XML data. The steps and algorithm of Xrecursive are given in details to describe how to use the storing structure to storage and query XML documents in relational database. The authors report their experimental results on a real database, showing that the performance of their Xrecursive algorithm achieves better results in terms of storage size, insertion time, mapping time, and reconstruction time as compared with that SUCXENT and XParent methods. In overall, Xrecursive performs better in term of query performances as compared to the both methods.

The Effect of Stemming on Arabic Text Classification

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch013 ◽

2013 ◽

pp. 207-225 ◽

Cited By ~ 3

Author(s):

Abdullah Wahbeh ◽

Mohammed Al-Kabi ◽

Qasem Al-Radaideh ◽

Emad Al-Shawakfa ◽

Izzat Alsmadi

Keyword(s):

Text Classification ◽

Digital Libraries ◽

Arabic Language ◽

Support Vector ◽

Svm Classifier ◽

Arabic Text ◽

Text Documents ◽

Information Retrieval Systems ◽

Arabic Text Classification ◽

The Web

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.

Virtual Community of Practice Ontocop

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch009 ◽

2013 ◽

pp. 132-155

Author(s):

Ahlam Sawsaa ◽

Zhongyu (Joan) Lu

Keyword(s):

Information Science ◽

Virtual Community ◽

Library Science ◽

Semantic Heterogeneity ◽

Web Ontology Language ◽

Clear Definition ◽

Virtual Community Of Practice ◽

Ontology Language ◽

Computing Science ◽

Knowledge Levels

Information Science (IS) is an ambiguous field as its boundaries overlap with other domains such as Archive Science, Library Science and Computing Science which requires defined clear definition. This study creates a systematic and comprehensive ontology targeted to explore IS boundaries and foundations. This paper uses Mereotopolgy theory to describe classes, instances and their relations. The classes are created based on taxonomy of IS to create an asserted model of Information Science Ontology (ISO) that can be as a skeletal foundation for knowledge base. The main classes are Actors, Method, Practice, Studies, Mediate, Kinds, Domains, Resources, Legislation, Philosophy & Theories, Societal, Time, and Space. The design is based on Methontology to create ISO from scratch. Its framework facilitates the construction of ontology at the knowledge levels. It is found that identifying the IS boundaries through implementation ontology workflow is encoded using Protégé and Web Ontology Language (OWL) for formalizing and representation of the ISO. ISO is an effective way to represent knowledge and overcome semantic heterogeneity, ISO is a fundamental integration between semantic that realizes the interoperability information of the domain.

On the Design and Implementation of Interactive XML Applications

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch002 ◽

2013 ◽

pp. 19-30

Author(s):

Jeff Brown ◽

Rebecca Brown ◽

Chris Velado ◽

Ron Vetter

Keyword(s):

Web Applications ◽

Information Service ◽

Short Message ◽

State Information ◽

Client Server ◽

Session Management ◽

Design And Implementation ◽

Xml Document ◽

Message Service ◽

Extensible Markup

This paper describes issues and challenges in the design and implementation of interactive client-server applications where program logic is expressed in terms of an extensible markup language (XML) document. Although the technique was originally developed for creating interactive short message service (SMS) applications, it has expanded and is used for developing interactive web applications. XML-Interactive (or XML-I) defines the program states and corresponding actions. Because many interactive applications require sustained communication between the client and the underlying information service, XML-I has support for session management. This allows state information to be managed in a dynamic way. The paper describes several applications that are implemented using XML-I and discusses design issues. The software framework has been implemented in a Java environment.

SAR

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch016 ◽

2013 ◽

pp. 266-280

Author(s):

Rabiei Mamat ◽

Tutut Herawan ◽

Mustafa Mat Deris

Keyword(s):

Decision Making ◽

Information System ◽

Set Theory ◽

Rough Set ◽

Soft Set ◽

Decision Makers ◽

Benchmark Datasets

Soft-set theory proposed by Molodstov is a general mathematic tool for dealing with uncertainty. Recently, several algorithms have been proposed for decision making using soft-set theory. However, these algorithms still concern on Boolean-valued information system. In this paper, Support Attribute Representative (SAR), a soft-set based technique for decision making in categorical-valued information system is proposed. The proposed technique has been tested on three datasets to select the best partitioning attribute. Furthermore, two UCI benchmark datasets are used to elaborate the performance of the proposed technique in term of executing time. On these two datasets, it is shown that SAR outperforms three rough set-based techniques TR, MMR, and MDA up to 95% and 50%, respectively. The results of this research will provide useful information for decision makers to handle categorical datasets.

Improved Parameterless K-Means

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch010 ◽

2013 ◽

pp. 156-168

Author(s):

Wan Maseri Binti Wan Mohd ◽

A.H. Beg ◽

Tutut Herawan ◽

A. Noraziah ◽

K. F. Rabbi

Keyword(s):

Unsupervised Learning ◽

Globular Clusters ◽

Clustering Algorithm ◽

Experimental Results ◽

Maximum Distance ◽

Initial Number ◽

Number Of Clusters ◽

New Approach ◽

Data Points ◽

Run Time

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.

Mining Product Reviews in Web Forums

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch006 ◽

2013 ◽

pp. 78-94 ◽

Cited By ~ 1

Author(s):

S. Hariharan ◽

T. Ramkumar

Keyword(s):

Social Networking ◽

Opinion Mining ◽

General Information ◽

Social Medium ◽

Product Reviews ◽

Accurate Representation ◽

Web Forums ◽

Very High ◽

The Web ◽

Novel Algorithm

Internet has brought a major drift in user community. Apart from its well-known usage, it also promotes social networking. Research on such social networking has advanced significantly in recent years which have been highly influenced by the online social websites. People perceive the web as a social medium that allows larger interaction among people, sharing of knowledge, or experiences. Internet or social web forums act as an agent to reproduce some general information that would benefit the users. A product review by the user is a more accurate representation of its real-world performance and web-forums are generally used to post such reviews. Though commercial review websites allow users to express their opinions in whatever way they feel, the number of reviews that a product receives could be very high. Hence, opinion mining techniques can be used to analyze the user-reviews, classify the content as positive or negative, and thereby find out how the product fares. This paper focuses its attention on providing a recommendation to the products available on the web by analyzing the context to score the sentences for each review by identifying the opinion and feature words using a novel algorithm.

Information Retrieval Methods for Multidisciplinary Applications
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

A Roadmap to Integrate Document Clustering in Information Retrieval

Model of E-Reading Process for E-School Book in Libya

XRecursive

The Effect of Stemming on Arabic Text Classification

Virtual Community of Practice Ontocop

On the Design and Implementation of Interactive XML Applications

SAR

Improved Parameterless K-Means

Mining Product Reviews in Web Forums

Export Citation Format

Information Retrieval Methods for Multidisciplinary ApplicationsLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

A Roadmap to Integrate Document Clustering in Information Retrieval

Model of E-Reading Process for E-School Book in Libya

XRecursive

The Effect of Stemming on Arabic Text Classification

Virtual Community of Practice Ontocop

On the Design and Implementation of Interactive XML Applications

SAR

Improved Parameterless K-Means

Mining Product Reviews in Web Forums

Information Retrieval Methods for Multidisciplinary Applications
Latest Publications