An Ontology-based Ranking Model in Search Engines

Yu Hou; Lixin Tao

doi:10.30564/jcsr.v1i2.972

An Ontology-based Ranking Model in Search Engines

Journal of Computer Science Research ◽

10.30564/jcsr.v1i2.972 ◽

2019 ◽

Vol 1 (2) ◽

Author(s):

Yu Hou ◽

Lixin Tao

Keyword(s):

Search Engine ◽

Search Engines ◽

Semantic Search ◽

The Internet ◽

Web Pages ◽

Ranking Algorithm ◽

Intelligent Network ◽

Web Page ◽

Pagerank Algorithm ◽

Hits Algorithm

As the tsunami of data has emerged, search engines have become the most powerful tool for obtaining scattered information on the internet. The traditional search engines return the organized results by using ranking algorithm such as term frequency, link analysis (PageRank algorithm and HITS algorithm) etc. However, these algorithms must combine the keyword frequency to determine the relevance between user’s query and the data in the computer system or internet. Moreover, we expect the search engines could understand users’ searching by content meanings rather than literal strings. Semantic Web is an intelligent network and it could understand human’s language more semantically and make the communication easier between human and computers. But, the current technology for the semantic search is hard to apply. Because some meta data should be annotated to each web pages, then the search engine will have the ability to understand the users intend. However, annotate every web page is very time-consuming and leads to inefficiency. So, this study designed an ontology-based approach to improve the current traditional keyword-based search and emulate the effects of semantic search. And let the search engine can understand users more semantically when it gets the knowledge.

Download Full-text

Critical Analysis of Major Search Engines

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8239 ◽

2019 ◽

Vol 16 (9) ◽

pp. 3712-3716

Author(s):

Kailash Kumar ◽

Abdulaziz Al-Besher

Keyword(s):

Search Engine ◽

Search Engines ◽

Critical Analysis ◽

Research Paper ◽

Web Pages ◽

Ranking Algorithm ◽

Internet Search Engines

Encyclopedia of E-Commerce, E-Government, and Mobile Commerce ◽

10.4018/978-1-59140-799-7.ch108 ◽

2011 ◽

pp. 672-677

Author(s):

Vijay Kasi ◽

Radhika Jain

Keyword(s):

Search Engine ◽

Web Sites ◽

Search Engines ◽

World Wide ◽

Relevant Information ◽

The Internet ◽

Web Pages ◽

Web Page ◽

The World ◽

The Web

In the context of the Internet, a search engine can be defined as a software program designed to help one access information, documents, and other content on the World Wide Web. The adoption and growth of the Internet in the last decade has been unprecedented. The World Wide Web has always been applauded for its simplicity and ease of use. This is evident looking at the extent of the knowledge one requires to build a Web page. The flexible nature of the Internet has enabled the rapid growth and adoption of it, making it hard to search for relevant information on the Web. The number of Web pages has been increasing at an astronomical pace, from around 2 million registered domains in 1995 to 233 million registered domains in 2004 (Consortium, 2004). The Internet, considered a distributed database of information, has the CRUD (create, retrieve, update, and delete) rule applied to it. While the Internet has been effective at creating, updating, and deleting content, it has considerably lacked in enabling the retrieval of relevant information. After all, there is no point in having a Web page that has little or no visibility on the Web. Since the 1990s when the first search program was released, we have come a long way in terms of searching for information. Although we are currently witnessing a tremendous growth in search engine technology, the growth of the Internet has overtaken it, leading to a state in which the existing search engine technology is falling short. When we apply the metrics of relevance, rigor, efficiency, and effectiveness to the search domain, it becomes very clear that we have progressed on the rigor and efficiency metrics by utilizing abundant computing power to produce faster searches with a lot of information. Rigor and efficiency are evident in the large number of indexed pages by the leading search engines (Barroso, Dean, & Holzle, 2003). However, more research needs to be done to address the relevance and effectiveness metrics. Users typically type in two to three keywords when searching, only to end up with a search result having thousands of Web pages! This has made it increasingly hard to effectively find any useful, relevant information. Search engines face a number of challenges today requiring them to perform rigorous searches with relevant results efficiently so that they are effective. These challenges include the following (“Search Engines,” 2004). 1. The Web is growing at a much faster rate than any present search engine technology can index. 2. Web pages are updated frequently, forcing search engines to revisit them periodically. 3. Dynamically generated Web sites may be slow or difficult to index, or may result in excessive results from a single Web site. 4. Many dynamically generated Web sites are not able to be indexed by search engines. 5. The commercial interests of a search engine can interfere with the order of relevant results the search engine shows. 6. Content that is behind a firewall or that is password protected is not accessible to search engines (such as those found in several digital libraries).1 7. Some Web sites have started using tricks such as spamdexing and cloaking to manipulate search engines to display them as the top results for a set of keywords. This can make the search results polluted, with more relevant links being pushed down in the result list. This is a result of the popularity of Web searches and the business potential search engines can generate today. 8. Search engines index all the content of the Web without any bounds on the sensitivity of information. This has raised a few security and privacy flags. With the above background and challenges in mind, we lay out the article as follows. In the next section, we begin with a discussion of search engine evolution. To facilitate the examination and discussion of the search engine development’s progress, we break down this discussion into the three generations of search engines. Figure 1 depicts this evolution pictorially and highlights the need for better search engine technologies. Next, we present a brief discussion on the contemporary state of search engine technology and various types of content searches available today. With this background, the next section documents various concerns about existing search engines setting the stage for better search engine technology. These concerns include information overload, relevance, representation, and categorization. Finally, we briefly address the research efforts under way to alleviate these concerns and then present our conclusion.

Download Full-text

Machine Learning as a New Search Engine Interface: An Overview

Engineering International ◽

10.18034/ei.v2i2.539 ◽

2014 ◽

Vol 2 (2) ◽

pp. 103-112 ◽

Cited By ~ 1

Author(s):

Taposh Kumar Neogy ◽

Harish Paruchuri

Keyword(s):

Machine Learning ◽

Search Engine ◽

Search Engines ◽

New World ◽

Human Factor ◽

Experimental Results ◽

The Internet ◽

Web Pages ◽

Web Page ◽

Real People

The essence of a web page is an inherently predisposed issue, one that is built on behaviors, interests, and intelligence. There are relatively a ton of reasons web pages are critical to the new world, as the matter cannot be overemphasized. The meteoric growth of the internet is one of the most potent factors making it hard for search engines to provide actionable results. With classified directories, search engines store web pages. To store these pages, some of the engines rely on the expertise of real people. Most of them are enabled and classified using automated means but the human factor is dominant in their success. From experimental results, we can deduce that the most effective and critical way to automate web pages for search engines is via the integration of machine learning.

Download Full-text

An Offline SEO (Search Engine Optimization) Based Algorithm to Calculate Web Page Rank According to Different Parameters

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v9i1.4161 ◽

2013 ◽

Vol 9 (1) ◽

pp. 926-931 ◽

Cited By ~ 2

Author(s):

Parveen Rani ◽

Er. Sukhpreet Singh

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Pages ◽

Web Page ◽

Page Rank ◽

Search Engine Optimization ◽

Hits Algorithm

SEO stands for Search Engine Optimization. It is a technique that searches various web pages for specified keywords and ranks these Web pages according to some parameters. They are used to feed pages to search engines. Â The main importance of SEO is that it helps to find the relevant data and increase the rank of a webpage in search enginesâ€™ results. In our paper, we develop a new algorithm M-HITS (Modified HITS) to provide the page rank. M-HITS Algorithm is a new version of HITS algorithm. It is developed by extending the properties of HITS algorithm.

Download Full-text

Similarity Web Pages Retrieval Technologies on the Internet

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch440 ◽

2005 ◽

pp. 2486-2491

Author(s):

Rung Ching Chen ◽

Ming Yung Tsai ◽

Chung Hsun Hsieh

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Search Engines ◽

Fast Growth ◽

The Other ◽

The Internet ◽

Web Pages ◽

Query Term ◽

Web Page ◽

Critical Problems

In recent years, due to the fast growth of the Internet, the services and information it provides are constantly expanding. Madria and Bhowmick (1999) and Baeza-Yates (2003) indicated that most large search engines need to comply to, on average, at least millions of hits daily in order to satisfy the users’ needs for information. Each search engine has its own sorting policy and the keyword format for the query term, but there are some critical problems. The searches may get more or less information. In the former, the user always gets buried in the information. Requiring only a little information, they always select some former items from the large amount of returned information. In the latter, the user always re-queries using another searching keyword to do searching work. The re-query operation also leads to retrieving information in a great amount, which leads to having a large amount of useless information. That is a bad cycle of information retrieval. The similarity Web page retrieval can help avoid browsing the useless information. The similarity Web page retrieval indicates a Web page, and then compares the page with the other Web pages from the searching results of search engines. The similarity Web page retrieval will allow users to save time by not browsing unrelated Web pages and reject non-similar Web pages, rank the similarity order of Web pages and cluster the similarity Web pages into the same classification.

Download Full-text

A Review on Semantic Text and Multimedia Retrieval and Recent Trends

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2015010104 ◽

2015 ◽

Vol 6 (1) ◽

pp. 54-74

Author(s):

Oğuzhan Menemencioğlu ◽

İlhami Muharrem Orak

Keyword(s):

Semantic Web ◽

Search Engine ◽

Search Engines ◽

Semantic Search ◽

Multimedia Retrieval ◽

Web Pages ◽

The Face ◽

Recent Trends ◽

New Applications ◽

Machine Readable

Semantic web works on producing machine readable data and aims to deal with large amount of data. The most important tool to access the data which exist in web is the search engine. Traditional search engines are insufficient in the face of the amount of data that consists in the existing web pages. Semantic search engines are extensions to traditional engines and overcome the difficulties faced by them. This paper summarizes semantic web, concept of traditional and semantic search engines and infrastructure. Also semantic search approaches are detailed. A summary of the literature is provided by touching on the trends. In this respect, type of applications and the areas worked for are considered. Based on the data for two different years, trend on these points are analyzed and impacts of changes are discussed. It shows that evaluation on the semantic web continues and new applications and areas are also emerging. Multimedia retrieval is a newly scope of semantic. Hence, multimedia retrieval approaches are discussed. Text and multimedia retrieval is analyzed within semantic search.

Download Full-text

WEB GRAPH BASED SEARCH BY USING DENSITY OF KEYWORD AND AGE FACTOR

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2013.1124 ◽

2013 ◽

pp. 89-93

Author(s):

GAURAV AGARWAL ◽

SACHI GUPTA ◽

SAURABH MUKHERJEE

Keyword(s):

Search Engine ◽

Web Search ◽

Web Pages ◽

Main Role ◽

Ranking Algorithm ◽

Web Page ◽

Web Crawler ◽

User Requirement ◽

Priority Assignment ◽

The Web

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.

Download Full-text

Discovering How Students Search a Library Web Site: A Usability Case Study

College & Research Libraries ◽

10.5860/crl.63.4.354 ◽

2002 ◽

Vol 63 (4) ◽

pp. 354-365 ◽

Cited By ~ 43

Author(s):

Susan Augustine ◽

Courtney Greene

Keyword(s):

Search Engine ◽

Long Range ◽

Web Sites ◽

Search Engines ◽

Web Pages ◽

Usability Study ◽

Web Page ◽

The Past ◽

Library Resources

Have Internet search engines influenced the way students search library Web pages? The results of this usability study reveal that students consistently and frequently use the library Web site’s internal search engine to find information rather than navigating through pages. If students are searching rather than navigating, library Web page designers must make metadata and powerful search engines priorities. The study also shows that students have difficulty interpreting library terminology, experience confusion discerning difference amongst library resources, and prefer to seek human assistance when encountering problems online. These findings imply that library Web sites have not alleviated some of the basic and long-range problems that have challenged librarians in the past.

Download Full-text

Web Crawler and Web Crawler Algorithms: A Perspective

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9362.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 203-205

Keyword(s):

Search Engine ◽

Search Engines ◽

The Internet ◽

Web Pages ◽

Web Crawler ◽

Day By Day ◽

The Web

A web crawler is also called spider. For the intention of web indexing it automatically searches on the WWW. As the W3 is increasing day by day, globally the number of web pages grown massively. To make the search sociable for users, searching engine are mandatory. So to discover the particular data from the WWW search engines are operated. It would be almost challenging for mankind devoid of search engines to find anything from the web unless and until he identifies a particular URL address. A central depository of HTML documents in indexed form is sustained by every search Engine. Every time an operator gives the inquiry, searching is done at the database of indexed web pages. The size of a database of every search engine depends on the existing page on the internet. So to increase the proficiency of search engines, it is permitted to store only the most relevant and significant pages in the database.

Download Full-text

The Core Aspects of Search Engine Optimisation Necessary to Move up the Ranking

International Journal of Ambient Computing and Intelligence ◽

10.4018/jaci.2011100105 ◽

2011 ◽

Vol 3 (4) ◽

pp. 62-70 ◽

Cited By ~ 6

Author(s):

Stephen O’Neill ◽

Kevin Curran

Keyword(s):

Local Search ◽

Search Engine ◽

Search Engines ◽

Internet Marketing ◽

Web Pages ◽

Image Search ◽

Web Page ◽

Search Results ◽

The Core

Search engine optimization (SEO) is the process of improving the visibility, volume and quality of traffic to website or a web page in search engines via the natural search results. SEO can also target other areas of a search, including image search and local search. SEO is one of many different strategies used for marketing a website but SEO has been proven the most effective. An Internet marketing campaign may drive organic search results to websites or web pages but can be involved with paid advertising on search engines. All search engines have a unique way of ranking the importance of a website. Some search engines focus on the content while others review Meta tags to identify who and what a web site’s business is. Most engines use a combination of Meta tags, content, link popularity, click popularity and longevity to determine a sites ranking. To make it even more complicated, they change their ranking policies frequently. This paper provides an overview of search engine optimisation strategies and pitfalls.

Download Full-text