Web Page Ranking Algorithm Based on the Meta-Information

2014 ◽  
Vol 596 ◽  
pp. 292-296
Author(s):  
Xin Li Li

PageRank algorithms only consider hyperlink information, without other page information such as page hits frequency, page update time and web page category. Therefore, the algorithms rank a lot of advertising pages and old pages pretty high and can’t meet the users' needs. This paper further studies the page meta-information such as category, page hits frequency and page update time. The Web page with high hits frequency and with smaller age should get a high rank, while the above two factors are more or less dependent on page category. Experimental results show that the algorithm has good results.

2015 ◽  
Vol 5 (1) ◽  
pp. 41-55 ◽  
Author(s):  
Sutirtha Kumar Guha ◽  
Anirban Kundu ◽  
Rana Duttagupta

In this paper the authors are going to propose a new rank measurement technique by introducing weightage factor based on number of Web links available on a particular Web page. Available Web links are considered as an important importance indicator. Distinct weightage factor is assigned to the Web pages as these are calculated based on the Web links. Different Web pages are evaluated more accurately due to the independent and uniqueness of weightage factor. Better Web page ranking is achieved as it depends on specific weightage factor. Impact of unwanted intruder is minimized by the introduction of this weightage factor.


Author(s):  
Jie Zhao ◽  
Jianfei Wang ◽  
Jia Yang ◽  
Peiquan Jin

Company acquisition relation reflects a company's development intent and competitive strategies, which is an important type of enterprise competitive intelligence. In the traditional environment, the acquisition of competitive intelligence mainly relies on newspapers, internal reports, and so on, but the rapid development of the Web introduces a new way to extract company acquisition relation. In this paper, the authors study the problem of extracting company acquisition relation from huge amounts of Web pages, and propose a novel algorithm for company acquisition relation extraction. The authors' algorithm considers the tense feature of Web content and classification technology of semantic strength when extracting company acquisition relation from Web pages. It first determines the tense of each sentence in a Web page, which is then applied in sentences classification so as to evaluate the semantic strength of the candidate sentences in describing company acquisition relation. After that, the authors rank the candidate acquisition relations and return the top-k company acquisition relation. They run experiments on 6144 pages crawled through Google, and measure the performance of their algorithm under different metrics. The experimental results show that the algorithm is effective in determining the tense of sentences as well as the company acquisition relation.


Supplementary factor to the general pagerank calculation which is utilized by Google chrome to rank sites in their web index results is tended to in this paper. These extra factors incorporate couple of ideas which expressly results to build the precision of evaluating the PageRank value. By making a decision about the likeness between the web page content with the text extracted from different site pagesresulted in topmost search using few keywords of the considered page for which the rank is to be determined by utilizing a comparability measure. It results with a worth or rate which speaks to the significance or similarity factor. Further, in a similar strategy if sentimental analysis is applied the search results of the keywords could be analysed with keywords of the page considered, it results with a Sentimental Analysed factor.In this way, one can improve and execute the Page ranking procedure which results with a superior accuracy.Hadoop Distributed File System is used to compute the page rank of input nodes. Python is chosen for parallel page rank algorithm that is executed on Hadoop


2019 ◽  
Vol 8 (2) ◽  
pp. 32-39
Author(s):  
T. Mylsami ◽  
B. L. Shivakumar

In general the World Wide Web become the most useful information resource used for information retrievals and knowledge discoveries. But the Information on Web to be expand in size and density. The retrieval of the required information on the web is efficiently and effectively to be challenge one. For the tremendous growth of the web has created challenges for the search engine technology. Web mining is an area in which applies data mining techniques to deal the requirements. The following are the popular Web Mining algorithms, such as PageRanking (PR), Weighted PageRanking (WPR) and Hyperlink-Induced Topic Search (HITS), are quite commonly used algorithm to sort out and rank the search results. In among the page ranking algorithm uses web structure mining and web content mining to estimate the relevancy of a web site and not to deal the scalability problem and also visits of inlinks and outlinks of the pages. In recent days to access fast and efficient page ranking algorithm for webpage retrieval remains as a challenging. This paper proposed a new improved WPR algorithm which uses a Principal Component Analysis technique called (PWPR) based on mean value of page ranks. The proposed PWPR algorithm takes into account the importance of both the number of visits of inlinks and outlinks of the pages and distributes rank scores based on the popularity of the pages. The weight values of the pages is computed from the inlinks and outlinks with their mean values. But in PWPR method new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. To solve this problem is a MapReduce (MR) framework is promising approach to refreshing mining results for mining big data .The proposed MR algorithm reduces the time complexity of the PWPR algorithm by reducing the number of iterations to reach a convergence point.


Author(s):  
GAURAV AGARWAL ◽  
SACHI GUPTA ◽  
SAURABH MUKHERJEE

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.


2012 ◽  
Vol 601 ◽  
pp. 394-400
Author(s):  
Taeh Wan Kim ◽  
Ho Cheol Jeon ◽  
Joong Min Choi

Document similarity search is to retrieve a ranked list of similar documents and find documents similar to a query document in a text corpus or a web page on the web. But most of the previous researches regarding searching for similar documents are focused on classifying documents based on the contents of documents. To solve this problem, we propose a novel retrieval approach based on undirected graphs to represent each document in corpus. In addition, this study also considers unified graph in conjunction with multiple graphs to improve the quality of searching for similar documents. Experimental results on the Reuters-21578 data demonstrate that the proposed system has better performance and success than the traditional approach.


2018 ◽  
Vol 182 (29) ◽  
pp. 1-5
Author(s):  
Megha Bhawsar ◽  
Shraddha Kumar

2018 ◽  
Vol 8 (4) ◽  
pp. 1-13
Author(s):  
Rajnikant Bhagwan Wagh ◽  
Jayantrao Bhaurao Patil

Recommendation systems are growing very rapidly. While surfing, users frequently miss the goal of their search and lost in information overload problem. To overcome this information overload problem, the authors have proposed a novel web page recommendation system to save surfing time of user. The users are analyzed when they surf through a particular web site. Authors have used relationship matrix and frequency matrix for effectively finding the connectivity among the web pages of similar users. These webpages are divided into various clusters using enhanced graph based partitioning concept. Authors classify active users more accurately to found clusters. Threshold values are used in both clustering and classification stages for more appropriate results. Experimental results show that authors get around 61% accuracy, 37% coverage and 46% F1 measure. It helps in improved surfing experience of users.


2018 ◽  
Vol 7 (4.10) ◽  
pp. 566
Author(s):  
B Jaganathan ◽  
Kalyani Desikan

In today's era of computer technology where users want not only the most relevant data but they also want the data as quickly as possible. Hence, ranking web pages becomes a crucial task. The purpose of this research is to find a centrality measure that can be used in place of original page rank. In this article concept of Laplacian centrality measure for directed web graph has been introduced to identify the web page ranks. Comparison between the original page rank and Laplacian centrality based Page rank has been made. Kendall's  correlation co-efficient has been used as a measure to find the correlation between the original page rank and Laplacian centrality measure based page rank.  


Sign in / Sign up

Export Citation Format

Share Document