AntWeb—Web Search Based on Ant Behavior

Author(s):  
Li Weigang ◽  
Wu Man Qi

This chapter presents a study of Ant Colony Optimization (ACO) to Interlegis Web portal, Brazilian legislation Website. The approach of AntWeb is inspired by ant colonies foraging behavior to adaptively mark the most significant link by means of the shortest route to arrive the target pages. The system considers the users in the Web portal as artificial ants and the links among the pages of the Web pages as the researching network. To identify the group of the visitors, Web mining is applied to extract knowledge based on preprocessing Web log files. The chapter describes the theory, model, main utilities and implementation of AntWeb prototype in Interlegis Web portal. The case study shows Off-line Web mining; simulations without and with the use of AntWeb; testing by modification of the parameters. The result demonstrates the sensibility and accessibility of AntWeb and the benefits for the Interlegis Web users.

2020 ◽  
Vol 8 (6) ◽  
pp. 2619-2624

Now a day's web is the primary wellspring of data in each field. Web is additionally extending exponentially step by step procedure. To get the applicability of data is very tedious and is anything but an extremely simple assignment. For the most part of clients go for the different web indexes to look through any data. However, here and there web search tools are not ready to give valuable outcomes as the vast majority of the web archives are available in an unstructured way. Information mining is the extraction of data from an enormous database. This project can be useful in diagnostics, treatment, and counteraction of any ailment. There are large numbers of archives on the web about biomedical an explicit term so to acquire a pertinent record is exceptionally troublesome. The objective of this project is to apply content mining strategies to recover helpful biomedical web records. Here an increasingly productive instrument is proposed which utilizes the advanced SVM algorithm, grouping calculation where it can aggregate the comparable archives in a single spot. In this paper proposed smartly designed web mining algorithms to extract the textual form of information on web pages and to apply for web applications. This proposed system gives more helpful in all biomedical sectors. Search engines can be used to do the regression on the web pages into the biomedical structure. This methodology will assist the client in getting all the important relevant biomedical information in one place. On contrasting my methodology and the first SVM algorithm calculation that we use with an improved k-mean algorithm and found that our calculation on a normal giving 99.72 % results.


Nowadays, internet has become the easiest way to obtain more information from the web and millions of users search internet to find out the information. The continuous growth of web pages and users interest to search more information about various topics increases the complexity of recommendation. The user's behavior is extracted by using the web mining techniques, which are used in web server log. The main aim of this research study is to identify the navigation pattern of users from the log files. There are three major steps in the web mining process namely pre-processing the data, classification of pattern and users discovery. In recent periods, the web page articles are classified by the researchers before recommending the requested page to users. However, every category size is too large or manual labors are often needed for classification tasks. A high time complexity issues are faced by some existing clustering methods or according to the initial parameters, these techniques provides the iterative computing that leads to insufficient results. To address the above issues, a recommendation for web page is developed by initializing the margin parameters of classification techniques which considers both effectiveness and efficiency. This research work initializes the Random Forest's (RF) margin parameters by using the FireFly Algorithm (FFA) for reducing the processing time to speed up the process. A large volume of user's interest data is processed by these margin parameters, which provides a better recommendation than existing techniques. The experimental results show that RF-FFA method achieved 41.89% accuracy and recall values, when compared with other heuristic algorithms.


Author(s):  
Ji-Rong Wen

The Web is an open and free environment for people to publish and get information. Everyone on the Web can be either an author, a reader, or both. The language of the Web, HTML (Hypertext Markup Language), is mainly designed for information display, not for semantic representation. Therefore, current Web search engines usually treat Web pages as unstructured documents, and traditional information retrieval (IR) technologies are employed for Web page parsing, indexing, and searching. The unstructured essence of Web pages seriously blocks more accurate search and advanced applications on the Web. For example, many sites contain structured information about various products. Extracting and integrating product information from multiple Web sites could lead to powerful search functions, such as comparison shopping and business intelligence. However, these structured data are embedded in Web pages, and there are no proper traditional methods to extract and integrate them. Another example is the link structure of the Web. If used properly, information hidden in the links could be taken advantage of to effectively improve search performance and make Web search go beyond traditional information retrieval (Page, Brin, Motwani, & Winograd, 1998, Kleinberg, 1998).


Author(s):  
Ji-Rong Wen

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.


2007 ◽  
Vol 16 (05) ◽  
pp. 793-828 ◽  
Author(s):  
JUAN D. VELÁSQUEZ ◽  
VASILE PALADE

Understanding the web user browsing behaviour in order to adapt a web site to the needs of a particular user represents a key issue for many commercial companies that do their business over the Internet. This paper presents the implementation of a Knowledge Base (KB) for building web-based computerized recommender systems. The Knowledge Base consists of a Pattern Repository that contains patterns extracted from web logs and web pages, by applying various web mining tools, and a Rule Repository containing rules that describe the use of discovered patterns for building navigation or web site modification recommendations. The paper also focuses on testing the effectiveness of the proposed online and offline recommendations. An ample real-world experiment is carried out on a web site of a bank.


2020 ◽  
Vol 9 (1) ◽  
pp. 1045-1050

Nowadays, WWW has grown into significant and vast data storage. Every one of clients' exercises will be put away in log record. The log file shows the eagerness on the website. With an abundant use of web, the log file size is developing hurriedly. Web mining is a utilization of information digging innovations for immense information storehouses. It is the procedure of uncover data from web information. Before applying web mining procedures, the information in the web log must be pre-processed, consolidated and changed. It is essential for the web excavators to use smart apparatuses so as to discover, concentrate, channel and assess the ideal data. The information preprocessing stage is the most significant stage during the time spent web mining and is basic and complex in fruitful extraction of helpful information. The web logs are circulated in nature also they are non-versatile and unfeasible. Subsequently we require a broad learning calculation so as to get the ideal data.


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1772
Author(s):  
Amit Kumar Nandanwar ◽  
Jaytrilok Choudhary

Internet technologies are emerging very fast nowadays, due to which web pages are generated exponentially. Web page categorization is required for searching and exploring relevant web pages based on users’ queries and is a tedious task. The majority of web page categorization techniques ignore semantic features and the contextual knowledge of the web page. This paper proposes a web page categorization method that categorizes web pages based on semantic features and contextual knowledge. Initially, the GloVe model is applied to capture the semantic features of the web pages. Thereafter, a Stacked Bidirectional long short-term memory (BiLSTM) with symmetric structure is applied to extract the contextual and latent symmetry information from the semantic features for web page categorization. The performance of the proposed model has been evaluated on the publicly available WebKB dataset. The proposed model shows superiority over the existing state-of-the-art machine learning and deep learning methods.


Author(s):  
GAURAV AGARWAL ◽  
SACHI GUPTA ◽  
SAURABH MUKHERJEE

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.


2019 ◽  
Vol 16 (2) ◽  
pp. 384-388 ◽  
Author(s):  
K. S. Ramanujam ◽  
K. David

Web page classification refers to one of the significant research are in the web mining domain. Enormous quantity of data existing in the web demands the essential development of various effective and robust techniques to undergo web mining task that involves the process to categorizing the web page based on the data labels. It also includes various other tasks such as web crawling, analysis of web links and contextual advertising process. Existing machine learning and data mining techniques are being efficiently used for various web mining processes which include classification of web pages. Using of multiple classifier techniques are most promising research area while considering machine learning that works on the base of merging various classifiers with difference in base classifier and/or dataset distribution. With this several classification models are constructed that is highly robust in nature. This review paper, comparison has been done between FA, PSO, ACO, GA and IWT, to evaluate best fit algorithm for classifying web pages.


Author(s):  
Ben Choi

Web mining aims for searching, organizing, and extracting information on the Web and search engines focus on searching. The next stage of Web mining is the organization of Web contents, which will then facilitate the extraction of useful information from the Web. This chapter will focus on organizing Web contents. Since a majority of Web contents are stored in the form of Web pages, this chapter will focus on techniques for automatically organizing Web pages into categories. Various artificial intelligence techniques have been used; however the most successful ones are classification and clustering. This chapter will focus on clustering. Clustering is well suited for Web mining by automatically organizing Web pages into categories each of which contain Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain, until now there is no such a method suitable for Web page clustering. To address this problem, this chapter describes a method to discover a constant factor that characterizes the Web domain and proposes a new method for automatically determining the number of clusters in Web page datasets. This chapter also proposes a new bi-directional hierarchical clustering algorithm, which arranges individual Web pages into clusters and then arranges the clusters into larger clusters and so on until the average inter-cluster similarity approaches the constant factor. Having the constant factor together with the algorithm, this chapter provides a new clustering system suitable for mining the Web.


Sign in / Sign up

Export Citation Format

Share Document