Usage of co-event pattern mining with optimal fuzzy rule-based classifier for effective web page retrieval

2018 ◽  
Vol 7 (3.29) ◽  
pp. 275
Author(s):  
P Chandrashaker Reddy ◽  
A Suresh Babu

With the coming of the World Wide Web and the rise of web-based business applications and informal organizations, associations over the web create a lot of information on a daily basis. It is becoming more complex and critical task to retrieve exact information from web expected by its users. In the recent times, the Web has extended its noteworthiness to the point of transforming into the point of convergence of our propelled lives. The search engine as an apparatus to explore the web must get the coveted outcomes for any given query. The greater part of the search engines can't totally fulfill user’s necessities and the outcomes are regularly inaccurate and irrelevant. knowledge of ontology and history is not much personalization in the existing techniques. To conquer these issues, data mining systems must be connected to the web and one advanced powerful concept is web-page recommendation which is becoming more powerful now a day. In this paper, the design of a fuzzy logic classifier algorithm is defined as a search problem in the solution space where every node represents a rule set, membership function, and the particular framework behaviour. Therefore, the hybrid optimization algorithm is applied to search for an optimal location of this solution space which hopefully represents the near optimal rule set and membership function. In this article, we reviewed various techniques proposed by different researchers for web page personalization and proposed a novel approach for finding optimal solutions to search the relevant information..  

Author(s):  
Giuliano Armano ◽  
Alessandro Giuliani ◽  
Eloisa Vargiu

Information Filtering deals with the problem of selecting relevant information for a given user, according to her/his preferences and interests. In this chapter, the authors consider two ways of performing information filtering: recommendation and contextual advertising. In particular, they study and analyze them according to a unified view. In fact, the task of suggesting an advertisement to a Web page can be viewed as the task of recommending an item (the advertisement) to a user (the Web page), and vice versa. Starting from this insight, the authors propose a content-based recommender system based on a generic solution for contextual advertising and a hybrid contextual advertising system based on a generic hybrid recommender system. Relevant case studies have been considered (i.e., a photo recommender and a Web advertiser) with the goal of highlighting how the proposed approach works in practice. In both cases, results confirm the effectiveness of the proposed solutions.


Author(s):  
MENG HIOT LIM ◽  
WILLIE NG

We present a methodology of learning fuzzy rules using an iterative genetic algorithm (GA). The approach incorporates a scheme of partitioning the entire solution space into individual subspaces. It then employs a mechanism to progressively relax or tighten the constraint. The relaxation or tightening of constraint guides the GA to the subspace for further iteration. The system referred to as the iterative GA learning module is useful for learning an efficient fuzzy control algorithm based on a predefined linguistic terms set. The overall approach was applied to learn a fuzzy algorithm for a water bath temperature control. The simulation results demonstrate the effectiveness of the approach in automating an industrial process.


Author(s):  
V Aruna, Et. al.

In the recent years with the advancement in technology, a  lot of information is available in different formats and extracting the  knowledge from that data has become a very difficult task. Due to the vast amount of information available on the web, users are finding it difficult to extract relevant information or create new knowledge using information available on the web. To solve this problem  Web mining techniques are used to discover the interesting patterns from the hidden data .Web Usage Mining (WUM), which is one  of the subset of  Web Mining helps in extracting the hidden knowledge present in the Web log  files , in recognizing various interests of web users and also in  discovering customer behaviours. Web Usage mining  includes different phases of data mining techniques called Data Pre-processing, Pattern Discovery & Pattern Analysis. This paper presents an updated focused survey on various sequential pattern mining  algorithms  like  apriori-based algorithm , Breadth First Search-based strategy, Depth First Search strategy,  sequential closed-pattern algorithm and Incremental pattern mining algorithm which are used in Pattern Discovery Phase of WUM. At last , a comparison  is done based on the important key features present in these algorithms. This study gives us better understanding of the approaches of sequential pattern mining.


Author(s):  
H. Inbarani ◽  
K. Thangavel

Recommender systems represent a prominent class of personalized Web applications, which particularly focus on the user-dependent filtering and selection of relevant information. Recommender Systems have been a subject of extensive research in Artificial Intelligence over the last decade, but with today’s increasing number of e-commerce environments on the Web, the demand for new approaches to intelligent product recommendation is higher than ever. There are more online users, more online channels, more vendors, more products, and, most importantly, increasingly complex products and services. These recent developments in the area of recommender systems generated new demands, in particular with respect to interactivity, adaptivity, and user preference elicitation. These challenges, however, are also in the focus of general Web page recommendation research. The goal of this chapter is to develop robust techniques to model noisy data sets containing an unknown number of overlapping categories and apply them for Web personalization and mining. In this chapter, rough set-based clustering approaches are used to discover Web user access patterns, and these techniques compute a number of clusters automatically from the Web log data using statistical techniques. The suitability of rough clustering approaches for Web page recommendation are measured using predictive accuracy metrics.


Author(s):  
Vijay Kasi ◽  
Radhika Jain

In the context of the Internet, a search engine can be defined as a software program designed to help one access information, documents, and other content on the World Wide Web. The adoption and growth of the Internet in the last decade has been unprecedented. The World Wide Web has always been applauded for its simplicity and ease of use. This is evident looking at the extent of the knowledge one requires to build a Web page. The flexible nature of the Internet has enabled the rapid growth and adoption of it, making it hard to search for relevant information on the Web. The number of Web pages has been increasing at an astronomical pace, from around 2 million registered domains in 1995 to 233 million registered domains in 2004 (Consortium, 2004). The Internet, considered a distributed database of information, has the CRUD (create, retrieve, update, and delete) rule applied to it. While the Internet has been effective at creating, updating, and deleting content, it has considerably lacked in enabling the retrieval of relevant information. After all, there is no point in having a Web page that has little or no visibility on the Web. Since the 1990s when the first search program was released, we have come a long way in terms of searching for information. Although we are currently witnessing a tremendous growth in search engine technology, the growth of the Internet has overtaken it, leading to a state in which the existing search engine technology is falling short. When we apply the metrics of relevance, rigor, efficiency, and effectiveness to the search domain, it becomes very clear that we have progressed on the rigor and efficiency metrics by utilizing abundant computing power to produce faster searches with a lot of information. Rigor and efficiency are evident in the large number of indexed pages by the leading search engines (Barroso, Dean, & Holzle, 2003). However, more research needs to be done to address the relevance and effectiveness metrics. Users typically type in two to three keywords when searching, only to end up with a search result having thousands of Web pages! This has made it increasingly hard to effectively find any useful, relevant information. Search engines face a number of challenges today requiring them to perform rigorous searches with relevant results efficiently so that they are effective. These challenges include the following (“Search Engines,” 2004). 1. The Web is growing at a much faster rate than any present search engine technology can index. 2. Web pages are updated frequently, forcing search engines to revisit them periodically. 3. Dynamically generated Web sites may be slow or difficult to index, or may result in excessive results from a single Web site. 4. Many dynamically generated Web sites are not able to be indexed by search engines. 5. The commercial interests of a search engine can interfere with the order of relevant results the search engine shows. 6. Content that is behind a firewall or that is password protected is not accessible to search engines (such as those found in several digital libraries).1 7. Some Web sites have started using tricks such as spamdexing and cloaking to manipulate search engines to display them as the top results for a set of keywords. This can make the search results polluted, with more relevant links being pushed down in the result list. This is a result of the popularity of Web searches and the business potential search engines can generate today. 8. Search engines index all the content of the Web without any bounds on the sensitivity of information. This has raised a few security and privacy flags. With the above background and challenges in mind, we lay out the article as follows. In the next section, we begin with a discussion of search engine evolution. To facilitate the examination and discussion of the search engine development’s progress, we break down this discussion into the three generations of search engines. Figure 1 depicts this evolution pictorially and highlights the need for better search engine technologies. Next, we present a brief discussion on the contemporary state of search engine technology and various types of content searches available today. With this background, the next section documents various concerns about existing search engines setting the stage for better search engine technology. These concerns include information overload, relevance, representation, and categorization. Finally, we briefly address the research efforts under way to alleviate these concerns and then present our conclusion.


2018 ◽  
Vol 7 (3.27) ◽  
pp. 290
Author(s):  
Jyoti Narayan Jadhav ◽  
B Arunkumar

The web page recommenders predict and recommend the web pages to the users based on the behavior of their search history. The web page recommender system analyzes the semantics of the navigation by the user and predicts the related web pages for the user. Various recommender systems have been developed in the literature for the web page recommendation. In the first work, a web page recommendation system was developed using weighted sequential pattern mining and Wu and Li Index Fuzzy clustering (WLI-FC) algorithm. In this work, the Chronological based Dragonfly Algorithm (Chronological-DA) is proposed for recommending the webpage to the users. The proposed Chronological-DA algorithm includes the concept of the chronological for recommending the webpage based on the history of pages visited by the users. Also, the proposed recommendation system uses the concept of Laplacian correction for defining the recommendation probability. Simulation of the proposed webpage recommendation system with the chronological-DA uses the standard CTI and the MSNBC database for the experimentation, and the experimental results prove that the proposed scheme has better values of 1, 0.964, and 0.973 for precision, recall, and F-measure respectively.  


2020 ◽  
pp. 151-156
Author(s):  
A. P. Korablev ◽  
N. S. Liksakova ◽  
D. M. Mirin ◽  
D. G. Oreshkin ◽  
P. G. Efimov

A new species list of plants and lichens of Russia and neighboring countries has been developed for Turboveg for Windows, the program, intended for storage and management of phytosociological data (relevés), is widely used all around the world (Hennekens, Schaminée, 2001; Hennekens, 2015). The species list is built upon the database of the Russian website Plantarium (Plantarium…: [site]), which contains a species atlas and illustrated an online Handbook of plants and lichens. The nomenclature used on Plantarium was originally based on the following issues: vascular plants — S. K. Cherepanov (1995) with additions; mosses — «Flora of mosses of Russia» (Proect...: [site]); liverworts and hornworts — A. D. Potemkin and E. V. Sofronova (2009); lichens — «Spisok…» G. P. Urbanavichyus ed. (2010); other sources (Plantarium...: [site]). The new species list, currently the most comprehensive in Turboveg format for Russia, has 89 501 entries, including 4627 genus taxa compare to the old one with 32 020 entries (taxa) and only 253 synonyms. There are 84 805 species and subspecies taxa in the list, 37 760 (44.7 %) of which are accepted, while the others are synonyms. Their distribution by groups of organisms and divisions are shown in Table. A large number of synonyms in the new list and its adaptation to work with the Russian literature will greatly facilitate the entry of old relevé data. The ways of making new list, its structure as well as the possibilities of checking taxonomic lists on Internet resources are considered. The files of the species list for Turboveg 2 and Turboveg 3, the technique of associating existing databases with a new species list (in Russian) are available on the web page https://www.binran.ru/resursy/informatsionnyye-resursy/tekuschie-proekty/species_list_russia/.


2009 ◽  
Author(s):  
Mirko Luca Lobina ◽  
Davide Mula
Keyword(s):  
Web Page ◽  

2021 ◽  
Vol 13 (2) ◽  
pp. 50
Author(s):  
Hamed Z. Jahromi ◽  
Declan Delaney ◽  
Andrew Hines

Content is a key influencing factor in Web Quality of Experience (QoE) estimation. A web user’s satisfaction can be influenced by how long it takes to render and visualize the visible parts of the web page in the browser. This is referred to as the Above-the-fold (ATF) time. SpeedIndex (SI) has been widely used to estimate perceived web page loading speed of ATF content and a proxy metric for Web QoE estimation. Web application developers have been actively introducing innovative interactive features, such as animated and multimedia content, aiming to capture the users’ attention and improve the functionality and utility of the web applications. However, the literature shows that, for the websites with animated content, the estimated ATF time using the state-of-the-art metrics may not accurately match completed ATF time as perceived by users. This study introduces a new metric, Plausibly Complete Time (PCT), that estimates ATF time for a user’s perception of websites with and without animations. PCT can be integrated with SI and web QoE models. The accuracy of the proposed metric is evaluated based on two publicly available datasets. The proposed metric holds a high positive Spearman’s correlation (rs=0.89) with the Perceived ATF reported by the users for websites with and without animated content. This study demonstrates that using PCT as a KPI in QoE estimation models can improve the robustness of QoE estimation in comparison to using the state-of-the-art ATF time metric. Furthermore, experimental result showed that the estimation of SI using PCT improves the robustness of SI for websites with animated content. The PCT estimation allows web application designers to identify where poor design has significantly increased ATF time and refactor their implementation before it impacts end-user experience.


2014 ◽  
Vol 2014 ◽  
pp. 1-8
Author(s):  
Noriyuki Matsuda ◽  
Haruhiko Takeuchi

Assuming that scenes would be visually scanned by chunking information, we partitioned fixation sequences of web page viewers into chunks using isolate gaze point(s) as the delimiter. Fixations were coded in terms of the segments in a5×5mesh imposed on the screen. The identified chunks were mostly short, consisting of one or two fixations. These were analyzed with respect to the within- and between-chunk distances in the overall records and the patterns (i.e., subsequences) frequently shared among the records. Although the two types of distances were both dominated by zero- and one-block shifts, the primacy of the modal shifts was less prominent between chunks than within them. The lower primacy was compensated by the longer shifts. The patterns frequently extracted at three threshold levels were mostly simple, consisting of one or two chunks. The patterns revealed interesting properties as to segment differentiation and the directionality of the attentional shifts.


Sign in / Sign up

Export Citation Format

Share Document