Ranking Web Search Results Exploiting Wikipedia

It is widely known that search engines are the dominating tools for finding information on the web. In most of the cases, these engines return web page references on a global ranking taking in mind either the importance of the web site or the relevance of the web pages to the identified topic. In this paper, we focus on the problem of determining distinct thematic groups on web search engine results that other existing engines provide. We additionally address the problem of dynamically adapting their ranking according to user selections, incorporating user judgments as implicitly registered in their selection of relevant documents. Our system exploits a state of the art semantic web data mining technique that identifies semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. Moreover, we propose a novel probabilistic Network scheme that employs the aforementioned topic identification method, in order to modify ranking of results as the users select documents. We evaluated in practice our implemented prototype with extensive experiments with the ClueWeb09 dataset using the TREC’s 2009, 2010, 2011 and 2012 Web Tracks’ where we observed improved retrieval performance compared to current state of the art re-ranking methods.

Download Full-text

Soft Threshold Ternary Networks

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/318 ◽

2020 ◽

Author(s):

Weixiang Xu ◽

Xiangyu He ◽

Tianli Zhao ◽

Qinghao Hu ◽

Peisong Wang ◽

...

Keyword(s):

Neural Networks ◽

Mobile Devices ◽

State Of The Art ◽

Performance Gap ◽

Training Time ◽

Current State ◽

The Arts ◽

Soft Threshold ◽

And Storage ◽

Selection Of

Large neural networks are difficult to deploy on mobile devices because of intensive computation and storage. To alleviate it, we study ternarization, a balance between efficiency and accuracy that quantizes both weights and activations into ternary values. In previous ternarized neural networks, a hard threshold Δ is introduced to determine quantization intervals. Although the selection of Δ greatly affects the training results, previous works estimate Δ via an approximation or treat it as a hyper-parameter, which is suboptimal. In this paper, we present the Soft Threshold Ternary Networks (STTN), which enables the model to automatically determine quantization intervals instead of depending on a hard threshold. Concretely, we replace the original ternary kernel with the addition of two binary kernels at training time, where ternary values are determined by the combination of two corresponding binary values. At inference time, we add up the two binary kernels to obtain a single ternary kernel. Our method dramatically outperforms current state-of-the-arts, lowering the performance gap between full-precision networks and extreme low bit networks. Experiments on ImageNet with AlexNet (Top-1 55.6%), ResNet-18 (Top-1 66.2%) achieves new state-of-the-art.

Download Full-text

Personalized web search on e-commerce using ontology based association mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9487 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 286

Author(s):

B. Sekhar Babu ◽

P. Lakshmi Prasanna ◽

P. Vidyullatha

Keyword(s):

Data Mining ◽

Web Search ◽

Large Data ◽

Association Mining ◽

Data Sets ◽

Data Mining Algorithm ◽

Web Data ◽

Data Mining Technique ◽

Web Data Mining ◽

The Web

In current days, World Wide Web has grown into a familiar medium to investigate the new information, Business trends, trading strategies so on. Several organizations and companies are also contracting the web in order to present their products or services across the world. E-commerce is a kind of business or saleable transaction that comprises the transfer of statistics across the web or internet. In this situation huge amount of data is obtained and dumped into the web services. This data overhead tends to arise difficulties in determining the accurate and valuable information, hence the web data mining is used as a tool to determine and mine the knowledge from the web. Web data mining technology can be applied by the E-commerce organizations to offer personalized E-commerce solutions and better meet the desires of customers. By using data mining algorithm such as ontology based association rule mining using apriori algorithms extracts the various useful information from the large data sets .We are implementing the above data mining technique in JAVA and data sets are dynamically generated while transaction is processing and extracting various patterns.

Download Full-text

AntWeb—Web Search Based on Ant Behavior

Emerging Technologies of Text Mining ◽

10.4018/978-1-59904-373-9.ch010 ◽

2008 ◽

pp. 208-222

Author(s):

Li Weigang ◽

Wu Man Qi

Keyword(s):

Web Mining ◽

Web Search ◽

Theory Model ◽

Web Pages ◽

Web Portal ◽

Knowledge Based ◽

Log Files ◽

Ant Behavior ◽

Shortest Route ◽

The Web

This chapter presents a study of Ant Colony Optimization (ACO) to Interlegis Web portal, Brazilian legislation Website. The approach of AntWeb is inspired by ant colonies foraging behavior to adaptively mark the most significant link by means of the shortest route to arrive the target pages. The system considers the users in the Web portal as artificial ants and the links among the pages of the Web pages as the researching network. To identify the group of the visitors, Web mining is applied to extract knowledge based on preprocessing Web log files. The chapter describes the theory, model, main utilities and implementation of AntWeb prototype in Interlegis Web portal. The case study shows Off-line Web mining; simulations without and with the use of AntWeb; testing by modification of the parameters. The result demonstrates the sensibility and accessibility of AntWeb and the benefits for the Interlegis Web users.

Download Full-text

Enhancing Web Search through Web Structure Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch084 ◽

2011 ◽

pp. 443-447

Author(s):

Ji-Rong Wen

Keyword(s):

Information Retrieval ◽

Web Search ◽

Product Information ◽

Semantic Representation ◽

Web Pages ◽

Search Performance ◽

Information Display ◽

Web Structure Mining ◽

Free Environment ◽

The Web

The Web is an open and free environment for people to publish and get information. Everyone on the Web can be either an author, a reader, or both. The language of the Web, HTML (Hypertext Markup Language), is mainly designed for information display, not for semantic representation. Therefore, current Web search engines usually treat Web pages as unstructured documents, and traditional information retrieval (IR) technologies are employed for Web page parsing, indexing, and searching. The unstructured essence of Web pages seriously blocks more accurate search and advanced applications on the Web. For example, many sites contain structured information about various products. Extracting and integrating product information from multiple Web sites could lead to powerful search functions, such as comparison shopping and business intelligence. However, these structured data are embedded in Web pages, and there are no proper traditional methods to extract and integrate them. Another example is the link structure of the Web. If used properly, information hidden in the links could be taken advantage of to effectively improve search performance and make Web search go beyond traditional information retrieval (Page, Brin, Motwani, & Winograd, 1998, Kleinberg, 1998).

Download Full-text

A Signal-Representation-Based Parser to Extract Text-Based Information from the Web

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0531 ◽

2010 ◽

Vol 14 (5) ◽

pp. 531-539

Author(s):

Mu-Chun Su ◽

◽

Shao-Jui Wang ◽

Chen-Ko Huang ◽

Pa-ChunWang ◽

...

Keyword(s):

Web Services ◽

World Wide ◽

Information Sources ◽

State Of The Art ◽

Value Added ◽

Web Pages ◽

Web Page ◽

Web Information ◽

The World ◽

The Web

Most of the dramatically increased amount of information available on the World Wide Web is provided via HTML and formatted for human browsing rather than for software programs. This situation calls for a tool that automatically extracts information from semistructured Web information sources, increasing the usefulness of value-added Web services. We present a signal-representation-based parser (SIRAP) that breaks Web pages up into logically coherent groups - groups of information related to an entity, for example. Templates for records with different tag structures are generated incrementally by a Histogram-Based Correlation Coefficient (HBCC) algorithm, then records on a Web page are detected efficiently using templates generated by matching. Hundreds of Web pages from 17 state-of-the-art search engines were used to demonstrate the feasibility of our approach.

Download Full-text

OPWUM: Opportunistic MAC Protocol Leveraging Wake-Up Receivers in WSNs

Journal of Sensors ◽

10.1155/2016/6263719 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 20

Author(s):

Fayçal Ait Aoudia ◽

Matthieu Gautier ◽

Olivier Berder

Keyword(s):

Power Consumption ◽

Analytical Study ◽

State Of The Art ◽

Mac Protocol ◽

Promising Technique ◽

Improve Energy Efficiency ◽

Current State ◽

Unreliable Links ◽

Careful Design ◽

Selection Of

Opportunistic forwarding has emerged as a promising technique to address the problem of unreliable links typical in wireless sensor networks and improve energy efficiency by exploiting multiuser diversity. Timer-based solutions, such as timer-based contention, form promising schemes to allow opportunistic next hop relay selection. However, they can incur significant idle listening and thus reduce the lifetime of the network. To tackle this problem, we propose to exploit emerging wake-up receiver technologies that have the potential to considerably reduce the power consumption of wireless communications. A careful design of MAC protocols is required to efficiently employ these new devices. In this work, we propose Opportunistic Wake-Up MAC (OPWUM), a novel multihop MAC protocol using timer-based contention. It enables the opportunistic selection of the best receiver among its neighboring nodes according to a given metric (e.g., the remaining energy), without requiring any knowledge about them. Moreover, OPWUM exploits emerging wake-up receivers to drastically reduce nodes power consumption. Through analytical study and exhaustive networks simulations, we show the effectiveness of OPWUM compared to the current state-of-the-art protocols using timer-based contention.

Download Full-text

How to Model Tendon-Driven Continuum Robots and Benchmark Modelling Performance

Frontiers in Robotics and AI ◽

10.3389/frobt.2020.630245 ◽

2021 ◽

Vol 7 ◽

Author(s):

Priyanka Rao ◽

Quentin Peyron ◽

Sven Lilge ◽

Jessica Burgner-Kahrs

Keyword(s):

Case Studies ◽

State Of The Art ◽

Computation Time ◽

Comprehensive Overview ◽

Continuum Robots ◽

Current State ◽

Selection Of ◽

Modelling Approaches

Tendon actuation is one of the most prominent actuation principles for continuum robots. To date, a wide variety of modelling approaches has been derived to describe the deformations of tendon-driven continuum robots. Motivated by the need for a comprehensive overview of existing methodologies, this work summarizes and outlines state-of-the-art modelling approaches. In particular, the most relevant models are classified based on backbone representations and kinematic as well as static assumptions. Numerical case studies are conducted to compare the performance of representative modelling approaches from the current state-of-the-art, considering varying robot parameters and scenarios. The approaches show different performances in terms of accuracy and computation time. Guidelines for the selection of the most suitable approach for given designs of tendon-driven continuum robots and applications are deduced from these results.

Download Full-text

Joint epitope selection and spacer design for string-of-beads vaccines

Bioinformatics ◽

10.1093/bioinformatics/btaa790 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i643-i650

Author(s):

Emilio Dorigatti ◽

Benjamin Schubert

Keyword(s):

Antigen Processing ◽

State Of The Art ◽

Supplementary Information ◽

Current State ◽

Fixed Set ◽

Epitope Selection ◽

High Chance ◽

Selection Of ◽

Design Steps

Abstract Motivation Conceptually, epitope-based vaccine design poses two distinct problems: (i) selecting the best epitopes to elicit the strongest possible immune response and (ii) arranging and linking them through short spacer sequences to string-of-beads vaccines, so that their recovery likelihood during antigen processing is maximized. Current state-of-the-art approaches solve this design problem sequentially. Consequently, such approaches are unable to capture the inter-dependencies between the two design steps, usually emphasizing theoretical immunogenicity over correct vaccine processing, thus resulting in vaccines with less effective immunogenicity in vivo. Results In this work, we present a computational approach based on linear programming, called JessEV, that solves both design steps simultaneously, allowing to weigh the selection of a set of epitopes that have great immunogenic potential against their assembly into a string-of-beads construct that provides a high chance of recovery. We conducted Monte Carlo cleavage simulations to show that a fixed set of epitopes often cannot be assembled adequately, whereas selecting epitopes to accommodate proper cleavage requirements substantially improves their recovery probability and thus the effective immunogenicity, pathogen and population coverage of the resulting vaccines by at least 2-fold. Availability and implementation The software and the data analyzed are available at https://github.com/SchubertLab/JessEV. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

WEB GRAPH BASED SEARCH BY USING DENSITY OF KEYWORD AND AGE FACTOR

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2013.1124 ◽

2013 ◽

pp. 89-93

Author(s):

GAURAV AGARWAL ◽

SACHI GUPTA ◽

SAURABH MUKHERJEE

Keyword(s):

Search Engine ◽

Web Search ◽

Web Pages ◽

Main Role ◽

Ranking Algorithm ◽

Web Page ◽

Web Crawler ◽

User Requirement ◽

Priority Assignment ◽

The Web

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.

Download Full-text

Joint epitope selection and spacer design for string-of-beads vaccines

10.1101/2020.04.25.060988 ◽

2020 ◽

Author(s):

Emilio Dorigatti ◽

Benjamin Schubert

Keyword(s):

Antigen Processing ◽

State Of The Art ◽

Computational Approach ◽

Population Coverage ◽

Current State ◽

Fixed Set ◽

Epitope Selection ◽

High Chance ◽

Selection Of ◽

Design Steps

AbstractMotivationConceptually, epitope-based vaccine design poses two distinct problems: (1) selecting the best epitopes eliciting the strongest possible immune response, and (2) arranging and linking the selected epitopes through short spacer sequences to string-of-beads vaccines so as to increase the recovery likelihood of each epitope during antigen processing. Current state-of-the-art approaches solve this design problem sequentially. Consequently, such approaches are unable to capture the inter-dependencies between the two design steps, usually emphasizing theoretical immunogenicity over correct vaccine processing and resulting in vaccines with less effective immunogencity.ResultsIn this work, we present a computational approach based on linear programming that solves both design steps simultaneously, allowing to weigh the selection of a set of epitopes that have great immunogenic potential against their assembly into a string-of-beads construct that provides a high chance of recovery. We conducted Monte-Carlo cleavage simulations to show that, indeed, a fixed set of epitopes often cannot be assembled adequately, whereas selecting epitopes to accommodate proper cleavage requirements substantially improves their recovery probability and thus the effective immunogenicity, pathogen, and population coverage of the resulting vaccines by at least two fold.AvailabilityThe software and the data analyzed are available at https://github.com/SchubertLab/JessEV

Download Full-text