Search Integration with WebSphere Portal

Modern web applications and servers like Portal require adequate support for integration of search services due to user focused information delivery and user interaction, as well as new technologies used to render such information, which is exemplified by two fundamental problems that have long plagued web crawlers: dynamic content and Javascript generated content. Today, the solution is simple: ignore such web pages. To enable “search” in Portals, a different “crawling” paradigm is required to search engines to gather and consume information. WebSphere Portal provides a framework that propagates content and information through “Seedlists”—comparable to HTML based sitemaps but richer in terms of features. This mandates that information and content delivering applications must be “search engine aware”, requiring them to enable services and seedlists for fast, efficient and complete delivery of content and information. This is the main integration point for search engines into the portal for Portal site search services for a rich and user focused search experience. This article discusses how such technologies can allow for more efficient crawling of public Portal sites by prominent Internet search engines as well as myths surrounding search engine optimization.

Download Full-text

A Review on Semantic Text and Multimedia Retrieval and Recent Trends

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2015010104 ◽

2015 ◽

Vol 6 (1) ◽

pp. 54-74

Author(s):

Oğuzhan Menemencioğlu ◽

İlhami Muharrem Orak

Keyword(s):

Semantic Web ◽

Search Engine ◽

Search Engines ◽

Semantic Search ◽

Multimedia Retrieval ◽

Web Pages ◽

The Face ◽

Recent Trends ◽

New Applications ◽

Machine Readable

Semantic web works on producing machine readable data and aims to deal with large amount of data. The most important tool to access the data which exist in web is the search engine. Traditional search engines are insufficient in the face of the amount of data that consists in the existing web pages. Semantic search engines are extensions to traditional engines and overcome the difficulties faced by them. This paper summarizes semantic web, concept of traditional and semantic search engines and infrastructure. Also semantic search approaches are detailed. A summary of the literature is provided by touching on the trends. In this respect, type of applications and the areas worked for are considered. Based on the data for two different years, trend on these points are analyzed and impacts of changes are discussed. It shows that evaluation on the semantic web continues and new applications and areas are also emerging. Multimedia retrieval is a newly scope of semantic. Hence, multimedia retrieval approaches are discussed. Text and multimedia retrieval is analyzed within semantic search.

Download Full-text

Critical Analysis of Major Search Engines

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8239 ◽

2019 ◽

Vol 16 (9) ◽

pp. 3712-3716

Author(s):

Kailash Kumar ◽

Abdulaziz Al-Besher

Keyword(s):

Search Engine ◽

Search Engines ◽

Critical Analysis ◽

Research Paper ◽

Web Pages ◽

Ranking Algorithm ◽

Handling Complex Queries Using Query Trees

10.36227/techrxiv.14845212 ◽

2021 ◽

Author(s):

Srihari Vemuru ◽

Eric John ◽

Shrisha Rao

Keyword(s):

Search Engine ◽

Search Engines ◽

Knowledge Bases ◽

Web Pages ◽

Complex Query ◽

Complex Queries ◽

Tree Generation ◽

Query Tree ◽

Final Answer ◽

Simple Query

Humans can easily parse and find answers to complex queries such as "What was the capital of the country of the discoverer of the element which has atomic number 1?" by breaking them up into small pieces, querying these appropriately, and assembling a final answer. However, contemporary search engines lack such capability and fail to handle even slightly complex queries. Search engines process queries by identifying keywords and searching against them in knowledge bases or indexed web pages. The results are, therefore, dependent on the keywords and how well the search engine handles them. In our work, we propose a three-step approach called parsing, tree generation, and querying (PTGQ) for effective searching of larger and more expressive queries of potentially unbounded complexity. PTGQ parses a complex query and constructs a query tree where each node represents a simple query. It then processes the complex query by recursively querying a back-end search engine, going over the corresponding query tree in postorder. Using PTGQ makes sure that the search engine always handles a simpler query containing very few keywords. Results demonstrate that PTGQ can handle queries of much higher complexity than standalone search engines.

Download Full-text

Reliability of women epilepsy related information from main web search engines in China?deceitful web search environment and illumination (Preprint)

10.2196/preprints.7724 ◽

2017 ◽

Author(s):

Xi Zhu ◽

Xiangmiao Qiu ◽

Dingwang Wu ◽

Shidong Chen ◽

Jiwen Xiong ◽

...

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Negative Impact ◽

Academic Publishing ◽

Web Pages ◽

Efficient System ◽

Related Information ◽

Web Search Engines ◽

Electronic Health

BACKGROUND All electronic health practices like app/software are involved in web search engine due to its convenience for receiving information. The success of electronic health has link with the success of web search engines in field of health. Yet information reliability from search engine results remains to be evaluated. A detail analysis can find out setbacks and bring inspiration. OBJECTIVE Find out reliability of women epilepsy related information from the searching results of main search engines in China. METHODS Six physicians conducted the search work every week. Search key words are one kind of AEDs (valproate acid/oxcarbazepine/levetiracetam/ lamotrigine) plus "huaiyun"/"renshen", both of which means pregnancy in Chinese. The search were conducted in different devices (computer/cellphone), different engines (Baidu/Sogou/360). Top ten results of every search result page were included. Two physicians classified every results into 9 categories according to their contents and also evaluated the reliability. RESULTS A total of 16411 searching results were included. 85.1% of web pages were with advertisement. 55% were categorized into question and answers according to their contents. Only 9% of the searching results are reliable, 50.7% are partly reliable, 40.3% unreliable. With the ranking of the searching results higher, advertisement up and the proportion of those unreliable increase. All contents from hospital websites are unreliable at all and all from academic publishing are reliable. CONCLUSIONS Several first principles must be emphasized to further the use of web search engines in field of healthcare. First, identification of registered physicians and development of an efficient system to guide the patients to physicians guarantee the quality of information provided. Second, corresponding department should restrict the excessive advertisement sale trades in healthcare area by specific regulations to avoid negative impact on patients. Third, information from hospital websites should be carefully judged before embracing them wholeheartedly.

Download Full-text

An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12924 ◽

2018 ◽

Vol 7 (3) ◽

pp. 1119

Author(s):

Jyoti Mor ◽

Dr Dinesh Rai ◽

Dr Naresh Kumar

Keyword(s):

Search Engines ◽

Quality Data ◽

Web Pages ◽

Large Collection ◽

Web Crawler ◽

Network Resources ◽

High Quality Data ◽

Remote Server ◽

Web Crawlers ◽

Shared Network

In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engine’s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XML based approach. This focused crawling based architecture is trained to download only the high quality data from the internet leaving behind the web pages which are not relevant to the desired domain. Here, a detailed layout of the proposed system is described which is capable of reducing the load on network and reducing the problem arise in residency of mobile agent at the remote server.

Download Full-text

Discovering How Students Search a Library Web Site: A Usability Case Study

College & Research Libraries ◽

10.5860/crl.63.4.354 ◽

2002 ◽

Vol 63 (4) ◽

pp. 354-365 ◽

Cited By ~ 43

Author(s):

Susan Augustine ◽

Courtney Greene

Keyword(s):

Search Engine ◽

Long Range ◽

Web Sites ◽

Search Engines ◽

Web Pages ◽

Usability Study ◽

Web Page ◽

The Past ◽

Library Resources

Have Internet search engines influenced the way students search library Web pages? The results of this usability study reveal that students consistently and frequently use the library Web site’s internal search engine to find information rather than navigating through pages. If students are searching rather than navigating, library Web page designers must make metadata and powerful search engines priorities. The study also shows that students have difficulty interpreting library terminology, experience confusion discerning difference amongst library resources, and prefer to seek human assistance when encountering problems online. These findings imply that library Web sites have not alleviated some of the basic and long-range problems that have challenged librarians in the past.

Download Full-text

Analysis of Web Pages Based the Changed Information and its’ Application in the Search Engine for one Web Site

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.2311 ◽

2013 ◽

Vol 303-306 ◽

pp. 2311-2316

Author(s):

Hong Shen Liu ◽

Peng Fei Wang

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Site ◽

New Method ◽

Web Pages ◽

Web Crawler ◽

The Core ◽

Core Technology ◽

The Web

The structures and contents of researching search engines are presented and the core technology is the analysis technology of web pages. The characteristic of analyzing web pages in one website is studied, relations between the web pages web crawler gained at two times are able to be obtained and the changed information among them are found easily. A new method of analyzing web pages in one website is introduced and the method analyzes web pages with the changed information of web pages. The result of applying the method shows that the new method is effective in the analysis of web pages.

Download Full-text

Getting Started Creating Data Dictionaries: How to Create a Shareable Dataset

10.31219/osf.io/vd4y3 ◽

2019 ◽

Author(s):

Erin Michelle Buchanan ◽

Sarah E Crain ◽

Ari L. Cunningham ◽

Hannah Rose Johnson ◽

Hannah Elyse Stash ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Search Engine ◽

Search Engines ◽

Web Applications ◽

Data Dictionary ◽

Entire Process ◽

Source Data

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand its contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a dataset. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search engine indexing to reach a broader audience of interested parties. This tutorial first explains the terminology and standards surrounding data dictionaries and codebooks. We then present a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared dataset accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we explain how to use freely available web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable (FAIR; Wilkinson et al., 2016).

Download Full-text

A Method for Automated User Interaction Testing of Web Applications

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research.v3.n12.315 ◽

2015 ◽

pp. 28

Author(s):

Le Khanh Trinh ◽

Vo Dinh Hieu ◽

Pham Ngoc Hung

Keyword(s):

Web Application ◽

Web Applications ◽

User Interaction ◽

Finite State Automaton ◽

Web Pages ◽

User Interactions ◽

Compositional Model ◽

Finite State ◽

Interaction Testing ◽

The Web

Automated user interaction testing of Web applications has been received great attentions from the research community and industry. Currently, several available tools are proposed to partly deal withthe problem. However, how to perform the automated user interaction testing of whole Web applications effectively is still an open problem. This research proposes a method and develops a tool supporting automated user interaction testing of whole Web applications. In this method, the model of each Web page of the Web application under testing which describes the user interaction (UI) is represented by a finite state automaton. The whole model that describes the behaviors of the whole Web application then is constructed by composing the models of all Web pages. After that, test paths are generated automatically based on the compositional model of the Web application so that these test paths cover all possible user interactions of the application. A tool supporting the proposed method has been developed and applied to test on some simple Web applications. The experimental results show the potential application of this tool for automated user interaction testing of Webapplications in practice

Download Full-text