Performance Analysis of Elastic Search Technique in Identification and Removalof Duplicate Data

Elastic search is a way to organize the data and make it easily accessible. It is a server based search on Lucene. It is a highly scalable, distributed and full-text search engine. Elastic search is developed in Java. It is published as open source under the terms of the Apache License. Elastic search is the most popular enterprise search engine. Elastic search includes all advances in speed, security, scalability, and hardware efficiency. Elastic search is a tool for querying written words. It can perform some other smart tasks, but its principal is returning text similar to a given query and statistical analyses of a quantity of text. Elasticsearch is a standalone database server, which is written in Java and using HTTP/JSON protocol,it’s takes data and optimized the data according to language based searches and stores it in a sophisticated format. Elastic search is very convenient, supporting clustering and leader selection out of the box. Whether it’s searching a database of trade products by description, finding similar text in a body of crawled web pages. In this manuscript elastic search capability of copied data identification and its removing techniques performance are analyzed

Download Full-text

A User-Aware and Semantic Approach for Enterprise Search

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch016 ◽

2020 ◽

pp. 302-321

Author(s):

Giacomo Cabri ◽

Riccardo Martoglia

Keyword(s):

Search Engine ◽

Search Engines ◽

User Profile ◽

Web Pages ◽

Enterprise Information ◽

Semantic Approach ◽

Enterprise Search ◽

Semantic Techniques ◽

Search For Information ◽

Search And Retrieval

This article describes how in addition to general purposes search engines, specialized search engines have appeared and have gained their part of the market. An enterprise search engine enables the search inside the enterprise information, mainly web pages but also other kinds of documents; the search is performed by people inside the enterprise or by customers. This article proposes an enterprise search engine called AMBIT1-SE that relies on two enhancements: first, it is user-aware in the sense that it takes into consideration the profile of the users that perform the query; second, it exploits semantic techniques to consider not only exact matches but also synonyms and related terms. It performs two main activities: (1) information processing to analyse the documents and build the user profile and (2) search and retrieval to search for information that matches user's query and profile. An experimental evaluation of the proposed approach is performed on different real websites, showing its benefits over other well-established approaches.

Download Full-text

A User-Aware and Semantic Approach for Enterprise Search

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2018100107 ◽

2018 ◽

Vol 14 (4) ◽

pp. 129-146

Author(s):

Giacomo Cabri ◽

Riccardo Martoglia

Keyword(s):

Search Engine ◽

Search Engines ◽

User Profile ◽

Web Pages ◽

Enterprise Information ◽

Semantic Approach ◽

Enterprise Search ◽

Semantic Techniques ◽

Search For Information ◽

Search And Retrieval

Download Full-text

Digital hajj: the pilgrimage to Mecca in Muslim cyberspace and the issue of religious online authority

Scripta Instituti Donneriani Aboensis ◽

10.30674/scripta.67440 ◽

2013 ◽

Vol 25 ◽

pp. 189-203 ◽

Cited By ~ 1

Author(s):

Dominik Schlosser

Keyword(s):

Search Engine ◽

The Internet ◽

Web Pages ◽

Religious Authority ◽

Liminal Space ◽

Online Presence ◽

Optimisation Techniques

This paper attempts to give an overview of the different representations of the pilgrimage to Mecca found in the ‘liminal space’ of the internet. For that purpose, it examines a handful of emblematic examples of how the hajj is being presented and discussed in cyberspace. Thereby, special attention shall be paid to the question of how far issues of religious authority are manifest on these websites, whether the content providers of web pages appoint themselves as authorities by scrutinizing established views of the fifth pillar of Islam, or if they upload already printed texts onto their sites in order to reiterate normative notions of the pilgrimage to Mecca, or of they make use of search engine optimisation techniques, thus heightening the very visibility of their online presence and increasing the possibility of becoming authoritative in shaping internet surfers’ perceptions of the hajj.

Download Full-text

Implementing an open source spatio-temporal search platform for Spatial Data Infrastructures

10.7287/peerj.preprints.2238v2 ◽

2016 ◽

Author(s):

Paolo Corti ◽

Benjamin G Lewis ◽

Tom Kralidis ◽

Jude Mwenda

Keyword(s):

Open Source ◽

Search Engine ◽

Language Processing ◽

Spatial Data ◽

Search Engines ◽

Spatial Information ◽

Text Search ◽

Advanced Search ◽

Data Infrastructures ◽

Spatio Temporal

A Spatial Database Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide the most efficient and flexible way to use spatial information. One of the key software component of a SDI is the catalogue service, needed to discover, query and manage the metadata. Catalogue services in a SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard, that defines common interfaces to access the metadata information. A search engine is a software system able to perform very fast and reliable search, with features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting and many others. The Centre of Geographic Analysis (CGA) at Harvard University is trying to integrate within its public domain SDI (named WorldMap), the benefits of both worlds (OGC catalogs and search engines). Harvard Hypermap (HHypermap) is a component that will be part of WorldMap, totally built on an open source stack, implementing an OGC catalog, based on pycsw, to provide access to metadata in a standard way, and a search engine, based on Solr/Lucene, to provide the advanced search features typically found in search engines.

Download Full-text

Application of Full Text Search Engine Based on Lucene

Advances in Internet of Things ◽

10.4236/ait.2012.24013 ◽

2012 ◽

Vol 02 (04) ◽

pp. 106-109 ◽

Cited By ~ 6

Author(s):

Rujia Gao ◽

Danying Li ◽

Wanlong Li ◽

Yaze Dong

Keyword(s):

Search Engine ◽

Full Text ◽

Text Search ◽

Full Text Search

Download Full-text

Classification of Spamming Attacks to Blogging Websites and Their Security Techniques

Encyclopedia of Criminal Activities and the Deep Web ◽

10.4018/978-1-5225-9715-5.ch058 ◽

2020 ◽

pp. 864-880 ◽

Cited By ~ 1

Author(s):

Rizwan Ur Rahman ◽

Rishu Verma ◽

Himani Bansal ◽

Deepak Singh Tomar

Keyword(s):

Search Engine ◽

World Wide ◽

Web Search ◽

Service Providers ◽

Web Pages ◽

Internet Service ◽

Important Concern ◽

Attack Scenario ◽

Explosive Expansion

With the explosive expansion of information on the world wide web, search engines are becoming more significant in the day-to-day lives of humans. Even though a search engine generally gives huge number of results for certain query, the majority of the search engine users simply view the first few web pages in result lists. Consequently, the ranking position has become a most important concern of internet service providers. This article addresses the vulnerabilities, spamming attacks, and countermeasures in blogging sites. In the first part, the article explores the spamming types and detailed section on vulnerabilities. In the next part, an attack scenario of form spamming is presented, and defense approach is presented. Consequently, the aim of this article is to provide review of vulnerabilities, threats of spamming associated with blogging websites, and effective measures to counter them.

Download Full-text

Review of Link Structure Based Ranking Algorithms and Hanging Pages

Handbook of Research on Modern Cryptographic Solutions for Computer and Cyber Security - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-0105-3.ch018 ◽

2016 ◽

pp. 420-459 ◽

Cited By ~ 1

Author(s):

Ravi P. Kumar ◽

Ashutosh K. Singh ◽

Anand Mohan

Keyword(s):

Search Engine ◽

Web Sites ◽

Cyber Security ◽

Web Security ◽

Optimization Techniques ◽

Web Pages ◽

Link Structure ◽

Search Engine Optimization ◽

Ranking Algorithms ◽

The Web

In this era of Web computing, Cyber Security is very important as more and more data is moving into the Web. Some data are confidential and important. There are many threats for the data in the Web. Some of the basic threats can be addressed by designing the Web sites properly using Search Engine Optimization techniques. One such threat is the hanging page which gives room for link spamming. This chapter addresses the issues caused by hanging pages in Web computing. This Chapter has four important objectives. They are 1) Compare and review the different types of link structure based ranking algorithms in ranking Web pages. PageRank is used as the base algorithm throughout this Chapter. 2) Study on hanging pages, explore the effects of hanging pages in Web security and compare the existing methods to handle hanging pages. 3) Study on Link spam and explore the effect of hanging pages in link spam contribution and 4) Study on Search Engine Optimization (SEO) / Web Site Optimization (WSO) and explore the effect of hanging pages in Search Engine Optimization (SEO).

Download Full-text

A Review on Semantic Text and Multimedia Retrieval and Recent Trends

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2015010104 ◽

2015 ◽

Vol 6 (1) ◽

pp. 54-74

Author(s):

Oğuzhan Menemencioğlu ◽

İlhami Muharrem Orak

Keyword(s):

Semantic Web ◽

Search Engine ◽

Search Engines ◽

Semantic Search ◽

Multimedia Retrieval ◽

Web Pages ◽

The Face ◽

Recent Trends ◽

New Applications ◽

Machine Readable

Semantic web works on producing machine readable data and aims to deal with large amount of data. The most important tool to access the data which exist in web is the search engine. Traditional search engines are insufficient in the face of the amount of data that consists in the existing web pages. Semantic search engines are extensions to traditional engines and overcome the difficulties faced by them. This paper summarizes semantic web, concept of traditional and semantic search engines and infrastructure. Also semantic search approaches are detailed. A summary of the literature is provided by touching on the trends. In this respect, type of applications and the areas worked for are considered. Based on the data for two different years, trend on these points are analyzed and impacts of changes are discussed. It shows that evaluation on the semantic web continues and new applications and areas are also emerging. Multimedia retrieval is a newly scope of semantic. Hence, multimedia retrieval approaches are discussed. Text and multimedia retrieval is analyzed within semantic search.

Download Full-text

An Intelligent Web Search Using Multi-Document Summarization

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2016040103 ◽

2016 ◽

Vol 6 (2) ◽

pp. 41-65 ◽

Cited By ~ 2

Author(s):

Sheetal A. Takale ◽

Prakash J. Kulkarni ◽

Sahil K. Shah

Keyword(s):

Search Engine ◽

Web Search ◽

Document Clustering ◽

The Internet ◽

Web Pages ◽

Extractive Summarization ◽

Text Understanding ◽

User Query ◽

Sentence Clustering

Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process: identification of relevant theme clusters, and selection of top ranking sentences to form summarized result for user query. A new approach to semantic similarity computation using semantic roles and semantic meaning is proposed. Document clustering is effectively achieved by application of MDL principle and sentence clustering and ranking is done by using SNMF. Experiments conducted demonstrate the effectiveness of system in semantic text understanding, document clustering and summarization.

Download Full-text

Critical Analysis of Major Search Engines

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8239 ◽

2019 ◽

Vol 16 (9) ◽

pp. 3712-3716

Author(s):

Kailash Kumar ◽

Abdulaziz Al-Besher

Keyword(s):

Search Engine ◽

Search Engines ◽

Critical Analysis ◽

Research Paper ◽

Web Pages ◽

Ranking Algorithm ◽