Classification of Spamming Attacks to Blogging Websites and Their Security Techniques

Author(s):  
Rizwan Ur Rahman ◽  
Rishu Verma ◽  
Himani Bansal ◽  
Deepak Singh Tomar

With the explosive expansion of information on the world wide web, search engines are becoming more significant in the day-to-day lives of humans. Even though a search engine generally gives huge number of results for certain query, the majority of the search engine users simply view the first few web pages in result lists. Consequently, the ranking position has become a most important concern of internet service providers. This article addresses the vulnerabilities, spamming attacks, and countermeasures in blogging sites. In the first part, the article explores the spamming types and detailed section on vulnerabilities. In the next part, an attack scenario of form spamming is presented, and defense approach is presented. Consequently, the aim of this article is to provide review of vulnerabilities, threats of spamming associated with blogging websites, and effective measures to counter them.

2018 ◽  
Vol 6 (3) ◽  
pp. 67-78
Author(s):  
Tian Nie ◽  
Yi Ding ◽  
Chen Zhao ◽  
Youchao Lin ◽  
Takehito Utsuro

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.


Author(s):  
Leslie S. Hiraoka

Development of the search engine as a major information and marketing channel resulted from innovative technologies that made it capable of presenting rapid, relevant responses to queries. To do this, the search engine compiles an index of web pages of information stored on the World Wide Web, ranks each page according to its incoming links, matches keywords in the query to those in its index, and returns what it determines are the most relevant pages to the searcher. Innovative and cost-effective ad placement algorithms have attracted advertisers to search engine websites and intensified the competitive dynamics among industry leaders. Their interacting software also continues to draw advertisers from traditional, mass marketing channels like television and newspapers to the online medium to cater to customers who have expressed an interest in their products and services.


2016 ◽  
Vol 6 (2) ◽  
pp. 41-65 ◽  
Author(s):  
Sheetal A. Takale ◽  
Prakash J. Kulkarni ◽  
Sahil K. Shah

Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process: identification of relevant theme clusters, and selection of top ranking sentences to form summarized result for user query. A new approach to semantic similarity computation using semantic roles and semantic meaning is proposed. Document clustering is effectively achieved by application of MDL principle and sentence clustering and ranking is done by using SNMF. Experiments conducted demonstrate the effectiveness of system in semantic text understanding, document clustering and summarization.


Author(s):  
Wen-Chen Hu ◽  
Hung-Jen Yang ◽  
Jyh-haw Yeh ◽  
Chung-wei Lee

The World Wide Web now holds more than six billion pages covering almost all daily issues. The Web’s fast growing size and lack of structural style present a new challenge for information retrieval (Lawrence & Giles, 1999a). Traditional search techniques are based on users typing in search keywords which the search services can then use to locate the desired Web pages. However, this approach normally retrieves too many documents, of which only a small fraction are relevant to the users’ needs. Furthermore, the most relevant documents do not necessarily appear at the top of the query output list. Numerous search technologies have been applied to Web search engines; however, the dominant search methods have yet to be identified. This article provides an overview of the existing technologies for Web search engines and classifies them into six categories: i) hyperlink exploration, ii) information retrieval, iii) metasearches, iv) SQL approaches, v) content-based multimedia searches, and vi) others. At the end of this article, a comparative study of major commercial and experimental search engines is presented, and some future research directions for Web search engines are suggested. Related Web search technology review can also be found in Arasu, Cho, Garcia-Molina, Paepcke, and Raghavan (2001) and Lawrence and Giles (1999b).


2002 ◽  
pp. 145-152 ◽  
Author(s):  
Fiona Fui-Hoon Nah

The explosive expansion of the World Wide Web (WWW) is the biggest event in the Internet. Since its public introduction in 1991, the WWW has become an important channel for electronic commerce, information access, and publication. However, the long waiting time for accessing web pages has become a critical issue, especially with the popularity of multimedia technology and the exponential increase in the number of Web users. Although various technologies and techniques have been implemented to alleviate the situation and to comfort the impatient users, there is still the need to carry out fundamental research to investigate what constitutes an acceptable waiting time for a typical WWW user. This research not only evaluates Nielsen’s hypothesis of 15 seconds as the maximum waiting time of WWW users, but also provides approximate distributions of waiting time of WWW users.


2017 ◽  
Author(s):  
Xi Zhu ◽  
Xiangmiao Qiu ◽  
Dingwang Wu ◽  
Shidong Chen ◽  
Jiwen Xiong ◽  
...  

BACKGROUND All electronic health practices like app/software are involved in web search engine due to its convenience for receiving information. The success of electronic health has link with the success of web search engines in field of health. Yet information reliability from search engine results remains to be evaluated. A detail analysis can find out setbacks and bring inspiration. OBJECTIVE Find out reliability of women epilepsy related information from the searching results of main search engines in China. METHODS Six physicians conducted the search work every week. Search key words are one kind of AEDs (valproate acid/oxcarbazepine/levetiracetam/ lamotrigine) plus "huaiyun"/"renshen", both of which means pregnancy in Chinese. The search were conducted in different devices (computer/cellphone), different engines (Baidu/Sogou/360). Top ten results of every search result page were included. Two physicians classified every results into 9 categories according to their contents and also evaluated the reliability. RESULTS A total of 16411 searching results were included. 85.1% of web pages were with advertisement. 55% were categorized into question and answers according to their contents. Only 9% of the searching results are reliable, 50.7% are partly reliable, 40.3% unreliable. With the ranking of the searching results higher, advertisement up and the proportion of those unreliable increase. All contents from hospital websites are unreliable at all and all from academic publishing are reliable. CONCLUSIONS Several first principles must be emphasized to further the use of web search engines in field of healthcare. First, identification of registered physicians and development of an efficient system to guide the patients to physicians guarantee the quality of information provided. Second, corresponding department should restrict the excessive advertisement sale trades in healthcare area by specific regulations to avoid negative impact on patients. Third, information from hospital websites should be carefully judged before embracing them wholeheartedly.


Author(s):  
GAURAV AGARWAL ◽  
SACHI GUPTA ◽  
SAURABH MUKHERJEE

Today, web servers, are the key repositories of the information & internet is the source of getting this information. There is a mammoth data on the Internet. It becomes a difficult job to search out the accordant data. Search Engine plays a vital role in searching the accordant data. A search engine follows these steps: Web crawling by crawler, Indexing by Indexer and Searching by Searcher. Web crawler retrieves information of the web pages by following every link on the site. Which is stored by web search engine then the content of the web page is indexed by the indexer. The main role of indexer is how data can be catch soon as per user requirements. As the client gives a query, Search Engine searches the results corresponding to this query to provide excellent output. Here ambition is to enroot an algorithm for search engine which may response most desirable result as per user requirement. In this a ranking method is used by the search engine to rank the web pages. Various ranking approaches are discussed in literature but in this paper, ranking algorithm is proposed which is based on parent-child relationship. Proposed ranking algorithm is based on priority assignment phase of Heterogeneous Earliest Finish Time (HEFT) Algorithm which is designed for multiprocessor task scheduling. Proposed algorithm works on three on range variable its means the density of keywords, number of successors to the nodes and the age of the web page. Density shows the occurrence of the keyword on the particular web page. Numbers of successors represent the outgoing link to a single web page. Age is the freshness value of the web page. The page which is modified recently is the freshest page and having the smallest age or largest freshness value. Proposed Technique requires that the priorities of each page to be set with the downward rank values & pages are arranged in ascending/ Descending order of their rank values. Experiments show that our algorithm is valuable. After the comparison with Google we find that our Algorithm is performing better. For 70% problems our algorithm is working better than Google.


2013 ◽  
Vol 8 (3) ◽  
pp. 913-921 ◽  
Author(s):  
Noryusliza Abdullah ◽  
Rosziati Ibrahim

Semantic Web approach with the assistance of ontology is widely used to give more reliable application in retrieving information and knowledge.  It is capable to discover the World Wide Web (WWW) that is presented in natural-language text.  Based on previous research, incorporating categorization with ontology concept has proven to give better results.  However, performing hybrid of the search engine using another technique that is user profiling has a promising potency in enhancing the searching process.  Utilizing searching time and giving relevant results are the contributions of this research.  The proposed hybrid techniques integrate ontologies, categorization and user profiling concept.  In user profiling, similarity measure is adopted in making comparison between two different ontologies.  WordNet and UTHM Onto are the independent ontologies used in this process.  The preliminary experimental results have given interesting results in terms of data arrangement and time usage.


2019 ◽  
Vol 12 (2) ◽  
pp. 110-119 ◽  
Author(s):  
Jayaraman Sethuraman ◽  
Jafar A. Alzubi ◽  
Ramachandran Manikandan ◽  
Mehdi Gheisari ◽  
Ambeshwar Kumar

Background: The World Wide Web houses an abundance of information that is used every day by billions of users across the world to find relevant data. Website owners employ webmasters to ensure their pages are ranked top in search engine result pages. However, understanding how the search engine ranks a website, which comprises numerous web pages, as the top ten or twenty websites is a major challenge. Although systems have been developed to understand the ranking process, a specialized tool based approach has not been tried. Objective: This paper develops a new framework and system that process website contents to determine search engine optimization factors. Methods: To analyze the web page dynamically by assessing the web site content based on specific keywords, elimination method was used in an attempt to reveal various search engine optimization techniques. Conclusion: Our results lead to conclude that the developed system is able to perform a deeper analysis and find factors which play a role in bringing the site on the top of the list.


Author(s):  
Leslie S. Hiraoka

Development of the search engine as a major information and marketing channel resulted from innovative technologies that made it capable of presenting rapid, relevant responses to queries. To do this, the search engine compiles an index of web pages of information stored on the World Wide Web, ranks each page according to its incoming links, matches keywords in the query to those in its index, and returns what it determines are the most relevant pages to the searcher. Innovative and cost-effective ad placement algorithms have attracted advertisers to search engine websites and intensified the competitive dynamics among industry leaders. Their interacting software also continues to draw advertisers from traditional, mass marketing channels like television and newspapers to the online medium to cater to customers who have expressed an interest in their products and services.


Sign in / Sign up

Export Citation Format

Share Document