Semi-Automatic Online Tagging with K-Medoid Clustering

Author(s):  
He Hu ◽  
Xiaoyong Du

Online tagging is crucial for the acquisition and organization of web knowledge. We present TYG (Tag-as-You-Go) in this paper, a web browser extension for online tagging of personal knowledge on standard web pages. We investigate an approach to combine a K-Medoid-style clustering algorithm with the user input to achieve semi-automatic web page annotation. The annotation process supports user-defined tagging schema and comprises an automatic mechanism that is built upon clustering techniques, which can automatically group similar HTML DOM nodes into clusters corresponding to the user specification. TYG is a prototype system illustrating the proposed approach. Experiments with TYG show that our approach can achieve both efficiency and effectiveness in real world annotation scenarios.

Mobile precise internet web sites dissent drastically from their computer laptop equivalents in cloth, format and functionality. Sooner or later, present techniques to sight detrimental net internet internet sites rectangular movement now not probably to determine for such webpages. During this paper, we often typically have a propensity to format and exercising paintings over, a mechanism that distinguishes amongst terrible and benign mobile net net web sites. Activity over makes this energy of will supported normal picks of a net internet web page beginning with the quantity of iframes to the life of identified dishonourable cellular mobile cellphone numbers. First, we have a tendency to via attempting out show the requirement for mobile information strategies so installation a spread of new regular options that very correlate with cellular malicious pages. We will be predisposed to then use work over to a dataset of over 350,000 famous benign similarly to volatile cellular webpages and show 90th accuracy in splendor. In addition, we frequently normally normally have a tendency to discover, end up aware of and furthermore document choice of websites incomprehensible through Google Safe Surfing and furthermore Virus Total, however decided through art work over. Lastly, we will be inclined to growth a web browser extension victimization undertaking over to comfortable customers from damaging mobile internet web sites in length. In doing consequently, we provide the number one everyday assessment technique to view volatile cellular webpages


2014 ◽  
Vol 519-520 ◽  
pp. 373-376
Author(s):  
Yi Tang ◽  
Zhao Kai Luo ◽  
Ji Zhang

A web page often contains objects that the hosted web server intends a browser to render. Rendering those objects can instruct network requests to foreign origins. Although the same origin policy (SOP) limits the access for foreign objects, web attackers could circumvent the SOP controls through injected unintended objects for sensitive data smuggling. In this paper, we propose UOFilter, a whitelist-based method to filter out unintended objects in web pages. We define a list item structure to describe intended objects with optional integrity guarantees. The UOFilter in a web browser interprets the items and blocks the network requests issued by those unintended objects. We implement a proof of concept UOFilter prototype as a chrome browser extension and validate it with experiments.


Nowadays the usage of mobile phones is widely spread in our lifestyle; we use cell phones as a camera, a radio, a music player, and even as a web browser. Since most web pages are created for desktop computers, navigating through web pages is highly fatigued. Hence, there is a great interest in computer science to adopt such pages with rich content into small screens of our mobile devices. On the other hand, every web page has got many different parts that do not have the equal importance to the end user. Consequently, the authors propose a mechanism to identify the most useful part of a web page to a user regarding his or her search query while the information loss is avoided. The challenge here comes from the fact that long web contents cannot be easily displayed in both vertical and horizontal ways.


Author(s):  
Ben Choi

Web mining aims for searching, organizing, and extracting information on the Web and search engines focus on searching. The next stage of Web mining is the organization of Web contents, which will then facilitate the extraction of useful information from the Web. This chapter will focus on organizing Web contents. Since a majority of Web contents are stored in the form of Web pages, this chapter will focus on techniques for automatically organizing Web pages into categories. Various artificial intelligence techniques have been used; however the most successful ones are classification and clustering. This chapter will focus on clustering. Clustering is well suited for Web mining by automatically organizing Web pages into categories each of which contain Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain, until now there is no such a method suitable for Web page clustering. To address this problem, this chapter describes a method to discover a constant factor that characterizes the Web domain and proposes a new method for automatically determining the number of clusters in Web page datasets. This chapter also proposes a new bi-directional hierarchical clustering algorithm, which arranges individual Web pages into clusters and then arranges the clusters into larger clusters and so on until the average inter-cluster similarity approaches the constant factor. Having the constant factor together with the algorithm, this chapter provides a new clustering system suitable for mining the Web.


2007 ◽  
Vol 2007 ◽  
pp. 1-5 ◽  
Author(s):  
Michael Bensch ◽  
Ahmed A. Karim ◽  
Jürgen Mellinger ◽  
Thilo Hinterberger ◽  
Michael Tangermann ◽  
...  

We have previously demonstrated that an EEG-controlled web browser based on self-regulation of slow cortical potentials (SCPs) enables severely paralyzed patients to browse the internet independently of any voluntary muscle control. However, this system had several shortcomings, among them that patients could only browse within a limited number of web pages and had to select links from an alphabetical list, causing problems if the link names were identical or if they were unknown to the user (as in graphical links). Here we describe a new EEG-controlled web browser, called Nessi, which overcomes these shortcomings. In Nessi, the open source browser, Mozilla, was extended by graphical in-place markers, whereby different brain responses correspond to different frame colors placed around selectable items, enabling the user to select any link on a web page. Besides links, other interactive elements are accessible to the user, such as e-mail and virtual keyboards, opening up a wide range of hypertext-based applications.


2021 ◽  
Vol 1 (2) ◽  
pp. 319-339
Author(s):  
Jean Rosemond Dora ◽  
Karol Nemoga

In this work, we tackle a frequent problem that frequently occurs in the cybersecurity field which is the exploitation of websites by XSS attacks, which are nowadays considered a complicated attack. These types of attacks aim to execute malicious scripts in a web browser of the client by including code in a legitimate web page. A serious matter is when a website accepts the “user-input” option. Attackers can exploit the web application (if vulnerable), and then steal sensitive data (session cookies, passwords, credit cards, etc.) from the server and/or from the client. However, the difficulty of the exploitation varies from website to website. Our focus is on the usage of ontology in cybersecurity against XSS attacks, on the importance of the ontology, and its core meaning for cybersecurity. We explain how a vulnerable website can be exploited, and how different JavaScript payloads can be used to detect vulnerabilities. We also enumerate some tools to use for an efficient analysis. We present detailed reasoning on what can be done to improve the security of a website in order to resist attacks, and we provide supportive examples. Then, we apply an ontology model against XSS attacks to strengthen the protection of a web application. However, we note that the existence of ontology does not improve the security itself, but it has to be properly used and should require a maximum of security layers to be taken into account.


2020 ◽  
Author(s):  
Muralidhar Pantula ◽  
K S Kuppusamy

Abstract Evaluating readability of web documents has gained attention due to several factors such as improving the effectiveness of writing and to reach a wider spectrum of audience. Current practices in this direction follow several statistical measures in evaluating readability of the document. In this paper, we have proposed a machine learning-based model to compute readability of web pages. The minimum educational standards required (grade level) to understand the contents of a web page are also computed. The proposed model classifies the web pages into highly readable, readable or less readable using specified feature set. To classify a web page with the aforementioned categories, we have incorporated the features such as sentence count, word count, syllable count, type-token ratio and lexical ambiguity. To increase the usability of the proposed model, we have developed an accessible browser extension to perform the assessments of every web page loaded into the browser.


Author(s):  
Shashank Gupta ◽  
B. B. Gupta

Cross-Site Scripting (XSS) attack is a vulnerability on the client-side browser that is caused by the improper sanitization of the user input embedded in the Web pages. Researchers in the past had proposed various types of defensive strategies, vulnerability scanners, etc., but still XSS flaws remains in the Web applications due to inadequate understanding and implementation of various defensive tools and strategies. Therefore, in this chapter, the authors propose a security model called Browser Dependent XSS Sanitizer (BDS) on the client-side Web browser for eliminating the effect of XSS vulnerability. Various earlier client-side solutions degrade the performance on the Web browser side. But in this chapter, the authors use a three-step approach to bypass the XSS attack without degrading much of the user's Web browsing experience. While auditing the experiments, this approach is capable of preventing the XSS attacks on various modern Web browsers.


Author(s):  
Shashank Gupta ◽  
B. B. Gupta

Cross-Site Scripting (XSS) attack is a vulnerability on the client-side browser that is caused by the improper sanitization of the user input embedded in the Web pages. Researchers in the past had proposed various types of defensive strategies, vulnerability scanners, etc., but still XSS flaws remains in the Web applications due to inadequate understanding and implementation of various defensive tools and strategies. Therefore, in this chapter, the authors propose a security model called Browser Dependent XSS Sanitizer (BDS) on the client-side Web browser for eliminating the effect of XSS vulnerability. Various earlier client-side solutions degrade the performance on the Web browser side. But in this chapter, the authors use a three-step approach to bypass the XSS attack without degrading much of the user's Web browsing experience. While auditing the experiments, this approach is capable of preventing the XSS attacks on various modern Web browsers.


Phishing attack is used for identity theft with the help of social engineering and some sophisticated attacks. To attract the user by clicking a URL and is trapped to a phishing Web page. Security for user’s credentials is one of most important factor for organizations nowadays. It can be achieved through several ways like education and training. Through training and education the level of awareness will be increased also it helps to mitigate phishing. Approach with several steps is introduced in this paper, where a user must take a look or take these precautionary measures if the user is browsing any Web browser. We found it possible to detect Phishing Web pages without anti Phishing solutions. This approach contains several steps to examine whether the Web page is a real Web page or a fake Webpage. All these steps will check the phishing features exist in that Web page or not. For evaluation of our approach we analyzed the data set of Phish Tank, this data set is full of Phishing Web Pages. The purpose of evaluation is to check the features discussed in our approach to aware the user. From the following result it is resulted that the user can detect without using any Anti Phishing solution just by taking some steps to check the Web page for certain features.


Sign in / Sign up

Export Citation Format

Share Document