An Empirical Study of Web Page Structural Properties

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2044 ◽

2021 ◽

Author(s):

Xavier Chamberland-Thibeault ◽

Sylvain Hallé

Keyword(s):

Empirical Study ◽

Structural Properties ◽

Maximum Degree ◽

Large Scale ◽

Empirical Evaluation ◽

The Internet ◽

Web Pages ◽

Web Page ◽

Internet Archive ◽

Large Scale Survey

The paper reports results on an empirical study of the structural properties of HTML markup in websites. A first large-scale survey is made on 708 contemporary (2019–2020) websites, in order to measure various features related to their size and structure: DOM tree size, maximum degree, depth, diversity of element types and CSS classes, among others. The second part of the study leverages archived pages from the Internet Archive, in order to retrace the evolution of these features over a span of 25 years. The goal of this research is to serve as a reference point for studies that include an empirical evaluation on samples of web pages.

Download Full-text

A Simulation of the Structure of the World-Wide Web

Sociological Research Online ◽

10.5153/sro.684 ◽

2002 ◽

Vol 7 (1) ◽

pp. 9-25 ◽

Cited By ~ 2

Author(s):

Moses Boudourides ◽

Gerasimos Antypas

Keyword(s):

World Wide Web ◽

Power Law ◽

Web Sites ◽

World Wide ◽

The Internet ◽

Web Pages ◽

Small Worlds ◽

Web Page ◽

Simple Simulation ◽

The World

In this paper we are presenting a simple simulation of the Internet World-Wide Web, where one observes the appearance of web pages belonging to different web sites, covering a number of different thematic topics and possessing links to other web pages. The goal of our simulation is to reproduce the form of the observed World-Wide Web and of its growth, using a small number of simple assumptions. In our simulation, existing web pages may generate new ones as follows: First, each web page is equipped with a topic concerning its contents. Second, links between web pages are established according to common topics. Next, new web pages may be randomly generated and subsequently they might be equipped with a topic and be assigned to web sites. By repeated iterations of these rules, our simulation appears to exhibit the observed structure of the World-Wide Web and, in particular, a power law type of growth. In order to visualise the network of web pages, we have followed N. Gilbert's (1997) methodology of scientometric simulation, assuming that web pages can be represented by points in the plane. Furthermore, the simulated graph is found to possess the property of small worlds, as it is the case with a large number of other complex networks.

Download Full-text

Alaska's Embrace of Digital Opportunities

Strategies for Generating E-Business Returns on Investment ◽

10.4018/978-1-59140-417-0.ch008 ◽

2005 ◽

pp. 163-186

Author(s):

Ping Lan ◽

David C. Yen

Keyword(s):

Large Scale ◽

Government Expenditure ◽

The Internet ◽

Digital Revolution ◽

Potential Development ◽

Information And Communication ◽

Development Goals ◽

Large Scale Survey ◽

Shed Light ◽

The U.S

There have been a very limited number of systematic studies of how a region is turning digital opportunities into a development force. In theory, major advances in information and communication technology (ICT) have successfully transformed traditional businesses and markets, revolutionized learning and knowledge-sharing, generated global information flows, and empowered citizens and communities in new ways to redefine governance (Afuah, 2003; Mullaney et al., 2003). At a regional level, this “digital revolution” could offer enormous opportunities to support sustainable local prosperity, and thus help to achieve the broader development goals (DOT Force, 2001). Alaska is one state that can be positioned to take advantage of Internet and e-commerce technologies. Isolated from the U.S. main economic centers and heavily reliant on the export of commodities in its economy, e-commerce or business via the Internet is an ideal choice for Alaska. However, the available statistics do not support this claim. Most economic indicators show a downward trend in Alaska since 1995, in spite of the fact that the federal government expenditure has been increasing (ASTF, 2002). This chapter is dedicated to measuring the usage of the Internet in Alaska. It hypothesizes that geographical limitations help a region like Alaska embrace ICT and its applications without much hesitation, but also hinders the region to fully exploit the potential of ICT due to the limitations of resources. A large-scale survey was conducted to reveal the characteristics of Internet usage among individuals, government agencies, local communities, and private firms in Alaska. This research is of interest in two aspects: It could offer help for policymakers and enterprises within Alaska to realize the potential development brought about by the current digital revolution, and it could help enterprises outside Alaska to target this market more effectively. Theoretically, it could shed light on issues related to technology adoption and local innovation. Besides that, the platform-dependent approach used in this research can be applied in a broader context.

Download Full-text

Web Authoring

Web Portfolio Design and Applications ◽

10.4018/978-1-59140-854-3.ch007 ◽

2011 ◽

pp. 122-156

Author(s):

John DiMarco

Keyword(s):

Web Applications ◽

The Internet ◽

Web Pages ◽

Web Development ◽

Web Page ◽

Web Based ◽

Content Creation ◽

Microsoft Word ◽

Software Skills ◽

The Web

Web authoring is the process of developing Web pages. The Web development process requires you to use software to create functional pages that will work on the Internet. Adding Web functionality is creating specific components within a Web page that do something. Adding links, rollover graphics, and interactive multimedia items to a Web page creates are examples of enhanced functionality. This chapter demonstrates Web based authoring techniques using Macromedia Dreamweaver. The focus is on adding Web functions to pages generated from Macromedia Fireworks and to overview creating Web pages from scratch using Dreamweaver. Dreamweaver and Fireworks are professional Web applications. Using professional Web software will benefit you tremendously. There are other ways to create Web pages using applications not specifically made to create Web pages. These applications include Microsoft Word and Microsoft PowerPoint. The use of Microsoft applications for Web page development is not covered in this chapter. However, I do provide steps on how to use these applications for Web page authoring within the appendix of this text. If you feel that you are more comfortable using the Microsoft applications or the Macromedia applications simply aren’t available to you yet, follow the same process for Web page conceptualization and content creation and use the programs available to you. You should try to get Web page development skills using Macromedia Dreamweaver because it helps you expand your software skills outside of basic office applications. The ability to create a Web page using professional Web development software is important to building a high-end computer skills set. The main objectives of this chapter are to get you involved in some technical processes that you’ll need to create the Web portfolio. Focus will be on guiding you through opening your sliced pages, adding links, using tables, creating pop up windows for content and using layers and timelines for dynamic HTML. The coverage will not try to provide a complete tutorial set for Macromedia Dreamweaver, but will highlight essential techniques. Along the way you will get pieces of hand coded action scripts and JavaScripts. You can decide which pieces you want to use in your own Web portfolio pages. The techniques provided are a concentrated workflow for creating Web pages. Let us begin to explore Web page authoring.

Download Full-text

A Perspective of Evolution After Five Years: A Large-Scale Study of Web Spam Evolution

International Journal of Cooperative Information Systems ◽

10.1142/s0218843014410019 ◽

2014 ◽

Vol 23 (02) ◽

pp. 1441001 ◽

Cited By ~ 3

Author(s):

De Wang ◽

Danesh Irani ◽

Calton Pu

Keyword(s):

Social Networks ◽

Social Media ◽

Large Scale ◽

Web Pages ◽

Spam Filtering ◽

Web Page ◽

New Techniques ◽

Web Spam ◽

Classification Evaluation ◽

Over Time

Identifying and detecting web spam is an ongoing battle between spam-researchers and spammers which has been going on since search engines allowed searching of web pages to the modern sharing of web links via social networks. A common challenge faced by spam-researchers is the fact that new techniques depend on requiring a corpus of legitimate and spam web pages. Although large corpora of legitimate web pages are available to researchers, the same cannot be said about web spam or spam web pages. In this paper, we introduce the Webb Spam Corpus 2011 — a corpus of approximately 330,000 spam web pages — which we make available to researchers in the fight against spam. By having a standard corpus available, researchers can collaborate better on developing and reporting results of spam filtering techniques. The corpus contains web pages crawled from links found in over 6.3 million spam emails. We analyze multiple aspects of this corpus including redirection, HTTP headers, web page content, and classification evaluation. We also provide insights into changes in web spam since the last Webb Spam Corpus was released in 2006. These insights include: (1) spammers manipulate social media in spreading spam; (2) HTTP headers and content also change over time; (3) spammers have evolved and adopted new techniques to avoid the detection based on HTTP header information.

Download Full-text

Effective Genre Classification - Understanding Url And Webpage Attributes For Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1191.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 2011-2016

Keyword(s):

The Internet ◽

Web Pages ◽

Web Page ◽

Exchange Method ◽

Genre Classification ◽

Internet Application ◽

Reliable Classification ◽

Time Eating ◽

The Web

With the boom in the number of internet pages, it is very hard to discover desired records effortlessly and fast out of heaps of web pages retrieved with the aid of a search engine. there may be a increasing requirement for automatic type strategies with more class accuracy. There are a few conditions these days in which it's far vital to have an green and reliable classification of a web-web page from the information contained within the URL (Uniform aid Locator) handiest, with out the want to go to the web page itself. We want to understand if the URL can be used by us while not having to look and visit the page due to numerous motives. Getting the web page content material and sorting them to discover the genre of the net web page is very time ingesting and calls for the consumer to recognize the shape of the web page which needs to be categorised. To avoid this time-eating technique we proposed an exchange method so one can help us get the genre of the entered URL based of the entered URL and the metadata i.e., description, keywords used in the website along side the title of the web site. This approach does not most effective rely upon URL however also content from the internet application. The proposed gadget can be evaluated using numerous available datasets.

Download Full-text

CRITICAL CARE & THE EARLY WEB: ETHICAL DIGITAL METHODS FOR ARCHIVED YOUTH DATA

AoIR Selected Papers of Internet Research ◽

10.5210/spir.v2021i0.11974 ◽

2021 ◽

Author(s):

Katherine Mackinnon

Keyword(s):

Online Community ◽

Methodological Approach ◽

The Internet ◽

Web Pages ◽

Feminist Ethics ◽

Structured Interviews ◽

Internet Archive ◽

Digital Methods ◽

The Right ◽

Simultaneous Discovery

This paper demonstrates an ethico-methodological approach to researching archived web pages created by young people throughout 1994-2005 that was collected and stored by the Internet Archive. Rather than deploying a range of computational tools available for collecting web data in the Internet Archive, my approach to this material has been to start with the person: I recruited participants through social media who remembered creating websites or participating in web communities when they were younger and were interested in attempting to relocate their digital traces. In a series of qualitative, online semi-structured interviews, I guided participants through the Wayback Machine’s interface and directed them towards where their materials might be stored. I adapted this approach from the walkthrough method, where I position the participant as co-investigator and analyst of web archival material, enabling simultaneous discovery, memory, interpretation and investigation. Together, we walk through the abandoned sites and ruins of a once-vibrant online community as they reflect and remember the early web. This approach responds to significant ethical gaps in web archival research and engages with feminist ethics of care (Luka & Millette, 2018) inspired by conceptual framing of data materials in research on the "right to be forgotten” (Crossen-White, 2015; GDPR, 2018; Tsesis, 2014), digital afterlives (Sutherland, 2020), indigenous data sovereignty and governance (Wemigwans, 2018), and the Feminist Data Manifest-No (Cifor et al, 2019). This method re-centers the human and moves towards a digital justice approach (Gieseking, 2020; Cowan & Rault, 2020) for engaging with historical youth data.

Download Full-text

Identification of Optimal Web Page Set based on Web Usage using Biclustering Optimization Techniques

Advances in Web Technologies and Engineering - Design Solutions for Improving Website Quality and Effectiveness ◽

10.4018/978-1-4666-9764-5.ch006 ◽

2016 ◽

pp. 141-161 ◽

Cited By ~ 1

Author(s):

R. Rathipriya

Keyword(s):

Empirical Study ◽

Firefly Algorithm ◽

Primary Objective ◽

Optimization Techniques ◽

Web Pages ◽

Web Page ◽

Web Usage ◽

The Given ◽

Usage Data

The primary objective of this chapter is to propose Biclustering Optimization Techniques (BOT) to identify the optimal web pages from web usage data. Bio-inspired optimization techniques like Firefly algorithm and its variant are used as optimization tool to generate optimal usage profile from the given web usage dataset. Finally, empirical study is conducted on the benchmark clickstream datasets like MSNBC, MSWEB and CTI and their results are analyzed to know the performance of the proposed biclustering optimization techniques with respect to optimization techniques available in the literature.

Download Full-text

UK Hotel Web Page Accessibility for Disabled and Challenged Users

Tourism and Hospitality Research ◽

10.1057/palgrave.thr.6040024 ◽

2005 ◽

Vol 5 (3) ◽

pp. 255-268 ◽

Cited By ~ 9

Author(s):

Russell Williams ◽

Rulzion Rattray

Keyword(s):

Hotel Industry ◽

The Internet ◽

Web Pages ◽

Web Page ◽

Legal Mandates ◽

The Disabled ◽

Low Levels ◽

Hotel Websites ◽

Effective Use ◽

The Web

Organisations increasingly use the internet and web to communicate with the marketplace. Indeed, the hotel industry seems particularly suited to the use of these technologies. Many sites are not accessible to large segments of the disabled community, however, or to individuals using particular hard and softwares. Identifying the competitive and legal mandates for website accessibility, the study looks at the accessibility of UK-based hotel websites. Utilising the accessibility software, Bobby, as well as making some additional manual accessibility checks, the study finds disappointingly low levels of website accessibility. If organisations want to make more effective use of the web then they need to ensure that their web pages are designed from the outside-in — from the user's perspective.

Download Full-text

Layout Transposition for Non-Visual Navigation of Web Pages by Tactile Feedback on Mobile Devices

Micromachines ◽

10.3390/mi11040376 ◽

2020 ◽

Vol 11 (4) ◽

pp. 376 ◽

Cited By ~ 1

Author(s):

Fabrice Maurel ◽

Gaël Dias ◽

Waseem Safi ◽

Jean-Marc Routoure ◽

Pierre Beust

Keyword(s):

Empirical Study ◽

Visual Navigation ◽

Tactile Feedback ◽

Web Pages ◽

Blind People ◽

Web Page ◽

Vibrotactile Feedback ◽

Qualitative Understanding ◽

Substitution System ◽

Touchscreen Device

In this paper, we present the results of an empirical study that aims to evaluate the performance of sighted and blind people to discriminate web page structures using vibrotactile feedback. The proposed visuo-tactile substitution system is based on a portable and economical solution that can be used in noisy and public environments. It converts the visual structures of web pages into tactile landscapes that can be explored on any mobile touchscreen device. The light contrasts overflown by the fingers are dynamically captured, sent to a micro-controller, translated into vibrating patterns that vary in intensity, frequency and temperature, and then reproduced by our actuators on the skin at the location defined by the user. The performance of the proposed system is measured in terms of perception of frequency and intensity thresholds and qualitative understanding of the shapes displayed.

Download Full-text

Understanding Offline Password-Cracking Methods: A Large-Scale Empirical Study

Security and Communication Networks ◽

10.1155/2021/5563884 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Ruixin Shi ◽

Yongbin Zhou ◽

Yong Li ◽

Weili Han

Keyword(s):

Empirical Study ◽

Large Scale ◽

Ad Hoc ◽

Empirical Evaluation ◽

Comparative Investigation ◽

Academic Community ◽

Data Sets ◽

The Past ◽

Cracking Performance ◽

Password Cracking

Researchers proposed several data-driven methods to efficiently guess user-chosen passwords for password strength metering or password recovery in the past decades. However, these methods are usually evaluated under ad hoc scenarios with limited data sets. Thus, this motivates us to conduct a systematic and comparative investigation with a very large-scale data corpus for such state-of-the-art cracking methods. In this paper, we present the large-scale empirical study on password-cracking methods proposed by the academic community since 2005, leveraging about 220 million plaintext passwords leaked from 12 popular websites during the past decade. Specifically, we conduct our empirical evaluation in two cracking scenarios, i.e., cracking under extensive-knowledge and limited-knowledge. The evaluation concludes that no cracking method may outperform others from all aspects in these offline scenarios. The actual cracking performance is determined by multiple factors, including the underlying model principle along with dataset attributes such as length and structure characteristics. Then, we perform further evaluation by analyzing the set of cracked passwords in each targeting dataset. We get some interesting observations that make sense of many cracking behaviors and come up with some suggestions on how to choose a more effective password-cracking method under these two offline cracking scenarios.

Download Full-text