Large Scale Web Page Optimization of Virtual Labs

Identifying and detecting web spam is an ongoing battle between spam-researchers and spammers which has been going on since search engines allowed searching of web pages to the modern sharing of web links via social networks. A common challenge faced by spam-researchers is the fact that new techniques depend on requiring a corpus of legitimate and spam web pages. Although large corpora of legitimate web pages are available to researchers, the same cannot be said about web spam or spam web pages. In this paper, we introduce the Webb Spam Corpus 2011 — a corpus of approximately 330,000 spam web pages — which we make available to researchers in the fight against spam. By having a standard corpus available, researchers can collaborate better on developing and reporting results of spam filtering techniques. The corpus contains web pages crawled from links found in over 6.3 million spam emails. We analyze multiple aspects of this corpus including redirection, HTTP headers, web page content, and classification evaluation. We also provide insights into changes in web spam since the last Webb Spam Corpus was released in 2006. These insights include: (1) spammers manipulate social media in spreading spam; (2) HTTP headers and content also change over time; (3) spammers have evolved and adopted new techniques to avoid the detection based on HTTP header information.

Download Full-text

The Role of A/B Tests in the Study of Large-Scale Online Learning

10.31219/osf.io/83jsg ◽

2017 ◽

Cited By ~ 2

Author(s):

Alexander Olof Savi ◽

Joseph Jay Williams ◽

Gunter Maris ◽

Han van der Maas

Keyword(s):

Online Learning ◽

Learning Community ◽

Large Scale ◽

Controlled Experiments ◽

Web Page ◽

Online Learning Community ◽

Randomized Controlled ◽

Improved Learning ◽

Use Of Knowledge

Although large-scale online learning increasingly succeeds in attracting learners worldwide, to date it fails to deliver on its promise. We first show the immense popularity of online learning and discuss its (unsatisfactory) effectiveness. We then discuss large-scale online randomized controlled experiments (A/B tests) as a powerful complimentary means to enable the desired leap forward. Although these experiments are widely and intensively used for web page optimization, and are slowly being adopted by the online learning community, their use, benefits, and challenges have only limitedly seeped through to the larger learning community. We summarize existing efforts in employing A/B tests in online learning, argue that such tests should take into account the typical nature of (online) learning, and encourage the use of knowledge from the various learning sciences to identify interventions that promise improved learning. We finally discuss both the limitations and promises of A/B tests, and show how such tests can ultimately contribute to learning that is tailored to each individual learner. The insights and priorities that arise from this overview and synthesis of A/B tests in online learning may help advance and direct the field.

Download Full-text

A Tag-Based Improved LDA and Web Page Clustering Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.667.277 ◽

2014 ◽

Vol 667 ◽

pp. 277-285 ◽

Cited By ~ 1

Author(s):

Fang Chen ◽

Yan Hui Zhou

Keyword(s):

Clustering Analysis ◽

Large Scale ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Topic Model ◽

Rapid Development ◽

Expansion Method ◽

Web Page ◽

Web Page Clustering ◽

The Web

With the rapid development of Internet, tag technology has been widely used in various sites. The brief text labels of network resources are greatly convenient for people to access the massive data. Social tags allows the user to use any word ----to tag network objects, and to share these tags, because of its simple and flexible operation, and it has become one of the popular applications. However, there exists some problems like noise of tags, lack of using criteria, and sparse distribution etc. Especially sparsity of tags seriously limits its application in the semantic analysis of web pages. This paper, by exploiting the user-related tag expansion method to overcome this problem, at the same time by using the topic model----LDA to model the web tags, mine its potential topic from the large-scale web page, and obtain the topic distribution of the text to the text clustering analysis. The experimental results show that, compared with the traditional clustering algorithm, the method of based LDA clustering on the analysis of the web tags have a larger increase.

Download Full-text