scholarly journals An Improved Framework for Content- and Link-Based Web-Spam Detection: A Combined Approach

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-18
Author(s):  
Asim Shahzad ◽  
Nazri Mohd Nawi ◽  
Muhammad Zubair Rehman ◽  
Abdullah Khan

In this modern era, people utilise the web to share information and to deliver services and products. The information seekers use different search engines (SEs) such as Google, Bing, and Yahoo as tools to search for products, services, and information. However, web spamming is one of the most significant issues encountered by SEs because it dramatically affects the quality of SE results. Web spamming’s economic impact is enormous because web spammers index massive free advertising data on SEs to increase the volume of web traffic on a targeted website. Spammers trick an SE into ranking irrelevant web pages higher than relevant web pages in the search engine results pages (SERPs) using different web-spamming techniques. Consequently, these high-ranked unrelated web pages contain insufficient or inappropriate information for the user. To detect the spam web pages, several researchers from industry and academia are working. No efficient technique that is capable of catching all spam web pages on the World Wide Web (WWW) has been presented yet. This research is an attempt to propose an improved framework for content- and link-based web-spam identification. The framework uses stopwords, keywords’ frequency, part of speech (POS) ratio, spam keywords database, and copied-content algorithms for content-based web-spam detection. For link-based web-spam detection, we initially exposed the relationship network behind the link-based web spamming and then used the paid-link database, neighbour pages, spam signals, and link-farm algorithms. Finally, we combined all the content- and link-based spam identification algorithms to identify both types of spam. To conduct experiments and to obtain threshold values, WEBSPAM-UK2006 and WEBSPAM-UK2007 datasets were used. A promising F-measure of 79.6% with 81.2% precision shows the applicability and effectiveness of the proposed approach.

Author(s):  
Lun-song Chen ◽  
Bi-Lin Sun

Based on the survey data of Lishui City, Zhejiang Province, this paper uses the Heckman two-stage model to construct a credit constraint function without selection bias, and explores the relationship between the scale and quality of the relationship network and the credit constraints of rural households. Research shows that the scale of the relationship network is affected adversely by urbanization and networking, having a weaker impact on the formal credit constraints of rural households. The quality of the relationship networks can improve farmers’ awareness of formal credit, reduce transaction exposure, regulate farmers’ behavior and act as a “guarantee”, thereby effectively alleviating farmers’ formal credit constraints. At the same time, the relationship network of farmers is gradually becoming more structured, where farmers' social interests are becoming more purposeful. Additionally, formal financial institutions have set a threshold for farmers’ credit, which requires a certain amount of securities for money.


2020 ◽  
Vol 17 (2) ◽  
pp. 1260-1265
Author(s):  
Mohd Sharul Hafiz Razak ◽  
Nor Azman Ismail ◽  
Alif Fikri Mohktar ◽  
Su Elya Namira ◽  
Nurina Izzati Ramzi

This paper aims to investigate 18 web domains of computer science and information technology academic websites of Malaysia universities.We collected more than two million web pages. A webometric analysis was used to explore the number of web pages, inbound links, the web impact factor (WIF) and link relationships. The results show Fakulti Teknologi dan Sains Maklumat (FTSM), Universiti Kebangsaan Malaysia (UKM) has the highest number of webpages while Fakulti Teknologi Kreatif dan Warisan (FTKW), Universiti Malaysia Kelantan (UMK) has the largest WIF score. Pearson’s rank correlation coefficient was used to detect the relationship between institutions subdomain age and WIF. Correlations point out that there is scant relationship between subdomain age and WIF score across all 18 Malaysia selected schools [r =−.076, n = 18, p < .0005]. This is due to WIF are highly dependent on the quality of the content to attract backlinks and Google crawler algorithm that changes from time to time for the number of web pages. Subdomain age is independent to the year of establishment of the schools. These findings can be used as a guide to the implementation of university web content strategy.


2013 ◽  
Vol 2013 ◽  
pp. 1-13 ◽  
Author(s):  
Christian Schwartz ◽  
Tobias Hoßfeld ◽  
Frank Lehrieder ◽  
Phuoc Tran-Gia

The popularity of smartphones and mobile applications has experienced a considerable growth during the recent years, and this growth is expected to continue in the future. Since smartphones have only very limited energy resources, battery efficiency is one of the determining factors for a good user experience. Therefore, some smartphones tear down connectionsto the mobile network soon after a completed data transmission to reduce the power consumption of their transmission unit. However, frequent connection reestablishments caused by apps which send or receive small amounts of data often lead to a heavy signalling load within the mobile network. One of the major contributions of this paper is the investigation of the resulting tradeoff between energy consumption at the smartphone and the generated signalling traffic in the mobile network. We explain that this tradeoff can be controlled by the connection release timeout and study the impact of this parameter for a number of popular apps that cover a wide range of traffic characteristics in terms of bandwidth requirements and resulting signalling traffic. Finally, we study the impact of the timer settings on Quality of Experience (QoE) for web traffic. This is an important aspect since connection establishments not only lead to signalling traffic but also increase the load time of web pages.


2016 ◽  
Vol 2016 ◽  
pp. 1-18 ◽  
Author(s):  
J. Fdez-Glez ◽  
D. Ruano-Ordás ◽  
R. Laza ◽  
J. R. Méndez ◽  
R. Pavón ◽  
...  

Over the last years, research on web spam filtering has gained interest from both academia and industry. In this context, although there are a good number of successful antispam techniques available (i.e., content-based, link-based, and hiding), an adequate combination of different algorithms supported by an advanced web spam filtering platform would offer more promising results. To this end, we propose the WSF2 framework, a new platform particularly suitable for filtering spam content on web pages. Currently, our framework allows the easy combination of different filtering techniques including, but not limited to, regular expressions and well-known classifiers (i.e., Naïve Bayes, Support Vector Machines, and C5.0). Applying our WSF2 framework over the publicly available WEBSPAM-UK2007 corpus, we have been able to demonstrate that a simple combination of different techniques is able to improve the accuracy of single classifiers on web spam detection. As a result, we conclude that the proposed filtering platform is a powerful tool for boosting applied research in this area.


2002 ◽  
pp. 234-248
Author(s):  
Antonis Danalis ◽  
Evangelos Markatos

World Wide Web traffic increases at exponential rates saturating network links and web servers. By replicating popular web pages in strategic places on the Internet, web caching reduces core network traffic, reduces web server load, and improves the end-users’ perceived quality of service. In this paper we survey the area of web caching. We identify major research challenges and their solutions, as well as several commercial products that are being widely used.


2021 ◽  
Vol 13 (5) ◽  
pp. 2822
Author(s):  
Beatriz Feijoo ◽  
Charo Sádaba

This article presents the results of a study that sought to analyze the relationship between minors and brands on social media. The frequency with which minors search for or share information or subscribe to brand web pages was measured, as well as their following of influencers, who commonly refer to consumer goods. The main purpose of this article is to contribute to learning about the commercial environment that surrounds children in their routines on social media, particularly because of their growing influence in home purchasing decisions. The results, obtained from a survey applied in 501 homes in the Metropolitan Area of Santiago de Chile to minors between 10 and 14 years old, show that the respondents effectively interact with brands through social media. Although it is not a widespread practice among 10- to 12-year-olds, it is increasingly becoming present among 13- to 14-year-olds. Children seem most interested in sportswear, fashion, and technology brands, areas in which children have significant influence in family purchasing decision. Following influencers through social media is also a common activity among minors. In particular, the age groups here studied preferred to follow celebrities, particularly from the worlds of music, football, or YouTube, over specific brands.


Ta dib ◽  
2019 ◽  
Vol 24 (2) ◽  
pp. 249-263
Author(s):  
TAUFIK ABDILLAH SYUKUR ◽  
AISHA TARA ATHIRA

In this modern era, for its rapid grown, Pesantren has played an important role in the development of Indonesian education. Daar El-Qolam is an institution based on a modern pesantren system that has a principle to create a good future generation. By using their principle and implement it to all santri, it is expected that it can increase their lead. In actualizing their purpose, pesantren has created a good system both academic and non-academic. To improve their abilities, santri needs a place and chance to increase and to improve their abilities. One of the pesantren programs is to obligate all santri from class 5 to be a manager in daily activities in pesantren. The purpose of this study case is to know the relationship of self-efficacy in improving the quality of santri organization in Daar El-Qolam. The method used in research was quantitative research design that used correlational method method to know the relationship of Self-Efficacy in improving Quality of Santri Organization in Daar El-Qolam. Based on the result of research that has been done regarding the relationship of Self efficacy towards improving santri organization in Daar El-Qolam, there was the significant relationship of Self efficacy in improving quality of santri organization in Daar El-Qolam.


2016 ◽  
Vol 30 (2) ◽  
pp. 76-86 ◽  
Author(s):  
Judith Meessen ◽  
Verena Mainz ◽  
Siegfried Gauggel ◽  
Eftychia Volz-Sidiropoulou ◽  
Stefan Sütterlin ◽  
...  

Abstract. Recently, Garfinkel and Critchley (2013) proposed to distinguish between three facets of interoception: interoceptive sensibility, interoceptive accuracy, and interoceptive awareness. This pilot study investigated how these facets interrelate to each other and whether interoceptive awareness is related to the metacognitive awareness of memory performance. A sample of 24 healthy students completed a heartbeat perception task (HPT) and a memory task. Judgments of confidence were requested for each task. Participants filled in questionnaires assessing interoceptive sensibility, depression, anxiety, and socio-demographic characteristics. The three facets of interoception were found to be uncorrelated and interoceptive awareness was not related to metacognitive awareness of memory performance. Whereas memory performance was significantly related to metamemory awareness, interoceptive accuracy (HPT) and interoceptive awareness were not correlated. Results suggest that future research on interoception should assess all facets of interoception in order to capture the multifaceted quality of the construct.


2002 ◽  
Author(s):  
R. Arnold ◽  
A. V. Ranchor ◽  
N. H. T. ten Hacken ◽  
G. H. Koeter ◽  
V. Otten ◽  
...  

2020 ◽  
Vol 29 (12) ◽  
pp. 52-58
Author(s):  
E.P. Meleshkina ◽  
◽  
S.N. Kolomiets ◽  
A.S. Cheskidova ◽  
◽  
...  

Objectively and reliably determined indicators of rheological properties of the dough were identified using the alveograph device to create a system of classifications of wheat and flour from it for the intended purpose in the future. The analysis of the relationship of standardized quality indicators, as well as newly developed indicators for identifying them, differentiating the quality of wheat flour for the intended purpose, i.e. for finished products. To do this, we use mathematical statistics methods.


Sign in / Sign up

Export Citation Format

Share Document