scholarly journals Detecting Web Spam Based on Novel Features from Web Page Source Code

2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Jiayong Liu ◽  
Yu Su ◽  
Shun Lv ◽  
Cheng Huang

Search engine is critical in people’s daily life because it determines the information quality people obtain through searching. Fierce competition for the ranking in search engines is not conducive to both users and search engines. Existing research mainly studies the content and links of websites. However, none of these techniques focused on semantic analysis of link and anchor text for detection. In this paper, we propose a web spam detection method by extracting novel feature sets from the homepage source code and choosing the random forest (RF) as the classifier. The novel feature sets are extracted from the homepage’s links, hypertext markup language (HTML) structure, and semantic similarity of content. We conduct experiments on the WEBSPAM-UK2007 and UK-2011 dataset using a five-fold cross-validation method. Besides, we design three sets of experiments to evaluate the performance of the proposed method. The proposed method with novel feature sets is compared with different indicators and has better performance than other methods with a precision of 0.929 and a recall of 0.930. Experiment results show that the proposed model could effectively detect web spam.

2018 ◽  
Vol 13 (3) ◽  
pp. 408-428 ◽  
Author(s):  
Phu Vo Ngoc

We have already survey many significant approaches for many years because there are many crucial contributions of the sentiment classification which can be applied in everyday life, such as in political activities, commodity production, and commercial activities. We have proposed a novel model using a Latent Semantic Analysis (LSA) and a Dennis Coefficient (DNC) for big data sentiment classification in English. Many LSA vectors (LSAV) have successfully been reformed by using the DNC. We use the DNC and the LSAVs to classify 11,000,000 documents of our testing data set to 5,000,000 documents of our training data set in English. This novel model uses many sentiment lexicons of our basis English sentiment dictionary (bESD). We have tested the proposed model in both a sequential environment and a distributed network system. The results of the sequential system are not as good as that of the parallel environment. We have achieved 88.76% accuracy of the testing data set, and this is better than the accuracies of many previous models of the semantic analysis. Besides, we have also compared the novel model with the previous models, and the experiments and the results of our proposed model are better than that of the previous model. Many different fields can widely use the results of the novel model in many commercial applications and surveys of the sentiment classification.


2014 ◽  
Vol 90 (3) ◽  
pp. 967-985 ◽  
Author(s):  
Carlos Corona ◽  
Lin Nan ◽  
Gaoqing Zhang

ABSTRACT We study the interaction between interbank competition and accounting information quality and their effects on banks' risk-taking behavior. We identify an endogenous false-alarm cost that banks incur when forced to sell assets to meet capital requirements. We find that when the interbank competition is less intense, an improvement in the quality of accounting information encourages banks to take more risk. Keeping the banks' investments in loans constant, the provision of high-quality accounting information reduces the false-alarm cost of assets sales and improves the discriminating efficiency of the capital requirement policy. When considering the banks' endogenous investment decisions, however, this improvement in discriminating efficiency causes excessive risk-taking, because banks respond by competing more aggressively in the deposit market, and the increase in deposit costs motivates banks to take more risk. Our paper shows that improving information quality increases risk-taking with mild competition, but has no effect under fierce competition.


Author(s):  
Gangavarapu Venkata Satya Kumar ◽  
Pillutla Gopala Krishna Mohan

In diverse computer applications, the analysis of image content plays a key role. This image content might be either textual (like text appearing in the images) or visual (like shape, color, texture). These two image contents consist of image’s basic features and therefore turn out to be as the major advantage for any of the implementation. Many of the art models are based on the visual search or annotated text for Content-Based Image Retrieval (CBIR) models. There is more demand toward multitasking, a new method needs to be introduced with the combination of both textual and visual features. This paper plans to develop the intelligent CBIR system for the collection of different benchmark texture datasets. Here, a new descriptor named Information Oriented Angle-based Local Tri-directional Weber Patterns (IOA-LTriWPs) is adopted. The pattern is operated not only based on tri-direction and eight neighborhood pixels but also based on four angles [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text]. Once the patterns concerning tri-direction, eight neighborhood pixels, and four angles are taken, the best patterns are selected based on maximum mutual information. Moreover, the histogram computation of the patterns provides the final feature vector, from which the new weighted feature extraction is performed. As a new contribution, the novel weight function is optimized by the Improved MVO on random basis (IMVO-RB), in such a way that the precision and recall of the retrieved image is high. Further, the proposed model has used the logarithmic similarity called Mean Square Logarithmic Error (MSLE) between the features of the query image and trained images for retrieving the concerned images. The analyses on diverse texture image datasets have validated the accuracy and efficiency of the developed pattern over existing.


Kybernetes ◽  
2019 ◽  
Vol 48 (6) ◽  
pp. 1355-1372 ◽  
Author(s):  
Ying Huang ◽  
Nu-nu Wang ◽  
Hongyu Zhang ◽  
Jianqiang Wang

Purpose The purpose of this paper is to propose a model for product recommendation to improve the accuracy of recommendation based on the current search engines used in e-commerce platforms like Tmall.com. Design/methodology/approach First, the proposed model comprehensively considers price, trust and online reviews, which all represent critical factors in consumers’ purchasing decisions. Second, the model introduces the quantization methods for these criteria incorporating fuzzy theory. Third, the model uses a distance measure between two single valued neutrosophic sets based on the prioritized average operator to consolidate the influences of positive, neutral and negative comments. Finally, the model uses multi-criteria decision-making methods to integrate the influences of price, trust and online reviews on purchasing decisions to generate recommendations. Findings To demonstrate the feasibility and efficiency of the proposed model, a case study is conducted based on Tmall.com. The results of case study indicate that the recommendations of our model perform better than those of current search engines of Tmall.com. The proposed model can significantly improve the accuracy of product recommendations based on search engines. Originality/value The product recommendation method can meet the critical challenge from the search engines on e-commerce platforms. In addition, the proposed method could be used in practice to develop a new application for e-commerce platforms.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Faizan Ullah ◽  
Qaisar Javaid ◽  
Abdu Salam ◽  
Masood Ahmad ◽  
Nadeem Sarwar ◽  
...  

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.


2020 ◽  
Author(s):  
Shaily Meta ◽  
Daria Ghezzi ◽  
Alessia Catalani ◽  
Tania Vanzolini ◽  
Pietro Ghezzi

AbstractCountries have major differences in the acceptance of face mask use for the prevention of COVID-19. We analyzed 450 webpages returned by searching the string “are face masks dangerous” in Italy, the UK and the USA using three search engines (Bing, Duckduckgo and Google). The majority (64-79%) were pages from news outlets, with few (2-6%) pages from government and public health agencies. Webpages with a positive stance on masks were more frequent in English (50%) than in Italian (36%), and those with a negative stance were more frequent in Italian (28% vs. 19% in English). Google returned the highest number of mask-positive pages and Duckduckgo the lowest. Google also returned the lowest number of pages mentioning conspiracy theories and Duckduckgo the highest. Webpages in Italian scored lower than those in English in transparency (reporting authors, their credentials and backing the information with references). When issues about the use of face masks were analyzed, mask effectiveness was the most discussed followed by hypercapnia (accumulation of carbon dioxide), contraindication in respiratory disease, and hypoxia, with issues related to their contraindications in mental health conditions and disability mentioned by very few pages. This study suggests that: 1) public health agencies should increase their web presence in providing correct information on face masks; 2) search engines should improve the information quality criteria in their ranking; 3) the public should be more informed on issues related to the use of masks and disabilities, mental health and stigma arising for those people who cannot wear masks.


2021 ◽  
Vol 7 (3) ◽  
pp. 4672-4699
Author(s):  
I. H. K. Premarathna ◽  
◽  
H. M. Srivastava ◽  
Z. A. M. S. Juman ◽  
Ali AlArjani ◽  
...  

<abstract> <p>The novel corona virus (COVID-19) has badly affected many countries (more than 180 countries including China) in the world. More than 90% of the global COVID-19 cases are currently outside China. The large, unanticipated number of COVID-19 cases has interrupted the healthcare system in many countries and created shortages for bed space in hospitals. Consequently, better estimation of COVID-19 infected people in Sri Lanka is vital for government to take suitable action. This paper investigates predictions on both the number of the first and the second waves of COVID-19 cases in Sri Lanka. First, to estimate the number of first wave of future COVID-19 cases, we develop a stochastic forecasting model and present a solution technique for the model. Then, another solution method is proposed to the two existing models (SIR model and Logistic growth model) for the prediction on the second wave of COVID-19 cases. Finally, the proposed model and solution approaches are validated by secondary data obtained from the Epidemiology Unit, Ministry of Health, Sri Lanka. A comparative assessment on actual values of COVID-19 cases shows promising performance of our developed stochastic model and proposed solution techniques. So, our new finding would definitely be benefited to practitioners, academics and decision makers, especially the government of Sri Lanka that deals with such type of decision making.</p> </abstract>


Author(s):  
Pallavi Mirajkar ◽  
Rupali Dahake

The novel COVID sickness 2019 (COVID-19) pandemic caused by the SARS-CoV-2 keeps on representing a serious and vital threat to worldwide health. This pandemic keeps on testing clinical frameworks around the world in numerous viewpoints, remembering sharp increments in requests for clinic beds and basic deficiencies in clinical equipments, while numerous medical services laborers have themselves been infected. We have proposed analytical model that predicts a positive SARS-CoV-2 infection by considering both common and severe symptoms in patients. The proposed model will work on response data of all individuals if they are suffering from various symptoms of the COVID-19. Consequently, proposed model can be utilized for successful screening and prioritization of testing for the infection in everyone.


2018 ◽  
Vol 118 (3) ◽  
pp. 541-569 ◽  
Author(s):  
Hyun-Sun Ryu

Purpose The purpose of this paper is to better understand why people are willing or hesitant to use Financial technology (Fintech) as well as to determine whether the effect of perceived benefits and risks of continuance intention differs depending on user types. Design/methodology/approach Original data were collected via a survey of 243 participants with Fintech usage experience. The partial least squares method was used to test the proposed model. Findings The results reveal that legal risk had the most negative effect on the Fintech continuance intention, while convenience had the strongest positive effect. Differences in specific benefit and risk impacts are found between early and late adopters. Originality/value This empirical study contributes to the novel understanding of the benefit and risk factors affecting the Fintech continuance intention.


Sign in / Sign up

Export Citation Format

Share Document