web crawlers
Recently Published Documents


TOTAL DOCUMENTS

127
(FIVE YEARS 33)

H-INDEX

12
(FIVE YEARS 1)

2021 ◽  
Vol 15 (3) ◽  
pp. 205-215
Author(s):  
Gurjot Singh Mahi ◽  
Amandeep Verma

  Web crawlers are as old as the Internet and are most commonly used by search engines to visit websites and index them into repositories. They are not limited to search engines but are also widely utilized to build corpora in different domains and languages. This study developed a focused set of web crawlers for three Punjabi news websites. The web crawlers were developed to extract quality text articles and add them to a local repository to be used in further research. The crawlers were implemented using the Python programming language and were utilized to construct a corpus of more than 134,000 news articles in nine different news genres. The crawler code and extracted corpora were made publicly available to the scientific community for research purposes.


2021 ◽  
Author(s):  
Graeme Edwards ◽  
Larissa Christensen

Cyber strategies play a role in combating child sexual abuse material (CSAM). These strategies aim to detect offenders and prevent them from accessing and producing CSAM, or to identify victims. This paper explores five cyber strategies: peer-to-peer network monitoring, automated multi-modal CSAM detection tools, using web crawlers to identify CSAM sites, pop-up warning messages, and facial recognition. This research synthesis captures the background of each strategy, how it works and the evaluative research, along with the benefits, limitations and implementation considerations, offering a practical overview for a broad audience.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Meng Mei ◽  
Hui Tan

With the improvement and growth of instructional informatisation, the contradiction between the open supply of academic resources, information expression, and mental property safety is turning into greater acute. Remedying the relationship between the two is very necessary for the overall performance of records expression of academic assets and the advent of true surroundings for mental property protection. The safety of mental property rights is to shield the rights and pursuits of know-how owners, defend the strength of information producers to produce knowledge, and defend the supply of academic sources sharing. The data expression and protection of intellectual property education resources based on machine learning is a kind of protection tool for the intellectual property of education resources developed using the characteristics of automation, real-time monitoring, and growth of machine learning. It can prevent web crawlers from harming e-commerce websites, prevent them from stealing the intellectual property of e-commerce websites, and analyse web crawlers that visit websites to prevent important website data from being stolen by them. From this point of view, based on the relationship between the fact expression of instructional sources and the safety of mental property rights, this paper advocates to promote the records expression and safety of mental property rights of academic sources from a couple of perspectives.


2021 ◽  
Author(s):  
Tianyi Yue ◽  
Yadong Zhou ◽  
Bowen Hu ◽  
Zhanbo Xu ◽  
Xiaohong Guan ◽  
...  

2021 ◽  
Author(s):  
Hammook Zahra

General and Focus crawlers are the main types of web crawlers used for different goals, with different crawling techniques and architecture. Our crawler was written in Java language using different software and libraries. To test the crawler, it has been run on the academic social network, Researchgate.net from 3 rd.April to 28th.June 2014 and retrieved real data. The crawler consists of three main algorithms to crawl information such as researchers details, publications details, questions/answers activity details. The retrieved data has been analyzed to highlight the performance of Canadian researchers, in the field of Computer Science on Researchgate.net. Data analysis has been done from the collaboration and (alt)metrics perspectives. Among other features Researchgate.net came with “Impact Points” and “RG Score” (alt)metrics. The former builds on ISI Journal Impact Factor, which disregards author’s contribution in its calculations. A new Contribution Determines Sequence (CDS) method has been developed and tested, with all required scripts which showed better performance than other methods.


2021 ◽  
Author(s):  
Hammook Zahra

General and Focus crawlers are the main types of web crawlers used for different goals, with different crawling techniques and architecture. Our crawler was written in Java language using different software and libraries. To test the crawler, it has been run on the academic social network, Researchgate.net from 3 rd.April to 28th.June 2014 and retrieved real data. The crawler consists of three main algorithms to crawl information such as researchers details, publications details, questions/answers activity details. The retrieved data has been analyzed to highlight the performance of Canadian researchers, in the field of Computer Science on Researchgate.net. Data analysis has been done from the collaboration and (alt)metrics perspectives. Among other features Researchgate.net came with “Impact Points” and “RG Score” (alt)metrics. The former builds on ISI Journal Impact Factor, which disregards author’s contribution in its calculations. A new Contribution Determines Sequence (CDS) method has been developed and tested, with all required scripts which showed better performance than other methods.


2021 ◽  
Vol 1125 (1) ◽  
pp. 012045
Author(s):  
Ika Oktavia Suzanti ◽  
Fakhrur Razi ◽  
Husni ◽  
Eka Mala Sari Rochman ◽  
Nurhayati Fitriani

Author(s):  
Yanxi Huang ◽  
Fangzhou Zhu ◽  
Liang Liu ◽  
Wezhi Meng ◽  
Simin Hu ◽  
...  

AbstractThe security of wireless routers receives much attention given by the increasing security threats. In the era of Internet of Things, many devices pose security vulnerabilities, and there is a significant need to analyze the current security status of devices. In this paper, we develop WNV-Detector, a universal and scalable framework for detecting wireless network vulnerabilities. Based on semantic analysis and named entities recognition, we design rules for automatic device identification of wireless access points and routers. The rules are automatically generated based on the information extracted from the admin webpages, and can be updated with a semi-automated method. To detect the security status of devices, WNV-Detector aims to extract the critical identity information and retrieve known vulnerabilities. In the evaluation, we collect information through web crawlers and build a comprehensive vulnerability database. We also build a prototype system based on WNV-Detector and evaluate it with routers from various vendors on the market. Our results indicate that the effectiveness of our WNV-Detector, i.e., the success rate of vulnerability detection could achieve 95.5%.


Sign in / Sign up

Export Citation Format

Share Document