dom tree
Recently Published Documents


TOTAL DOCUMENTS

44
(FIVE YEARS 3)

H-INDEX

6
(FIVE YEARS 0)

2021 ◽  
Vol 11 (2) ◽  
pp. 3-23
Author(s):  
Lehe Yu ◽  
Zhengxiu Gui

Abstract There are generally hundreds of millions of nodes in social media, and they are connected to a huge social network through attention and fan relationships. The news is spread through this huge social network. This paper studies the acquisition technology of social media topic data and enterprise data. The topic positioning technology based on Sina meta search and topic related keywords is introduced, and the crawling efficiency of topic crawlers is analyzed. Aiming at the factors of diverse and variable webpage structure on the Internet, this paper proposes a new Web information extraction algorithm by studying the general laws existing in the webpage structure, combining DOM (Document Object Model) tree and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. Several links in the algorithm are introduced in detail, including Web page processing, DOM tree construction, segmented text content acquisition, and web content extraction based on the DBSCAN algorithm. The simulation results show that the intelligence culture, intelligence system, technology platform and intelligence organization ecological collaboration strategy under the extraction of DOM tree and DBSCAN information can improve the level of intelligence participation of all employees. There is a significant positive correlation between the level of participation and the level of the intelligence environment of all employees. According to the research results, the DOM tree and DBSCAN information proposed in this paper can extract the enterprise’s employee intelligence and the effective implementation of relevant collaborative strategies, which can provide guidance for the effective implementation of the employee intelligence.


2020 ◽  
Vol 28 (1) ◽  
pp. 19-31
Author(s):  
Jian Feng ◽  
Ying Zhang ◽  
Yuqiang Qiao

Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and to measure the semantic similarity in web pages by the distance between different DOM vectors. Finally, the hierarchical clustering method is used to implement clustering of web pages. Experiments show that the method proposed in the paper achieves higher recall and precision for phishing classification, compared to DOM-based structural clustering method and TF-IDF-based semantic clustering method. The result shows that using Paragraph Vector is effective on DOM in a linguistic approach.


Author(s):  
B. B. Gupta ◽  
Shashank Gupta ◽  
Pooja Chaudhary

This article presents a cloud-based framework that thwarts the DOM-based XSS vulnerabilities caused due to the injection of advanced HTML5 attack vectors in the HTML5 web applications. Initially, the framework collects the key modules of web application, extracts the suspicious HTML5 strings from the latent injection points and performs the clustering on such strings based on their level of similarity. Further, it detects the injection of malicious HTML5 code in the script nodes of DOM tree by detecting the variation in the HTML5 code embedded in the HTTP response generated. Any variation observed will simply indicate the injection of suspicious script code. The prototype of our framework was developed in Java and installed in the virtual machines of cloud environment on the Google Chrome extension. The experimental evaluation of our framework was performed on the platform of real world HTML5 web applications deployed in the cloud platform.


2017 ◽  
Vol 7 (1) ◽  
pp. 1-31 ◽  
Author(s):  
B.B. Gupta ◽  
Shashank Gupta ◽  
Pooja Chaudhary

This article presents a cloud-based framework that thwarts the DOM-based XSS vulnerabilities caused due to the injection of advanced HTML5 attack vectors in the HTML5 web applications. Initially, the framework collects the key modules of web application, extracts the suspicious HTML5 strings from the latent injection points and performs the clustering on such strings based on their level of similarity. Further, it detects the injection of malicious HTML5 code in the script nodes of DOM tree by detecting the variation in the HTML5 code embedded in the HTTP response generated. Any variation observed will simply indicate the injection of suspicious script code. The prototype of our framework was developed in Java and installed in the virtual machines of cloud environment on the Google Chrome extension. The experimental evaluation of our framework was performed on the platform of real world HTML5 web applications deployed in the cloud platform.


Sign in / Sign up

Export Citation Format

Share Document