dom tree Latest Research Papers

Analysis of Enterprise Social Media Intelligence Acquisition Based on Data Crawler Technology

Entrepreneurship Research Journal ◽

10.1515/erj-2020-0267 ◽

2021 ◽

Vol 11 (2) ◽

pp. 3-23

Author(s):

Lehe Yu ◽

Zhengxiu Gui

Keyword(s):

Social Media ◽

Social Network ◽

Spatial Clustering ◽

Web Content ◽

Enterprise Social Media ◽

Intelligence System ◽

Effective Implementation ◽

Tree Construction ◽

Dom Tree ◽

Text Content

Abstract There are generally hundreds of millions of nodes in social media, and they are connected to a huge social network through attention and fan relationships. The news is spread through this huge social network. This paper studies the acquisition technology of social media topic data and enterprise data. The topic positioning technology based on Sina meta search and topic related keywords is introduced, and the crawling efficiency of topic crawlers is analyzed. Aiming at the factors of diverse and variable webpage structure on the Internet, this paper proposes a new Web information extraction algorithm by studying the general laws existing in the webpage structure, combining DOM (Document Object Model) tree and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. Several links in the algorithm are introduced in detail, including Web page processing, DOM tree construction, segmented text content acquisition, and web content extraction based on the DBSCAN algorithm. The simulation results show that the intelligence culture, intelligence system, technology platform and intelligence organization ecological collaboration strategy under the extraction of DOM tree and DBSCAN information can improve the level of intelligence participation of all employees. There is a significant positive correlation between the level of participation and the level of the intelligence environment of all employees. According to the research results, the DOM tree and DBSCAN information proposed in this paper can extract the enterprise’s employee intelligence and the effective implementation of relevant collaborative strategies, which can provide guidance for the effective implementation of the employee intelligence.

Get full-text (via PubEx)

A Webpage Segmentation Method Based on Node Information Entropy of DOM Tree

Journal of Physics Conference Series ◽

10.1088/1742-6596/1624/3/032023 ◽

2020 ◽

Vol 1624 ◽

pp. 032023

Author(s):

Shengnan Zhang ◽

Jiawei Wu ◽

Kun Yang

Keyword(s):

Information Entropy ◽

Segmentation Method ◽

Dom Tree

Get full-text (via PubEx)

A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model

Journal of Computing and Information Technology ◽

10.20532/cit.2020.1004899 ◽

2020 ◽

Vol 28 (1) ◽

pp. 19-31

Author(s):

Jian Feng ◽

Ying Zhang ◽

Yuqiang Qiao

Keyword(s):

Semantic Information ◽

Detection Method ◽

Structural Characteristics ◽

Web Pages ◽

Web Page ◽

Clustering Method ◽

Semantic Clustering ◽

Dom Tree ◽

Structural Semantics ◽

Linguistic Approach

Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and to measure the semantic similarity in web pages by the distance between different DOM vectors. Finally, the hierarchical clustering method is used to implement clustering of web pages. Experiments show that the method proposed in the paper achieves higher recall and precision for phishing classification, compared to DOM-based structural clustering method and TF-IDF-based semantic clustering method. The result shows that using Paragraph Vector is effective on DOM in a linguistic approach.

Get full-text (via PubEx)

Research on WEB Information Extraction Based on DOM Tree Statistics Keyword Path

Computer Science and Application ◽

10.12677/csa.2019.92022 ◽

2019 ◽

Vol 09 (02) ◽

pp. 181-187

Author(s):

建视赵

Keyword(s):

Information Extraction ◽

Web Information Extraction ◽

Web Information ◽

Dom Tree

Get full-text (via PubEx)

Enhancing the Browser-Side Context-Aware Sanitization of Suspicious HTML5 Code for Halting the DOM-Based XSS Vulnerabilities in Cloud

Application Development and Design ◽

10.4018/978-1-5225-3422-8.ch009 ◽

2018 ◽

pp. 216-247

Author(s):

B. B. Gupta ◽

Shashank Gupta ◽

Pooja Chaudhary

Keyword(s):

Real World ◽

Experimental Evaluation ◽

Web Application ◽

Web Applications ◽

Virtual Machines ◽

Context Aware ◽

Cloud Environment ◽

Cloud Platform ◽

Dom Tree

This article presents a cloud-based framework that thwarts the DOM-based XSS vulnerabilities caused due to the injection of advanced HTML5 attack vectors in the HTML5 web applications. Initially, the framework collects the key modules of web application, extracts the suspicious HTML5 strings from the latent injection points and performs the clustering on such strings based on their level of similarity. Further, it detects the injection of malicious HTML5 code in the script nodes of DOM tree by detecting the variation in the HTML5 code embedded in the HTTP response generated. Any variation observed will simply indicate the injection of suspicious script code. The prototype of our framework was developed in Java and installed in the virtual machines of cloud environment on the Google Chrome extension. The experimental evaluation of our framework was performed on the platform of real world HTML5 web applications deployed in the cloud platform.

Get full-text (via PubEx)

Leveraging Analysis of User Behavior from Web Usage Extraction over DOM-tree Structure

Lecture Notes in Computer Science - Web Engineering ◽

10.1007/978-3-319-91662-0_14 ◽

2018 ◽

pp. 185-192 ◽

Cited By ~ 1

Author(s):

Wesley G. Siqueira ◽

Laercio A. Baldochi

Keyword(s):

User Behavior ◽

Tree Structure ◽

Web Usage ◽

Dom Tree

Get full-text (via PubEx)

Web content information extraction based on DOM tree and statistical information

2017 IEEE 17th International Conference on Communication Technology (ICCT) ◽

10.1109/icct.2017.8359846 ◽

2017 ◽

Author(s):

Xin Yu ◽

Zhengping Jin

Keyword(s):

Information Extraction ◽

Statistical Information ◽

Web Content ◽

Dom Tree ◽

Content Information

Get full-text (via PubEx)

SVM-based web content mining with leaf classification unit from DOM-tree

2017 9th International Conference on Knowledge and Smart Technology (KST) ◽

10.1109/kst.2017.7886134 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yeongsu Kim ◽

Seungwoo Lee

Keyword(s):

Web Content ◽

Web Content Mining ◽

Content Mining ◽

Dom Tree

Get full-text (via PubEx)

Enhancing the Browser-Side Context-Aware Sanitization of Suspicious HTML5 Code for Halting the DOM-Based XSS Vulnerabilities in Cloud

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2017010101 ◽

2017 ◽

Vol 7 (1) ◽

pp. 1-31 ◽

Cited By ~ 26

Author(s):

B.B. Gupta ◽

Shashank Gupta ◽

Pooja Chaudhary

Keyword(s):

Real World ◽

Experimental Evaluation ◽

Web Application ◽

Web Applications ◽

Virtual Machines ◽

Context Aware ◽

Cloud Environment ◽

Cloud Platform ◽

Dom Tree

This article presents a cloud-based framework that thwarts the DOM-based XSS vulnerabilities caused due to the injection of advanced HTML5 attack vectors in the HTML5 web applications. Initially, the framework collects the key modules of web application, extracts the suspicious HTML5 strings from the latent injection points and performs the clustering on such strings based on their level of similarity. Further, it detects the injection of malicious HTML5 code in the script nodes of DOM tree by detecting the variation in the HTML5 code embedded in the HTTP response generated. Any variation observed will simply indicate the injection of suspicious script code. The prototype of our framework was developed in Java and installed in the virtual machines of cloud environment on the Google Chrome extension. The experimental evaluation of our framework was performed on the platform of real world HTML5 web applications deployed in the cloud platform.

Get full-text (via PubEx)

An Approach of Information Extraction Based on Dom Tree and Weight Value

International Journal of Grid and Distributed Computing ◽

10.14257/ijgdc.2016.9.10.28 ◽

2016 ◽

Vol 9 (10) ◽

pp. 311-320

Author(s):

Haitao Wang ◽

Shufen Liu

Keyword(s):

Information Extraction ◽

Weight Value ◽

Dom Tree

Get full-text (via PubEx)

dom tree
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Analysis of Enterprise Social Media Intelligence Acquisition Based on Data Crawler Technology

A Webpage Segmentation Method Based on Node Information Entropy of DOM Tree

A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model

Research on WEB Information Extraction Based on DOM Tree Statistics Keyword Path

Enhancing the Browser-Side Context-Aware Sanitization of Suspicious HTML5 Code for Halting the DOM-Based XSS Vulnerabilities in Cloud

Leveraging Analysis of User Behavior from Web Usage Extraction over DOM-tree Structure

Web content information extraction based on DOM tree and statistical information

SVM-based web content mining with leaf classification unit from DOM-tree

Enhancing the Browser-Side Context-Aware Sanitization of Suspicious HTML5 Code for Halting the DOM-Based XSS Vulnerabilities in Cloud

An Approach of Information Extraction Based on Dom Tree and Weight Value

Export Citation Format

dom treeRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Analysis of Enterprise Social Media Intelligence Acquisition Based on Data Crawler Technology

A Webpage Segmentation Method Based on Node Information Entropy of DOM Tree

A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model

Research on WEB Information Extraction Based on DOM Tree Statistics Keyword Path

Enhancing the Browser-Side Context-Aware Sanitization of Suspicious HTML5 Code for Halting the DOM-Based XSS Vulnerabilities in Cloud

Leveraging Analysis of User Behavior from Web Usage Extraction over DOM-tree Structure

Web content information extraction based on DOM tree and statistical information

SVM-based web content mining with leaf classification unit from DOM-tree

Enhancing the Browser-Side Context-Aware Sanitization of Suspicious HTML5 Code for Halting the DOM-Based XSS Vulnerabilities in Cloud

An Approach of Information Extraction Based on Dom Tree and Weight Value

dom tree
Recently Published Documents