Design and Construction of Distributed JavaScript Parsing System
With the rapid development of the Internet technology, JS (short for JavaScript), as one of the representative of script languages, which is very powerful, is becoming more and more popular to the developers and users. But JS programming is more complex than usual static technology. In the field of search engine and information acquisition, it's very difficult to get the information hidden in script code. In this paper, the authors design a distributed system for parsing the JS code embedded in HTML file and retrieving the underling information. the authors describe how to extract JS codes from HTML file and parse them. Also, they introduce a task scheduling algorithm for the JS parsing system by employing Hadoop distributed computing technology. The experimental results indicate that the proposed algorithm and system can achieve a reasonable task scheduling efficiency and parse JS codes rapidly.