Noise Elimination from Web Page Based on Regular Expressions for Web Content Mining

Author(s):  
Amit Dutta ◽  
Sudipta Paria ◽  
Tanmoy Golui ◽  
Dipak Kumar Kole





Author(s):  
Rowena Chau ◽  
Chung-Hsing Yeh

This chapter presents a novel user-oriented, concept-based approach to multilingual web content mining using self-organizing maps. The multilingual linguistic knowledge required for multilingual web content mining is made available by encoding all multilingual concept-term relationships using a multilingual concept space. With this linguistic knowledge base, a concept-based multilingual text classifier is developed. It reveals the conceptual content of multilingual web documents and forms concept categories of multilingual web documents on a concept-based browsing interface. To personalize multilingual web content mining, a concept-based user profile is generated from a user’s bookmark file to highlight the user’s topics of information interest on the browsing interface. As such, both explorative browsing and user-oriented, concept-focused information filtering in multilingual web are facilitated.



Sign in / Sign up

Export Citation Format

Share Document