Main Content Extraction from Web Pages

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla51294.2020.00162 ◽

2020 ◽

Author(s):

Stanislas Morbieu ◽

Guillaume Bruneval ◽

Mohamed Lacarne ◽

Mohamed Kone ◽

Francois-Xavier Bois

Keyword(s):

Web Pages ◽

Content Extraction

Download Full-text

Main Content Extraction from Web Pages Based on Node Characteristics

Journal of Computing Science and Engineering ◽

10.5626/jcse.2017.11.2.39 ◽

2017 ◽

Vol 11 (2) ◽

pp. 39-48 ◽

Author(s):

Qingtang Liu ◽

Mingbo Shao ◽

Linjing Wu ◽

Gang Zhao ◽

Guilin Fan ◽

...

Keyword(s):

Web Pages ◽

Content Extraction

Download Full-text

Opinion Content Extraction from Web Pages Using Embedded Semantic Term Tree Kernels

Proceedings of International Conference on Computational Intelligence and Data Engineering - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-981-10-6319-0_29 ◽

2017 ◽

pp. 345-358

Author(s):

Veerappa B. Pagi ◽

Ramesh S. Wadawadagi

Keyword(s):

Web Pages ◽

Content Extraction

Download Full-text

Content extraction from news web pages using tag tree

International Journal of Autonomic Computing ◽

10.1504/ijac.2018.10013755 ◽

2018 ◽

Vol 3 (1) ◽

pp. 34 ◽

Author(s):

Sanjay K. Dwivedi ◽

Chandrakala Arya

Keyword(s):

Web Pages ◽

Content Extraction

Download Full-text

Content Extraction from Web Pages Based on Chinese Punctuation Number

2007 International Conference on Wireless Communications, Networking and Mobile Computing ◽

10.1109/wicom.2007.1365 ◽

2007 ◽

Author(s):

Mingqiu Song ◽

Xintao Wu

Keyword(s):

Web Pages ◽

Content Extraction

Download Full-text

An Incremental Acquisition Method for Web Forensics

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2021110116 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1-13

Author(s):

Guangxuan Chen ◽

Guangxiao Chen ◽

Lei Zhang ◽

Qiang Liu

Keyword(s):

Real Time ◽

Digital Forensics ◽

Recall Rate ◽

Web Pages ◽

Web Page ◽

Content Extraction ◽

Data Redundancy ◽

Repeated Acquisition ◽

Low Efficiency ◽

Acquisition Method

In order to solve the problems of repeated acquisition, data redundancy and low efficiency in the process of website forensics, this paper proposes an incremental acquisition method orientecd to dynamic websites. This method realized the incremental collection on dynamically updated websites through acquiring and parsing web pages, URL deduplication, web page denoising, web page content extraction and hashing. Experiments show that the algorithm has relative high acquisition precision and recall rate, and can be combined with other data to perform effective digital forensics on dynamically updated real-time websites.

Download Full-text

An Improvised Algorithm for Relevant Content Extraction from Web Pages

Journal of Emerging Technologies in Web Intelligence ◽

10.4304/jetwi.6.2.226-230 ◽

2014 ◽

Vol 6 (2) ◽

Author(s):

Aanshi Bhardwaj ◽

Veenu Mangat

Keyword(s):

Web Pages ◽

Content Extraction

Download Full-text

An Intellect and Clustering Technology Using Noiseless Content Extraction in Web Pages

Journal of Advanced Research in Dynamical and Control Systems ◽

10.5373/jardcs/v11/20192667 ◽

2019 ◽

Vol 11 (0009-SPECIAL ISSUE) ◽

pp. 1024-1029

Author(s):

Florence Dayana M ◽

Chidambaram M.

Keyword(s):

Web Pages ◽

Content Extraction

Download Full-text

Web Content Extraction by Integrating Textual and Visual Importance of Web Pages

International Journal of Computer Applications ◽

10.5120/15861-4785 ◽

2014 ◽

Vol 91 (3) ◽

pp. 20-24

Author(s):

K. Nethra ◽

J. Anitha

Keyword(s):

Web Pages ◽

Web Content ◽

Content Extraction ◽

Visual Importance

Download Full-text

A novel approach for content extraction from web pages

2014 Recent Advances in Engineering and Computational Sciences (RAECS) ◽

10.1109/raecs.2014.6799616 ◽

2014 ◽

Author(s):

Aanshi Bhardwaj ◽

Veenu Mangat

Keyword(s):

Web Pages ◽

Content Extraction ◽

Download Full-text

Syntactic entropy for main content extraction from web pages

Proceedings of the 2nd international Conference on Big Data, Cloud and Applications ◽

10.1145/3090354.3090419 ◽

2017 ◽

Author(s):

Ismail Jellouli ◽

Badr Eddine El Mohajir ◽

Mohammed Al Achhab

Keyword(s):

Web Pages ◽

Content Extraction

Download Full-text