A very efficient approach to news title and content extraction on the web

Author(s):  
Hualiang Yan ◽  
Jianwu Yang
2016 ◽  
Vol 43 (1) ◽  
pp. 103-121 ◽  
Author(s):  
MohammadSadegh Zahedi ◽  
Abolfazl Aleahmad ◽  
Maseud Rahgozar ◽  
Farhad Oroumchian ◽  
Arastoo Bozorgi

Blogs are one of the main user-generated contents on the web and are growing in number rapidly. The characteristics of blogs require the development of specialized search methods which are tuned for the blogosphere. In this paper, we focus on blog retrieval, which aims at ranking blogs with respect to their recurrent relevance to a user’s topic. Although different blog retrieval algorithms have already been proposed, few of them have considered temporal properties of the input queries. Therefore, we propose an efficient approach to improving relevant blog retrieval using temporal property of queries. First, time sensitivity of each query is automatically computed for different time intervals based on an initially retrieved set of relevant posts. Then a temporal score is calculated for each blog and finally all blogs are ranked based on their temporal and content relevancy with regard to the input query. Experimental analysis and comparison of the proposed method are carried out using a standard dataset with 45 diverse queries. Our experimental results demonstrate that, using different measurement criteria, our proposed method outperforms other blog retrieval methods.


2014 ◽  
Vol 10 (2) ◽  
pp. 20-36
Author(s):  
Andreas Schieber ◽  
Andreas Hilbert

This paper develops and evaluates a BPMN-based process model which identifies and extracts blog content from the web and stores its textual data in a data warehouse for further analyses. Depending on the characteristics of the technologies used to create the weblogs, the process has to perform specific tasks in order to extract blog content correctly. The paper describes three phases: extraction, transformation and loading of data in a repository specifically adapted for blog content extraction. It highlights the objectives in these phases which must be achieved to ensure the correct extraction. The authors integrate the described process in a previously developed framework for blog mining. The authors' process model closes the conceptual gap in this framework as well as the gap in current research of blog mining process models. Furthermore, it can easily be adapted for other web extraction proposals.


2008 ◽  
Vol 11 (2) ◽  
pp. 83-85
Author(s):  
Howard Wilson
Keyword(s):  

2005 ◽  
Vol 8 (1) ◽  
pp. 16-18
Author(s):  
Howard F. Wilson
Keyword(s):  

1999 ◽  
Vol 3 (2) ◽  
pp. 6-6
Author(s):  
Barbara Shadden
Keyword(s):  

2008 ◽  
Vol 18 (1) ◽  
pp. 9-20 ◽  
Author(s):  
Mark Kander ◽  
Steve White

Abstract This article explains the development and use of ICD-9-CM diagnosis codes, CPT procedure codes, and HCPCS supply/device codes. Examples of appropriate coding combinations, and Coding rules adopted by most third party payers are given. Additionally, references for complete code lists on the Web and a list of voice-related CPT code edits are included. The reader is given adequate information to report an evaluation or treatment session with accurate diagnosis, procedure, and supply/device codes. Speech-language pathologists can accurately code services when given adequate resources and rules and are encouraged to insert relevant codes in the medical record rather than depend on billing personnel to accurately provide this information. Consultation is available from the Division 3 Reimbursement Committee members and from [email protected] .


Sign in / Sign up

Export Citation Format

Share Document