A very efficient approach to news title and content extraction on the web

Blogs are one of the main user-generated contents on the web and are growing in number rapidly. The characteristics of blogs require the development of specialized search methods which are tuned for the blogosphere. In this paper, we focus on blog retrieval, which aims at ranking blogs with respect to their recurrent relevance to a user’s topic. Although different blog retrieval algorithms have already been proposed, few of them have considered temporal properties of the input queries. Therefore, we propose an efficient approach to improving relevant blog retrieval using temporal property of queries. First, time sensitivity of each query is automatically computed for different time intervals based on an initially retrieved set of relevant posts. Then a temporal score is calculated for each blog and finally all blogs are ranked based on their temporal and content relevancy with regard to the input query. Experimental analysis and comparison of the proposed method are carried out using a standard dataset with 45 diverse queries. Our experimental results demonstrate that, using different measurement criteria, our proposed method outperforms other blog retrieval methods.

Download Full-text

Process Model for Content Extraction from Weblogs

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2014040102 ◽

2014 ◽

Vol 10 (2) ◽

pp. 20-36

Author(s):

Andreas Schieber ◽

Andreas Hilbert

Keyword(s):

Data Warehouse ◽

Process Model ◽

Process Models ◽

Content Extraction ◽

Web Extraction ◽

Textual Data ◽

The Web ◽

Three Phases

This paper develops and evaluates a BPMN-based process model which identifies and extracts blog content from the web and stores its textual data in a data warehouse for further analyses. Depending on the characteristics of the technologies used to create the weblogs, the process has to perform specific tasks in order to extract blog content correctly. The paper describes three phases: extraction, transformation and loading of data in a repository specifically adapted for blog content extraction. It highlights the objectives in these phases which must be achieved to ensure the correct extraction. The authors integrate the described process in a previously developed framework for blog mining. The authors' process model closes the conceptual gap in this framework as well as the gap in current research of blog mining process models. Furthermore, it can easily be adapted for other web extraction proposals.

Download Full-text

Hybrid Method for Automated News Content Extraction from the Web

Web Information Systems – WISE 2006 - Lecture Notes in Computer Science ◽

10.1007/11912873_34 ◽

2006 ◽

pp. 327-338 ◽

Cited By ~ 3

Author(s):

Yu Li ◽

Xiaofeng Meng ◽

Qing Li ◽

Liping Wang

Keyword(s):

Hybrid Method ◽

Content Extraction ◽

News Content ◽

The Web

Download Full-text

Gifts From the Web: Evidence-Based Practices in Speech-Language Pathology

Perspectives on Issues in Higher Education ◽

10.1044/ihe7.2.14-a ◽

2004 ◽

Vol 7 (2) ◽

pp. 14-16

Author(s):

Howard Wilson

Keyword(s):

Evidence Based ◽

Speech Language Pathology ◽

Evidence Based Practices ◽

Language Pathology ◽

The Web

Download Full-text

Gifts From the Web: Mentoring

Perspectives on Issues in Higher Education ◽

10.1044/ihe11.2.83 ◽

2008 ◽

Vol 11 (2) ◽

pp. 83-85

Author(s):

Howard Wilson

Keyword(s):

The Web

Download Full-text

Gifts From the Web: Key Links to Formative and Summative Assessment

Perspectives on Issues in Higher Education ◽

10.1044/ihe6.1.6 ◽

2003 ◽

Vol 6 (1) ◽

pp. 6-7

Author(s):

Howard Wilson

Keyword(s):

Summative Assessment ◽

The Web

Download Full-text

Gifts from the Web: Service Learning

Perspectives on Issues in Higher Education ◽

10.1044/ihe8.1.16 ◽

2005 ◽

Vol 8 (1) ◽

pp. 16-18

Author(s):

Howard F. Wilson

Keyword(s):

Service Learning ◽

Web Service ◽

The Web

Download Full-text

Gifts From the Web: Critical Thinking

Perspectives on Issues in Higher Education ◽

10.1044/ihe3.2.6 ◽

1999 ◽

Vol 3 (2) ◽

pp. 6-6

Author(s):

Barbara Shadden

Keyword(s):

Critical Thinking ◽

The Web

Download Full-text

Gifts From the Web: Tips on Assessing Student Learning

Perspectives on Issues in Higher Education ◽

10.1044/ihe3.1.3 ◽

1999 ◽

Vol 3 (1) ◽

pp. 3-3

Author(s):

Barbara B. Shadden

Keyword(s):

Student Learning ◽

The Web

Download Full-text

Coding for Evaluation and Treatment

Perspectives on Voice and Voice Disorders ◽

10.1044/vvd18.1.9 ◽

2008 ◽

Vol 18 (1) ◽

pp. 9-20 ◽

Cited By ~ 1

Author(s):

Mark Kander ◽

Steve White

Keyword(s):

Medical Record ◽

Treatment Session ◽

Accurate Diagnosis ◽

Third Party ◽

Speech Language Pathologists ◽

Adequate Information ◽

Diagnosis Codes ◽

Coding Rules ◽

Procedure Codes ◽

The Web

Abstract This article explains the development and use of ICD-9-CM diagnosis codes, CPT procedure codes, and HCPCS supply/device codes. Examples of appropriate coding combinations, and Coding rules adopted by most third party payers are given. Additionally, references for complete code lists on the Web and a list of voice-related CPT code edits are included. The reader is given adequate information to report an evaluation or treatment session with accurate diagnosis, procedure, and supply/device codes. Speech-language pathologists can accurately code services when given adequate resources and rules and are encouraged to insert relevant codes in the medical record rather than depend on billing personnel to accurately provide this information. Consultation is available from the Division 3 Reimbursement Committee members and from [email protected] .

Download Full-text