Learning Information Extraction Rules for Web Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch129 ◽

2011 ◽

pp. 678-683

Author(s):

Chia-Hui Chang ◽

Chun-Nan Hsu

Keyword(s):

Information Extraction ◽

Web Mining ◽

World Wide ◽

Information Sources ◽

Structured Data ◽

Comparison Shopping ◽

Data Formats ◽

The World ◽

Document Collection ◽

Keyword Searching

The explosive growth and popularity of the World Wide Web has resulted in a huge number of information sources on the Internet. However, due to the heterogeneity and the lack of structure of Web information sources, access to this huge collection of information has been limited to browsing and keyword searching. Sophisticated Web-mining applications, such as comparison shopping, require expensive maintenance costs to deal with different data formats. The problem in translating the contents of input documents into structured data is called information extraction (IE). Unlike information retrieval (IR), which concerns how to identify relevant documents from a document collection, IE produces structured data ready for post-processing, which is crucial to many applications of Web mining and search tools.

Download Full-text

Patterns of Searching for Information on the World Wide Web: A Pilot Study

Psychological Reports ◽

10.2466/pr0.2003.92.3c.1091 ◽

2003 ◽

Vol 92 (3_suppl) ◽

pp. 1091-1096 ◽

Cited By ~ 2

Author(s):

Nobuhiko Fujihara ◽

Asako Miura

Keyword(s):

World Wide Web ◽

Undergraduate Students ◽

Search Engines ◽

World Wide ◽

Information Sources ◽

Task Type ◽

The Other ◽

The World ◽

Selection Of ◽

Search Domain

The influences of task type on search of the World Wide Web using search engines without limitation of search domain were investigated. 9 graduate and undergraduate students studying psychology (1 woman and 8 men, M age = 25.0 yr., SD = 2.1) participated. Their performance to manipulate the search engines on a closed task with only one answer were compared with their performance on an open task with several possible answers. Analysis showed that the number of actions was larger for the closed task ( M = 91) than for the open task ( M = 46.1). Behaviors such as selection of keywords (averages were 7.9% of all actions for the closed task and 16.7% for the open task) and pressing of the browser's back button (averages were 40.3% of all actions for the closed task and 29.6% for the open task) were also different. On the other hand, behaviors such as selection of hyperlinks, pressing of the home button, and number of browsed pages were similar for both tasks. Search behaviors were influenced by task type when the students searched for information without limitation placed on the information sources.

Download Full-text

Trust in Virtual Communities

Virtual Communities ◽

10.4018/978-1-60960-100-3.ch115 ◽

2011 ◽

pp. 203-212

Author(s):

Luis V. Casaló ◽

Carlos Flavián ◽

Miguel Guinalíu

Keyword(s):

World Wide Web ◽

World Wide ◽

Virtual Communities ◽

Information Sources ◽

Social Groups ◽

Virtual Community ◽

Chat Rooms ◽

The World ◽

Community Concept ◽

E Mail

Individuals are increasingly turning to computermediated communication in order to get information on which to base their decisions. For instance, many consumers are using newsgroups, chat rooms, forums, e-mail list servers, and other online formats to share ideas, build communities and contact other consumers who are seen as more objective information sources (Kozinets, 2002). These social groups have been traditionally called virtual communities. The virtual community concept is almost as old as the concept of Internet. However, the exponential development of these structures occurred during the nineties (Flavián & Guinalíu, 2004) due to the appearance of the World Wide Web and the spreading of other Internet tools such as e-mail or chats. The justification of this expansion is found in the advantages generated by the virtual communities to both the members and the organizations that create them.

Download Full-text

Challenges and opportunities of the Internet for medical oncology.

Journal of Clinical Oncology ◽

10.1200/jco.1996.14.7.2181 ◽

1996 ◽

Vol 14 (7) ◽

pp. 2181-2186 ◽

Cited By ~ 13

Author(s):

L M Glodé

Keyword(s):

Cancer Biology ◽

World Wide ◽

Information Sources ◽

Medical Oncology ◽

The Internet ◽

Web Page ◽

Related Information ◽

The World ◽

Challenges And Opportunities ◽

American Society

PURPOSE The internet, and in particular the world wide web (www), has a rapidly increasing potential to provide information for oncologists and their patients about cancer biology and treatment. A brief overview of this environment is given along with examples of how easily the information is accessed as a means of introducing the web page of the American Society of Clinical Oncology (ASCO), ASCO OnLine. METHODS Oncology information sources on the www were accessed from the author's home using a 14.4 kbs modem, Netscape browser (Netscape communications Corp, Mountain View, CA), and the locations recorded for tabulation and discussion. RESULTS Overwhelming amounts of oncology-related information are now available via the Internet. CONCLUSION Oncology as a subspecialty is ideally suited to apply the newest information technology to traditional needs in areas of education, research, and patient care. Oncologists will increasingly act as information guides rather than information resources for patients and their families with cancer.

Download Full-text

A Survey of Web Ontology Languages and Semantic Web Services

Annals of the Alexandru Ioan Cuza University - Economics ◽

10.2478/aicue-2013-0005 ◽

2013 ◽

Vol 60 (1) ◽

pp. 42-53 ◽

Cited By ~ 4

Author(s):

Alexandru Napoleon Sireteanu

Keyword(s):

World Wide Web ◽

Semantic Web ◽

World Wide ◽

Information Sources ◽

Semantic Web Services ◽

Formal Ontology ◽

Web Technologies ◽

The World ◽

Ontology Languages ◽

In The Beginning

Abstract In the beginning World Wide Web was syntactic and the content itself was only readable by humans. The modern web combines existing web technologies with knowledge representation formalisms. In this sense, the Semantic Web proposes the mark-up of content on the web using formal ontology that structure essential data for the purpose of comprehensive machine understanding. On the syntactical level, standardization is an important topic. Many standards which can be used to integrate different information sources have evolved. Beside the classical database interfaces like ODBC, web-oriented standard languages like HTML, XML, RDF and OWL increase in importance. As the World Wide Web offers the greatest potential for sharing information, we will base our paper on these evolving standards.

Download Full-text

A Signal-Representation-Based Parser to Extract Text-Based Information from the Web

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0531 ◽

2010 ◽

Vol 14 (5) ◽

pp. 531-539

Author(s):

Mu-Chun Su ◽

◽

Shao-Jui Wang ◽

Chen-Ko Huang ◽

Pa-ChunWang ◽

...

Keyword(s):

Web Services ◽

World Wide ◽

Information Sources ◽

State Of The Art ◽

Value Added ◽

Web Pages ◽

Web Page ◽

Web Information ◽

The World ◽

The Web

Most of the dramatically increased amount of information available on the World Wide Web is provided via HTML and formatted for human browsing rather than for software programs. This situation calls for a tool that automatically extracts information from semistructured Web information sources, increasing the usefulness of value-added Web services. We present a signal-representation-based parser (SIRAP) that breaks Web pages up into logically coherent groups - groups of information related to an entity, for example. Templates for records with different tag structures are generated incrementally by a Histogram-Based Correlation Coefficient (HBCC) algorithm, then records on a Web page are detected efficiently using templates generated by matching. Hundreds of Web pages from 17 state-of-the-art search engines were used to demonstrate the feasibility of our approach.

Download Full-text

Information Sources and Searching on the World Wide Web20021G.G. Chowdhury, Sudatta Chowdhury. Information Sources and Searching on the World Wide Web. London: Library Association Publishing 2001. 174 pages, ISBN: 1856043940 £29.95

New Library World ◽

10.1108/nlw.2002.103.4_5.184.1 ◽

2002 ◽

Vol 103 (4/5) ◽

pp. 184-184

Author(s):

Karyn Meaden

Keyword(s):

World Wide Web ◽

World Wide ◽

Information Sources ◽

The World

Download Full-text

Information Sources and Searching on the World Wide Web20021G.G. Chowdhury and Sudatta Chowdhury. Information Sources and Searching on the World Wide Web. London: Library Association Publishing 2001. 174 pp., ISBN: 1‐85604‐394‐0 £29.95

Program electronic library and information systems ◽

10.1108/prog.2002.36.3.206.1 ◽

2002 ◽

Vol 36 (3) ◽

pp. 206-206

Author(s):

Mark Kerr

Keyword(s):

World Wide Web ◽

World Wide ◽

Information Sources ◽

The World

Download Full-text

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

International Journal on Document Analysis and Recognition (IJDAR) ◽

10.1007/s10032-007-0052-2 ◽

2007 ◽

Vol 10 (3-4) ◽

pp. 211-226 ◽

Cited By ~ 18

Author(s):

Matthew Michelson ◽

Craig A. Knoblock

Keyword(s):

World Wide Web ◽

Information Extraction ◽

World Wide ◽

Data Sources ◽

The World

Download Full-text

Web mining: information and pattern discovery on the World Wide Web

Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence ◽

10.1109/tai.1997.632303 ◽

2002 ◽

Cited By ~ 388

Author(s):

R. Cooley ◽

B. Mobasher ◽

J. Srivastava

Keyword(s):

World Wide Web ◽

Web Mining ◽

World Wide ◽

Pattern Discovery ◽

The World

Download Full-text

Information Sources Used by Garden Writers

HortTechnology ◽

10.21273/horttech.9.3.451 ◽

1999 ◽

Vol 9 (3) ◽

pp. 451-454

Author(s):

M.P. Garber ◽

K. Bondari

Keyword(s):

World Wide Web ◽

World Wide ◽

Information Sources ◽

Sources Of Information ◽

Source Information ◽

Public Gardens ◽

Botanical Gardens ◽

University Personnel ◽

Landscape Plants ◽

The World

Results of a national survey indicated that the top four sources of information used by garden writers for new or appropriate plants were nursery catalogs, botanical and public gardens, seed company catalogs, and gardening magazines. More than 50% of the participating garden writers reportedly used these four sources a lot. The most frequently used books and magazines were Horticulture Magazine (34.6%), Manual of Woody Landscape Plants (24.1%), and Fine Gardening (23.7%). About 29% of the garden writers used the World Wide Web to source information and the two most widely used type of sites were universities and botanical gardens and arboreta. A high percentage of garden writers desire greater or more frequent communications with botanical gardens and arboreta (90.4%), university personnel (87.4%), and plant producers (86.3%).

Download Full-text