Web Mining to Create a Domain Specific Web Portal Database

2003 ◽  
pp. 36-53 ◽  
Author(s):  
Anthony Scime

The dynamic nature of the World Wide Web is causing an evolution of both information access and format. The use of a Web portal to access information about a domain relieves the searcher of the responsibility to know about, access and retrieve domain documents. In a properly constructed portal, a Web mining process has already sifted through pages found on the Web to select domain facts. This Web-generated knowledge is added to domain expert knowledge in an organized database. This chapter details the design and construction of a domain specific Web portal through the combination of domain expertise and Web-based domain facts.

Author(s):  
Vangelis Karkaletsis ◽  
Konstantinos Stamatakis ◽  
Karampiperis ◽  
Karampiperis ◽  
Pythagoras Karampiperis ◽  
...  

The World Wide Web is an important channel of information exchange in many domains, including the medical one. The ever increasing amount of freely available healthcare-related information generates, on the one hand, excellent conditions for self-education of patients as well as physicians, but on the other hand, entails substantial risks if such information is trusted irrespective of low competence or even bad intentions of its authors. This is why medical Web site certification, also called quality labeling, by renowned authorities is of high importance. In this respect, it recently became obvious that the labelling process could benefit from employment of Web mining and information extraction techniques, in combination with flexible methods of Web-based information management developed within the Semantic Web initiative. Achieving such synergy is the central issue in the MedIEQ project. The AQUA (Assisting Quality Assessment) system, developed within the MedIEQ project, aims to provide the infrastructure and the means to organize and support various aspects of the daily work of labelling experts.


Author(s):  
Anthony Scime

The volume of data available on the World Wide Web makes it difficult for a domain novice to find reliable, accurate information. Such a novice may call upon a domain expert for information and advice. On the Web, this expert advice can be organized as an expert database behind a Web portal for the domain. The creation of such a database requires an architecture that captures the expert’s domain knowledge and finds and evaluates applicable Web pages from which data is extracted. This chapter outlines the components of an expert database Web portal, its design, and population.


2019 ◽  
Vol 6 ◽  
pp. 12-41
Author(s):  
Chris Dijkshoorn ◽  
Victor De Boer ◽  
Lora Aroyo ◽  
Guus Schreiber

With the increase of cultural heritage data published online, the usefulness of data in this open context hinges on the quality and diversity of descriptions of collection objects. In many cases, existing descriptions are not sufficient for retrieval and research tasks, resulting in the need for more specific annotations. However, eliciting such annotations is a challenge since it often requires domain-specific knowledge. Where crowdsourcing can be successfully used to execute simple annotation tasks, identifying people with the required expertise might prove troublesome for more complex and domain-specific tasks. Nichesourcing addresses this problem, by tapping into the expert knowledge available in niche communities. This paper presents Accurator, a methodology for conducting nichesourcing campaigns for cultural heritage institutions, by addressing communities, organizing events and tailoring a web-based annotation tool to a domain of choice. The contribution of this paper is fourfold: 1) a nichesourcing methodology, 2) an annotation tool for experts, 3) validation of the methodology in three case studies and 4) a dataset including the obtained annotations. The three domains of the case studies are birds on art, bible prints and fashion images. We compare the quality and quantity of obtained annotations in the three case studies, showing that the nichesourcing methodology in combination with the image annotation tool can be used to collect high-quality annotations in a variety of domains. A user evaluation indicates the tool is suited and usable for domain-specific annotation tasks.


2011 ◽  
pp. 1994-2014
Author(s):  
Vangelis Karkaletsis ◽  
Konstantinos Stamatakis ◽  
Pythagoras Karampiperis ◽  
Martin Labský

The World Wide Web is an important channel of information exchange in many domains, including the medical one. The ever increasing amount of freely available healthcare-related information generates, on the one hand, excellent conditions for self-education of patients as well as physicians, but on the other hand, entails substantial risks if such information is trusted irrespective of low competence or even bad intentions of its authors. This is why medical Web site certification, also called quality labeling, by renowned authorities is of high importance. In this respect, it recently became obvious that the labelling process could benefit from employment of Web mining and information extraction techniques, in combination with flexible methods of Web-based information management developed within the Semantic Web initiative. Achieving such synergy is the central issue in the MedIEQ project. The AQUA (Assisting Quality Assessment) system, developed within the MedIEQ project, aims to provide the infrastructure and the means to organize and support various aspects of the daily work of labelling experts.


Humans have been using their domain expertise intelligently and skillfully for making decisions in solving a problem. These decisions are made based on the knowledge that they have acquired through experience and practice over a course of time, which will be lost after the expert’s life ends. Hence, this expert knowledge is required to be stored to a database and a machine could be intelligently programmed which could use this knowledge to make decisions, known as an Expert System (ES). This system tries to emulate the decision-making skills of a domain expert by gathering knowledge of the domain experts, storing it to a knowledge base in rule format, and then using those rules to analyze the given data and provides solutions to the problems. These Expert Systems can be utilized to analyze the system log files, find issues logged into those log statements and provide solutions to the errors that are found in those logs.


2021 ◽  
Vol 3 (2) ◽  
pp. 299-317
Author(s):  
Patrick Schrempf ◽  
Hannah Watson ◽  
Eunsoo Park ◽  
Maciej Pajak ◽  
Hamish MacKinnon ◽  
...  

Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.


2014 ◽  
Vol 15 (1) ◽  
pp. 68-74 ◽  
Author(s):  
Doug Reside

In the first section of the submission guidelines for this esteemed journal, would-be authors are informed, “RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage uses a web-based, automated, submission system to track and review manuscripts. Manuscripts should be sent to the editor, […], through the web portal[…]” The multivalent uses of the word “manuscript” in this sentence reveal a good deal about the state of our field. This journal is dedicated to the study of manuscripts, and it is understood by most readers that the manuscripts being studied are of the “one-of-a-kind” variety (even rarer than the “rare . . .


1998 ◽  
Vol 21 (3) ◽  
pp. 163-185 ◽  
Author(s):  
Johnny S.K. Wong ◽  
Rishi Nayar ◽  
Armin R. Mikler

Sign in / Sign up

Export Citation Format

Share Document