scholarly journals Using Grammatical Inference to Automate Information Extraction from the Web

Author(s):  
Theodore W. Hong ◽  
Keith L. Clark
2013 ◽  
Vol 7 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Dr Sunitha Abburu ◽  
G. Suresh Babu

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.  But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies  data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.   It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.  The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.


2018 ◽  
Vol 25 (2) ◽  
pp. 287-306 ◽  
Author(s):  
Cleiton Fernando Lima Sena ◽  
Daniela Barreiro Claro

AbstractNowadays, there is an increasing amount of digital data. In the case of the Web, daily, a vast collection of data is generated, whose contents are heterogeneous. A significant portion of this data is available in a natural language format. Open Information Extraction (Open IE) enables the extraction of facts from large quantities of texts written in natural language. In this work, we propose an Open IE method to extract facts from texts written in Portuguese. We developed two new rules that generalize the inference by transitivity and by symmetry. Consequently, this approach increases the number of implicit facts in a sentence. Our novel symmetric inference approach is based on a list of symmetric features. Our results confirmed that our method outstands close works both in precision and number of valid extractions. Considering the number of minimal facts, our approach is equivalent to the most relevant methods in the literature.


2021 ◽  
pp. 99-110
Author(s):  
Mohammad Ali Tofigh ◽  
◽  
◽  
Zhendong Mu

With the development of society, people pay more and more attention to the safety of food, and relevant laws and policies are gradually introduced and being improved. The research and development of agricultural product quality and safety system has become a research hot spot, and how to obtain the Web information of the system effectively and quickly is the focus of the research, so it is essential to carry out the intelligent extraction of Web information for agricultural product quality and safety system. The purpose of this paper is to solve the problem of how to efficiently extract the Web information of the agricultural product quality and safety system. By studying the Web information extraction methods of various systems, the paper makes a detailed analysis and research on how to realize the efficient and intelligent extraction of the Web information of the agricultural product quality and safety system. This paper analyzes in detail all kinds of template information extraction algorithms used at present, and systematically discusses a set of schemes that can automatically extract the Web information of agricultural product quality and safety system according to the template. The research results show that the proposed scheme is a dynamically extensible information extraction system, which can independently implement dynamic configuration templates according to different requirements without changing the code. Compared with the general way, the Web information extraction speed of agricultural product quality safety system is increased by 25%, the accuracy is increased by 12%, and the recall rate is increased by 30%.


Author(s):  
Junxia Guo ◽  
Prach Chaisatien ◽  
Hao Han ◽  
Tomoya Noro ◽  
Takehiro Tokuda

Author(s):  
Marlene Goncalves ◽  
Alberto Gobbi

Location-based Skyline queries select the nearest objects to a point that best meet the user's preferences. Particularly, this chapter focuses on location-based Skyline queries over web-accessible data. Web-accessible may have geographical location and be geotagged with documents containing ratings by web users. Location-based Skyline queries may express preferences based on dynamic features such as distance and changeable ratings. In this context, distance must be recalculated when a user changes his position while the ratings must be extracted from external data sources which are updated each time a user scores an item in the Web. This chapter describes and empirically studies four solutions capable of answering location-based Skyline queries considering user's position change and information extraction from the Web inside an area search around the user. They are based on an M-Tree index and Divide & Conquer principle.


2004 ◽  
pp. 227-267
Author(s):  
Wee Keong Ng ◽  
Zehua Liu ◽  
Zhao Li ◽  
Ee Peng Lim

With the explosion of information on the Web, traditional ways of browsing and keyword searching of information over web pages no longer satisfy the demanding needs of web surfers. Web information extraction has emerged as an important research area that aims to automatically extract information from target web pages and convert them into a structured format for further processing. The main issues involved in the extraction process include: (1) the definition of a suitable extraction language; (2) the definition of a data model representing the web information source; (3) the generation of the data model, given a target source; and (4) the extraction and presentation of information according to a given data model. In this chapter, we discuss the challenges of these issues and the approaches that current research activities have taken to revolve these issues. We propose several classification schemes to classify existing approaches of information extraction from different perspectives. Among the existing works, we focus on the Wiccap system — a software system that enables ordinary end-users to obtain information of interest in a simple and efficient manner by constructing personalized web views of information sources.


2011 ◽  
pp. 2048-2081
Author(s):  
Gijs Geleijnse ◽  
Jan Korst

In this chapter we discuss approaches to find, extract, and structure information from natural language texts on the Web. Such structured information can be expressed and shared using the standard Semantic Web languages and hence be machine interpreted. In this chapter we focus on two tasks in Web information extraction. The first part focuses on mining facts from the Web, while in the second part, we present an approach to collect community-based meta-data. A search engine is used to retrieve potentially relevant texts. From these texts, instances and relations are extracted. The proposed approaches are illustrated using various case-studies, showing that we can reliably extract information from the Web using simple techniques.


2018 ◽  
Vol 7 (4.19) ◽  
pp. 1041
Author(s):  
Santosh V. Chobe ◽  
Dr. Shirish S. Sane

There is an explosive growth of information on Internet that makes extraction of relevant data from various sources, a difficult task for its users. Therefore, to transform the Web pages into databases, Information Extraction (IE) systems are needed. Relevant information in Web documents can be extracted using information extraction and presented in a structured format.By applying information extraction techniques, information can be extracted from structured, semi-structured, and unstructured data. This paper presents some of the major information extraction tools. Here, advantages and limitations of the tools are discussed from a user’s perspective.  


Sign in / Sign up

Export Citation Format

Share Document