Flint: From Web Pages to Probabilistic Semantic Data

Author(s):  
Lorenzo Blanco ◽  
Mirco Bronzi ◽  
Valter Crescenzi ◽  
Paolo Merialdo ◽  
Paolo Papotti
Author(s):  
Ming-Cheng Tsou

The World Wide Web (WWW) offers an enormous wealth of information and data, and assembles a tremendous amount of knowledge. Much of this knowledge, however, comprises either non-structured data or semistructured data. To make use of these unexploited or underexploited resources more efficiently, the management of information and data gathering has become an essential task for research and development. In this paper, the author examines the task of researching a hostel or homestay using the Google search web service as a base search engine. From the search results, mining, retrieving and sorting out location and semantic data were carried out by combining the Chinese Word Segmentation System with text mining technology to find geographic information gleaned from web pages. The results obtained from this particular searching method allowed users to get closer to the answers they sought and achieve greater accuracy, as the results included graphics and textual geographic information. In the future, this method may be suitable for and applicable to various types of queries, analyses, geographic data collection, and in managing spatial knowledge related to different keywords within a document.


2020 ◽  
Vol 23 (3) ◽  
pp. 494-513
Author(s):  
Evgeny L’vovich Kitaev ◽  
Rimma Yuryevna Skornyakova

The semantic markups of the World Wide Web have accumulated a large amount of data and their number continues to grow. However, the potential of these data is, in our opinion, not fully utilized. The semantic markups contents are widely used by search systems, partly by social networks, but the usual approach to using that data by application developers is based on converting data to RDF standard and executing SPARQL queries, which requires good knowledge of this language and programming skills. In this paper, we propose to leverage the semantic markups available on the Web to automatically incorporate their contents to the content of other web pages. We also present a software tool for implementing such incorporation that does not require a web page developer to have knowledge of any programming languages ​​other than HTML and CSS. The developed tool does not require installation, the work is performed by JavaScript plugins. Currently, the tool supports semantic data contained in the popular types of semantic markups “microdata” and JSON-LD, in the tags of HTML documents and the properties of Word and PDF documents.


Author(s):  
Ming-Cheng Tsou

The World Wide Web (WWW) offers an enormous wealth of information and data, and assembles a tremendous amount of knowledge. Much of this knowledge, however, comprises either non-structured data or semi-structured data. To make use of these unexploited or underexploited resources more efficiently, the management of information and data gathering has become an essential task for research and development. In this paper, the author examines the task of researching a hostel or homestay using the Google search web service as a base search engine. From the search results, mining, retrieving and sorting out location and semantic data were carried out by combining the Chinese Word Segmentation System with text mining technology to find geographic information gleaned from web pages. The results obtained from this particular searching method allowed users to get closer to the answers they sought and achieve greater accuracy, as the results included graphics and textual geographic information. In the future, this method may be suitable for and applicable to various types of queries, analyses, geographic data collection, and in managing spatial knowledge related to different keywords within a document.


Crisis ◽  
2018 ◽  
Vol 39 (3) ◽  
pp. 197-204 ◽  
Author(s):  
Hajime Sueki ◽  
Jiro Ito

Abstract. Background: Gatekeeper training is an effective suicide prevention strategy. However, the appropriate targets of online gatekeeping have not yet been clarified. Aim: We examined the association between the outcomes of online gatekeeping using the Internet and the characteristics of consultation service users. Method: An advertisement to encourage the use of e-mail-based psychological consultation services among viewers was placed on web pages that showed the results of searches using suicide-related keywords. All e-mails received between October 2014 and December 2015 were replied to as part of gatekeeping, and the obtained data (responses to an online questionnaire and the content of the received e-mails) were analyzed. Results: A total of 154 consultation service users were analyzed, 35.7% of whom were male. The median age range was 20–29 years. Online gatekeeping was significantly more likely to be successful when such users faced financial/daily life or workplace problems, or revealed their names (including online names). By contrast, the activity was more likely to be unsuccessful when it was impossible to assess the problems faced by consultation service users. Conclusion: It may be possible to increase the success rate of online gatekeeping by targeting individuals facing financial/daily life or workplace problems with marked tendencies for self-disclosure.


2012 ◽  
Vol 2 (9) ◽  
pp. 148-150 ◽  
Author(s):  
Marriboyina Rajendra ◽  
◽  
S. Suresh Babu

2013 ◽  
Vol 7 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Dr Sunitha Abburu ◽  
G. Suresh Babu

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.  But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies  data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.   It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.  The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.


Sign in / Sign up

Export Citation Format

Share Document