web data
Recently Published Documents


TOTAL DOCUMENTS

1362
(FIVE YEARS 196)

H-INDEX

37
(FIVE YEARS 3)

2021 ◽  
Vol 10 (12) ◽  
pp. 832
Author(s):  
Xiangfu Meng ◽  
Lin Zhu ◽  
Qing Li ◽  
Xiaoyan Zhang

Resource Description Framework (RDF), as a standard metadata description framework proposed by the World Wide Web Consortium (W3C), is suitable for modeling and querying Web data. With the growing importance of RDF data in Web data management, there is an increasing need for modeling and querying RDF data. Previous approaches mainly focus on querying RDF. However, a large amount of RDF data have spatial and temporal features. Therefore, it is important to study spatiotemporal RDF data query approaches. In this paper, firstly, we formally define spatiotemporal RDF data, and construct a spatiotemporal RDF model st-RDF that is used to represent and manipulate spatiotemporal RDF data. Secondly, we present a spatiotemporal RDF query algorithm stQuery based on subgraph matching. This algorithm can quickly determine whether the query result is empty for queries whose temporal or spatial range exceeds a specific range by adopting a preliminary query filtering mechanism in the query process. Thirdly, we propose a sorting strategy that calculates the matching order of query nodes to speed up the subgraph matching. Finally, we conduct experiments in terms of effect and query efficiency. The experimental results show the performance advantages of our approach.


2021 ◽  
pp. 599-608
Author(s):  
Priyanka C. Nair ◽  
Deepa Gupta ◽  
B. Indira Devi
Keyword(s):  

2021 ◽  
Vol 22 (2) ◽  
Author(s):  
Rofikhotul Khoeriyah ◽  
Nia Kurniadin
Keyword(s):  
Web Data ◽  

Coffee Shop merupakan tempat yang banyak diminati oleh masyarakat Kota Samarinda. Terdapat beberapa perbedaan antara Coffee Shop dengan kedai kopi atau warung kopi, antara lain dari segi konsep, desain interior, sarana dan prasarana, menu dan segmen pasar. Akan tetapi masyarakat dihadapkan dengan permasalahan dalam mengetahui lokasi serta informasi yang ada pada Coffee Shop. Dengan demikian diperlukan sarana informasi yang dapat diakses oleh umum, salah satu cara dengan pembuatan peta informasi berbasis Web yaitu WebGIS. Tujuan dari kegiatan penelitian ini yaitu untuk memberikan informasi lokasi dan informasi lainnya tentang Coffee Shop yang ada di Samarinda, serta penyajiannya dalam bentuk peta informasi berbasis Web. Data yang dikumpulkan berupa nilai titik koordinat dari hasil pengamatan di lapangan, serta beberapa informasi mengenai Coffee Shop dari media sosial masing-masing Coffee Shop, yang kemudian diolah menggunakan perangkat lunak Quantum GIS menjadi peta informasi berbasis Web. Hasil penelitian menunjukkan bahwa terdapat 49 Coffee Shop yang tersebar di Kota Samarinda dan data tersebut disajikan dalam bentuk WebGIS yang disertai informasi yang ada pada masing-masing Coffee Shop tersebut.


2021 ◽  
pp. 1-22
Author(s):  
Sudhir Kumar Patnaik ◽  
C. Narendra Babu

Web data extraction has seen significant development in the last decade since its inception in the early nineties. It has evolved from a simple manual way of extracting data from web page and documents to automated extraction to an intelligent extraction using machine learning algorithms, tools and techniques. Data extraction is one of the key components of end-to-end life cycle in web data extraction process that includes navigation, extraction, data enrichment and visualization. This paper presents the journey of web data extraction over the last many years highlighting evolution of tools, techniques, frameworks and algorithms for building intelligent web data extraction systems. The paper also throws light into challenges, opportunities for future research and emerging trends over the years in web data extraction with specific focus on machine learning techniques. Both traditional and machine learning approaches to manual and automated web data extraction are experimented and results published with few use cases demonstrating the challenges in web data extraction in the event of changes in the website layout. This paper introduces novel ideas such as self-healing capability in web data extraction and proactive error detection in the event of changes in website layout as an area of future research. This unique perspective will help readers to get deeper insights in to the present and future of web data extraction.


2021 ◽  
pp. 46-70
Author(s):  
Stefan Bosse ◽  
Lena Dahlhaus ◽  
Uwe Engel
Keyword(s):  

2021 ◽  
Vol 2066 (1) ◽  
pp. 012033
Author(s):  
Guilian Feng

Abstract With the arrival of the era of big data, people have gradually realized the importance of data. Data is not just a resource, it is an asset. This paper mainly studies the realization of Web data mining technology based on Python. This paper analyzes the overall architecture design of distributed web crawler system, and then analyzes in detail the principles of crawler’s URL function module, crawler’s web crawl function module, crawler’s web page parsing function module, crawler’s data storage function module and so on. Each function module of the crawler system was tested on the experimental computer, and the data information was summarized for comparative analysis. The main significance of this paper lies in the design and implementation of a distributed web crawler system, which, to a certain extent, solves the problems of slow speed, low efficiency and poor scalability of traditional single computer web crawler, and improves the speed and efficiency of web crawler in grasping information and web page data.


Sign in / Sign up

Export Citation Format

Share Document