Extracting schema from semistructured data

Author(s):  
Svetlozar Nestorov ◽  
Serge Abiteboul ◽  
Rajeev Motwani
Keyword(s):  
2021 ◽  
Vol 1 (2) ◽  
pp. 65-77
Author(s):  
T. E. Vildanov ◽  
◽  
N. S. Ivanov ◽  

This article explores both popular and newly invented tools for extracting data from sites and converting them into a form suitable for analysis. The paper compares the Python libraries, the key criterion of the compared tools is their performance. The results will be grouped by sites, tools used and number of iterations, and then presented in graphical form. The scientific novelty of the research lies in the field of application of data extraction tools: we will receive and transform semistructured data from the websites of bookmakers and betting exchanges. The article also describes new tools that are currently not in great demand in the field of parsing and web scraping. As a result of the study, quantitative metrics were obtained for all the tools used and the libraries that were most suitable for the rapid extraction and processing of information in large quantities were selected.


Author(s):  
Tetsuhiro Miyahara ◽  
Yusuke Suzuki ◽  
Takayoshi Shoudai ◽  
Tomoyuki Uchida ◽  
Sachio Hirokawa ◽  
...  

Author(s):  
Li-Cheng Wu ◽  
Jorng-Tzong Horng ◽  
Baw-Jhiune Liu ◽  
Chin-Yea Wang ◽  
Gwo-Dong Chen
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document