scholarly journals Self-supervised Automated Wrapper Generation for Weblog Data Extraction

Author(s):  
George Gkotsis ◽  
Karen Stepanyan ◽  
Alexandra I. Cristea ◽  
Mike Joy
2013 ◽  
Vol 17 (4) ◽  
pp. 827-846
Author(s):  
George Gkotsis ◽  
Karen Stepanyan ◽  
Alexandra I. Cristea ◽  
Mike Joy

2021 ◽  
Author(s):  
Chia-Hui Chang

<div>Web data extraction is a key component in many business intelligence tasks, such as data transformation, exchange, and analysis. Many approaches have been proposed, with either labeled training examples (supervised) or annotation-free training pages (unsupervised). However, most research focuses on extraction effectiveness. Not much attention has been paid to extraction efficiency. In fact, most unsupervised web data extraction ignores wrapper generation because they could work alone without any supervision. </div><div>In this paper, we argue that wrapper generation for unsupervised web data extraction is as important as supervised wrapper induction because the generated wrappers could work more efficiently without sophisticated analysis during testing. We consider two approaches for wrapper generation: schema-guided finite-state machine (FSM) approaches and data-driven machine learning (ML) approaches. We exploit unique mandatory templates to improve the FSM-based wrapper, and proposed two convolutional neural network (CNN)-based models for sequence-labeling. The experimental results show that the FSM wrapper performs well even with small training data, while the CNN-based models require more training pages to achieve the same effectiveness but are more efficient with GPU support. Furthermore, FSM wrappers can work as a filter to reduce the number of training pages and advance the learning curve for wrapper generation.</div>


2021 ◽  
Author(s):  
Chia-Hui Chang

<div>Web data extraction is a key component in many business intelligence tasks, such as data transformation, exchange, and analysis. Many approaches have been proposed, with either labeled training examples (supervised) or annotation-free training pages (unsupervised). However, most research focuses on extraction effectiveness. Not much attention has been paid to extraction efficiency. In fact, most unsupervised web data extraction ignores wrapper generation because they could work alone without any supervision. </div><div>In this paper, we argue that wrapper generation for unsupervised web data extraction is as important as supervised wrapper induction because the generated wrappers could work more efficiently without sophisticated analysis during testing. We consider two approaches for wrapper generation: schema-guided finite-state machine (FSM) approaches and data-driven machine learning (ML) approaches. We exploit unique mandatory templates to improve the FSM-based wrapper, and proposed two convolutional neural network (CNN)-based models for sequence-labeling. The experimental results show that the FSM wrapper performs well even with small training data, while the CNN-based models require more training pages to achieve the same effectiveness but are more efficient with GPU support. Furthermore, FSM wrappers can work as a filter to reduce the number of training pages and advance the learning curve for wrapper generation.</div>


Author(s):  
W.J. de Ruijter ◽  
M.R. McCartney ◽  
David J. Smith ◽  
J.K. Weiss

Further advances in resolution enhancement of transmission electron microscopes can be expected from digital processing of image data recorded with slow-scan CCD cameras. Image recording with these new cameras is essential because of their high sensitivity, extreme linearity and negligible geometric distortion. Furthermore, digital image acquisition allows for on-line processing which yields virtually immediate reconstruction results. At present, the most promising techniques for exit-surface wave reconstruction are electron holography and the recently proposed focal variation method. The latter method is based on image processing applied to a series of images recorded at equally spaced defocus.Exit-surface wave reconstruction using the focal variation method as proposed by Van Dyck and Op de Beeck proceeds in two stages. First, the complex image wave is retrieved by data extraction from a parabola situated in three-dimensional Fourier space. Then the objective lens spherical aberration, astigmatism and defocus are corrected by simply dividing the image wave by the wave aberration function calculated with the appropriate objective lens aberration coefficients which yields the exit-surface wave.


Author(s):  
Mitsuji MUNEYASU ◽  
Nayuta JINDA ◽  
Yuuya MORITANI ◽  
Soh YOSHIDA

2013 ◽  
Vol 7 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Dr Sunitha Abburu ◽  
G. Suresh Babu

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.  But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies  data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.   It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.  The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.


2019 ◽  
Vol 23 (4) ◽  
pp. 442-454 ◽  
Author(s):  
Rachel Mandela ◽  
Maggie Bellew ◽  
Paul Chumas ◽  
Hannah Nash

OBJECTIVEThere are currently no guidelines for the optimum age for surgical treatment of craniosynostosis. This systematic review summarizes and assesses evidence on whether there is an optimal age for surgery in terms of neurodevelopmental outcomes.METHODSThe databases MEDLINE, PsycINFO, CINAHL, Embase + Embase Classic, and Web of Science were searched between October and November 2016 and searches were repeated in July 2017. According to PICO (participants, intervention, comparison, outcome) criteria, studies were included that focused on: children diagnosed with nonsyndromic craniosynostosis, aged ≤ 5 years at time of surgery; corrective surgery for nonsyndromic craniosynostosis; comparison of age-at-surgery groups; and tests of cognitive and neurodevelopmental postoperative outcomes. Studies that did not compare age-at-surgery groups (e.g., those employing a correlational design alone) were excluded. Data were double-extracted by 2 authors using a modified version of the Cochrane data extraction form.RESULTSTen studies met the specified criteria; 5 found a beneficial effect of earlier surgery, and 5 did not. No study found a beneficial effect of later surgery. No study collected data on length of anesthetic exposure and only 1 study collected data on sociodemographic factors.CONCLUSIONSIt was difficult to draw firm conclusions from the results due to multiple confounding factors. There is some inconclusive evidence that earlier surgery is beneficial for patients with sagittal synostosis. The picture is even more mixed for other subtypes. There is no evidence that later surgery is beneficial. The authors recommend that future research use agreed-upon parameters for: age-at-surgery cut-offs, follow-up times, and outcome measures.


Sign in / Sign up

Export Citation Format

Share Document