Automated Text and Tabular Data Extraction from Scanned Document Images

Author(s):  
Pushkar Kurhekar ◽  
Shivani Nigam ◽  
Shriram Pillai
Author(s):  
Manolis Vasileiadis ◽  
Nikolaos Kaklanis ◽  
Konstantinos Votis ◽  
Dimitrios Tzovaras
Keyword(s):  

Author(s):  
MOHAMMAD SHAFKAT AMIN ◽  
HASAN JAMIL

In the last few years, several works in the literature have addressed the problem of data extraction from web pages. The importance of this problem derives from the fact that, once extracted, data can be handled in a way similar to instances of a traditional database, which in turn can facilitate application of web data integration and various other domain specific problems. In this paper, we propose a novel table extraction technique that works on web pages generated dynamically from a back-end database. The proposed system can automatically discover table structure by relevant pattern mining from web pages in an efficient way, and can generate regular expression for the extraction process. Moreover, the proposed system can assign intuitive column names to the columns of the extracted table by leveraging Wikipedia knowledge base for the purpose of table annotation. To improve accuracy of the assignment, we exploit the structural homogeneity of the column values and their co-location information to weed out less likely candidates. This approach requires no human intervention and experimental results have shown its accuracy to be promising. Moreover, the wrapper generation algorithm works in linear time.


Author(s):  
Borra Vineetha ◽  
◽  
D. N. D. Harini ◽  
Ravi Yelesvarupu ◽  
◽  
...  

In the recent advancement, the extensive usage of electronic devices to photograph and upload documents, the requirement for extracting the information present in the unstructured document images is becoming progressively intense. The major obstacle to the objective is, these images often contain information in tabular form and extracting the data from table images presents a series of challenges due to the various layouts and encodings of the tables. It includes the accurate detection of the table present in an image and eventually recognizing the internal structure of the table and extracting the information from it. Although some progress has been made in table detection, obtaining the table contents is still a challenge since this involves more fine-grained table structure (rows and columns) recognition. The digitization of critical information has to be carried out automatically since there are millions of documents. Based on the motivation that AI-based solutions are automating many processors, this work comprises three different stages: First, the table detection using Faster R-CNN algorithm. Second, table internal structure recognition process using morphology operation and refine operation and last the table data extraction using contours algorithm. The dataset used in this work was taken from the UNLV dataset.


Author(s):  
W.J. de Ruijter ◽  
M.R. McCartney ◽  
David J. Smith ◽  
J.K. Weiss

Further advances in resolution enhancement of transmission electron microscopes can be expected from digital processing of image data recorded with slow-scan CCD cameras. Image recording with these new cameras is essential because of their high sensitivity, extreme linearity and negligible geometric distortion. Furthermore, digital image acquisition allows for on-line processing which yields virtually immediate reconstruction results. At present, the most promising techniques for exit-surface wave reconstruction are electron holography and the recently proposed focal variation method. The latter method is based on image processing applied to a series of images recorded at equally spaced defocus.Exit-surface wave reconstruction using the focal variation method as proposed by Van Dyck and Op de Beeck proceeds in two stages. First, the complex image wave is retrieved by data extraction from a parabola situated in three-dimensional Fourier space. Then the objective lens spherical aberration, astigmatism and defocus are corrected by simply dividing the image wave by the wave aberration function calculated with the appropriate objective lens aberration coefficients which yields the exit-surface wave.


Sign in / Sign up

Export Citation Format

Share Document