table recognition
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 12)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Vol 7 (5) ◽  
pp. 1170-1188
Author(s):  
Lyu Zhigang ◽  
Wang Hongxi ◽  
Li Liangliang ◽  
Wang Peng ◽  
Li Xiaoyan

Objectives: Currently, in a large number of print-out report documents from tobacco package, there exist irregular phenomena such as discontinuous vertical lines, misplaced frame lines and multi-page tables. Thus, the existing table recognition algorithm cannot be adopted to perform digital identification. In order to solve this problem, this paper proposes a table image processing algorithm based on the dual-coding difference of Gaussians iterative clustering. Firstly, the method of local regional sub-block is used to the skew correction threshold to conduct image correction. Secondly, the corrected images are coded by rows and columns, and 2D image features are transformed into 1D image features. Thirdly, the Gaussian differenced operation is adopted to obtain effective characteristic matrices that are stable and easily distinguishable. Then the iterative clustering analysis is performed to obtain the feature values of effective frame lines. Fourthly, after finishing the tasks, such as the table positioning, inner structure reconstruction, and text information identification, the dichotomy judgmentsof the integrity of multi-page tablesare realized according to the local pixel features. Finally, the text information inside the local regions and the reconstructed regions are merged, and the digital reproduction of the multi-page tables is realized. To validate the effectiveness of the proposed algorithm, an experiment in the sample set containing 12,840 table images with different resolutionsis carried out. The average detection accuracies of table positioning, table cell reconstructionand multi-page incompleteness are 98.95%, 99.80%, and 95.85%, respectively. The experimental results show that the proposed algorithm is simple and effective, and can accomplish the digital reproduction of irregular tables.


Author(s):  
Guibin Wu ◽  
Junjie Zhou ◽  
Yongping Xiong ◽  
Chaoyi Zhou ◽  
Chong Li

AbstractUsing deep learning networks to recognize the table attracts lots of attention. However, due to the lack of high-quality table datasets, the performance of using deep learning networks is limited. Therefore, TableRobot has been proposed, an automatic annotation method for heterogeneous tables. To be more specific, the annotations of table consist of the coordinates of the item block and the mapping relationship between item blocks and table cells. In order to transform the task, we successfully design an algorithm based on the greedy approach to find the optimum solution. To evaluate the performance of TableRobot, we check the annotation data of 3000 tables collected from the LaTex documents in arXiv.com, and the result shows that TableRobot can generate table annotation datasets with the accuracy of 93.2%. Besides, the table annotation data is feed into GraphTSR which is a state-of-the-art table recognition graph neural network, and the F1 value of the network has increased by nearly 10% compared with before.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Khurram Azeem Hashmi ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
Muhammad Adnan Afzal ◽  
Muhammad Ahtsham Afzal ◽  
...  

2020 ◽  
Vol 17 (4) ◽  
pp. 3203-3223
Author(s):  
Qiaokang Liang ◽  
◽  
Jianzhong Peng ◽  
Zhengwei Li ◽  
Daqi Xie ◽  
...  

2019 ◽  
Vol 9 (19) ◽  
pp. 4162 ◽  
Author(s):  
Jin Zhang ◽  
Yanmiao Xie ◽  
Weilai Liu ◽  
Xiaoli Gong

Internet of Things (IoT) technology allows us to measure, compute, and decide about the physical world around us in a quantitative and intelligent way. It makes all kinds of intelligent IoT devices popular. We are continually perceived and recorded by intelligent IoT devices, especially vision devices such as cameras and mobile phones. However, a series of security issues have arisen in recent years. Sensitive data leakage is the most typical and harmful one. Whether we are just browsing files unintentionally in sight of high-definition (HD) security cameras, or internal ghosts are using mobile phones to photograph secret files, it causes sensitive data to be captured by intelligent IoT vision devices, resulting in irreparable damage. Although the risk of sensitive data diffusion can be reduced by optical character recognition (OCR)-based packet filtering, it is difficult to use it with sensitive data presented in table form. This is because table images captured by the intelligent IoT vision device face issues of perspective transformation, and interferences of circular stamps and irregular handwritten signatures. Therefore, a table-recognition algorithm based on a directional connected chain is proposed in this paper to solve the problem of identifying sensitive table data captured by intelligent IoT vision devices. First, a Directional Connected Chain (DCC) search algorithm is proposed for line detection. Then, valid line mergence and invalid line removal is performed for the searched DCCs to detect the table frame, to filter the irregular interferences. Finally, an inverse perspective transformation algorithm is used to restore the table after perspective transformation. Experiments show that our proposed algorithm can achieve accuracy of at least 92%, and filter stamp interference completely.


Author(s):  
Elvis Koci ◽  
Maik Thiele ◽  
Josephine Rehak ◽  
Oscar Romero ◽  
Wolfgang Lehner
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document