Ensemble Classification System for Scientific Chart Recognition from PDF Files

Author(s):  
S. Nagarajan ◽  
V. Karthikeyani

Portable Document Format (PDF) is the most frequently used universal document format on the Internet and E-Publishing. Wide usage of PDF files has increased the need of conversion tools that convert PDF file content to text or HTML formats. A PDF converter can be categorized into two domains, namely, text recognition and graphics recognition. This paper focus on graphic recognition, especially chart type identification, which is concerned with developing algorithms that has the ability to determine the type of a given chart image from a PDF file. In the proposed system, initially an enhanced connected component and statistical feature based method is used to separate the chart region from other regions. The chart region is then analyzed and grouped as either 2-dimensional or 3-dimensional chart. After separating the graphic component from the text components, feature extraction is performed. The features can be grouped as object features, texture features and shape features. The combined feature vector is then classified using ensemble classification system. Experimental results show that the chart separation, feature extraction and ensemble classification models significantly improve the quality of chart identification.

PLoS ONE ◽  
2013 ◽  
Vol 8 (9) ◽  
pp. e69446 ◽  
Author(s):  
David G. Barnes ◽  
Michail Vidiassov ◽  
Bernhard Ruthensteiner ◽  
Christopher J. Fluke ◽  
Michelle R. Quayle ◽  
...  

2021 ◽  
Vol 7 (3) ◽  
pp. 51
Author(s):  
Emanuela Paladini ◽  
Edoardo Vantaggiato ◽  
Fares Bougourzi ◽  
Cosimo Distante ◽  
Abdenour Hadid ◽  
...  

In recent years, automatic tissue phenotyping has attracted increasing interest in the Digital Pathology (DP) field. For Colorectal Cancer (CRC), tissue phenotyping can diagnose the cancer and differentiate between different cancer grades. The development of Whole Slide Images (WSIs) has provided the required data for creating automatic tissue phenotyping systems. In this paper, we study different hand-crafted feature-based and deep learning methods using two popular multi-classes CRC-tissue-type databases: Kather-CRC-2016 and CRC-TP. For the hand-crafted features, we use two texture descriptors (LPQ and BSIF) and their combination. In addition, two classifiers are used (SVM and NN) to classify the texture features into distinct CRC tissue types. For the deep learning methods, we evaluate four Convolutional Neural Network (CNN) architectures (ResNet-101, ResNeXt-50, Inception-v3, and DenseNet-161). Moreover, we propose two Ensemble CNN approaches: Mean-Ensemble-CNN and NN-Ensemble-CNN. The experimental results show that the proposed approaches outperformed the hand-crafted feature-based methods, CNN architectures and the state-of-the-art methods in both databases.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Chun-Cheng Lin ◽  
Chun-Min Yang

This study developed an automatic heartbeat classification system for identifying normal beats, supraventricular ectopic beats, and ventricular ectopic beats based on normalized RR intervals and morphological features. The proposed heartbeat classification system consists of signal preprocessing, feature extraction, and linear discriminant classification. First, the signal preprocessing removed the high-frequency noise and baseline drift of the original ECG signal. Then the feature extraction derived the normalized RR intervals and two types of morphological features using wavelet analysis and linear prediction modeling. Finally, the linear discriminant classifier combined the extracted features to classify heartbeats. A total of 99,827 heartbeats obtained from the MIT-BIH Arrhythmia Database were divided into three datasets for the training and testing of the optimized heartbeat classification system. The study results demonstrate that the use of the normalized RR interval features greatly improves the positive predictive accuracy of identifying the normal heartbeats and the sensitivity for identifying the supraventricular ectopic heartbeats in comparison with the use of the nonnormalized RR interval features. In addition, the combination of the wavelet and linear prediction morphological features has higher global performance than only using the wavelet features or the linear prediction features.


Sign in / Sign up

Export Citation Format

Share Document