scholarly journals FADIT: Fast Document Image Thresholding

Algorithms ◽  
2020 ◽  
Vol 13 (2) ◽  
pp. 46
Author(s):  
Yufang Min ◽  
Yaonan Zhang

We propose a fast document image thresholding method (FADIT) and evaluations of the two classic methods for demonstrating the effectiveness of FADIT. We put forward two assumptions: (1) the probability of the occurrence of grayscale text and background is ideally two constants, and (2) a pixel with a low grayscale has a high probability of being classified as text and a pixel with a high grayscale has a high probability of being classified as background. With the two assumptions, a new criterion function is applied to document image thresholding in the Bayesian framework. The effectiveness of the method has been borne of a quantitative metric as well as qualitative comparisons with the state-of-the-art methods.

2017 ◽  
Vol 2 (1) ◽  
pp. 299-316 ◽  
Author(s):  
Cristina Pérez-Benito ◽  
Samuel Morillas ◽  
Cristina Jordán ◽  
J. Alberto Conejero

AbstractIt is still a challenge to improve the efficiency and effectiveness of image denoising and enhancement methods. There exists denoising and enhancement methods that are able to improve visual quality of images. This is usually obtained by removing noise while sharpening details and improving edges contrast. Smoothing refers to the case of denoising when noise follows a Gaussian distribution.Both operations, smoothing noise and sharpening, have an opposite nature. Therefore, there are few approaches that simultaneously respond to both goals. We will review these methods and we will also provide a detailed study of the state-of-the-art methods that attack both problems in colour images, separately.


2017 ◽  
Vol 108 (1) ◽  
pp. 307-318 ◽  
Author(s):  
Eleftherios Avramidis

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.


2022 ◽  
Vol 134 ◽  
pp. 103548
Author(s):  
Bianca Caiazzo ◽  
Mario Di Nardo ◽  
Teresa Murino ◽  
Alberto Petrillo ◽  
Gianluca Piccirillo ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (12) ◽  
pp. 3603
Author(s):  
Dasol Jeong ◽  
Hasil Park ◽  
Joongchol Shin ◽  
Donggoo Kang ◽  
Joonki Paik

Person re-identification (Re-ID) has a problem that makes learning difficult such as misalignment and occlusion. To solve these problems, it is important to focus on robust features in intra-class variation. Existing attention-based Re-ID methods focus only on common features without considering distinctive features. In this paper, we present a novel attentive learning-based Siamese network for person Re-ID. Unlike existing methods, we designed an attention module and attention loss using the properties of the Siamese network to concentrate attention on common and distinctive features. The attention module consists of channel attention to select important channels and encoder-decoder attention to observe the whole body shape. We modified the triplet loss into an attention loss, called uniformity loss. The uniformity loss generates a unique attention map, which focuses on both common and discriminative features. Extensive experiments show that the proposed network compares favorably to the state-of-the-art methods on three large-scale benchmarks including Market-1501, CUHK03 and DukeMTMC-ReID datasets.


Author(s):  
Jianwen Jiang ◽  
Di Bao ◽  
Ziqiang Chen ◽  
Xibin Zhao ◽  
Yue Gao

3D shape retrieval has attracted much attention and become a hot topic in computer vision field recently.With the development of deep learning, 3D shape retrieval has also made great progress and many view-based methods have been introduced in recent years. However, how to represent 3D shapes better is still a challenging problem. At the same time, the intrinsic hierarchical associations among views still have not been well utilized. In order to tackle these problems, in this paper, we propose a multi-loop-view convolutional neural network (MLVCNN) framework for 3D shape retrieval. In this method, multiple groups of views are extracted from different loop directions first. Given these multiple loop views, the proposed MLVCNN framework introduces a hierarchical view-loop-shape architecture, i.e., the view level, the loop level, and the shape level, to conduct 3D shape representation from different scales. In the view-level, a convolutional neural network is first trained to extract view features. Then, the proposed Loop Normalization and LSTM are utilized for each loop of view to generate the loop-level features, which considering the intrinsic associations of the different views in the same loop. Finally, all the loop-level descriptors are combined into a shape-level descriptor for 3D shape representation, which is used for 3D shape retrieval. Our proposed method has been evaluated on the public 3D shape benchmark, i.e., ModelNet40. Experiments and comparisons with the state-of-the-art methods show that the proposed MLVCNN method can achieve significant performance improvement on 3D shape retrieval tasks. Our MLVCNN outperforms the state-of-the-art methods by the mAP of 4.84% in 3D shape retrieval task. We have also evaluated the performance of the proposed method on the 3D shape classification task where MLVCNN also achieves superior performance compared with recent methods.


2020 ◽  
Vol 10 (14) ◽  
pp. 4791 ◽  
Author(s):  
Pedro Narváez ◽  
Steven Gutierrez ◽  
Winston S. Percybrooks

A system for the automatic classification of cardiac sounds can be of great help for doctors in the diagnosis of cardiac diseases. Generally speaking, the main stages of such systems are (i) the pre-processing of the heart sound signal, (ii) the segmentation of the cardiac cycles, (iii) feature extraction and (iv) classification. In this paper, we propose methods for each of these stages. The modified empirical wavelet transform (EWT) and the normalized Shannon average energy are used in pre-processing and automatic segmentation to identify the systolic and diastolic intervals in a heart sound recording; then, six power characteristics are extracted (three for the systole and three for the diastole)—the motivation behind using power features is to achieve a low computational cost to facilitate eventual real-time implementations. Finally, different models of machine learning (support vector machine (SVM), k-nearest neighbor (KNN), random forest and multilayer perceptron) are used to determine the classifier with the best performance. The automatic segmentation method was tested with the heart sounds from the Pascal Challenge database. The results indicated an error (computed as the sum of the differences between manual segmentation labels from the database and the segmentation labels obtained by the proposed algorithm) of 843,440.8 for dataset A and 17,074.1 for dataset B, which are better values than those reported with the state-of-the-art methods. For automatic classification, 805 sample recordings from different databases were used. The best accuracy result was 99.26% using the KNN classifier, with a specificity of 100% and a sensitivity of 98.57%. These results compare favorably with similar works using the state-of-the-art methods.


2020 ◽  
Vol 34 (07) ◽  
pp. 11053-11060
Author(s):  
Linjiang Huang ◽  
Yan Huang ◽  
Wanli Ouyang ◽  
Liang Wang

In this paper, we propose a weakly supervised temporal action localization method on untrimmed videos based on prototypical networks. We observe two challenges posed by weakly supervision, namely action-background separation and action relation construction. Unlike the previous method, we propose to achieve action-background separation only by the original videos. To achieve this, a clustering loss is adopted to separate actions from backgrounds and learn intra-compact features, which helps in detecting complete action instances. Besides, a similarity weighting module is devised to further separate actions from backgrounds. To effectively identify actions, we propose to construct relations among actions for prototype learning. A GCN-based prototype embedding module is introduced to generate relational prototypes. Experiments on THUMOS14 and ActivityNet1.2 datasets show that our method outperforms the state-of-the-art methods.


Electronics ◽  
2018 ◽  
Vol 7 (10) ◽  
pp. 258 ◽  
Author(s):  
Abdus Hassan ◽  
Umar Afzaal ◽  
Tooba Arifeen ◽  
Jeong Lee

Recently, concurrent error detection enabled through invariant relationships between different wires in a circuit has been proposed. Because there are many such implications in a circuit, selection strategies have been developed to select the most valuable implications for inclusion in the checker hardware such that a sufficiently high probability of error detection ( P d e t e c t i o n ) is achieved. These algorithms, however, due to their heuristic nature cannot guarantee a lossless P d e t e c t i o n . In this paper, we develop a new input-aware implication selection algorithm with the help of ATPG which minimizes loss on P d e t e c t i o n . In our algorithm, the detectability of errors for each candidate implication is carefully evaluated using error prone vectors. The evaluation results are then utilized to select the most efficient candidates for achieving optimal P d e t e c t i o n . The experimental results on 15 representative combinatorial benchmark circuits from the MCNC benchmarks suite show that the implications selected from our algorithm achieve better P d e t e c t i o n in comparison to the state of the art. The proposed method also offers better performance, up to 41.10%, in terms of the proposed impact-level metric, which is the ratio of achieved P d e t e c t i o n to the implication count.


Author(s):  
Gaetano Rossiello ◽  
Alfio Gliozzo ◽  
Michael Glass

We propose a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. We collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. This dataset is adopted to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. The model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a relation extraction task.


Sign in / Sign up

Export Citation Format

Share Document