Digital document analytics using logistic regressive and deep transition-based dependency parsing

Author(s):  
D. Rekha ◽  
J. Sangeetha ◽  
V. Ramaswamy
Author(s):  
Qinyuan Xiang ◽  
Weijiang Li ◽  
Hui Deng ◽  
Feng Wang

Author(s):  
Elaine G. Toms ◽  
D. Grant Campbell

Documents have conventions which have evolved within discourse communities and which facilitate document use. These conventions are represented in a document by visual cues that define a shape and serve as an interface metaphor in a user's interaction with a digital document. In this paper we report on the results of two studies, one of which examined the impact of . . .


Author(s):  
Cunli Mao ◽  
Zhibo Man ◽  
Zhengtao Yu ◽  
Zhenhan Wang ◽  
Shengxiang Gao ◽  
...  

Author(s):  
Shumin Shi ◽  
Dan Luo ◽  
Xing Wu ◽  
Congjun Long ◽  
Heyan Huang

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.


2009 ◽  
Vol E92-D (10) ◽  
pp. 2122-2136 ◽  
Author(s):  
Sutee SUDPRASERT ◽  
Asanee KAWTRAKUL ◽  
Christian BOITET ◽  
Vincent BERMENT

2013 ◽  
Vol 321-324 ◽  
pp. 2609-2612
Author(s):  
Yan Liang ◽  
Gao Yan ◽  
Chun Xia Qi

Digital watermarking has been proposed as a solution to the problem of copyright protection of multimedia data in a networked environment. It makes possible to tightly associated to a digital document a code allowing the identification of the data creator, owner, authorized consumer, and so on. In this paper a new DCT-domain system for digital watermarking algorithm for digital images is presented: the method, which operates in the frequency domain, embeds a pseudo-random sequence of scrambled image in a selected set of DCT coefficients. After embedding, the watermark is adapted to the image by exploiting the masking characteristics of the human visual system, thus ensuring the watermark invisibility. By exploiting the statistical properties of the embedded sequence, the mark can be reliably extracted without resorting to the original uncorrupted image. Experimental results demonstrate that the watermark is robust to several signal processing techniques, including JPEG compression, cut, fuzzy, addition of noise, and sharpen.


Sign in / Sign up

Export Citation Format

Share Document