Zero Based Research on Text Watermark Algorithm

2013 ◽  
Vol 717 ◽  
pp. 844-848
Author(s):  
Qing Jun Wang ◽  
Shou Jin Wang

Previously there was algorithm shortage about making watermark hidden in semantic meaning of text and text formatting, this paper presents a new zero-watermark algorithm based on English text and the algorithm is suitable for Chinese text as well. Firstly, the algorithm determines the textual characteristics through the punctuation mark of texts. Then add the watermark information into the host through coring the information between cryptographic watermark and textual characteristics. The algorithm has a good concealment and strong robustness.

2013 ◽  
Vol 12 (11) ◽  
pp. 2130-2137
Author(s):  
Xingming Sun ◽  
Shufang Wang ◽  
Zhihua Xia ◽  
Xinhui Wang

2012 ◽  
Vol 98 (1) ◽  
pp. 75-85 ◽  
Author(s):  
Jiří Maršík ◽  
Ondřej Bojar

TrTok: A Fast and Trainable Tokenizer for Natural Languages We present a universal data-driven tool for segmenting and tokenizing text. The presented tokenizer lets the user define where token and sentence boundaries should be considered. These instances are then judged by a classifier which is trained from provided tokenized data. The features passed to the classifier are also defined by the user making, e.g., the inclusion of abbreviation lists trivial. This level of customizability makes the tokenizer a versatile tool which we show is capable of sentence detection in English text as well as word segmentation in Chinese text. In the case of English sentence detection, the system outperforms previous methods. The software is available as an open-source project on GitHub1.


Interpreting ◽  
2021 ◽  
Author(s):  
Chao Han ◽  
Rui Xiao ◽  
Wei Su

Abstract The study reported on in this article pertains to rater-mediated assessment of English-to-Chinese consecutive interpreting, particularly informational correspondence between an originally intended message and an actually rendered message, also known as “fidelity” in Interpreting Studies. Previous literature has documented two main methods to assess fidelity: comparing actual renditions with the source text or with an exemplar rendition carefully prepared by experts (i.e., an ideal target text). However, little is known about the potential effects of these methods on fidelity assessment. We therefore conducted the study to explore the way in which these methods would affect rater reliability, fidelity ratings and rater perception. Our analysis of quantitative data shows that the raters tended to be less reliable, less self-consistent, less lenient and less comfortable when using the source English text (i.e., Condition A) than when using the target Chinese text (i.e., Condition B: the exemplar rendition). These findings were backed up and explained by emerging themes derived from the qualitative questionnaire data. The fidelity estimates in the two conditions were also found to be strongly correlated. We discuss these findings and entertain the possibility of recruiting untrained monolinguals or bilinguals to assess fidelity of interpreting.


2013 ◽  
Vol 444-445 ◽  
pp. 1713-1717
Author(s):  
Jing Jing Jiang ◽  
Xiao Yu Wang ◽  
Xiang Wei Mu ◽  
Jia Xing Hu ◽  
You Qin Zhu

Text mining is the task of automatic discovery of new, previously unknown information from unstructured document collections. Vector space or bag of words representation is one of the mainstream descriptions of text, in which each document is a data point in high-dimensional space and order between words is omitted. Generative models are probabilistic representation of data that can be regarded as the generator of observed data. Being probabilistic modelling approaches, a set of methods and criterions are available for model estimation, inference, comparison and selection for generative models. In this paper, we review several existing probabilistic models that are commonly applied to discrete exchangeable collections in English text. We hope this will shed some light on the Chinese text modelling and mining tasks.


2013 ◽  
Vol 380-384 ◽  
pp. 2854-2857
Author(s):  
Hu Li ◽  
Peng Zou ◽  
Wei Hong Han

Information explosion brings lots of challenges to text classification. The dimension disaster led to a sharp increase of computational complexity and lower classification accuracy. Therefore, it is critical to use feature selection techniques before actual classification. Automatic classification of English text has been researched for many years, but little on Chinese text. In this paper, several classic feature selection methods, namely TF, IG and CHI, are compared on classifying Chinese text. Meanwhile, we take imbalanced data into consideration in the paper. Experimental results show that CHI performed better than IG and TF when the dataset is imbalanced, but no obvious difference on balanced data.


2020 ◽  
pp. 1-12
Author(s):  
Li Dongmei

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.


Sign in / Sign up

Export Citation Format

Share Document