Zero Based Research on Text Watermark Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.717.844 ◽

2013 ◽

Vol 717 ◽

pp. 844-848

Author(s):

Qing Jun Wang ◽

Shou Jin Wang

Keyword(s):

Chinese Text ◽

English Text ◽

Semantic Meaning ◽

Punctuation Mark ◽

Strong Robustness ◽

Watermark Algorithm

Previously there was algorithm shortage about making watermark hidden in semantic meaning of text and text formatting, this paper presents a new zero-watermark algorithm based on English text and the algorithm is suitable for Chinese text as well. Firstly, the algorithm determines the textual characteristics through the punctuation mark of texts. Then add the watermark information into the host through coring the information between cryptographic watermark and textual characteristics. The algorithm has a good concealment and strong robustness.

Download Full-text

Skeleton-based Chinese Text Image Watermark Algorithm Robust to Printing and Scanning

Information Technology Journal ◽

10.3923/itj.2013.2130.2137 ◽

2013 ◽

Vol 12 (11) ◽

pp. 2130-2137

Author(s):

Xingming Sun ◽

Shufang Wang ◽

Zhihua Xia ◽

Xinhui Wang

Keyword(s):

Chinese Text ◽

Image Watermark ◽

Watermark Algorithm

Download Full-text

A Chinese text watermark algorithm based on pOLYPHONE

Proceedings of 2011 Cross Strait Quad-Regional Radio Science and Wireless Technology Conference ◽

10.1109/csqrwc.2011.6037180 ◽

2011 ◽

Author(s):

Wenbin Fei ◽

Xianghong Tang

Keyword(s):

Chinese Text ◽

Watermark Algorithm

Download Full-text

A Chinese text classification model based on vector space and semantic meaning

Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826) ◽

10.1109/icmlc.2004.1382361 ◽

2005 ◽

Author(s):

Bao-Yi Wang ◽

Shao-Min Zhang

Keyword(s):

Vector Space ◽

Chinese Text ◽

Text Classification ◽

Classification Model ◽

Semantic Meaning ◽

Chinese Text Classification ◽

Model Based

Download Full-text

TrTok: A Fast and Trainable Tokenizer for Natural Languages

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-012-0010-0 ◽

2012 ◽

Vol 98 (1) ◽

pp. 75-85 ◽

Cited By ~ 3

Author(s):

Jiří Maršík ◽

Ondřej Bojar

Keyword(s):

Open Source ◽

Chinese Text ◽

Word Segmentation ◽

Data Driven ◽

English Text ◽

Natural Languages ◽

English Sentence ◽

Open Source Project ◽

Versatile Tool

TrTok: A Fast and Trainable Tokenizer for Natural Languages We present a universal data-driven tool for segmenting and tokenizing text. The presented tokenizer lets the user define where token and sentence boundaries should be considered. These instances are then judged by a classifier which is trained from provided tokenized data. The features passed to the classifier are also defined by the user making, e.g., the inclusion of abbreviation lists trivial. This level of customizability makes the tokenizer a versatile tool which we show is capable of sentence detection in English text as well as word segmentation in Chinese text. In the case of English sentence detection, the system outperforms previous methods. The software is available as an open-source project on GitHub1.

Download Full-text

Research of English Text Classification Methods Based on Semantic Meaning

2005 International Conference on Information and Communication Technology ◽

10.1109/itict.2005.1609660 ◽

2006 ◽

Cited By ~ 2

Author(s):

Lin Lv ◽

Yu-Shu Liu

Keyword(s):

Text Classification ◽

English Text ◽

Classification Methods ◽

Semantic Meaning

Download Full-text

An Efficient Audio Watermark Algorithm with Strong Robustness

2008 International Conference on Computational Intelligence and Security ◽

10.1109/cis.2008.56 ◽

2008 ◽

Cited By ~ 2

Author(s):

Tong Ming ◽

Yan Tao ◽

Hongbing Ji

Keyword(s):

Strong Robustness ◽

Audio Watermark ◽

Watermark Algorithm

Download Full-text

Assessing the fidelity of consecutive interpreting

Interpreting ◽

10.1075/intp.00058.han ◽

2021 ◽

Author(s):

Chao Han ◽

Rui Xiao ◽

Wei Su

Keyword(s):

Chinese Text ◽

English Text ◽

Strongly Correlated ◽

Questionnaire Data ◽

Source Text ◽

Fidelity Assessment ◽

Self Consistent ◽

Emerging Themes ◽

Consecutive Interpreting ◽

Condition B

Abstract The study reported on in this article pertains to rater-mediated assessment of English-to-Chinese consecutive interpreting, particularly informational correspondence between an originally intended message and an actually rendered message, also known as “fidelity” in Interpreting Studies. Previous literature has documented two main methods to assess fidelity: comparing actual renditions with the source text or with an exemplar rendition carefully prepared by experts (i.e., an ideal target text). However, little is known about the potential effects of these methods on fidelity assessment. We therefore conducted the study to explore the way in which these methods would affect rater reliability, fidelity ratings and rater perception. Our analysis of quantitative data shows that the raters tended to be less reliable, less self-consistent, less lenient and less comfortable when using the source English text (i.e., Condition A) than when using the target Chinese text (i.e., Condition B: the exemplar rendition). These findings were backed up and explained by emerging themes derived from the qualitative questionnaire data. The fidelity estimates in the two conditions were also found to be strongly correlated. We discuss these findings and entertain the possibility of recruiting untrained monolinguals or bilinguals to assess fidelity of interpreting.

Download Full-text

The Study of Generative Modeling of Text

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.444-445.1713 ◽

2013 ◽

Vol 444-445 ◽

pp. 1713-1717

Author(s):

Jing Jing Jiang ◽

Xiao Yu Wang ◽

Xiang Wei Mu ◽

Jia Xing Hu ◽

You Qin Zhu

Keyword(s):

Chinese Text ◽

Probabilistic Models ◽

Dimensional Space ◽

Generative Models ◽

English Text ◽

Probabilistic Modelling ◽

Probabilistic Representation ◽

Document Collections ◽

Generative Modeling ◽

Selection For

Text mining is the task of automatic discovery of new, previously unknown information from unstructured document collections. Vector space or bag of words representation is one of the mainstream descriptions of text, in which each document is a data point in high-dimensional space and order between words is omitted. Generative models are probabilistic representation of data that can be regarded as the generator of observed data. Being probabilistic modelling approaches, a set of methods and criterions are available for model estimation, inference, comparison and selection for generative models. In this paper, we review several existing probabilistic models that are commonly applied to discrete exchangeable collections in English text. We hope this will shed some light on the Chinese text modelling and mining tasks.

Download Full-text

A Comparative Study on Feature Selection in Chinese Text Classification Problem

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.2854 ◽

2013 ◽

Vol 380-384 ◽

pp. 2854-2857

Author(s):

Hu Li ◽

Peng Zou ◽

Wei Hong Han

Keyword(s):

Feature Selection ◽

Chinese Text ◽

Text Classification ◽

Imbalanced Data ◽

Classification Problem ◽

English Text ◽

Feature Selection Techniques ◽

Actual Classification ◽

Better Than

Information explosion brings lots of challenges to text classification. The dimension disaster led to a sharp increase of computational complexity and lower classification accuracy. Therefore, it is critical to use feature selection techniques before actual classification. Automatic classification of English text has been researched for many years, but little on Chinese text. In this paper, several classic feature selection methods, namely TF, IG and CHI, are compared on classifying Chinese text. Meanwhile, we take imbalanced data into consideration in the paper. Experimental results show that CHI performed better than IG and TF when the dataset is imbalanced, but no obvious difference on balanced data.

Download Full-text

Design of English text-to-speech conversion algorithm based on machine learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189238 ◽

2020 ◽

pp. 1-12

Author(s):

Li Dongmei

Keyword(s):

Machine Learning ◽

Speech Synthesis ◽

Feature Recognition ◽

Learning Algorithm ◽

Morphological Structure ◽

English Text ◽

Text To Speech ◽

Part Of Speech ◽

Modern Computer ◽

Conversion Algorithm

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.

Download Full-text