scholarly journals Features of using Invisible Signs in the Word Environment for Hiding Data

Considered digital steganography is the direction of classical steganography based on concealing or introducing additional information into digital objects, while causing some distortions of these objects. At the same time, images, audio, video, network packets, etc. can be used as objects or containers. Recently, there has been a lot of publication in the field of information hiding in a text container. To embed a secret, steganographic methods rely on redundant information about the used covering media or properties that the human perception system cannot distinguish. Since text documents are widely used in organizations, using a text document as a storage medium may be the preferred choice in such an environment. On the other hand, the choice of using a text document as a storage medium is the most difficult, since it contains less redundant information. In this article, we present textual steganography using invisible characters in a word processor.

Author(s):  
N.R. Zaynalov ◽  
U.Kh. Narzullaev ◽  
A.N. Muhamadiev ◽  
I.R. Rahmatullaev ◽  
R.K. Buranov

Steganography develops tools and methods for hiding the fact of message transmission. The first traces of steganographic methods are lost in ancient times. For example, there is a known method of hiding a written message: the slave's head was shaved, a message was written on the scalp, and after the hair grew back, the slave was sent to the addressee. From detective works, various methods of secret writing between the lines of ordinary text are well known: from milk to complex chemical reagents with subsequent processing. Digital steganography is based on hiding or embedding additional information in digital objects while causing some distortion of these objects. In this case, text, images, audio, video, network packets, and so on can be used as objects or containers. To embed a secret message, steganographic methods rely on redundant container information or properties that the human perception system cannot distinguish. Recently, there has been a lot of research in the field of hiding information in a text container, since many organizations widely use text documents. Based on this, here the MS Word document is considered as a medium of information. MS Word documents have different parameters, and by changing these parameters or properties, you can achieve data embedding. In the same article, we present steganography using invisible Unicode characters of the Space type, but with a different encoding.


Author(s):  
E. A. Blinova ◽  
A. A. Sushchenia

The description of the method and algorithm for embedding a hidden message or a digital watermark into files of Microsoft Word electronic documents in .DOCX format based on two steganographic methods is given. A Microsoft Word electronic document in .DOCX format is used as a steganographic container. One of the methods uses the features of displaying a document by a word processor and the word processor allows the displacement of hidden characters, such as spaces, tabs and paragraphs, in the text relatively to the line of text. The second method uses the feature of the .DOCX format electronic text document that a document is an archive containing Open XML format files and media files, so specialized steganographic methods for XML files can be used for embedding a hidden message. In this case the quotes replacement method is used. The embedding of a hidden message by one of the methods is used for checking the integrity of the other message through the second method. Depending on the capacity of the steganographic container a method can be chosen to embed the message anda method to control the integrity of the message. The algorithm of the inverse steganographic transformation for extracting a message and confirming the integrity of an electronic document is considered. The application is developed to perform the embedding of a hidden message in an electronic text document depending on the capacity of the container. The possibility of using of some steganographic methods is analyzed with the aim of forming a multi-key steganographic system intended for a digital watermarking of an electronic document Microsoft Word format .DOCX.


Author(s):  
Laith Mohammad Abualigah ◽  
Essam Said Hanandeh ◽  
Ahamad Tajudin Khader ◽  
Mohammed Abdallh Otair ◽  
Shishir Kumar Shandilya

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.


Author(s):  
Gwendolyn Rehrig ◽  
Reese A. Cullimore ◽  
John M. Henderson ◽  
Fernanda Ferreira

Abstract According to the Gricean Maxim of Quantity, speakers provide the amount of information listeners require to correctly interpret an utterance, and no more (Grice in Logic and conversation, 1975). However, speakers do tend to violate the Maxim of Quantity often, especially when the redundant information improves reference precision (Degen et al. in Psychol Rev 127(4):591–621, 2020). Redundant (non-contrastive) information may facilitate real-world search if it narrows the spatial scope under consideration, or improves target template specificity. The current study investigated whether non-contrastive modifiers that improve reference precision facilitate visual search in real-world scenes. In two visual search experiments, we compared search performance when perceptually relevant, but non-contrastive modifiers were included in the search instruction. Participants (NExp. 1 = 48, NExp. 2 = 48) searched for a unique target object following a search instruction that contained either no modifier, a location modifier (Experiment 1: on the top left, Experiment 2: on the shelf), or a color modifier (the black lamp). In Experiment 1 only, the target was located faster when the verbal instruction included either modifier, and there was an overall benefit of color modifiers in a combined analysis for scenes and conditions common to both experiments. The results suggest that violations of the Maxim of Quantity can facilitate search when the violations include task-relevant information that either augments the target template or constrains the search space, and when at least one modifier provides a highly reliable cue. Consistent with Degen et al. (2020), we conclude that listeners benefit from non-contrastive information that improves reference precision, and engage in rational reference comprehension. Significance statement This study investigated whether providing more information than someone needs to find an object in a photograph helps them to find that object more easily, even though it means they need to interpret a more complicated sentence. Before searching a scene, participants were either given information about where the object would be located in the scene, what color the object was, or were only told what object to search for. The results showed that providing additional information helped participants locate an object in an image more easily only when at least one piece of information communicated what part of the scene the object was in, which suggests that more information can be beneficial as long as that information is specific and helps the recipient achieve a goal. We conclude that people will pay attention to redundant information when it supports their task. In practice, our results suggest that instructions in other contexts (e.g., real-world navigation, using a smartphone app, prescription instructions, etc.) can benefit from the inclusion of what appears to be redundant information.


Author(s):  
M A Mikheev ◽  
P Y Yakimov

The article is devoted to solving the problem of document versions comparison in electronic document management systems. Systems-analogues were considered, the process of comparing text documents was studied. In order to recognize the text on the scanned image, the technology of optical character recognition and its implementation — Tesseract library were chosen. The Myers algorithm is applied to compare received texts. The software implementation of the text document comparison module was implemented using the solutions described above.


2020 ◽  
pp. 3397-3407
Author(s):  
Nur Syafiqah Mohd Nafis ◽  
Suryanti Awang

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.


2020 ◽  
Vol 25 (6) ◽  
pp. 755-769
Author(s):  
Noorullah R. Mohammed ◽  
Moulana Mohammed

Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.


Author(s):  
Brinardi Leonardo ◽  
Seng Hansun

Plagiarism is an act that is considered by the university as a fraud by taking someone ideas or writings without mentioning the references and claimed as his own. Plagiarism detection system is generally implement string matching algorithm in a text document to search for common words between documents. There are some algorithms used for string matching, two of them are Rabin-Karp and Jaro-Winkler Distance algorithms. Rabin-Karp algorithm is one of compatible algorithms to solve the problem of multiple string patterns, while, Jaro-Winkler Distance algorithm has advantages in terms of time. A plagiarism detection application is developed and tested on different types of documents, i.e. doc, docx, pdf and txt. From the experimental results, we obtained that both of these algorithms can be used to perform plagiarism detection of those documents, but in terms of their effectiveness, Rabin-Karp algorithm is much more effective and faster in the process of detecting the document with the size more than 1000 KB.


1987 ◽  
Vol 27 ◽  
pp. 77-87
Author(s):  
A. Hoyer

One of the marks of a good translation is the use of the precise terminology, i.e. that technical terms are translated correctly into the target language or - if there is no direct translation - are paraphrased. With the increasing specialisation of tech-nical fields and the rapid growth in the number of new terms, the search for the correct words tends to be very time-consum-ing. As a result the need is often felt to record equivalent terms for future translation work. For this a terminology data bank can be used, either decentrally on stand-alone equipment or centrally on a mainframe computer. Even if the translator continues for the time being to keep his own collection of terminology on file cards, he can still make use of a generally accessible terminology data bank such as EURODICAUTOM in his search for terms. Unlike a "normal" dictionary, the terminology bank provides additional information such as definitions, relationships between terms and especially the sources of this information. This makes it easier to assess the reliability of the translation given. This article considers the possibilities offered to the translator by a terminology bank and also further developments such as the connection of a data bank to a word processor or a machine translation system. We are now observing the development of a new branch of technology: CAT - Computer Aided Translation.


Sign in / Sign up

Export Citation Format

Share Document