A network-based CNN model to identify the hidden information in text data

Author(s):  
Yanyan Liu ◽  
Keping Li ◽  
Dongyang Yan ◽  
Shuang Gu
Keyword(s):  
2021 ◽  
Vol 83 (1) ◽  
pp. 72-79
Author(s):  
O.A. Kan ◽  
◽  
N.A. Mazhenov ◽  
K.B. Kopbalina ◽  
G.B. Turebaeva ◽  
...  

The main problem: The article deals with the issues of hiding text information in a graphic file. A formula for hiding text information in image pixels is proposed. A steganography scheme for embedding secret text in random image pixels has been developed. Random bytes are pre-embedded in each row of pixels in the source image. As a result of the operations performed, a key image is obtained. The text codes are embedded in random bytes of pixels of a given RGB channel. To form a secret message, the characters of the ASCII code table are used. Demo encryption and decryption programs have been developed in the Python 3.5.2 programming language. A graphic file is used as the decryption key. Purpose: To develop an algorithm for embedding text information in random pixels of an image. Methods: Among the methods of hiding information in graphic images, the LSB method of hiding information is widely used, in which the lower bits in the image bytes responsible for color encoding are replaced by the bits of the secret message. Analysis of methods of hiding information in graphic files and modeling of algorithms showed an increase in the level of protection of hidden information from detection. Results and their significance: Using the proposed steganography scheme and the algorithm for embedding bytes of a secret message in a graphic file, protection against detection of hidden information is significantly increased. The advantage of this steganography scheme is that for decryption, a key image is used, in which random bytes are pre-embedded. In addition, the entire pixel bits of the container image are used to display the color shades. It can also be noted that the developed steganography scheme allows not only to transmit secret information, but also to add digital fingerprints or hidden tags to the image.


Author(s):  
Sobhan Sarkar ◽  
Sammangi Vinay ◽  
Chawki Djeddi ◽  
J. Maiti

AbstractClassifying or predicting occupational incidents using both structured and unstructured (text) data are an unexplored area of research. Unstructured texts, i.e., incident narratives are often unutilized or underutilized. Besides the explicit information, there exist a large amount of hidden information present in a dataset, which cannot be explored by the traditional machine learning (ML) algorithms. There is a scarcity of studies that reveal the use of deep neural networks (DNNs) in the domain of incident prediction, and its parameter optimization for achieving better prediction power. To address these issues, initially, key terms are extracted from the unstructured texts using LDA-based topic modeling. Then, these key terms are added with the predictor categories to form the feature vector, which is further processed for noise reduction and fed to the adaptive moment estimation (ADAM)-based DNN (i.e., ADNN) for classification, as ADAM is superior to GD, SGD, and RMSProp. To evaluate the effectiveness of our proposed method, a comparative study has been conducted using some state-of-the-arts on five benchmark datasets. Moreover, a case study of an integrated steel plant in India has been demonstrated for the validation of the proposed model. Experimental results reveal that ADNN produces superior performance than others in terms of accuracy. Therefore, the present study offers a robust methodological guide that enables us to handle the issues of unstructured data and hidden information for developing a predictive model.


1976 ◽  
Vol 15 (01) ◽  
pp. 21-28 ◽  
Author(s):  
Carmen A. Scudiero ◽  
Ruth L. Wong

A free text data collection system has been developed at the University of Illinois utilizing single word, syntax free dictionary lookup to process data for retrieval. The source document for the system is the Surgical Pathology Request and Report form. To date 12,653 documents have been entered into the system.The free text data was used to create an IRS (Information Retrieval System) database. A program to interrogate this database has been developed to numerically coded operative procedures. A total of 16,519 procedures records were generated. One and nine tenths percent of the procedures could not be fitted into any procedures category; 6.1% could not be specifically coded, while 92% were coded into specific categories. A system of PL/1 programs has been developed to facilitate manual editing of these records, which can be performed in a reasonable length of time (1 week). This manual check reveals that these 92% were coded with precision = 0.931 and recall = 0.924. Correction of the readily correctable errors could improve these figures to precision = 0.977 and recall = 0.987. Syntax errors were relatively unimportant in the overall coding process, but did introduce significant error in some categories, such as when right-left-bilateral distinction was attempted.The coded file that has been constructed will be used as an input file to a gynecological disease/PAP smear correlation system. The outputs of this system will include retrospective information on the natural history of selected diseases and a patient log providing information to the clinician on patient follow-up.Thus a free text data collection system can be utilized to produce numerically coded files of reasonable accuracy. Further, these files can be used as a source of useful information both for the clinician and for the medical researcher.


Author(s):  
I. G. Zakharova ◽  
Yu. V. Boganyuk ◽  
M. S. Vorobyova ◽  
E. A. Pavlova

The article goal is to demonstrate the possibilities of the approach to diagnosing the level of IT graduates’ professional competence, based on the analysis of the student’s digital footprint and the content of the corresponding educational program. We describe methods for extracting student professional level indicators from digital footprint text data — courses’ descriptions and graduation qualification works. We show methods of comparing these indicators with the formalized requirements of employers, reflected in the texts of vacancies in the field of information technology. The proposed approach was applied at the Institute of Mathematics and Computer Science of the University of Tyumen. We performed diagnostics using a data set that included texts of courses’ descriptions for IT areas of undergraduate studies, 542 graduation qualification works in these areas, 879 descriptions of job requirements and information on graduate employment. The presented approach allows us to evaluate the relevance of the educational program as a whole and the level of professional competence of each student based on objective data. The results were used to update the content of some major courses and to include new elective courses in the curriculum.


2020 ◽  
Vol 59 (1-4) ◽  
pp. 611-621
Author(s):  
Sára Horváthy

SummaryEgeria, a 4th century pious woman from the south of present-day Spain, retold, after visiting Palestine with the Bible in hand, her observations to her sisters. If the linguistic aspects of her letters are quite well-known, much less is known about its stylistic value, inappropriately called “simple”.What seems to be boringly the same again and again, is in fact a constantly renewed and perfectly mastered “variation on a theme”, just as in a well-composed piece of music. Her apparent objectivity is indeed a wish to focus on what she considers the most important, namely to tell her community, as closely to reality as possible, what she observed during her pilgrimage. However, Egeria’s latin is also a testimony of the christian lexicon in construction and of the social changes that were in progress by that time.Linguistics and stylistics work together here, the choice of a word or a grammatical formula reveals hidden information about the proper style of an author who, despite her supposed objectivity, had real personal purposes.


Author(s):  
Aleksey Klokov ◽  
Evgenii Slobodyuk ◽  
Michael Charnine

The object of the research when writing the work was the body of text data collected together with the scientific advisor and the algorithms for processing the natural language of analysis. The stream of hypotheses has been tested against computer science scientific publications through a series of simulation experiments described in this dissertation. The subject of the research is algorithms and the results of the algorithms, aimed at predicting promising topics and terms that appear in the course of time in the scientific environment. The result of this work is a set of machine learning models, with the help of which experiments were carried out to identify promising terms and semantic relationships in the text corpus. The resulting models can be used for semantic processing and analysis of other subject areas.


Sign in / Sign up

Export Citation Format

Share Document