text extraction Latest Research Papers

CTI View: APT Threat Intelligence Analysis System

Security and Communication Networks ◽

10.1155/2022/9875199 ◽

2022 ◽

Vol 2022 ◽

pp. 1-15

Author(s):

Yinghai Zhou ◽

Yi Tang ◽

Ming Yi ◽

Chuanyu Xi ◽

Hai Lu

Keyword(s):

Language Processing ◽

Automation System ◽

Intelligence Analysis ◽

Entity Extraction ◽

Text Extraction ◽

Advanced Persistent Threat ◽

Active Defense ◽

Main Work ◽

Threat Intelligence ◽

Analysis System

With the development of advanced persistent threat (APT) and the increasingly severe situation of network security, the strategic defense idea with the concept of “active defense, traceability, and countermeasures” arises at the historic moment, thus cyberspace threat intelligence (CTI) has become increasingly valuable in enhancing the ability to resist cyber threats. Based on the actual demand of defending against the APT threat, we apply natural language processing to process the cyberspace threat intelligence (CTI) and design a new automation system CTI View, which is oriented to text extraction and analysis for the massive unstructured cyberspace threat intelligence (CTI) released by various security vendors. The main work of CTI View is as follows: (1) to deal with heterogeneous CTI, a text extraction framework for threat intelligence is designed based on automated test framework, text recognition technology, and text denoising technology. It effectively solves the problem of poor adaptability when crawlers are used to crawl heterogeneous CTI; (2) using regular expressions combined with blacklist and whitelist mechanism to extract the IOC and TTP information described in CTI effectively; (3) according to the actual requirements, a model based on bidirectional encoder representations from transformers (BERT) is designed to complete the entity extraction algorithm for heterogeneous threat intelligence. In this paper, the GRU layer is added to the existing BERT-BiLSTM-CRF model, and we evaluate the proposed model on the marked dataset and get better performance than the current mainstream entity extraction mode.

A Detailed Review on Text Extraction Using Optical Character Recognition

ICT Analysis and Applications - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-5655-2_69 ◽

2022 ◽

pp. 719-728

Author(s):

Chhanam Thorat ◽

Aishwarya Bhat ◽

Padmaja Sawant ◽

Isha Bartakke ◽

Swati Shirsath

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Detailed Review ◽

Text Extraction ◽

Optical Character

An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289570 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Initial Step ◽

Post Processing ◽

Text To Speech ◽

Comprehensive Review ◽

Text Extraction ◽

Learning Techniques ◽

Spell Check ◽

Result Analysis ◽

Intensive Work

As we all know, listening makes learning easier and interesting than reading. An audiobook is a software that converts text to speech. Though this sounds good, the audiobooks available in the market are not free and feasible for everyone. Added to this, we find that these audiobooks are only meant for fictional stories, novels or comics. A comprehensive review of the available literature shows that very little intensive work was done for image to speech conversion. In this paper, we employ various strategies for the entire process. As an initial step, deep learning techniques are constructed to denoise the images that are fed to the system. This is followed by text extraction with the help of OCR engines. Additional improvements are made to improve the quality of text extraction and post processing spell check mechanism are incorporated for this purpose. Our result analysis demonstrates that with denoising and spell checking, our model has achieved an accuracy of 98.11% when compared to 84.02% without any denoising or spell check mechanism.

Chinese Buddhist Literature Database and Image Text Extraction Algorithm

10.1007/978-981-16-5854-9_15 ◽

2021 ◽

pp. 122-127

Author(s):

Xuetao Liu

Keyword(s):

Buddhist Literature ◽

Text Extraction ◽

Extraction Algorithm ◽

Literature Database

Urdu text in natural scene images: a new dataset and preliminary text detection

PeerJ Computer Science ◽

10.7717/peerj-cs.717 ◽

2021 ◽

Vol 7 ◽

pp. e717

Author(s):

Hazrat Ali ◽

Khalid Iqbal ◽

Ghulam Mujtaba ◽

Ahmad Fayyaz ◽

Mohammad Farhad Bulbul ◽

...

Keyword(s):

Text Detection ◽

Support Vector ◽

Research Use ◽

Natural Scene ◽

Text Extraction ◽

Second Stage ◽

Maximally Stable Extremal Region ◽

Interesting Task ◽

Candidate Regions ◽

Natural Scene Images

Text detection in natural scene images for content analysis is an interesting task. The research community has seen some great developments for English/Mandarin text detection. However, Urdu text extraction in natural scene images is a task not well addressed. In this work, firstly, a new dataset is introduced for Urdu text in natural scene images. The dataset comprises of 500 standalone images acquired from real scenes. Secondly, the channel enhanced Maximally Stable Extremal Region (MSER) method is applied to extract Urdu text regions as candidates in an image. Two-stage filtering mechanism is applied to eliminate non-candidate regions. In the first stage, text and noise are classified based on their geometric properties. In the second stage, a support vector machine classifier is trained to discard non-text candidate regions. After this, text candidate regions are linked using centroid-based vertical and horizontal distances. Text lines are further analyzed by a different classifier based on HOG features to remove non-text regions. Extensive experimentation is performed on the locally developed dataset to evaluate the performance. The experimental results show good performance on test set images. The dataset will be made available for research use. To the best of our knowledge, the work is the first of its kind for the Urdu language and would provide a good dataset for free research use and serve as a baseline performance on the task of Urdu text extraction.

BGR to HSV based Text Extraction from Manuscripts Using Slidebars

10.1109/asiancon51346.2021.9544674 ◽

2021 ◽

Author(s):

Mayank Singh ◽

S. Indu

Keyword(s):

Text Extraction

ELEMENT: Text Extraction for the Dark Web

Advanced Computing and Intelligent Technologies - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-2164-2_43 ◽

2021 ◽

pp. 537-551

Author(s):

Ashwini Dalvi ◽

Irfan Siddavatam ◽

Apoorva Jain ◽

Smit Moradiya ◽

Faruk Kazi ◽

...

Keyword(s):

Text Extraction ◽

Dark Web

Improved Real time Signboard Detection and Translation Using FRCNN

International Journal of Computing Communications and Networking ◽

10.30534/ijccn/2021/011022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1-5

Keyword(s):

Object Detection ◽

Real Time ◽

Mobile Application ◽

Major Portion ◽

Text Extraction

In India, majority of people are speaking the language Hindi, but a major portion of signboards are in English. On a business or pleasure trip, the travelers get confused by the various sign boards written in English. As smartphones becomes most popular inrecent years,they canrely on smartphone for the same. This paper explains the work intended to build a mobile application that can recognize the English content and sign present on the signboard image, detect and translate the content and symbols from English to Hindi and display the translated Hindi text back to the screen of the phone. The system uses pre-trained faster regional convolutional neural networkusing pre-trained CNN for object detection, tesseract OCR for text extraction and English-to-Hindi dictionary for translation.

Answer Script Evaluation using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35070 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 849-852

Author(s):

Dr. K. Suresh

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Computational Methods ◽

Language Processing ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Text Extraction ◽

Processing Techniques

The current way of checking answer scripts is hectic for the college. They need to manually check the answers and allocate the marks to the students. Our proposed system uses Machine Learning and Natural Language Processing techniques to beat this. Machine learning algorithms use computational methods to find out directly from data without hopping on predetermined rules. NLP algorithms identify specific entities within the text, explore for key elements during a document, run a contextual search for synonyms and detect misspelled words or similar entries, and more. Our algorithm performs similarity checking and also the number of words associated with the question exactly matched between two documents. It also checks whether the grammar is correctly used or not within the student's answer. Our proposed system performs text extraction and evaluation of marks by applying Machine Learning and Natural Language Processing techniques.

Extraction of Text From Image

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1391 ◽

2021 ◽

pp. 314-317

Author(s):

Akanksha Mate ◽

Megha Gurav ◽

Kajal Babar ◽

Gauri Raskar ◽

Prof. Prakash Kshirsagar

Keyword(s):

Mathematical Morphology ◽

Digital Image ◽

Image Binarization ◽

Connected Component ◽

Text Extraction ◽

Component Method ◽

Matlab Code ◽

Message Structure ◽

Complex Foundation ◽

Edge Based

Picture Text is the content data implanted or written in picture of various structure. Picture text can be found in caught pictures, filtered records, magazines, papers, banners and so on These picture messages are profoundly accessible these days and they are vital in addressing, depicting and moving data which help people groups in correspondence, tackling issues, accessibility, formation of new sorts of occupations, cost viability, efficiency, globalization and social hole and so forth The data from these picture archives would give higher proficiency and straightforward entry on the off chance that it is changed over to message structure. The cycle by which Image Text changed over into plain content is Text Extraction. Text Extraction is helpful in data recovering, looking, altering, recording, filing or detailing of picture text. In any case, variety of these writings because of contrasts in size, direction style, and arrangement, text is installed in complex hued archive pictures, corrupted reports picture, inferior quality picture, as well as low picture differentiation and complex foundation make issue text extraction incredibly troublesome what's more, testing one. Various strategies like Connected Component Method, Mathematical Morphology Method, Edged Based Method and Texture Based Method have been utilized beforehand, however those all have their own constraints when estimated by various boundaries like exactness, review and f- score. In this paper, text extraction from picture reports, utilizing blend of the two amazing techniques Connected Component and Edge Based Method, to improve execution and exactness of text extraction is talked about and execution is finished by incorporated MATLAB code with MATLAB/Simulink device and the proposed framework is tried by Digital Image Binarization Competition (DIBCO) 2017 dataset. At long last, the separated and perceived is changed over to discourse for legitimate use for outwardly hindered individuals.

text extraction
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

CTI View: APT Threat Intelligence Analysis System

A Detailed Review on Text Extraction Using Optical Character Recognition

An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook

Chinese Buddhist Literature Database and Image Text Extraction Algorithm

Urdu text in natural scene images: a new dataset and preliminary text detection

BGR to HSV based Text Extraction from Manuscripts Using Slidebars

ELEMENT: Text Extraction for the Dark Web

Improved Real time Signboard Detection and Translation Using FRCNN

Answer Script Evaluation using Machine Learning

Extraction of Text From Image

Export Citation Format

text extractionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

CTI View: APT Threat Intelligence Analysis System

A Detailed Review on Text Extraction Using Optical Character Recognition

An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook

Chinese Buddhist Literature Database and Image Text Extraction Algorithm

Urdu text in natural scene images: a new dataset and preliminary text detection

BGR to HSV based Text Extraction from Manuscripts Using Slidebars

ELEMENT: Text Extraction for the Dark Web

Improved Real time Signboard Detection and Translation Using FRCNN

Answer Script Evaluation using Machine Learning

Extraction of Text From Image

text extraction
Recently Published Documents