A New Hybrid Method for Caption and Scene Text Classification in Action Video Images

Author(s):  
Lokesh Nandanwar ◽  
Palaiahnakote Shivakumara ◽  
Umapada Pal ◽  
Tong Lu ◽  
Michael Blumenstein

Achieving a better recognition rate for text in action video images is challenging due to multiple types of text with unpredictable actions in the background. In this paper, we propose a new method for the classification of caption (which is edited text) and scene text (text that is a part of the video) in video images. This work considers five action classes, namely, Yoga, Concert, Teleshopping, Craft, and Recipes, where it is expected that both types of text play a vital role in understanding the video content. The proposed method introduces a new fusion criterion based on Discrete Cosine Transform (DCT) and Fourier coefficients to obtain the reconstructed images for caption and scene text. The fusion criterion involves computing the variances for coefficients of corresponding pixels of DCT and Fourier images, and the same variances are considered as the respective weights. This step results in Reconstructed image-1. Inspired by the special property of Chebyshev-Harmonic-Fourier-Moments (CHFM) that has the ability to reconstruct a redundancy-free image, we explore CHFM for obtaining the Reconstructed image-2. The reconstructed images along with the input image are passed to a Deep Convolutional Neural Network (DCNN) for classification of caption/scene text. Experimental results on five action classes and a comparative study with the existing methods demonstrate that the proposed method is effective. In addition, the recognition results of the before and after the classification obtained from different methods show that the recognition performance improves significantly after classification, compared to before classification.

2021 ◽  
Vol 3 (11) ◽  
Author(s):  
Abhra Chaudhuri ◽  
Palaiahnakote Shivakumara ◽  
Pinaki Nath Chowdhury ◽  
Umapada Pal ◽  
Tong Lu ◽  
...  

Abstract For the video images with complex actions, achieving accurate text detection and recognition results is very challenging. This paper presents a hybrid model for classification of action-oriented video images which reduces the complexity of the problem to improve text detection and recognition performance. Here, we consider the following five categories of genres, namely concert, cooking, craft, teleshopping and yoga. For classifying action-oriented video images, we explore ResNet50 for learning the general pixel-distribution level information and the VGG16 network is implemented for learning the features of Maximally Stable Extremal Regions and again another VGG16 is used for learning facial components obtained by a multitask cascaded convolutional network. The approach integrates the outputs of the three above-mentioned models using a fully connected neural network for classification of five action-oriented image classes. We demonstrated the efficacy of the proposed method by testing on our dataset and two other standard datasets, namely, Scene Text Dataset dataset which contains 10 classes of scene images with text information, and the Stanford 40 Actions dataset which contains 40 action classes without text information. Our method outperforms the related existing work and enhances the class-specific performance of text detection and recognition, significantly. Article highlights The method uses pixel, stable-region and face-component information in a noble way for solving complex classification problems. The proposed work fuses different deep learning models for successful classification of action-oriented images. Experiments on our own dataset as well as standard datasets show that the proposed model outperforms related state-of-the-art (SOTA) methods.


Author(s):  
M. Jeyanthi ◽  
C. Velayutham

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.


2020 ◽  
Vol 65 (6) ◽  
pp. 759-773
Author(s):  
Segu Praveena ◽  
Sohan Pal Singh

AbstractLeukaemia detection and diagnosis in advance is the trending topic in the medical applications for reducing the death toll of patients with acute lymphoblastic leukaemia (ALL). For the detection of ALL, it is essential to analyse the white blood cells (WBCs) for which the blood smear images are employed. This paper proposes a new technique for the segmentation and classification of the acute lymphoblastic leukaemia. The proposed method of automatic leukaemia detection is based on the Deep Convolutional Neural Network (Deep CNN) that is trained using an optimization algorithm, named Grey wolf-based Jaya Optimization Algorithm (GreyJOA), which is developed using the Grey Wolf Optimizer (GWO) and Jaya Optimization Algorithm (JOA) that improves the global convergence. Initially, the input image is applied to pre-processing and the segmentation is performed using the Sparse Fuzzy C-Means (Sparse FCM) clustering algorithm. Then, the features, such as Local Directional Patterns (LDP) and colour histogram-based features, are extracted from the segments of the pre-processed input image. Finally, the extracted features are applied to the Deep CNN for the classification. The experimentation evaluation of the method using the images of the ALL IDB2 database reveals that the proposed method acquired a maximal accuracy, sensitivity, and specificity of 0.9350, 0.9528, and 0.9389, respectively.


2021 ◽  
Vol 8 ◽  
pp. 237437352110180
Author(s):  
Robin E. McAtee ◽  
Laura Spradley ◽  
Leah Tobey ◽  
Whitney Thomasson ◽  
Gohar Azhar ◽  
...  

Millions of Americans live with dementia. Caregivers of this population provide countless hours of multifaceted, complex care that frequently cause unrelenting stress which can result in immense burden. However, it is not fully understood what efforts can be made to reduce the stress among caregivers of persons with dementia (PWD). Therefore, the aim of this pretest–posttest designed study was to evaluate changes in caregiver burden after providing an educational intervention to those caring for PWD in Arkansas. Forty-one participants completed the Zarit Caregiver Burden Scale before and after attending a 4-hour dementia-focused caregiving workshop. The analysis of the means, standard deviations, and paired t tests showed that there was an increase in the confidence and competence in caring for PWD 30 to 45 days after attending the workshop. Health care providers need to understand both the vital role caregivers provide in managing a PWD and the importance of the caregiver receiving education about their role as a caregiver. Utilizing caregiver educational programs is a first step.


2021 ◽  
Vol 503 (2) ◽  
pp. 1828-1846
Author(s):  
Burger Becker ◽  
Mattia Vaccari ◽  
Matthew Prescott ◽  
Trienko Grobler

ABSTRACT The morphological classification of radio sources is important to gain a full understanding of galaxy evolution processes and their relation with local environmental properties. Furthermore, the complex nature of the problem, its appeal for citizen scientists, and the large data rates generated by existing and upcoming radio telescopes combine to make the morphological classification of radio sources an ideal test case for the application of machine learning techniques. One approach that has shown great promise recently is convolutional neural networks (CNNs). Literature, however, lacks two major things when it comes to CNNs and radio galaxy morphological classification. First, a proper analysis of whether overfitting occurs when training CNNs to perform radio galaxy morphological classification using a small curated training set is needed. Secondly, a good comparative study regarding the practical applicability of the CNN architectures in literature is required. Both of these shortcomings are addressed in this paper. Multiple performance metrics are used for the latter comparative study, such as inference time, model complexity, computational complexity, and mean per class accuracy. As part of this study, we also investigate the effect that receptive field, stride length, and coverage have on recognition performance. For the sake of completeness, we also investigate the recognition performance gains that we can obtain by employing classification ensembles. A ranking system based upon recognition and computational performance is proposed. MCRGNet, Radio Galaxy Zoo, and ConvXpress (novel classifier) are the architectures that best balance computational requirements with recognition performance.


2015 ◽  
Vol 15 (05) ◽  
pp. 1550085 ◽  
Author(s):  
MADHURI TASGAONKAR ◽  
MADHURI KHAMBETE

Diabetes affects retinal structure of a diabetic patient by generating various lesions. Early detection of these lesions can avoid the loss of vision. Automation of detection process can be made easily feasible to masses by the use of fundus imaging. Detection of exudates is significant in diabetic retinopathy (DR) as they are earlier signs and can cause blindness. Finding the exact location as well as correct number of exudates play vital role in the overall treatment of a patient. This paper presents an algorithm for automatic detection of exudates for DR. The algorithm combines the advantages of supervised and unsupervised techniques. It uses fuzzy-C means (FCM) segmentation on coarse level and mahalanobis metric for finer classification of segmented pixels. Mahalanobis criterion gives significance to most relevant features and thus proves a better classifier. The results are validated using DIARETDB0 and DIARETDB1 databases and the ground truth provided with it. This evaluation provided 95.77% detection accuracy.


2013 ◽  
Vol 694-697 ◽  
pp. 2336-2340
Author(s):  
Yun Feng Yang ◽  
Feng Xian Tang

In order to construct a certain standard structure MRI (Magnetic resonance imaging) image library by extracting and collating unstructured literature data information, an identification method of the image and text information fusion is proposed. The method makes use of PHOW (Pyramid Histogram Of Words) to represent image features, combines with the word frequency characteristics of the embedded icon note (text), and then uses posterior multiplication fusion method to complete the classification and identification of the online biological literature MRI image. The experimental results show that this method has better correct recognition rate and better recognition performance than feature identification method only based on PHOW or text. The study can offer use for reference to construct other structured professional database from online literature.


2021 ◽  
Author(s):  
Yeremi Pérez ◽  
Roberto Borboa-Gastelum ◽  
Luz Maria Alonso-Valerdi ◽  
David I. Ibarra-Zarate ◽  
Eduardo A. Flores-Villalba ◽  
...  

Abstract Fatigue decreases performance in several professional activities. Fatigue can lead to commit technical mistakes which consequences might be lethal, such as in health area, where a surgical error due to the absence of rest can provoke the patient death. Therefore, this study aims to detect vigil and fatigue (due to lack of sleep) states in medical students through the classification of electroencephalographic (EEG) patterns. The EEG signals of 18 physician students were analyzed within theta band (4 - 8 Hz) over front-central recording sites, and alpha band (8 - 13 Hz) rhythms over temporal and parieto-occipital recording sites during the execution of laparoscopic tasks before and after their medical duties. The EEG signal processing pipeline consisted in pre-processing based on individual component analysis, absolute band power estimates, and Support Vector Machine classification. The F-score to differ between vigil and fatigue states was 90.89%, where the first class was slightly more identifiable reaching a sensitivity of 90.18%. Based on this outcome, the detection of fatigue in medical students while their laparoscopic training seems achievable and feasible to diminish technical mistakes that could be lethal in health area. For this purpose, EEG recording are provided.


2021 ◽  
Vol 881 ◽  
pp. 71-76
Author(s):  
Jian Yang ◽  
Hong Bin Li ◽  
Song Tao Ren ◽  
Peng Gang Jin ◽  
Zan Gao

In order to determine the influence of spheroidization process of Ammonium dinitramide’s hazard grade, the hazardous division of Ammonium dinitramide before and after spheroidization is studied by using hazard classification procedure for combustible and explosive substances and articles standard (WJ20405) and hazard classification method and criterion for combusitible and explosive substances and articles standard (WJ20404). The research results show that spheroidization process can significantly improve the temperature stability of Ammonium dinitramide and significantly reduce friction sensitivity and impact sensitivity of Ammonium dinitramide. So spheroidization process can reduce the hazardous of Ammonium dinitramide and improve the safe character of Ammonium dinitramide.


Author(s):  
Arjon Turnip ◽  
Gilbert F. Y. Sihombing ◽  
Giraldo F. J. Sihombing ◽  
George Michael Tampubolon ◽  
Peri Turnip ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document