scholarly journals Next frame prediction using ConvLSTM

2022 ◽  
Vol 2161 (1) ◽  
pp. 012024
Author(s):  
Padmashree Desai ◽  
C Sujatha ◽  
Saumyajit Chakraborty ◽  
Saurav Ansuman ◽  
Sanika Bhandari ◽  
...  

Abstract Intelligent decision-making systems require the potential for forecasting, foreseeing, and reasoning about future events. The issue of video frame prediction has aroused a lot of attention due to its usefulness in many computer vision applications such as autonomous vehicles and robots. Recent deep learning advances have significantly improved video prediction performance. Nevertheless, as top-performing systems attempt to foresee even more future frames, their predictions become increasingly foggy. We developed a method for predicting a future frame based on a series of prior frames that services the Convolutional Long-Short Term Memory (ConvLSTM) model. The input video is segmented into frames, fed to the ConvLSTM model to extract the features and forecast a future frame which can be beneficial in a variety of applications. We have used two metrics to measure the quality of the predicted frame: structural similarity index (SSIM) and perceptual distance, which help in understanding the difference between the actual frame and the predicted frame. The UCF101 data set is used for testing and training in the project. It is a data collection of realistic action videos taken from YouTube with 101 action categories for action detection. The ConvLSTM model is trained and tested for 24 categories from this dataset and a future frame is predicted which yields satisfactory results. We obtained SSIM as 0.95 and perceptual similarity as 24.28 for our system. The suggested work’s results are also compared to those of state-of-the-art approaches, which are shown to be superior.

2019 ◽  
Vol 28 (1) ◽  
pp. 25-34
Author(s):  
Grzegorz Wieczorek ◽  
Izabella Antoniuk ◽  
Michał Kruk ◽  
Jarosław Kurek ◽  
Arkadiusz Orłowski ◽  
...  

In this paper we present a new segmentation method meant for boost area that remains after removing the tumour using BCT (breast conserving therapy). The selected area is a region on which radiation treatment will later be made. Consequently, an inaccurate designation of this region can result in a treatment missing its target or focusing on healthy breast tissue that otherwise could be spared. Needless to say that exact indication of boost area is an extremely important aspect of the entire medical procedure, where a better definition can lead to optimizing of the coverage of the target volume and, in result, can save normal breast tissue. Precise definition of this area has a potential to both improve the local control of the disease and to ensure better cosmetic outcome for the patient. In our approach we use U-net along with Keras and TensorFlow systems to tailor a precise solution for the indication of the boost area. During the training process we utilize a set of CT images, where each of them came with a contour assigned by an expert. We wanted to achieve a segmentation result as close to given contour as possible. With a rather small initial data set we used data augmentation techniques to increase the number of training examples, while the final outcomes were evaluated according to their similarity to the ones produced by experts, by calculating the mean square error and the structural similarity index (SSIM).


2021 ◽  
Vol 11 (8) ◽  
pp. 3460
Author(s):  
Hang Nguyen Thi Phuong ◽  
Choonsung Shin ◽  
Hieyong Jeong

Taste function and condition may be a tool that exhibits a rapid deficit to impress the subject with an objectively measured effect of smoking on his/her own body, because smokers exhibit significantly lower taste sensitivity than non-smokers. This study proposed a visual method to measure capillaries of taste buds with capillaroscopy and classified the difference between smokers and non-smokers through convolutional neural networks (CNNs). The dataset was collected from 26 human subjects through the capillaroscopy with the low and high magnification directly; of which 13 were smokers, and the other 13 were non-smokers. The acquired dataset consisted of 2600 images. The results of gradient-weighted class activation mapping (grad-cam) enabled us to understand the difference in capillaries of taste buds between smokers and non-smokers. Through the results, it was found that CNNs gave us a good performance with 79% accuracy. It was discussed that there was a shortage of extracted features when the conventional methods such as structural similarity index (SSIM) and scale-invariant feature transform (SIFT) were used to classify.


A novel optimal multi-level thresholding is proposed using gray scale images for Fractional-order Darwinian Particle Swarm Optimization (FDPSO) and Tsallis function. The maximization of Tsallis entropy is chosen as the Objective Function (OF) which monitors FDPSO’s exploration until the search converges to an optimal solution. The proposed method is tested on six standard test images and compared with heuristic methods, such as Bat Algorithm (BA) and Firefly Algorithm (FA). The robustness of the proposed thresholding procedure was tested and validated on the considered image data set with Poisson Noise (PN) and Gaussian Noise (GN). The results obtained with this study verify that, FDPSO offers better image quality measures when compared with BA and FA algorithms. Wilcoxon’s test was performed by Mean Structural Similarity Index (MSSIM), and the results prove that image segmentation is clear even in noisy dataset based on the statistical significance of the FDPSO with respect to BA and FA.


2013 ◽  
Vol 380-384 ◽  
pp. 3982-3985
Author(s):  
Gang Li ◽  
Ainiwaer Aizimaiti ◽  
Yan Liu

Video quality evaluation methods have been widely studied because of an increasing need in variety of video processing applications, such as compression, analysis, communication, enhancement and restoration. The quaternion models are also widely used to measure image or video quality. In this paper, we proposed a new quaternion model which mainly describes the contour feature, surface feature and temporal information of the video. We use structure similarity comparison to normalize four quaternion parts respectively, because each part of the quaternion use different metric. Structure similarity comparison is also used to measure the difference between reference videos and distortion videos. The results of experiments show that the new method has good correlation with perceived video quality when tested on the video quality experts group (VQEG) Phase I FR-TV test data set.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 371
Author(s):  
Shiwu Li ◽  
Mengyuan Huang ◽  
Mengzhu Guo ◽  
Miao Yu

Speed judgment is a vital component of autonomous driving perception systems. Automobile drivers were able to evaluate their speed as a result of their driving experience. However, driverless automobiles cannot autonomously evaluate their speed suitability through external environmental factors such as the surrounding conditions and traffic flows. This study introduced the parameter of overtaking frequency (OTF) based on the state of the traffic flow on both sides of the lane to reflect the difference between the speed of a driverless automobile and its surrounding traffic to solve the above problem. In addition, a speed evaluation algorithm was proposed based on the long short-term memory (LSTM) model. To train the LSTM model, we extracted OTF as the first observation variable, and the characteristic parameters of the vehicle’s longitudinal motion and the comparison parameters with the leading vehicle were used as the second observation variables. The algorithm judged the velocity using a hierarchical method. We conducted a road test by using real vehicles and the algorithms verified the data, which showed the accuracy rate of the model is 93%. As a result, OTF is introduced as one of the observed variables that can support the accuracy of the algorithm used to judge speed.


2020 ◽  
Vol 2 (2) ◽  
pp. 78-98 ◽  
Author(s):  
Sandra Aigner ◽  
Marco Körner

This paper analyzes in detail how different loss functions influence the generalization abilities of a deep learning-based next frame prediction model for traffic scenes. Our prediction model is a convolutional long-short term memory (ConvLSTM) network that generates the pixel values of the next frame after having observed the raw pixel values of a sequence of four past frames. We trained the model with 21 combinations of seven loss terms using the Cityscapes Sequences dataset and an identical hyper-parameter setting. The loss terms range from pixel-error based terms to adversarial terms. To assess the generalization abilities of the resulting models, we generated predictions up to 20 time-steps into the future for four datasets of increasing visual distance to the training dataset—KITTI Tracking, BDD100K, UA-DETRAC, and KIT AIS Vehicles. All predicted frames were evaluated quantitatively with both traditional pixel-based evaluation metrics, that is, mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM), and recent, more advanced, feature-based evaluation metrics, that is, Fréchet inception distance (FID), and learned perceptual image patch similarity (LPIPS). The results show that solely by choosing a different combination of losses, we can boost the prediction performance on new datasets by up to 55%, and by up to 50% for long-term predictions.


Frame Interpolation is one of the main stages in video processing. Video coding standards skip some in-between frames for efficient compression and coding. At decoder the common approach to reconstruct the skipped frame using Motion Compensated Frame Interpolation (MCFI)methods. In MCFI, computational complexity is very high as calculation of Block Matching Algorithm, Motion Vectors (MV), Motion Estimation(ME) and Prediction logic of objects in motion between the frames, increases the complexity in MCFI method. A more feasible approach with minimum computational complexity using block level correlation is proposed in this paper. Errored MV at the decoder results in holes, occlusions, blurring and edge deformations in the interpolated frame. This proposal minimizes afore mentioned effects along with minimum complexity. The results are simulated in terms of peak-signal-to-noise-ratio (PSNR) and structural similarity index (SSIM).


Author(s):  
Azka Maqsood ◽  
Imran Touqir ◽  
Adil Masood Siddiqui ◽  
Maham Haider

Wavelet based image processing techniques do not strictly follow the conventional probabilistic models that are unrealistic for real world images. However, the key features of joint probability distributions of wavelet coefficients are well captured by HMT (Hidden Markov Tree) model. This paper presents the HMT model based technique consisting of Wavelet based Multiresolution analysis to enhance the results in image processing applications such as compression, classification and denoising. The proposed technique is applied to colored video sequences by implementing the algorithm on each video frame independently. A 2D (Two Dimensional) DWT (Discrete Wavelet Transform) is used which is implemented on popular HMT model used in the framework of Expectation-Maximization algorithm. The proposed technique can properly exploit the temporal dependencies of wavelet coefficients and their non-Gaussian performance as opposed to existing wavelet based denoising techniques which consider the wavelet coefficients to be jointly Gaussian or independent. Denoised frames are obtained by processing the wavelet coefficients inversely. Comparison of proposed method with the existing techniques based on CPSNR (Coloured Peak Signal to Noise Ratio), PCC (Pearson’s Correlation Coefficient) and MSSIM (Mean Structural Similarity Index) has been carried out in detail.The proposed denoising method reveals improved results in terms of quantitative and qualitative analysis for both additive and multiplicative noise and retains nearly all the structural contents of a video frame.


Author(s):  
Jules S. Jaffe ◽  
Robert M. Glaeser

Although difference Fourier techniques are standard in X-ray crystallography it has only been very recently that electron crystallographers have been able to take advantage of this method. We have combined a high resolution data set for frozen glucose embedded Purple Membrane (PM) with a data set collected from PM prepared in the frozen hydrated state in order to visualize any differences in structure due to the different methods of preparation. The increased contrast between protein-ice versus protein-glucose may prove to be an advantage of the frozen hydrated technique for visualizing those parts of bacteriorhodopsin that are embedded in glucose. In addition, surface groups of the protein may be disordered in glucose and ordered in the frozen state. The sensitivity of the difference Fourier technique to small changes in structure provides an ideal method for testing this hypothesis.


Sign in / Sign up

Export Citation Format

Share Document