Next frame prediction using ConvLSTM

Abstract Intelligent decision-making systems require the potential for forecasting, foreseeing, and reasoning about future events. The issue of video frame prediction has aroused a lot of attention due to its usefulness in many computer vision applications such as autonomous vehicles and robots. Recent deep learning advances have significantly improved video prediction performance. Nevertheless, as top-performing systems attempt to foresee even more future frames, their predictions become increasingly foggy. We developed a method for predicting a future frame based on a series of prior frames that services the Convolutional Long-Short Term Memory (ConvLSTM) model. The input video is segmented into frames, fed to the ConvLSTM model to extract the features and forecast a future frame which can be beneficial in a variety of applications. We have used two metrics to measure the quality of the predicted frame: structural similarity index (SSIM) and perceptual distance, which help in understanding the difference between the actual frame and the predicted frame. The UCF101 data set is used for testing and training in the project. It is a data collection of realistic action videos taken from YouTube with 101 action categories for action detection. The ConvLSTM model is trained and tested for 24 categories from this dataset and a future frame is predicted which yields satisfactory results. We obtained SSIM as 0.95 and perceptual similarity as 24.28 for our system. The suggested work’s results are also compared to those of state-of-the-art approaches, which are shown to be superior.

Download Full-text

BCT Boost Segmentation with U-net in TensorFlow

Machine Graphics and Vision ◽

10.22630/mgv.2019.28.1.3 ◽

2019 ◽

Vol 28 (1) ◽

pp. 25-34

Author(s):

Grzegorz Wieczorek ◽

Izabella Antoniuk ◽

Michał Kruk ◽

Jarosław Kurek ◽

Arkadiusz Orłowski ◽

...

Keyword(s):

Breast Tissue ◽

Data Augmentation ◽

Radiation Treatment ◽

Similarity Index ◽

Structural Similarity ◽

Breast Conserving Therapy ◽

Target Volume ◽

Medical Procedure ◽

Data Set ◽

Small Initial Data

In this paper we present a new segmentation method meant for boost area that remains after removing the tumour using BCT (breast conserving therapy). The selected area is a region on which radiation treatment will later be made. Consequently, an inaccurate designation of this region can result in a treatment missing its target or focusing on healthy breast tissue that otherwise could be spared. Needless to say that exact indication of boost area is an extremely important aspect of the entire medical procedure, where a better definition can lead to optimizing of the coverage of the target volume and, in result, can save normal breast tissue. Precise definition of this area has a potential to both improve the local control of the disease and to ensure better cosmetic outcome for the patient. In our approach we use U-net along with Keras and TensorFlow systems to tailor a precise solution for the indication of the boost area. During the training process we utilize a set of CT images, where each of them came with a contour assigned by an expert. We wanted to achieve a segmentation result as close to given contour as possible. With a rather small initial data set we used data augmentation techniques to increase the number of training examples, while the final outcomes were evaluated according to their similarity to the ones produced by experts, by calculating the mean square error and the structural similarity index (SSIM).

Download Full-text

Finding the Differences in Capillaries of Taste Buds between Smokers and Non-Smokers Using the Convolutional Neural Networks

Applied Sciences ◽

10.3390/app11083460 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3460

Author(s):

Hang Nguyen Thi Phuong ◽

Choonsung Shin ◽

Hieyong Jeong

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Human Subjects ◽

Similarity Index ◽

Structural Similarity ◽

Taste Buds ◽

Scale Invariant ◽

Measured Effect ◽

The Difference ◽

Objectively Measured

Taste function and condition may be a tool that exhibits a rapid deficit to impress the subject with an objectively measured effect of smoking on his/her own body, because smokers exhibit significantly lower taste sensitivity than non-smokers. This study proposed a visual method to measure capillaries of taste buds with capillaroscopy and classified the difference between smokers and non-smokers through convolutional neural networks (CNNs). The dataset was collected from 26 human subjects through the capillaroscopy with the low and high magnification directly; of which 13 were smokers, and the other 13 were non-smokers. The acquired dataset consisted of 2600 images. The results of gradient-weighted class activation mapping (grad-cam) enabled us to understand the difference in capillaries of taste buds between smokers and non-smokers. Through the results, it was found that CNNs gave us a good performance with 79% accuracy. It was discussed that there was a shortage of extracted features when the conventional methods such as structural similarity index (SSIM) and scale-invariant feature transform (SIFT) were used to classify.

Download Full-text

Multi-Level Thresholding with Fractional-Order Darwinian PSO and Tsallis Function

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1526.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1719-1734

Keyword(s):

Fractional Order ◽

Statistical Significance ◽

Similarity Index ◽

Optimal Solution ◽

Image Data ◽

Bat Algorithm ◽

Structural Similarity ◽

Standard Test ◽

Data Set ◽

Multi Level

A novel optimal multi-level thresholding is proposed using gray scale images for Fractional-order Darwinian Particle Swarm Optimization (FDPSO) and Tsallis function. The maximization of Tsallis entropy is chosen as the Objective Function (OF) which monitors FDPSO’s exploration until the search converges to an optimal solution. The proposed method is tested on six standard test images and compared with heuristic methods, such as Bat Algorithm (BA) and Firefly Algorithm (FA). The robustness of the proposed thresholding procedure was tested and validated on the considered image data set with Poisson Noise (PN) and Gaussian Noise (GN). The results obtained with this study verify that, FDPSO offers better image quality measures when compared with BA and FA algorithms. Wilcoxon’s test was performed by Mean Structural Similarity Index (MSSIM), and the results prove that image segmentation is clear even in noisy dataset based on the statistical significance of the FDPSO with respect to BA and FA.

Download Full-text

Quaternion Model of Fast Video Quality Assessment Based on Structural Similarity Normalization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.3982 ◽

2013 ◽

Vol 380-384 ◽

pp. 3982-3985

Author(s):

Gang Li ◽

Ainiwaer Aizimaiti ◽

Yan Liu

Keyword(s):

Video Processing ◽

Quality Evaluation ◽

Video Quality ◽

Structural Similarity ◽

Similarity Comparison ◽

Data Set ◽

Contour Feature ◽

Structure Similarity ◽

The Difference ◽

Video Quality Evaluation

Video quality evaluation methods have been widely studied because of an increasing need in variety of video processing applications, such as compression, analysis, communication, enhancement and restoration. The quaternion models are also widely used to measure image or video quality. In this paper, we proposed a new quaternion model which mainly describes the contour feature, surface feature and temporal information of the video. We use structure similarity comparison to normalize four quaternion parts respectively, because each part of the quaternion use different metric. Structure similarity comparison is also used to measure the difference between reference videos and distortion videos. The results of experiments show that the new method has good correlation with perceived video quality when tested on the video quality experts group (VQEG) Phase I FR-TV test data set.

Download Full-text

Evaluation Model of Autonomous Vehicles’ Speed Suitability Based on Overtaking Frequency

Sensors ◽

10.3390/s21020371 ◽

2021 ◽

Vol 21 (2) ◽

pp. 371

Author(s):

Shiwu Li ◽

Mengyuan Huang ◽

Mengzhu Guo ◽

Miao Yu

Keyword(s):

Autonomous Vehicles ◽

Short Term Memory ◽

Evaluation Model ◽

Autonomous Driving ◽

Longitudinal Motion ◽

Driving Experience ◽

Evaluation Algorithm ◽

Vital Component ◽

Long Short Term Memory ◽

The Difference

Speed judgment is a vital component of autonomous driving perception systems. Automobile drivers were able to evaluate their speed as a result of their driving experience. However, driverless automobiles cannot autonomously evaluate their speed suitability through external environmental factors such as the surrounding conditions and traffic flows. This study introduced the parameter of overtaking frequency (OTF) based on the state of the traffic flow on both sides of the lane to reflect the difference between the speed of a driverless automobile and its surrounding traffic to solve the above problem. In addition, a speed evaluation algorithm was proposed based on the long short-term memory (LSTM) model. To train the LSTM model, we extracted OTF as the first observation variable, and the characteristic parameters of the vehicle’s longitudinal motion and the comparison parameters with the leading vehicle were used as the second observation variables. The algorithm judged the velocity using a hierarchical method. We conducted a road test by using real vehicles and the algorithms verified the data, which showed the accuracy rate of the model is 93%. As a result, OTF is introduced as one of the observed variables that can support the accuracy of the algorithm used to judge speed.

Download Full-text

The Importance of Loss Functions for Increasing the Generalization Abilities of a Deep Learning-Based Next Frame Prediction Model for Traffic Scenes

Machine Learning and Knowledge Extraction ◽

10.3390/make2020006 ◽

2020 ◽

Vol 2 (2) ◽

pp. 78-98 ◽

Cited By ~ 1

Author(s):

Sandra Aigner ◽

Marco Körner

Keyword(s):

Deep Learning ◽

Prediction Model ◽

Short Term Memory ◽

Mean Squared Error ◽

Signal To Noise Ratio ◽

Similarity Index ◽

Structural Similarity ◽

Loss Functions ◽

Evaluation Metrics ◽

Training Dataset

This paper analyzes in detail how different loss functions influence the generalization abilities of a deep learning-based next frame prediction model for traffic scenes. Our prediction model is a convolutional long-short term memory (ConvLSTM) network that generates the pixel values of the next frame after having observed the raw pixel values of a sequence of four past frames. We trained the model with 21 combinations of seven loss terms using the Cityscapes Sequences dataset and an identical hyper-parameter setting. The loss terms range from pixel-error based terms to adversarial terms. To assess the generalization abilities of the resulting models, we generated predictions up to 20 time-steps into the future for four datasets of increasing visual distance to the training dataset—KITTI Tracking, BDD100K, UA-DETRAC, and KIT AIS Vehicles. All predicted frames were evaluated quantitatively with both traditional pixel-based evaluation metrics, that is, mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM), and recent, more advanced, feature-based evaluation metrics, that is, Fréchet inception distance (FID), and learned perceptual image patch similarity (LPIPS). The results show that solely by choosing a different combination of losses, we can boost the prediction performance on new datasets by up to 55%, and by up to 50% for long-term predictions.

Download Full-text

Correlation Based Low Complex Video Frame Interpolation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1328.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 4094-4098

Keyword(s):

Computational Complexity ◽

Video Processing ◽

Signal To Noise Ratio ◽

Similarity Index ◽

Structural Similarity ◽

Block Matching ◽

Video Frame ◽

Frame Interpolation ◽

Block Matching Algorithm ◽

Prediction Logic

Frame Interpolation is one of the main stages in video processing. Video coding standards skip some in-between frames for efficient compression and coding. At decoder the common approach to reconstruct the skipped frame using Motion Compensated Frame Interpolation (MCFI)methods. In MCFI, computational complexity is very high as calculation of Block Matching Algorithm, Motion Vectors (MV), Motion Estimation(ME) and Prediction logic of objects in motion between the frames, increases the complexity in MCFI method. A more feasible approach with minimum computational complexity using block level correlation is proposed in this paper. Errored MV at the decoder results in holes, occlusions, blurring and edge deformations in the interpolated frame. This proposal minimizes afore mentioned effects along with minimum complexity. The results are simulated in terms of peak-signal-to-noise-ratio (PSNR) and structural similarity index (SSIM).

Download Full-text

Wavelet Based Video Denoising using Probabilistic Models

Mehran University Research Journal of Engineering and Technology ◽

10.22581/muet1982.1901.02 ◽

2019 ◽

Vol 38 (1) ◽

pp. 17-30

Author(s):

Azka Maqsood ◽

Imran Touqir ◽

Adil Masood Siddiqui ◽

Maham Haider

Keyword(s):

Image Processing ◽

Probabilistic Models ◽

Similarity Index ◽

Joint Probability ◽

Structural Similarity ◽

Video Frame ◽

Discrete Wavelet ◽

Wavelet Coefficients ◽

Tree Model ◽

Denoising Method

Wavelet based image processing techniques do not strictly follow the conventional probabilistic models that are unrealistic for real world images. However, the key features of joint probability distributions of wavelet coefficients are well captured by HMT (Hidden Markov Tree) model. This paper presents the HMT model based technique consisting of Wavelet based Multiresolution analysis to enhance the results in image processing applications such as compression, classification and denoising. The proposed technique is applied to colored video sequences by implementing the algorithm on each video frame independently. A 2D (Two Dimensional) DWT (Discrete Wavelet Transform) is used which is implemented on popular HMT model used in the framework of Expectation-Maximization algorithm. The proposed technique can properly exploit the temporal dependencies of wavelet coefficients and their non-Gaussian performance as opposed to existing wavelet based denoising techniques which consider the wavelet coefficients to be jointly Gaussian or independent. Denoised frames are obtained by processing the wavelet coefficients inversely. Comparison of proposed method with the existing techniques based on CPSNR (Coloured Peak Signal to Noise Ratio), PCC (Pearson’s Correlation Coefficient) and MSSIM (Mean Structural Similarity Index) has been carried out in detail.The proposed denoising method reveals improved results in terms of quantitative and qualitative analysis for both additive and multiplicative noise and retains nearly all the structural contents of a video frame.

Download Full-text

Difference Fourier Analysis of Glucose Embedded and Frozen Hydrated Purple Membrane

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100053164 ◽

1982 ◽

Vol 40 ◽

pp. 74-75

Author(s):

Jules S. Jaffe ◽

Robert M. Glaeser

Keyword(s):

Purple Membrane ◽

Data Set ◽

High Resolution Data ◽

X Ray ◽

X Ray Crystallography ◽

Fourier Techniques ◽

Versus Protein ◽

The Difference ◽

Difference Fourier ◽

Ideal Method

Although difference Fourier techniques are standard in X-ray crystallography it has only been very recently that electron crystallographers have been able to take advantage of this method. We have combined a high resolution data set for frozen glucose embedded Purple Membrane (PM) with a data set collected from PM prepared in the frozen hydrated state in order to visualize any differences in structure due to the different methods of preparation. The increased contrast between protein-ice versus protein-glucose may prove to be an advantage of the frozen hydrated technique for visualizing those parts of bacteriorhodopsin that are embedded in glucose. In addition, surface groups of the protein may be disordered in glucose and ordered in the frozen state. The sensitivity of the difference Fourier technique to small changes in structure provides an ideal method for testing this hypothesis.

Download Full-text

Optimizing Image Compression Using Singular Value Decomposition Based on Structural Similarity Index

International Journal on Communications Antenna and Propagation (IRECAP) ◽

10.15866/irecap.v7i4.12861 ◽

2017 ◽

Vol 7 (4) ◽

pp. 316

Author(s):

Yazeed A. Al-Sbou

Keyword(s):

Singular Value Decomposition ◽

Image Compression ◽

Similarity Index ◽

Structural Similarity ◽

Singular Value ◽

Structural Similarity Index ◽

Value Decomposition

Download Full-text