scholarly journals Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

2020 ◽  
Vol 34 (04) ◽  
pp. 5315-5322
Author(s):  
Kyoung-Woon On ◽  
Eun-Sol Kim ◽  
Yu-Jung Heo ◽  
Byoung-Tak Zhang

Conventional sequential learning methods such as Recurrent Neural Networks (RNNs) focus on interactions between consecutive inputs, i.e. first-order Markovian dependency. However, most of sequential data, as seen with videos, have complex dependency structures that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video. The CB-GLNs represent video data as a graph, with nodes and edges corresponding to frames of the video and their dependencies respectively. The CB-GLNs find compositional dependencies of the data in multilevel graph forms via a parameterized kernel with graph-cut and a message passing framework. We evaluate the proposed method on the two different tasks for video understanding: Video theme classification (Youtube-8M dataset (Abu-El-Haija et al. 2016)) and Video Question and Answering (TVQA dataset(Lei et al. 2018)). The experimental results show that our model efficiently learns the semantic compositional structure of video data. Furthermore, our model achieves the highest performance in comparison to other baseline methods.

Author(s):  
Jacek Grekow

AbstractThe article presents conducted experiments using recurrent neural networks for emotion detection in musical segments. Trained regression models were used to predict the continuous values of emotions on the axes of Russell’s circumplex model. A process of audio feature extraction and creating sequential data for learning networks with long short-term memory (LSTM) units is presented. Models were implemented using the WekaDeeplearning4j package and a number of experiments were carried out with data with different sets of features and varying segmentation. The usefulness of dividing the data into sequences as well as the point of using recurrent networks to recognize emotions in music, the results of which have even exceeded the SVM algorithm for regression, were demonstrated. The author analyzed the effect of the network structure and the set of used features on the results of the regressors recognizing values on two axes of the emotion model: arousal and valence. Finally, the use of a pretrained model for processing audio features and training a recurrent network with new sequences of features is presented.


2016 ◽  
Vol 25 (05) ◽  
pp. 1640001 ◽  
Author(s):  
Sotirios Chatzis ◽  
Dimitrios Kosmopoulos ◽  
George Papadourakis

Hidden Markov models (HMMs) are a popular approach for modeling sequential data, typically based on the assumption of a first-order Markov chain. In other words, only one-step back dependencies are modeled which is a rather unrealistic assumption in most applications. In this paper, we propose a method for postulating HMMs with approximately infinitely-long time-dependencies. Our approach considers the whole history of model states in the postulated dependencies, by making use of a recently proposed nonparametric Bayesian method for modeling label sequences with infinitely-long time dependencies, namely the sequence memoizer. We manage to derive training and inference algorithms for our model with computational costs identical to simple first-order HMMs, despite its entailed infinitely-long time-dependencies, by employing a mean-field-like approximation. The efficacy of our proposed model is experimentally demonstrated.


2016 ◽  
Vol 26 (03) ◽  
pp. 1650014 ◽  
Author(s):  
Markus Flatz ◽  
Marián Vajteršic

The goal of Nonnegative Matrix Factorization (NMF) is to represent a large nonnegative matrix in an approximate way as a product of two significantly smaller nonnegative matrices. This paper shows in detail how an NMF algorithm based on Newton iteration can be derived using the general Karush-Kuhn-Tucker (KKT) conditions for first-order optimality. This algorithm is suited for parallel execution on systems with shared memory and also with message passing. Both versions were implemented and tested, delivering satisfactory speedup results.


2020 ◽  
Vol 12 (24) ◽  
pp. 4025
Author(s):  
Rongshu Tao ◽  
Yuming Xiang ◽  
Hongjian You

As an essential step in 3D reconstruction, stereo matching still faces unignorable problems due to the high resolution and complex structures of remote sensing images. Especially in occluded areas of tall buildings and textureless areas of waters and woods, precise disparity estimation has become a difficult but important task. In this paper, we develop a novel edge-sense bidirectional pyramid stereo matching network to solve the aforementioned problems. The cost volume is constructed from negative to positive disparities since the disparity range in remote sensing images varies greatly and traditional deep learning networks only work well for positive disparities. Then, the occlusion-aware maps based on the forward-backward consistency assumption are applied to reduce the influence of the occluded area. Moreover, we design an edge-sense smoothness loss to improve the performance of textureless areas while maintaining the main structure. The proposed network is compared with two baselines. The experimental results show that our proposed method outperforms two methods, DenseMapNet and PSMNet, in terms of averaged endpoint error (EPE) and the fraction of erroneous pixels (D1), and the improvements in occluded and textureless areas are significant.


2020 ◽  
Vol 7 (4) ◽  
pp. 745
Author(s):  
Rizka Indah Armianti ◽  
Achmad Fanany Onnilita Gaffar ◽  
Arief Bramanto Wicaksono Putra

<p class="Abstrak">Obyek dinyatakan bergerak jika terjadi perubahan posisi dimensi disetiap <em>frame</em>. Pergerakan obyek menyebabkan obyek memiliki perbedaan bentuk pola disetiap <em>frame-</em>nya. <em>Frame</em> yang memiliki pola terbaik diantara <em>frame</em> lainnya disebut <em>frame</em> dominan. Penelitian ini bertujuan untuk menyeleksi <em>frame</em> dominan dari rangkaian <em>frame</em> dengan menerapkan metode K-means <em>clustering</em> untuk memperoleh <em>centroid</em> dominan (<em>centroid</em> dengan nilai tertinggi) yang digunakan sebagai dasar seleksi <em>frame</em> dominan. Dalam menyeleksi <em>frame</em> dominan terdapat 4 tahapan utama yaitu akuisisi data, penetapan pola obyek, ekstrasi ciri dan seleksi. Data yang digunakan berupa data video yang kemudian dilakukan proses penetapan pola obyek menggunakan operasi pengolahan citra digital, dengan hasil proses berupa pola obyek RGB yang kemudian dilakukan ekstraksi ciri berbasis NTSC dengan menggunakan metode statistik orde pertama yaitu <em>Mean</em>. Data hasil ekstraksi ciri berjumlah 93 data <em>frame</em> yang selanjutnya dikelompokkan menjadi 3 <em>cluster</em> menggunakan metode K-Means. Dari hasil <em>clustering</em>, <em>centroid</em> dominan terletak pada <em>cluster</em> 3 dengan nilai <em>centroid</em> 0.0177 dan terdiri dari 41 data <em>frame</em>. Selanjutnya diukur jarak kedekatan seluruh data <em>cluster</em> 3 terhadap <em>centroid</em>, data yang memiliki jarak terdekat dengan <em>centroid</em> itulah <em>frame</em> dominan. Hasil seleksi <em>frame</em> dominan ditunjukkan pada jarak antar <em>centroid</em> dengan anggota <em>cluster</em>, dimana dari seluruh 41 data frame tiga jarak terbaik diperoleh adalah 0.0008 dan dua jarak bernilai  0.0010 yang dimiliki oleh <em>frame</em> ke-59, ke-36 dan ke-35.</p><p class="Abstrak"> </p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstract"><em>The object is declared moving if there is a change in the position of the dimensions in each frame. The movement of an object causes the object to have different shapes in each frame. The frame that has the best pattern among other frames is called the dominant frame. This study aims to select the dominant frame from the frame set by applying the K-means clustering method to obtain the dominant centroid (the highest value centroid) which is used as the basis for the selection of dominant frames. In selecting dominant frames, there are 4 main stages, namely data acquisition, determination of object patterns, feature extraction and selection. The data used in the form of video data which is then carried out the process of determining the pattern of objects using digital image processing operations, with the results of the process in the form of an RGB object pattern which is then performed NTSC-based feature extraction using the first-order statistical method, Mean. The data from feature extraction are 93 data frames which are then grouped into 3 clusters using the K-Means method. From the results of clustering, the dominant centroid is located in cluster 3 with a centroid value of 0.0177 and consists of 41 data frames. Furthermore, the proximity of all data cluster 3 to the centroid is measured, the data having the closest distance to the centroid is the dominant frame. The results of dominant frame selection are shown in the distance between centroids and cluster members, where from all 41 data frames the three best distances obtained are 0.0008, 0.0010, and 0.0010 owned by 59th, 36th and 35th frames.</em></p><p class="Abstrak"><em><strong><br /></strong></em></p><p> </p>


Author(s):  
Stefan Vlaski ◽  
Hermina P. Maretic ◽  
Roula Nassif ◽  
Pascal Frossard ◽  
Ali H. Sayed

2008 ◽  
Vol 05 (03) ◽  
pp. 363-373
Author(s):  
M. KACHKACHI

It was shown in [1], only for scalar conformal fields, that the Moyal–Weyl star product can introduce the quantum effect as the phase factor to the ordinary product. In this paper we show that, even on the same complex structure, the Moyal–Weyl star product of two j-differentials (conformal fields of weights (j, 0)) does not vanish but it generates the quantum effect at the first order of its perturbative series. More generally, we get the explicit expression of the Moyal–Weyl star product of j-differentials defined on any complex structure of a bi-dimensional Riemann surface Σ. We show that the star product of two j-differentials is not a j-differential and does not preserve the conformal covariance character. This can shed some light on the Moyal–Weyl deformation quantization procedure connection's with the deformation of complex structures on a Riemann surface. Hence, the situation might relate the star products to the Moduli and Teichmüller spaces of Riemann surfaces.


Author(s):  
CLIFFORD B. MILLER ◽  
C. LEE GILES

There has been much interest in increasing the computational power of neural networks. In addition there has been much interest in “designing” neural networks better suited to particular problems. Increasing the “order” of the connectivity of a neural network permits both. Though order has played a significant role in feedforward neural networks, its role in dynamically driven recurrent networks is still being understood. This work explores the effect of order in learning grammars. We present an experimental comparison of first order and second order recurrent neural networks, as applied to the task of grammatical inference. We show that for the small grammars studied these two neural net architectures have comparable learning and generalization power, and that both are reasonably capable of extracting the correct finite state automata for the language in question. However, for a larger randomly-generated ten-state grammar, second order networks significantly outperformed the first order networks, both in convergence time and generalization capability. We show that these networks learn faster the more neurons they have (our experiments used up to 10 hidden neurons), but that the solutions found by smaller networks are usually of better quality (in terms of generalization performance after training). Second order nets have the advantage that they converge more quickly to a solution and can find it more reliably than first order nets, but that the second order solutions tend to be of poorer quality than those of the first order if both architectures are trained to the same error tolerance. Despite this, second order nets can more successfully extract finite state machines using heuristic clustering techniques applied to the internal state representations. We speculate that this may be due to restrictions on the ability of first order architecture to fully make use of its internal state representation power and that this may have implications for the performance of the two architectures when scaled up to larger problems.


Sign in / Sign up

Export Citation Format

Share Document