scholarly journals A MODEL AND TRAINING METHOD FOR CONTEXT CLASSIFICATION IN CCTV SEWER INSPECTION VIDEO FRAMES

Author(s):  
V. V. Moskalenko ◽  
M. O. Zaretsky ◽  
A. S. Moskalenko ◽  
A. O. Panych ◽  
V. V. Lysyuk

Context. A model and training method for observational context classification in CCTV sewer inspection vide frames was developed and researched. The object of research is the process of detection of temporal-spatial context during CCTV sewer inspections. The subjects of the research are machine learning model and training method for classification analysis of CCTV video sequences under the limited and imbalanced training dataset constraint. Objective. Stated research goal is to develop an efficient context classifier model and training algorithm for CCTV sewer inspection video frames under the constraint of the limited and imbalanced labeled training set. Methods. The four-stage training algorithm of the classifier is proposed. The first stage involves training with soft triplet loss and regularisation component which penalises the network’s binary output code rounding error. The next stage is needed to determine the binary code for each class according to the principles of error-correcting output codes with accounting for intra- and interclass relationship. The resulting reference vector for each class is then used as a sample label for the future training with Joint Binary Cross Entropy Loss. The last machine learning stage is related to decision rule parameter optimization according to the information criteria to determine the boundaries of deviation of binary representation of observations for each class from the corresponding reference vector. A 2D convolutional frame feature extractor combined with the temporal network for inter-frame dependency analysis is considered. Variants with 1D Dilated Regular Convolutional Network, 1D Dilated Causal Convolutional Network, LSTM Network, GRU Network are considered. Model efficiency comparison is made on the basis of micro averaged F1 score calculated on the test dataset. Results. Results obtained on the dataset provided by Ace Pipe Cleaning, Inc confirm the suitability of the model and method for practical use, the resulting accuracy equals 92%. Comparison of the training outcome with the proposed method against the conventional methods indicated a 4% advantage in micro averaged F1 score. Further analysis of the confusion matrix had shown that the most significant increase in accuracy in comparison with the conventional methods is achieved for complex classes which combine both camera orientation and the sewer pipe construction features. Conclusions. The scientific novelty of the work lies in the new models and methods of classification analysis of the temporalspatial context when automating CCTV sewer inspections under imbalanced and limited training dataset conditions. Training results obtained with the proposed method were compared with the results obtained with the conventional method. The proposed method showed 4% advantage in micro averaged F1 score. It had been empirically proven that the use of the regular convolutional temporal network architecture is the most efficient in utilizing inter-frame dependencies. Resulting accuracy is suitable for practical use, as the additional error correction can be made by using the odometer data.

Author(s):  
В’ячеслав Васильович Москаленко ◽  
Микола Олександрович Зарецький ◽  
Артем Геннадійович Коробов ◽  
Ярослав Юрійович Ковальський ◽  
Артур Фанісович Шаєхов ◽  
...  

Models and training methods for water-level classification analysis on the footage of sewage pipe inspections have been developed and investigated. The object of the research is the process of water-level recognition, considering the spatial and temporal context during the inspection of sewage pipes. The subject of the research is a model and machine learning method for water-level classification analysis on video sequences of pipe inspections under conditions of limited size and an unbalanced set of training data. A four-stage algorithm for training the classifier is proposed. At the first stage of training, training occurs with a softmax triplet loss function and a regularizing component to penalize the rounding error of the network output to a binary code. The next step is to define a binary code (reference vector) for each class according to the principles of error-correcting output codes, but considering the intraclass and interclass relations. The computed reference vector of each class is used as the target label of the sample for further training using the joint cross-entropy loss function. The last stage of machine learning involves optimizing the parameters of the decision rules based on the information criterion to account for the boundaries of deviation of the binary representation of the observations of each class from the corresponding reference vectors. As a classifier model, a combination of 2D convolutional feature extractor for each frame and temporal network to analyze inter-frame dependencies is considered. The different variants of the temporal network are compared. We consider a 1D regular convolutional network with dilated convolutions, 1D causal convolutional network with dilated convolutions, recurrent LSTM-network, recurrent GRU-network. The performance of the models is compared by the micro-averaged metric F1 computed on the test subset. The results obtained on the dataset from Ace Pipe Cleaning (Kansas City, USA) confirm the suitability of the model and training method for practical use, the obtained value of F1-metric is 0.88. The results of training by the proposed method were compared with the results obtained using the traditional method. It was shown that the proposed method provides a 9 % increase in the value of micro-averaged F1-measure.


Author(s):  
Xiaobin Zhu ◽  
Zhuangzi Li ◽  
Xiao-Yu Zhang ◽  
Changsheng Li ◽  
Yaqi Liu ◽  
...  

Video super-resolution is a challenging task, which has attracted great attention in research and industry communities. In this paper, we propose a novel end-to-end architecture, called Residual Invertible Spatio-Temporal Network (RISTN) for video super-resolution. The RISTN can sufficiently exploit the spatial information from low-resolution to high-resolution, and effectively models the temporal consistency from consecutive video frames. Compared with existing recurrent convolutional network based approaches, RISTN is much deeper but more efficient. It consists of three major components: In the spatial component, a lightweight residual invertible block is designed to reduce information loss during feature transformation and provide robust feature representations. In the temporal component, a novel recurrent convolutional model with residual dense connections is proposed to construct deeper network and avoid feature degradation. In the reconstruction component, a new fusion method based on the sparse strategy is proposed to integrate the spatial and temporal features. Experiments on public benchmark datasets demonstrate that RISTN outperforms the state-ofthe-art methods.


2019 ◽  
Vol 8 (2) ◽  
pp. 435
Author(s):  
Lystia Nurhaliza Hasibuan ◽  
R. Triyanto ◽  
Raden Burhan ◽  
Mangatas Mangatas

AbstrakPenelitian ini dilaksanakan di SMK Negeri 9 Medan, Jenis penelitian yaitu Penelitian Tindakan Kelas. Subjek penelitian ini adalah siswa kelas X DKV 1 yang berjumlah 36 orang siswa terdiri dari 20 laki-laki dan 16 perempuan, Penelitian ini bertujuan untuk meningkatkan hasil belajar sketsa, melalui motode demonstrasi dan latihan. Berdasarkan hasil observasi awal yang di temukan oleh peneliti, masih rendahnya hasil belajar sketsa siswa. Peneliti melakukan pre test untuk mengetahui kondisi awal sebelum dilaksanakannya tindakan persiklus. Hasil pre test menunjukkan masih rendahnya ketuntasan belajar siswa, dari 36 orang siswa hanya 7 orang siswa (19,4%) yang tuntas dalam pembelajaran sketsa. Penelitian ini menggunakan motode pembelajaran demonstrasi dan latihan. Pada siklus I diperoleh 24 orang siswa (66,7%) yang tuntas dan 12 orang siswa (33,3%) yang tidak tuntas. Dengan begitu, peneliti melanjutkan ke siklus II dengan perolehan 35 orang siswa (97,2%) yang tuntas dan 1 orang siswa (2,8%) tidak tuntas dalam pembelajaran sketsa. Terdapat peningkatan dari siklus I ke siklus II sebesar 30,5%, maka peneliti tidak melanjutkan ke siklus berikutnya. Dengan demikian dapat disimpulkan bahwa penggunaan motode pembelajaran demonstrasi dan latihan dapat meningkatkan kemampuan belajar sketsa pada siswa kelas X DKV 1 di SMK Negeri 9 Medan Tahun Pembelajaran 2019/2020.Kata Kunci: hasil belajar, sketsa, demonstrasi, latihan.AbstractThis research was conducted at SMK Negeri 9 Medan. The research employs a classroom action research method. The research subject was the ten grade students of DKV 1 with a total of 36 students consist of 20 men and 16 women. This research aims to improve skectch learning outcomes through demonstration and training method. Based on the results of preliminary observations found by researcher, the researcher found the low student sketch learning outcomes. The researcher used a pre-test to determine the initial conditions before the pre-cycle action was carried out. The result of pre-test showed that students' learning completeness was still low. From 36 students only 7 students (19,4%) who complete in sketching learning. This research was taught by using demonstration and training learning method. In cycle I, there were 24 students (66,7%) was complete and 12 students (33,3%) was incomplete. Therefore, the researcher proceed to cycle II with the acquisition of 35 students (97,2%) was complete and 1 students (2,8%) was incomplete in sketch learning. There was increase from cycle I to cycle II of 30,5%, the researcher does not proceed to the next cycle. Thus, it can be concluded that the use of demonstration and training method can improve the ability in sketch learning of the ten grade students DKV I at SMK Negeri 9 Medan in the academic year 2019/2020. Keywords: learning outcomes, sketch, demonstration, training.


Symmetry ◽  
2020 ◽  
Vol 13 (1) ◽  
pp. 38
Author(s):  
Dong Zhao ◽  
Baoqing Ding ◽  
Yulin Wu ◽  
Lei Chen ◽  
Hongchao Zhou

This paper proposes a method for discovering the primary objects in single images by learning from videos in a purely unsupervised manner—the learning process is based on videos, but the generated network is able to discover objects from a single input image. The rough idea is that an image typically consists of multiple object instances (like the foreground and background) that have spatial transformations across video frames and they can be sparsely represented. By exploring the sparsity representation of a video with a neural network, one may learn the features of each object instance without any labels, which can be used to discover, recognize, or distinguish object instances from a single image. In this paper, we consider a relatively simple scenario, where each image roughly consists of a foreground and a background. Our proposed method is based on encoder-decoder structures to sparsely represent the foreground, background, and segmentation mask, which further reconstruct the original images. We apply the feed-forward network trained from videos for object discovery in single images, which is different from the previous co-segmentation methods that require videos or collections of images as the input for inference. The experimental results on various object segmentation benchmarks demonstrate that the proposed method extracts primary objects accurately and robustly, which suggests that unsupervised image learning tasks can benefit from the sparsity of images and the inter-frame structure of videos.


2020 ◽  
Vol 10 (6) ◽  
pp. 2104
Author(s):  
Michał Tomaszewski ◽  
Paweł Michalski ◽  
Jakub Osuchowski

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.


1992 ◽  
Vol 03 (02) ◽  
pp. 157-165
Author(s):  
D. Saad ◽  
R. Sasson

Learning by Choice of Internal Representations (CHIR) is a training algorithm presented by Grossman et al.1 based on modification of the Internal Representations (IR) along side of the direct weight matrix modification performed in conventional training methods. This algorithm was presented in several versions aimed to tackle the various training problems of nets with continuous and binary weights, multilayer and multi-output-neuron nets and training without storing the Internal Representations. The capability of one of these versions, the CHIR2 algorithm, to tackle multilayer training tasks of nets with continuous input vectors is examined in this paper. A comparison between the performance of this algorithm and of the Backpropagation algorithm2 is carried out via extensive computer simulations for the “two-spirals” problem, aimed to classify two classes of dots forming two intertwined spirals. The CHIR24 algorithm shows a rapid convergence rate for this problem, an order of magnitude faster than the results reported for the BP training algorithm (as well as those obtained by us) regarding the same training problem and network architecture.11 Moreover, the CHIR2 algorithm finds solution nets for the above mentioned problem with reduced architectures, reported as hard to solve by the BP training algorithm.11


2019 ◽  
Vol 16 (1) ◽  
pp. 0116
Author(s):  
Al-Saif Et al.

       In this paper, we focus on designing feed forward neural network (FFNN) for solving Mixed Volterra – Fredholm Integral Equations (MVFIEs) of second kind in 2–dimensions. in our method, we present a multi – layers model consisting of a hidden layer which has five hidden units (neurons) and one linear output unit. Transfer function (Log – sigmoid) and training algorithm (Levenberg – Marquardt) are used as a sigmoid activation of each unit. A comparison between the results of numerical experiment and the analytic solution of some examples has been carried out in order to justify the efficiency and the accuracy of our method.                                  


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Benjamin Zahneisen ◽  
Matus Straka ◽  
Shalini Bammer ◽  
Greg Albers ◽  
Roland Bammer

Introduction: Ruling out hemorrhage (stroke or traumatic) prior to administration of thrombolytics is critical for Code Strokes. A triage software that identifies hemorrhages on head CTs and alerts radiologists would help to streamline patient care and increase diagnostic confidence and patient safety. ML approach: We trained a deep convolutional network with a hybrid 3D/2D architecture on unenhanced head CTs of 805 patients. Our training dataset comprised 348 positive hemorrhage cases (IPH=245, SAH=67, Sub/Epi-dural=70, IVH=83) (128 female) and 457 normal controls (217 female). Lesion outlines were drawn by experts and stored as binary masks that were used as ground truth data during the training phase (random 80/20 train/test split). Diagnostic sensitivity and specificity were defined on a per patient study level, i.e. a single, binary decision for presence/absence of a hemorrhage on a patient’s CT scan. Final validation was performed in 380 patients (167 positive). Tool: The hemorrhage detection module was prototyped in Python/Keras. It runs on a local LINUX server (4 CPUs, no GPUs) and is embedded in a larger image processing platform dedicated to stroke. Results: Processing time for a standard whole brain CT study (3-5mm slices) was around 2min. Upon completion, an instant notification (by email and/or mobile app) was sent to users to alert them about the suspected presence of a hemorrhage. Relative to neuroradiologist gold standard reads the algorithm’s sensitivity and specificity is 90.4% and 92.5% (95% CI: 85%-94% for both). Detection of acute intracranial hemorrhage can be automatized by deploying deep learning. It yielded very high sensitivity/specificity when compared to gold standard reads by a neuroradiologist. Volumes as small as 0.5mL could be detected reliably in the test dataset. The software can be deployed in busy practices to prioritize worklists and alert health care professionals to speed up therapeutic decision processes and interventions.


Sign in / Sign up

Export Citation Format

Share Document