Hyper-rectangle based segmentation and clustering of large video data sets

2002 ◽  
Vol 141 (1-2) ◽  
pp. 139-168 ◽  
Author(s):  
Seok-Lyong Lee ◽  
Chin-Wan Chung
Keyword(s):  
2012 ◽  
Vol 20 (4-5) ◽  
pp. 495-514 ◽  
Author(s):  
Michael Dorr ◽  
Eleonora Vig ◽  
Erhardt Barth

2021 ◽  
Author(s):  
ElMehdi SAOUDI ◽  
Said Jai Andaloussi

Abstract With the rapid growth of the volume of video data and the development of multimedia technologies, it has become necessary to have the ability to accurately and quickly browse and search through information stored in large multimedia databases. For this purpose, content-based video retrieval ( CBVR ) has become an active area of research over the last decade. In this paper, We propose a content-based video retrieval system providing similar videos from a large multimedia data-set based on a query video. The approach uses vector motion-based signatures to describe the visual content and uses machine learning techniques to extract key-frames for rapid browsing and efficient video indexing. We have implemented the proposed approach on both, single machine and real-time distributed cluster to evaluate the real-time performance aspect, especially when the number and size of videos are large. Experiments are performed using various benchmark action and activity recognition data-sets and the results reveal the effectiveness of the proposed method in both accuracy and processing time compared to state-of-the-art methods.


Author(s):  
Dr. Manish L Jivtode

Web services are applications that allow for communication between devices over the internet and are independent of the technology. The devices are built and use standardized eXtensible Markup Language (XML) for information exchange. A client or user is able to invoke a web service by sending an XML message and then gets back and XML response message. There are a number of communication protocols for web services that use the XML format such as Web Services Flow Language (WSFL), Blocks Extensible Exchange Protocol(BEEP) etc. Simple Object Access Protocol (SOAP) and Representational State Transfer (REST) are used options for accessing web services. It is not directly comparable that SOAP is a communications protocol while REST is a set of architectural principles for data transmission. In this paper, the data size of 1KB, 2KB, 4KB, 8KB and 16KB were tested each for Audio, Video and result obtained for CRUD methods. The encryption and decryption timings in milliseconds/seconds were recorded by programming extensibility points of a WCF REST web service in the Azure cloud..


Author(s):  
Jung Hwan Oh ◽  
Jeong Kyu Lee ◽  
Sae Hwang

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.


Author(s):  
Hong Lu ◽  
Xiangyang Xue

With the amount of video data increasing rapidly, automatic methods are needed to deal with large-scale video data sets in various applications. In content-based video analysis, a common and fundamental preprocess for these applications is video segmentation. Based on the segmentation results, video has a hierarchical representation structure of frames, shots, and scenes from the low level to high level. Due to the huge amount of video frames, it is not appropriate to represent video contents using frames. In the levels of video structure, shot is defined as an unbroken sequence of frames from one camera; however, the contents in shots are trivial and can hardly convey valuable semantic information. On the other hand, scene is a group of consecutive shots that focuses on an object or objects of interest. And a scene can represent a semantic unit for further processing such as story extraction, video summarization, etc. In this chapter, we will survey the methods on video scene segmentation. Specifically, there are two kinds of scenes. One kind of scene is to just consider the visual similarity of video shots and clustering methods are used for scene clustering. Another kind of scene is to consider both the visual similarity and temporal constraints of video shots, i.e., shots with similar contents and not lying too far in temporal order. Also, we will present our proposed methods on scene clustering and scene segmentation by using Gaussian mixture model, graph theory, sequential change detection, and spectral methods.


2021 ◽  
Vol 252 ◽  
pp. 01024
Author(s):  
Jiang Yan ◽  
Li Qiang ◽  
Wang Guanyao ◽  
Wang Ben ◽  
Deng Wei

With the rapid development of the national economy, the national power consumption level continues to increase, which puts forward higher requirements on the power supply guarantee capacity of the power grid system. The distribution range of the transmission line is wide and densely, most lines are exposed to the unguarded field without any shielding or protective measures, which are vulnerable to man-made destruction or natural disasters. Therefore, it is very important for the early monitoring and prevention of the external force breaking of the transmission lines. The method for preventing external breakage of transmission lines based on deep learning proposed in this paper utilizes the video data collected by the cameras erected on the transmission line roads to perform feature extraction and learning through 3D CNN and LSTM networks, and obtains a monitoring model for external breakage prevention of transmission lines. The model was tested on public data sets and verified that it has a good performance in the field of transmission lines against external damage. The method in this paper makes full use of the existing video acquisition equipment, and the process does not require human intervention, which greatly reduces the cost of line monitoring and the hidden dangers of accidents.


2019 ◽  
Author(s):  
Ellen M. Ditria ◽  
Sebastian Lopez-Marcano ◽  
Michael K. Sievers ◽  
Eric L. Jinks ◽  
Christopher J. Brown ◽  
...  

AbstractAquatic ecologists routinely count animals to provide critical information for conservation and management. Increased accessibility to underwater recording equipment such as cameras and unmanned underwater devices have allowed footage to be captured efficiently and safely. It has, however, led to immense volumes of data being collected that require manual processing, and thus significant time, labour and money. The use of deep learning to automate image processing has substantial benefits, but has rarely been adopted within the field of aquatic ecology. To test its efficacy and utility, we compared the accuracy and speed of deep learning techniques against human counterparts for quantifying fish abundance in underwater images and video footage. We collected footage of fish assemblages in seagrass meadows in Queensland, Australia. We produced three models using a MaskR-CNN object detection framework to detect the target species, an ecologically important fish, luderick (Girella tricuspidata). Our models were trained on three randomised 80:20 ratios of training:validation data-sets from a total of 6,080 annotations. The computer accurately determined abundance from videos with high performance using unseen footage from the same estuary as the training data (F1 = 92.4%, mAP50 = 92.5%), and from novel footage collected from a different estuary (F1 = 92.3%, mAP50 = 93.4%). The computer’s performance in determining MaxN was 7.1% better than human marine experts, and 13.4% better than citizen scientists in single image test data-sets, and 1.5% and 7.8% higher in video data-sets, respectively. We show that deep learning is a more accurate tool than humans at determining abundance, and that results are consistent and transferable across survey locations. Deep learning methods provide a faster, cheaper and more accurate alternative to manual data analysis methods currently used to monitor and assess animal abundance. Deep learning techniques have much to offer the field of aquatic ecology.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Cheng-Jian Lin ◽  
Shiou-Yun Jeng ◽  
Hong-Wei Lioa

In recent years, vehicle detection and classification have become essential tasks of intelligent transportation systems, and real-time, accurate vehicle detection from image and video data for traffic monitoring remains challenging. The most noteworthy challenges are real-time system operation to accurately locate and classify vehicles in traffic flows and working around total occlusions that hinder vehicle tracking. For real-time traffic monitoring, we present a traffic monitoring approach that overcomes the abovementioned challenges by employing convolutional neural networks that utilize You Only Look Once (YOLO). A real-time traffic monitoring system has been developed, and it has attracted significant attention from traffic management departments. Digitally processing and analyzing these videos in real time is crucial for extracting reliable data on traffic flow. Therefore, this study presents a real-time traffic monitoring system based on a virtual detection zone, Gaussian mixture model (GMM), and YOLO to increase the vehicle counting and classification efficiency. GMM and a virtual detection zone are used for vehicle counting, and YOLO is used to classify vehicles. Moreover, the distance and time traveled by a vehicle are used to estimate the speed of the vehicle. In this study, the Montevideo Audio and Video Dataset (MAVD), the GARM Road-Traffic Monitoring data set (GRAM-RTM), and our collection data sets are used to verify the proposed method. Experimental results indicate that the proposed method with YOLOv4 achieved the highest classification accuracy of 98.91% and 99.5% in MAVD and GRAM-RTM data sets, respectively. Moreover, the proposed method with YOLOv4 also achieves the highest classification accuracy of 99.1%, 98.6%, and 98% in daytime, night time, and rainy day, respectively. In addition, the average absolute percentage error of vehicle speed estimation with the proposed method is about 7.6%.


Author(s):  
Christophe Marsala ◽  
Marcin Detyniecki

In this chapter, the authors focus on the use of forests of fuzzy decision trees (FFDT) in a video mining application. They discuss how to learn from a high scale video data sets and how to use the trained FFDTs to detect concepts in a high number of video shots. Moreover, the authors study the effect of the size of the forest on the performance; and of the use of fuzzy logic during the classification process. The experiments are performed on a well-know non-video dataset and on a real TV quality video benchmark.


2020 ◽  
Vol 2 (4) ◽  
pp. 581-595
Author(s):  
Martin Wutke ◽  
Armin Otto Schmitt ◽  
Imke Traulsen ◽  
Mehmet Gültas

The activity level of pigs is an important stress indicator which can be associated to tail-biting, a major issue for animal welfare of domestic pigs in conventional housing systems. Although the consideration of the animal activity could be essential to detect tail-biting before an outbreak occurs, it is often manually assessed and therefore labor intense, cost intensive and impracticable on a commercial scale. Recent advances of semi- and unsupervised convolutional neural networks (CNNs) have made them to the state of art technology for detecting anomalous behavior patterns in a variety of complex scene environments. In this study we apply such a CNN for anomaly detection to identify varying levels of activity in a multi-pen problem setup. By applying a two-stage approach we first trained the CNN to detect anomalies in the form of extreme activity behavior. Second, we trained a classifier to categorize the detected anomaly scores by learning the potential activity range of each pen. We evaluated our framework by analyzing 82 manually rated videos and achieved a success rate of 91%. Furthermore, we compared our model with a motion history image (MHI) approach and a binary image approach using two benchmark data sets, i.e., the well established pedestrian data sets published by the University of California, San Diego (UCSD) and our pig data set. The results show the effectiveness of our framework, which can be applied without the need of a labor intense manual annotation process and can be utilized for the assessment of the pig activity in a variety of applications like early warning systems to detect changes in the state of health.


Sign in / Sign up

Export Citation Format

Share Document