scholarly journals Eye movement prediction and variability on natural video data sets

2012 ◽  
Vol 20 (4-5) ◽  
pp. 495-514 ◽  
Author(s):  
Michael Dorr ◽  
Eleonora Vig ◽  
Erhardt Barth
2021 ◽  
Author(s):  
ElMehdi SAOUDI ◽  
Said Jai Andaloussi

Abstract With the rapid growth of the volume of video data and the development of multimedia technologies, it has become necessary to have the ability to accurately and quickly browse and search through information stored in large multimedia databases. For this purpose, content-based video retrieval ( CBVR ) has become an active area of research over the last decade. In this paper, We propose a content-based video retrieval system providing similar videos from a large multimedia data-set based on a query video. The approach uses vector motion-based signatures to describe the visual content and uses machine learning techniques to extract key-frames for rapid browsing and efficient video indexing. We have implemented the proposed approach on both, single machine and real-time distributed cluster to evaluate the real-time performance aspect, especially when the number and size of videos are large. Experiments are performed using various benchmark action and activity recognition data-sets and the results reveal the effectiveness of the proposed method in both accuracy and processing time compared to state-of-the-art methods.


Author(s):  
Dr. Manish L Jivtode

Web services are applications that allow for communication between devices over the internet and are independent of the technology. The devices are built and use standardized eXtensible Markup Language (XML) for information exchange. A client or user is able to invoke a web service by sending an XML message and then gets back and XML response message. There are a number of communication protocols for web services that use the XML format such as Web Services Flow Language (WSFL), Blocks Extensible Exchange Protocol(BEEP) etc. Simple Object Access Protocol (SOAP) and Representational State Transfer (REST) are used options for accessing web services. It is not directly comparable that SOAP is a communications protocol while REST is a set of architectural principles for data transmission. In this paper, the data size of 1KB, 2KB, 4KB, 8KB and 16KB were tested each for Audio, Video and result obtained for CRUD methods. The encryption and decryption timings in milliseconds/seconds were recorded by programming extensibility points of a WCF REST web service in the Azure cloud..


SLEEP ◽  
2021 ◽  
Author(s):  
Brian Geuther ◽  
Mandy Chen ◽  
Raymond J Galante ◽  
Owen Han ◽  
Jie Lian ◽  
...  

Abstract Study Objectives Sleep is an important biological process that is perturbed in numerous diseases, and assessment its substages currently requires implantation of electrodes to carry out electroencephalogram/electromyogram (EEG/EMG) analysis. Although accurate, this method comes at a high cost of invasive surgery and experts trained to score EEG/EMG data. Here, we leverage modern computer vision methods to directly classify sleep substages from video data. This bypasses the need for surgery and expert scoring, provides a path to high-throughput studies of sleep in mice. Methods We collected synchronized high-resolution video and EEG/EMG data in 16 male C57BL/6J mice. We extracted features from the video that are time and frequency-based and used the human expert-scored EEG/EMG data to train a visual classifier. We investigated several classifiers and data augmentation methods. Results Our visual sleep classifier proved to be highly accurate in classifying wake, non-rapid eye movement sleep (NREM), and rapid eye movement sleep (REM) states, and achieves an overall accuracy of 0.92 +/- 0.05 (mean +/- SD). We discover and genetically validate video features that correlate with breathing rates, and show low and high variability in NREM and REM sleep, respectively. Finally, we apply our methods to non-invasively detect that sleep stage disturbances induced by amphetamine administration. Conclusions We conclude that machine learning based visual classification of sleep is a viable alternative to EEG/EMG based scoring. Our results will enable non-invasive high-throughput sleep studies and will greatly reduce the barrier to screening mutant mice for abnormalities in sleep.


Author(s):  
Jung Hwan Oh ◽  
Jeong Kyu Lee ◽  
Sae Hwang

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.


Author(s):  
Y. Wang ◽  
H. Cheng ◽  
X. Zhou ◽  
W. Luo ◽  
H. Zhang

Abstract. With the rapid development of remote sensing technology, it is possible to obtain continuous video data from outer space successfully. It is of great significance in military and civilian fields to detect moving objects from the remote sensing image sequence and predict their movements. In recent years, this issue has attracted more and more attention. However, researches on moving object detection and movement prediction in high-resolution remote sensing videos are still in its infancy, which is worthy of further study. In this paper, we propose a ship detection and movement prediction method based on You-Only-Look-Once (YOLO) v3 and Simple Online and Realtime Tracking (SORT). Original YOLO v3 is improved by multi-frame training to fully utilize the information of continuous frames in a fusion way. The simple and practical multiple object tracking algorithm SORT is used to recognize multiple targets detected by multi-frame YOLO v3 model and obtain their coordinates. These coordinates are fitted by the least square method to get the trajectories of multiple targets. We take the derivative of each trajectory to obtain the real-time movement direction and velocity of the detected ships. Experiments are performed on multi-spectral remote sensing images selected on Google Earth, as well as real multi-spectral remote sensing videos captured by Jilin-1 satellite. Experimental results validate the effectiveness of our method for moving ship detection and movement prediction. It shows a feasible way for efficient interpretation and information extraction of new remote sensing video data.


Author(s):  
Hong Lu ◽  
Xiangyang Xue

With the amount of video data increasing rapidly, automatic methods are needed to deal with large-scale video data sets in various applications. In content-based video analysis, a common and fundamental preprocess for these applications is video segmentation. Based on the segmentation results, video has a hierarchical representation structure of frames, shots, and scenes from the low level to high level. Due to the huge amount of video frames, it is not appropriate to represent video contents using frames. In the levels of video structure, shot is defined as an unbroken sequence of frames from one camera; however, the contents in shots are trivial and can hardly convey valuable semantic information. On the other hand, scene is a group of consecutive shots that focuses on an object or objects of interest. And a scene can represent a semantic unit for further processing such as story extraction, video summarization, etc. In this chapter, we will survey the methods on video scene segmentation. Specifically, there are two kinds of scenes. One kind of scene is to just consider the visual similarity of video shots and clustering methods are used for scene clustering. Another kind of scene is to consider both the visual similarity and temporal constraints of video shots, i.e., shots with similar contents and not lying too far in temporal order. Also, we will present our proposed methods on scene clustering and scene segmentation by using Gaussian mixture model, graph theory, sequential change detection, and spectral methods.


2002 ◽  
Vol 141 (1-2) ◽  
pp. 139-168 ◽  
Author(s):  
Seok-Lyong Lee ◽  
Chin-Wan Chung
Keyword(s):  

2021 ◽  
Vol 252 ◽  
pp. 01024
Author(s):  
Jiang Yan ◽  
Li Qiang ◽  
Wang Guanyao ◽  
Wang Ben ◽  
Deng Wei

With the rapid development of the national economy, the national power consumption level continues to increase, which puts forward higher requirements on the power supply guarantee capacity of the power grid system. The distribution range of the transmission line is wide and densely, most lines are exposed to the unguarded field without any shielding or protective measures, which are vulnerable to man-made destruction or natural disasters. Therefore, it is very important for the early monitoring and prevention of the external force breaking of the transmission lines. The method for preventing external breakage of transmission lines based on deep learning proposed in this paper utilizes the video data collected by the cameras erected on the transmission line roads to perform feature extraction and learning through 3D CNN and LSTM networks, and obtains a monitoring model for external breakage prevention of transmission lines. The model was tested on public data sets and verified that it has a good performance in the field of transmission lines against external damage. The method in this paper makes full use of the existing video acquisition equipment, and the process does not require human intervention, which greatly reduces the cost of line monitoring and the hidden dangers of accidents.


2019 ◽  
Author(s):  
Ellen M. Ditria ◽  
Sebastian Lopez-Marcano ◽  
Michael K. Sievers ◽  
Eric L. Jinks ◽  
Christopher J. Brown ◽  
...  

AbstractAquatic ecologists routinely count animals to provide critical information for conservation and management. Increased accessibility to underwater recording equipment such as cameras and unmanned underwater devices have allowed footage to be captured efficiently and safely. It has, however, led to immense volumes of data being collected that require manual processing, and thus significant time, labour and money. The use of deep learning to automate image processing has substantial benefits, but has rarely been adopted within the field of aquatic ecology. To test its efficacy and utility, we compared the accuracy and speed of deep learning techniques against human counterparts for quantifying fish abundance in underwater images and video footage. We collected footage of fish assemblages in seagrass meadows in Queensland, Australia. We produced three models using a MaskR-CNN object detection framework to detect the target species, an ecologically important fish, luderick (Girella tricuspidata). Our models were trained on three randomised 80:20 ratios of training:validation data-sets from a total of 6,080 annotations. The computer accurately determined abundance from videos with high performance using unseen footage from the same estuary as the training data (F1 = 92.4%, mAP50 = 92.5%), and from novel footage collected from a different estuary (F1 = 92.3%, mAP50 = 93.4%). The computer’s performance in determining MaxN was 7.1% better than human marine experts, and 13.4% better than citizen scientists in single image test data-sets, and 1.5% and 7.8% higher in video data-sets, respectively. We show that deep learning is a more accurate tool than humans at determining abundance, and that results are consistent and transferable across survey locations. Deep learning methods provide a faster, cheaper and more accurate alternative to manual data analysis methods currently used to monitor and assess animal abundance. Deep learning techniques have much to offer the field of aquatic ecology.


Sign in / Sign up

Export Citation Format

Share Document