scholarly journals A Distributed Content-Based Video Retrieval System for Large Data-sets

Author(s):  
ElMehdi SAOUDI ◽  
Said Jai Andaloussi

Abstract With the rapid growth of the volume of video data and the development of multimedia technologies, it has become necessary to have the ability to accurately and quickly browse and search through information stored in large multimedia databases. For this purpose, content-based video retrieval ( CBVR ) has become an active area of research over the last decade. In this paper, We propose a content-based video retrieval system providing similar videos from a large multimedia data-set based on a query video. The approach uses vector motion-based signatures to describe the visual content and uses machine learning techniques to extract key-frames for rapid browsing and efficient video indexing. We have implemented the proposed approach on both, single machine and real-time distributed cluster to evaluate the real-time performance aspect, especially when the number and size of videos are large. Experiments are performed using various benchmark action and activity recognition data-sets and the results reveal the effectiveness of the proposed method in both accuracy and processing time compared to state-of-the-art methods.

Author(s):  
Jung Hwan Oh ◽  
Jeong Kyu Lee ◽  
Sae Hwang

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
El Mehdi Saoudi ◽  
Said Jai-Andaloussi

AbstractWith the rapid growth in the amount of video data, efficient video indexing and retrieval methods have become one of the most critical challenges in multimedia management. For this purpose, Content-Based Video Retrieval (CBVR) is nowadays an active area of research. In this article, a CBVR system providing similar videos from a large multimedia dataset based on query video has been proposed. This approach uses vector motion-based signatures to describe the visual content and uses machine learning techniques to extract key frames for rapid browsing and efficient video indexing. The proposed method has been implemented on both single machine and real-time distributed cluster to evaluate the real-time performance aspect, especially when the number and size of videos are large. Experiments were performed using various benchmark action and activity recognition datasets and the results reveal the effectiveness of the proposed method in both accuracy and processing time compared to previous studies.


2008 ◽  
pp. 1631-1637
Author(s):  
Jung Hwan Oh ◽  
Jeong Kyu Lee ◽  
Sae Hwang

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.


Author(s):  
JungHwan Oh

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.


Author(s):  
Waleed E. Farag

Multimedia applications are rapidly spread at an everincreasing rate, introducing a number of challenging problems at the hands of the research community. The most significant and influential problem among them is the effective access to stored data. In spite of the popularity of keyword-based search technique in alphanumeric databases, it is inadequate for use with multimedia data due to their unstructured nature. On the other hand, a number of video content and contextbased access techniques have been developed (Deb, 2005). The basic idea of content-based retrieval is to access multimedia data by their contents, for example, using one of the visual content features. While context-based techniques try to improve the retrieval performance by using associated contextual information, other than those derived from the media content (Hori & Aizawa, 2003). Most of the proposed video indexing and retrieval prototypes have two major phases, the database population and the retrieval phase. In the former one, the video stream is partitioned into its constituent shots in a process known as shot boundary detection (Farag & Abdel-Wahab, 2001, 2002b). This step is followed by a process of selecting representative frames to summarize video shots (Farag & Abdel-Wahab, 2002a). Then, a number of low-level features (color, texture, object motion, etc.) are extracted in order to use them as indices to shots. The database population phase is performed as an off-line activity and it outputs a set of metadata with each element representing one of the clips in the video archive. In the retrieval phase, a query is presented to the system that in turns performs similarity matching operations and returns similar data back to the user. The basic objective of an automated video retrieval system (described above) is to provide the user with easy-to-use and effective mechanisms to access the required information. For that reason, the success of a content-based video access system is mainly measured by the effectiveness of its retrieval phase. The general query model adopted by almost all multimedia retrieval systems is the QBE (query by example; Marchionini, 2006). In this model, the user submits a query in the form of an image or a video clip (in case of a video retrieval system) and asks the system to retrieve similar data. QBE is considered to be a promising technique since it provides the user with an intuitive way of query presentation. In addition, the form of expressing a query condition is close to that of the data to be evaluated. Upon the reception of the submitted query, the retrieval stage analyzes it to extract a set of features then performs the task of similarity matching. In the latter task, the query-extracted features are compared with the features stored into the metadata; then matches are sorted and displayed back to the user based on how close a hit is to the input query. A central issue here is the assessment of video data similarity. Appropriately answering the following questions has a crucial impact on the effectiveness and applicability of the retrieval system. How are the similarity matching operations performed and based on what criteria? Do the employed similarity matching models reflect the human perception of multimedia similarity? The main focus of this article is to shed the light on possible answers to the above questions.


2020 ◽  
Vol 10 (9) ◽  
pp. 3079 ◽  
Author(s):  
Yi-Qi Huang ◽  
Jia-Chun Zheng ◽  
Shi-Dan Sun ◽  
Cheng-Fu Yang ◽  
Jing Liu

In the intelligent traffic system, real-time and accurate detections of vehicles in images and video data are very important and challenging work. Especially in situations with complex scenes, different models, and high density, it is difficult to accurately locate and classify these vehicles during traffic flows. Therefore, we propose a single-stage deep neural network YOLOv3-DL, which is based on the Tensorflow framework to improve this problem. The network structure is optimized by introducing the idea of spatial pyramid pooling, then the loss function is redefined, and a weight regularization method is introduced, for that, the real-time detections and statistics of traffic flows can be implemented effectively. The optimization algorithm we use is the DL-CAR data set for end-to-end network training and experiments with data sets under different scenarios and weathers. The analyses of experimental data show that the optimized algorithm can improve the vehicles’ detection accuracy on the test set by 3.86%. Experiments on test sets in different environments have improved the detection accuracy rate by 4.53%, indicating that the algorithm has high robustness. At the same time, the detection accuracy and speed of the investigated algorithm are higher than other algorithms, indicating that the algorithm has higher detection performance.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiawei Lian ◽  
Junhong He ◽  
Yun Niu ◽  
Tianze Wang

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.


Sign in / Sign up

Export Citation Format

Share Document