MINIMIZING LATENCY AND JITTER FOR LARGE-SCALE MULTIMEDIA REPOSITORIES THROUGH PREFIX CACHING

Multimedia data poses challenges for efficient storage and retrieval due to its large size and playback timing requirements. For applications that store very large volumes of multimedia data, hierarchical storage offers a scalable and economical alternative to store data on magnetic disks. In a hierarchical storage architecture data is stored on a tape or optical disk based tertiary storage layer with the secondary storage disks serving as a cache or buffer. Due to the need for swapping media on drives, retrieving multimedia data from tertiary storage can potentially result in large delays before playback (startup latency) begins as well as during playback (jitter). In this paper we address the important problem of reducing startup latency and jitter for very large multimedia repositories. We propose that secondary storage should not be used as a cache in the traditional manner — instead, most of the secondary storage should be used to permanently store partial objects. Furthermore, replication is employed at the tertiary storage level to avoid expensive media switching. In particular, we show that by saving the initial segments of documents permanently on secondary storage, and replicating them on tertiary storage, startup latency can be significantly reduced. Since we are effectively reducing the amount of secondary storage available for buffering the data from tertiary storage, an increase in jitter may be expected. However, our results show that the technique also reduces jitter, in contrast to the expected behavior. Our technique exploits the pattern of data access. Advance knowledge of the access pattern is helpful, but not essential. Lack of this information or changes in access patterns are handled through adaptive techniques. Our study addresses both single- and multiple-user scenarios. Our results show that startup latency can be reduced by as much as 75% and jitter practically eliminated through the use of these techniques.

Download Full-text

A Large Site Centralized Data Classification Strategy Based on User Value

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1030-1032.1619 ◽

2014 ◽

Vol 1030-1032 ◽

pp. 1619-1622

Author(s):

Bing Xin Zhu ◽

Jing Tao Li

Keyword(s):

High Performance ◽

Large Scale ◽

Storage System ◽

Data Classification ◽

Data Access ◽

Heat Index ◽

Storage Devices ◽

Hierarchical Storage ◽

Classification Strategy ◽

User Value

In large-scale storage system, variety of calculations, transfer, and storage devices both in performance and in characteristics such as reliability, there are physical differences. While operational load data access for storage devices is also not uniform, there is a big difference in space and time. If all the data is stored in the high-performance equipment is unrealistic and unwise. Hierarchical storage concept effectively solves this problem. It is able to monitor the data access loads, and depending on the load and application requirements based on storage resources optimally configure properties [1]. Traditional classification policy is generally against file data, based on frequency of access to files, file IO heat index for classification. This paper embarks from the website user value concept, aiming at the disadvantages of traditional data classification strategy, puts forward the centralized data classification strategy based on user value.

Download Full-text

Archiving Large-Scale Legacy Multimedia Research Data: A Case Study

International Journal of Digital Curation ◽

10.2218/ijdc.v12i2.484 ◽

2018 ◽

Vol 12 (2) ◽

pp. 157-176

Author(s):

Claudia Yogeswaran ◽

Kearsy Cormier

Keyword(s):

Large Scale ◽

Research Data ◽

Multimedia Data ◽

Ethical Challenges ◽

Large Size ◽

Video Research ◽

University College London ◽

Legacy Data ◽

Project Data

In this paper we provide a case study of the creation of the DCAL Research Data Archive at University College London. In doing so, we assess the various challenges associated with archiving large-scale legacy multimedia research data, given the lack of literature on archiving such datasets. We address issues such as the anonymisation of video research data, the ethical challenges of managing legacy data and historic consent, ownership considerations, the handling of large-size multimedia data, as well as the complexity of multi-project data from a number of researchers and legacy data from eleven years of research.

Download Full-text

Research on Dynamic Statistics for Spatial Data Access Laws Based on P2P

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1995 ◽

2013 ◽

Vol 380-384 ◽

pp. 1995-1998

Author(s):

Shao Ming Pan ◽

Hong Li ◽

Ge Tang

Keyword(s):

Spatial Data ◽

System Performance ◽

Large Scale ◽

High Efficiency ◽

Data Access ◽

Global Information ◽

Good Service ◽

Data Services ◽

Distribution Rule ◽

Hierarchical Storage

The strategy of hierarchical storage can be adjusted utilizing the access rule of the spatial data, which will significantly improve system performance of spatial data services. The access and distribution rule of the spatial data based on Hotmap and Zipf-like cannot reflect its global information. A dynamic statistics algorithm for the distribution rule of the spatial data based on P2P is proposed in this paper. The service capabilities of the service nodes are calculated in our algorithm. The node agents with good service capabilities are chosen preferentially in the group. At the same time, the size of group is controlled. The experimental results show that the performance of our algorithm can be improved by about 28% compared with the algorithm of random nodes. The algorithm can meet the need of dynamic statistics in large scale distribution environment with high efficiency.

Download Full-text

Distributed Computing Software and Data Access Patterns in OSG Midscale Collaborations

EPJ Web of Conferences ◽

10.1051/epjconf/202024503005 ◽

2020 ◽

Vol 245 ◽

pp. 03005

Author(s):

Pascal Paschos ◽

Benedikt Riedel ◽

Mats Rynge ◽

Lincoln Bryant ◽

Judith Stephen ◽

...

Keyword(s):

Distributed Computing ◽

Large Scale ◽

Data Access ◽

Open Science ◽

Distributed Resources ◽

Support Teams ◽

Institutional Researchers ◽

Data Access Patterns ◽

Access Patterns ◽

Open Science Grid

In this paper we showcase the support in Open Science Grid (OSG) of Midscale collaborations, the region of computing and storage scale where multi-institutional researchers collaborate to execute their science workflows on the grid without having dedicated technical support teams of their own. Collaboration Services enables such collaborations to take advantage of the distributed resources of the Open Science Grid by facilitating access to submission hosts, the deployment of their applications and supporting their data management requirements. Distributed computing software adopted from large scale collaborations, such as CVMFS, Rucio, xCache lower the barrier of intermediate scale research to integrate with existing infrastructure.

Download Full-text

Monitoring data access patterns in large-scale rendering

ACM SIGGRAPH 2014 Talks on - SIGGRAPH '14 ◽

10.1145/2614106.2614111 ◽

2014 ◽

Author(s):

Mark Hills ◽

Jim Vanns

Keyword(s):

Large Scale ◽

Data Access ◽

Monitoring Data ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

Profiling Dynamic Data Access Patterns with Controlled Overhead and Quality

Proceedings of the 20th International Middleware Conference Industrial Track ◽

10.1145/3366626.3368125 ◽

2019 ◽

Author(s):

SeongJae Park ◽

Yunjae Lee ◽

Heon Y. Yeom

Keyword(s):

Data Access ◽

Dynamic Data ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

EdgeDIPN

Proceedings of the VLDB Endowment ◽

10.14778/3430915.3430922 ◽

2020 ◽

Vol 14 (3) ◽

pp. 320-328

Author(s):

Long Guo ◽

Lifeng Hua ◽

Rongfei Jia ◽

Fei Fang ◽

Binqiang Zhao ◽

...

Keyword(s):

Real Time ◽

Large Scale ◽

Attention Mechanism ◽

Data Sources ◽

User Intent ◽

Multiple User ◽

Shopping Experience ◽

Data Source ◽

Intent Prediction ◽

The Right

With the rapid growth of e-commerce in recent years, e-commerce platforms are becoming a primary place for people to find, compare and ultimately purchase products. To improve online shopping experience for consumers and increase sales for sellers, it is important to understand user intent accurately and be notified of its change timely. In this way, the right information could be offered to the right person at the right time. To achieve this goal, we propose a unified deep intent prediction network, named EdgeDIPN, which is deployed at the edge, i.e., mobile device, and able to monitor multiple user intent with different granularity simultaneously in real-time. We propose to train EdgeDIPN with multi-task learning, by which EdgeDIPN can share representations between different tasks for better performance and saving edge resources in the meantime. In particular, we propose a novel task-specific attention mechanism which enables different tasks to pick out the most relevant features from different data sources. To extract the shared representations more effectively, we utilize two kinds of attention mechanisms, where the multi-level attention mechanism tries to identify the important actions within each data source and the inter-view attention mechanism learns the interactions between different data sources. In the experiments conducted on a large-scale industrial dataset, EdgeDIPN significantly outperforms the baseline solutions. Moreover, EdgeDIPN has been deployed in the operational system of Alibaba. Online A/B testing results in several business scenarios reveal the potential of monitoring user intent in real-time. To the best of our knowledge, EdgeDIPN is the first full-fledged real-time user intent understanding center deployed at the edge and serving hundreds of millions of users in a large-scale e-commerce platform.

Download Full-text

Issues in storage and retrieval of multimedia data

Multimedia Systems ◽

10.1007/bf01832145 ◽

1995 ◽

Vol 3 (5-6) ◽

pp. 298-304 ◽

Cited By ~ 21

Author(s):

T. L. Kunii ◽

Y. Shinagawa ◽

R. M. Paul ◽

M. F. Khan ◽

A. A. Khokhar

Keyword(s):

Multimedia Data ◽

Storage And Retrieval

Download Full-text

Manycore Performance-Portability: Kokkos Multidimensional Array Library

Scientific Programming ◽

10.1155/2012/917630 ◽

2012 ◽

Vol 20 (2) ◽

pp. 89-114 ◽

Cited By ~ 13

Author(s):

H. Carter Edwards ◽

Daniel Sunderland ◽

Vicki Porter ◽

Chris Amsler ◽

Sam Mish

Keyword(s):

Programming Model ◽

Engineering Application ◽

Data Access ◽

Memory Space ◽

Performance Requirements ◽

Application Programming ◽

Multidimensional Array ◽

And Performance ◽

Data Access Patterns ◽

Access Patterns

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].

Download Full-text

Temporal Activity Path Based Character Correction in Heterogeneous Social Networks via Multimedia Sources

Advances in Multimedia ◽

10.1155/2018/2058670 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16

Author(s):

Jun Long ◽

Lei Zhu ◽

Zhan Yang ◽

Chengyuan Zhang ◽

Xinpan Yuan

Keyword(s):

Social Networks ◽

Social Network ◽

Large Scale ◽

Multimedia Data ◽

Attribute Information ◽

Novel Method ◽

Relationship Network ◽

Heterogeneous Social Networks ◽

The Relationship ◽

Over Time

Vast amount of multimedia data contains massive and multifarious social information which is used to construct large-scale social networks. In a complex social network, a character should be ideally denoted by one and only one vertex. However, it is pervasive that a character is denoted by two or more vertices with different names; thus it is usually considered as multiple, different characters. This problem causes incorrectness of results in network analysis and mining. The factual challenge is that character uniqueness is hard to correctly confirm due to lots of complicated factors, for example, name changing and anonymization, leading to character duplication. Early, limited research has shown that previous methods depended overly upon supplementary attribute information from databases. In this paper, we propose a novel method to merge the character vertices which refer to the same entity but are denoted with different names. With this method, we firstly build the relationship network among characters based on records of social activities participating, which are extracted from multimedia sources. Then we define temporal activity paths (TAPs) for each character over time. After that, we measure similarity of the TAPs for any two characters. If the similarity is high enough, the two vertices should be considered as the same character. Based on TAPs, we can determine whether to merge the two character vertices. Our experiments showed that this solution can accurately confirm character uniqueness in large-scale social network.

Download Full-text