scholarly journals MINIMIZING LATENCY AND JITTER FOR LARGE-SCALE MULTIMEDIA REPOSITORIES THROUGH PREFIX CACHING

2003 ◽  
Vol 03 (01) ◽  
pp. 95-117 ◽  
Author(s):  
SUNIL PRABHAKAR ◽  
RAHUL CHARI

Multimedia data poses challenges for efficient storage and retrieval due to its large size and playback timing requirements. For applications that store very large volumes of multimedia data, hierarchical storage offers a scalable and economical alternative to store data on magnetic disks. In a hierarchical storage architecture data is stored on a tape or optical disk based tertiary storage layer with the secondary storage disks serving as a cache or buffer. Due to the need for swapping media on drives, retrieving multimedia data from tertiary storage can potentially result in large delays before playback (startup latency) begins as well as during playback (jitter). In this paper we address the important problem of reducing startup latency and jitter for very large multimedia repositories. We propose that secondary storage should not be used as a cache in the traditional manner — instead, most of the secondary storage should be used to permanently store partial objects. Furthermore, replication is employed at the tertiary storage level to avoid expensive media switching. In particular, we show that by saving the initial segments of documents permanently on secondary storage, and replicating them on tertiary storage, startup latency can be significantly reduced. Since we are effectively reducing the amount of secondary storage available for buffering the data from tertiary storage, an increase in jitter may be expected. However, our results show that the technique also reduces jitter, in contrast to the expected behavior. Our technique exploits the pattern of data access. Advance knowledge of the access pattern is helpful, but not essential. Lack of this information or changes in access patterns are handled through adaptive techniques. Our study addresses both single- and multiple-user scenarios. Our results show that startup latency can be reduced by as much as 75% and jitter practically eliminated through the use of these techniques.

2014 ◽  
Vol 1030-1032 ◽  
pp. 1619-1622
Author(s):  
Bing Xin Zhu ◽  
Jing Tao Li

In large-scale storage system, variety of calculations, transfer, and storage devices both in performance and in characteristics such as reliability, there are physical differences. While operational load data access for storage devices is also not uniform, there is a big difference in space and time. If all the data is stored in the high-performance equipment is unrealistic and unwise. Hierarchical storage concept effectively solves this problem. It is able to monitor the data access loads, and depending on the load and application requirements based on storage resources optimally configure properties [1]. Traditional classification policy is generally against file data, based on frequency of access to files, file IO heat index for classification. This paper embarks from the website user value concept, aiming at the disadvantages of traditional data classification strategy, puts forward the centralized data classification strategy based on user value.


2018 ◽  
Vol 12 (2) ◽  
pp. 157-176
Author(s):  
Claudia Yogeswaran ◽  
Kearsy Cormier

In this paper we provide a case study of the creation of the DCAL Research Data Archive at University College London. In doing so, we assess the various challenges associated with archiving large-scale legacy multimedia research data, given the lack of literature on archiving such datasets. We address issues such as the anonymisation of video research data, the ethical challenges of managing legacy data and historic consent, ownership considerations, the handling of large-size multimedia data, as well as the complexity of multi-project data from a number of researchers and legacy data from eleven years of research.


2013 ◽  
Vol 380-384 ◽  
pp. 1995-1998
Author(s):  
Shao Ming Pan ◽  
Hong Li ◽  
Ge Tang

The strategy of hierarchical storage can be adjusted utilizing the access rule of the spatial data, which will significantly improve system performance of spatial data services. The access and distribution rule of the spatial data based on Hotmap and Zipf-like cannot reflect its global information. A dynamic statistics algorithm for the distribution rule of the spatial data based on P2P is proposed in this paper. The service capabilities of the service nodes are calculated in our algorithm. The node agents with good service capabilities are chosen preferentially in the group. At the same time, the size of group is controlled. The experimental results show that the performance of our algorithm can be improved by about 28% compared with the algorithm of random nodes. The algorithm can meet the need of dynamic statistics in large scale distribution environment with high efficiency.


2020 ◽  
Vol 245 ◽  
pp. 03005
Author(s):  
Pascal Paschos ◽  
Benedikt Riedel ◽  
Mats Rynge ◽  
Lincoln Bryant ◽  
Judith Stephen ◽  
...  

In this paper we showcase the support in Open Science Grid (OSG) of Midscale collaborations, the region of computing and storage scale where multi-institutional researchers collaborate to execute their science workflows on the grid without having dedicated technical support teams of their own. Collaboration Services enables such collaborations to take advantage of the distributed resources of the Open Science Grid by facilitating access to submission hosts, the deployment of their applications and supporting their data management requirements. Distributed computing software adopted from large scale collaborations, such as CVMFS, Rucio, xCache lower the barrier of intermediate scale research to integrate with existing infrastructure.


2020 ◽  
Vol 14 (3) ◽  
pp. 320-328
Author(s):  
Long Guo ◽  
Lifeng Hua ◽  
Rongfei Jia ◽  
Fei Fang ◽  
Binqiang Zhao ◽  
...  

With the rapid growth of e-commerce in recent years, e-commerce platforms are becoming a primary place for people to find, compare and ultimately purchase products. To improve online shopping experience for consumers and increase sales for sellers, it is important to understand user intent accurately and be notified of its change timely. In this way, the right information could be offered to the right person at the right time. To achieve this goal, we propose a unified deep intent prediction network, named EdgeDIPN, which is deployed at the edge, i.e., mobile device, and able to monitor multiple user intent with different granularity simultaneously in real-time. We propose to train EdgeDIPN with multi-task learning, by which EdgeDIPN can share representations between different tasks for better performance and saving edge resources in the meantime. In particular, we propose a novel task-specific attention mechanism which enables different tasks to pick out the most relevant features from different data sources. To extract the shared representations more effectively, we utilize two kinds of attention mechanisms, where the multi-level attention mechanism tries to identify the important actions within each data source and the inter-view attention mechanism learns the interactions between different data sources. In the experiments conducted on a large-scale industrial dataset, EdgeDIPN significantly outperforms the baseline solutions. Moreover, EdgeDIPN has been deployed in the operational system of Alibaba. Online A/B testing results in several business scenarios reveal the potential of monitoring user intent in real-time. To the best of our knowledge, EdgeDIPN is the first full-fledged real-time user intent understanding center deployed at the edge and serving hundreds of millions of users in a large-scale e-commerce platform.


1995 ◽  
Vol 3 (5-6) ◽  
pp. 298-304 ◽  
Author(s):  
T. L. Kunii ◽  
Y. Shinagawa ◽  
R. M. Paul ◽  
M. F. Khan ◽  
A. A. Khokhar

2012 ◽  
Vol 20 (2) ◽  
pp. 89-114 ◽  
Author(s):  
H. Carter Edwards ◽  
Daniel Sunderland ◽  
Vicki Porter ◽  
Chris Amsler ◽  
Sam Mish

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].


2018 ◽  
Vol 2018 ◽  
pp. 1-16
Author(s):  
Jun Long ◽  
Lei Zhu ◽  
Zhan Yang ◽  
Chengyuan Zhang ◽  
Xinpan Yuan

Vast amount of multimedia data contains massive and multifarious social information which is used to construct large-scale social networks. In a complex social network, a character should be ideally denoted by one and only one vertex. However, it is pervasive that a character is denoted by two or more vertices with different names; thus it is usually considered as multiple, different characters. This problem causes incorrectness of results in network analysis and mining. The factual challenge is that character uniqueness is hard to correctly confirm due to lots of complicated factors, for example, name changing and anonymization, leading to character duplication. Early, limited research has shown that previous methods depended overly upon supplementary attribute information from databases. In this paper, we propose a novel method to merge the character vertices which refer to the same entity but are denoted with different names. With this method, we firstly build the relationship network among characters based on records of social activities participating, which are extracted from multimedia sources. Then we define temporal activity paths (TAPs) for each character over time. After that, we measure similarity of the TAPs for any two characters. If the similarity is high enough, the two vertices should be considered as the same character. Based on TAPs, we can determine whether to merge the two character vertices. Our experiments showed that this solution can accurately confirm character uniqueness in large-scale social network.


Sign in / Sign up

Export Citation Format

Share Document