data loading
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 20)

H-INDEX

5
(FIVE YEARS 0)

Author(s):  
Guixiang Lv ◽  
Liudong Xing

During the coronavirus pandemic, telecommuting is widely required, making remote data access grow significantly. This requires highly reliable data storage solutions. Storage area networks (SANs) are one of such solutions. To guarantee that SANs can deliver the desired quality of service, cascading failures must be prevented, which occur when a single initial incident triggers a cascade of unexpected failures of other devices. One such incident is the data loading/overloading, causing the malfunction of one device and further cascading failures. Thus, it is crucial to address influence of data loading on the SAN reliability modeling and analysis. In this work, we make contributions by modeling the effects of data loading on the reliability of an individual switch device in SANs though the proportional-hazards model and accelerated failure-time model. Effects of loading on the reliability of the entire SAN are further investigated through dynamic fault trees and binary decision diagrams-based analysis of a mesh SAN system.


2021 ◽  
pp. 1005-1099
Author(s):  
Darl Kuhn ◽  
Thomas Kyte

Author(s):  
Cong Ding ◽  
Dixin Tang ◽  
Xi Liang ◽  
Aaron J. Elmore ◽  
Sanjay Krishnan

2021 ◽  
Vol 14 (5) ◽  
pp. 771-784
Author(s):  
Jayashree Mohan ◽  
Amar Phanishayee ◽  
Ashish Raniwala ◽  
Vijay Chidambaram

Training Deep Neural Networks (DNNs) is resource-intensive and time-consuming. While prior research has explored many different ways of reducing DNN training time, the impact of input data pipeline , i.e., fetching raw data items from storage and performing data pre-processing in memory, has been relatively unexplored. This paper makes the following contributions: (1) We present the first comprehensive analysis of how the input data pipeline affects the training time of widely-used computer vision and audio Deep Neural Networks (DNNs), that typically involve complex data pre-processing. We analyze nine different models across three tasks and four datasets while varying factors such as the amount of memory, number of CPU threads, storage device, GPU generation etc on servers that are a part of a large production cluster at Microsoft. We find that in many cases, DNN training time is dominated by data stall time : time spent waiting for data to be fetched and pre-processed. (2) We build a tool, DS-Analyzer to precisely measure data stalls using a differential technique, and perform predictive what-if analysis on data stalls. (3) Finally, based on the insights from our analysis, we design and implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls. Our experiments on a range of DNN tasks, models, datasets, and hardware configs show that when PyTorch uses CoorDL instead of the state-of-the-art DALI data loading library, DNN training time is reduced significantly (by as much as 5X on a single server).


2021 ◽  
pp. 0724-0731
Author(s):  
Alexander Suleykin ◽  
Anna Bobkova ◽  
Peter Panfilov ◽  
Ilya Chumakov
Keyword(s):  

2020 ◽  
pp. 71-86
Author(s):  
Soledad Araya ◽  
Andrés Cruz
Keyword(s):  

10.2196/15918 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e15918
Author(s):  
Helmut Spengler ◽  
Claudia Lang ◽  
Tanmaya Mahapatra ◽  
Ingrid Gatz ◽  
Klaus A Kuhn ◽  
...  

Background Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are important infrastructure components that provide users with unified access to the large heterogeneous data sets needed to realize this and support use cases such as cohort selection, hypothesis generation, and ad hoc data analysis. Objective Often, different warehousing platforms are needed to support different use cases and different types of data. Moreover, to achieve an optimal data representation within the target systems, specific domain knowledge is needed when designing data-loading processes. Consequently, informaticians need to work closely with clinicians and researchers in short iterations. This is a challenging task as installing and maintaining warehousing platforms can be complex and time consuming. Furthermore, data loading typically requires significant effort in terms of data preprocessing, cleansing, and restructuring. The platform described in this study aims to address these challenges. Methods We formulated system requirements to achieve agility in terms of platform management and data loading. The derived system architecture includes a cloud infrastructure with unified management interfaces for multiple warehouse platforms and a data-loading pipeline with a declarative configuration paradigm and meta-loading approach. The latter compiles data and configuration files into forms required by existing loading tools, thereby automating a wide range of data restructuring and cleansing tasks. We demonstrated the fulfillment of the requirements and the originality of our approach by an experimental evaluation and a comparison with previous work. Results The platform supports both i2b2 and tranSMART with built-in security. Our experiments showed that the loading pipeline accepts input data that cannot be loaded with existing tools without preprocessing. Moreover, it lowered efforts significantly, reducing the size of configuration files required by factors of up to 22 for tranSMART and 1135 for i2b2. The time required to perform the compilation process was roughly equivalent to the time required for actual data loading. Comparison with other tools showed that our solution was the only tool fulfilling all requirements. Conclusions Our platform significantly reduces the efforts required for managing clinical and translational warehouses and for loading data in various formats and structures, such as complex entity-attribute-value structures often found in laboratory data. Moreover, it facilitates the iterative refinement of data representations in the target platforms, as the required configuration files are very compact. The quantitative measurements presented are consistent with our experiences of significantly reduced efforts for building warehousing platforms in close cooperation with medical researchers. Both the cloud-based hosting infrastructure and the data-loading pipeline are available to the community as open source software with comprehensive documentation.


Sign in / Sign up

Export Citation Format

Share Document