scholarly journals Towards a Holistic Approach to Fault Management

Author(s):  
Moises Goldszmidt ◽  
Miroslaw Malek ◽  
Simin Nadjm-Tehrani ◽  
Priya Narasimhan ◽  
Felix Salfner ◽  
...  

Systems with high dependability requirements are increasingly relying on complex on-line fault management systems. Such fault management systems involve a combination of multiple steps – monitoring, data analysis, planning, and execution – that are typically independently developed and optimized. We argue that it is inefficient and ineffective to improve any particular fault management step without taking into account its interactions and dependencies with the rest of the steps. Through six real-life examples, we demonstrate this inefficiency and how it results in systems that either under-perform or are over-budget. We propose a holistic approach to fault management that is aware of all relevant aspects, and explicitly considers the couplings between the different fault management steps. We believe it will produce systems that will better meet cost, performance, and dependability objectives.

2008 ◽  
Vol 16 (2-3) ◽  
pp. 205-216
Author(s):  
Bartosz Balis ◽  
Marian Bubak ◽  
Bartłomiej Łabno

Scientific workflows are a means of conducting in silico experiments in modern computing infrastructures for e-Science, often built on top of Grids. Monitoring of Grid scientific workflows is essential not only for performance analysis but also to collect provenance data and gather feedback useful in future decisions, e.g., related to optimization of resource usage. In this paper, basic problems related to monitoring of Grid scientific workflows are discussed. Being highly distributed, loosely coupled in space and time, heterogeneous, and heavily using legacy codes, workflows are exceptionally challenging from the monitoring point of view. We propose a Grid monitoring architecture for scientific workflows. Monitoring data correlation problem is described and an algorithm for on-line distributed collection of monitoring data is proposed. We demonstrate a prototype implementation of the proposed workflow monitoring architecture, the GEMINI monitoring system, and its use for monitoring of a real-life scientific workflow.


Author(s):  
M.I. Martyshov ◽  
D.A. Nikitenko

HPC systems are complex in architecture and contain millions of components. To ensure reliable operation and efficient output, functioning of most subsystems should be supervised. This is done on the basis of collected data from various logging and monitoring systems. This means that different data sources are used, and accordingly, data analysis can face multiple issues processing this data. Some of the data subsets can be incorrect due to the malfunctioning of used sensors, monitoring system data aggregation errors, etc. This is why it is crucial to preprocess such monitoring data before analyzing it, taking into the consideration the analysis goals. The aim of this paper is, being based on the MSU HPC Center monitoring data, to propose an approach to data preprocessing of HPC monitoring systems, giving some real life examples of issues that may be faced, and recommendations for further analysis of similar datasets. Высокопроизводительные вычислительные системы сложны по архитектуре и содержат миллионы компонент. Чтобы обеспечить надежную работу и эффективную отдачу, необходимо контролировать работу всех их подсистем. Это делается на основе данных, собранных различными системами журналирования и мониторинга. Это означает, что используются разные источники данных, и, соответственно, анализ данных может столкнуться с множеством проблем, связанных с обработкой этих данных. Некоторые из подмножеств данных могут быть неверными из-за неисправности используемых датчиков, ошибок агрегирования данных системы мониторинга и т.д. Вот почему крайне важно проводить предварительную обработку таких данных мониторинга перед их анализом, принимая во внимание цели анализа. Цель этой работы, описать подход к предварительной обработке данных суперкомпьютерных систем мониторинга на основе опыта работы СКЦ МГУ, привести некоторые реальные примеры проблем, с которыми можно при этом столкнуться, а также рекомендации по дальнейшему анализу подобных наборов данных.


1991 ◽  
Vol 30 (01) ◽  
pp. 53-64 ◽  
Author(s):  
R. Schosser ◽  
C. Weiss ◽  
K. Messmer

This report focusses on the planning and realization of an interdisciplinary local area network (LAN) for medical research at the University of Heidelberg. After a detailed requirements analysis, several networks were evaluated by means of a test installation, and a cost-performance analysis was carried out. At present, the LAN connects 45 (IBM-compatible) PCs, several heterogeneous mainframes (IBM, DEC and Siemens) and provides access to the public X.25 network and to wide-area networks for research (EARN, BITNET). The network supports application software that is frequently needed in medical research (word processing, statistics, graphics, literature databases and services, etc.). Compliance with existing “official” (e.g., IEEE 802.3) and “de facto” standards (e.g., PostScript) was considered to be extremely important for the selection of both hardware and software. Customized programs were developed to improve access control, user interface and on-line help. Wide acceptance of the LAN was achieved through extensive education and maintenance facilities, e.g., teaching courses, customized manuals and a hotline service. Since requirements of clinical routine differ substantially from medical research needs, two separate networks (with a gateway in between) are proposed as a solution to optimally satisfy the users’ demands.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 405
Author(s):  
Marcos Lupión ◽  
Javier Medina-Quero ◽  
Juan F. Sanjuan ◽  
Pilar M. Ortigosa

Activity Recognition (AR) is an active research topic focused on detecting human actions and behaviours in smart environments. In this work, we present the on-line activity recognition platform DOLARS (Distributed On-line Activity Recognition System) where data from heterogeneous sensors are evaluated in real time, including binary, wearable and location sensors. Different descriptors and metrics from the heterogeneous sensor data are integrated in a common feature vector whose extraction is developed by a sliding window approach under real-time conditions. DOLARS provides a distributed architecture where: (i) stages for processing data in AR are deployed in distributed nodes, (ii) temporal cache modules compute metrics which aggregate sensor data for computing feature vectors in an efficient way; (iii) publish-subscribe models are integrated both to spread data from sensors and orchestrate the nodes (communication and replication) for computing AR and (iv) machine learning algorithms are used to classify and recognize the activities. A successful case study of daily activities recognition developed in the Smart Lab of The University of Almería (UAL) is presented in this paper. Results present an encouraging performance in recognition of sequences of activities and show the need for distributed architectures to achieve real time recognition.


Author(s):  
Sebastiaan A. Pronk ◽  
Simone L. Gorter ◽  
Scheltus J. van Luijk ◽  
Pieter C. Barnhoorn ◽  
Beer Binkhorst ◽  
...  

Abstract Introduction Behaviour is visible in real-life events, but also on social media. While some national medical organizations have published social media guidelines, the number of studies on professional social media use in medical education is limited. This study aims to explore social media use among medical students, residents and medical specialists. Methods An anonymous, online survey was sent to 3844 medical students at two Dutch medical schools, 828 residents and 426 medical specialists. Quantitative, descriptive data analysis regarding demographic data, yes/no questions and Likert scale questions were performed using SPSS. Qualitative data analysis was performed iteratively, independently by two researchers applying the principles of constant comparison, open and axial coding until consensus was reached. Results Overall response rate was 24.8%. Facebook was most popular among medical students and residents; LinkedIn was most popular among medical specialists. Personal pictures and/or information about themselves on social media that were perceived as unprofessional were reported by 31.3% of students, 19.7% of residents and 4.1% of medical specialists. Information and pictures related to alcohol abuse, partying, clinical work or of a sexually suggestive character were considered inappropriate. Addressing colleagues about their unprofessional posts was perceived to be mainly dependent on the nature and hierarchy of the interprofessional relation. Discussion There is a widespread perception that the presence of unprofessional information on social media among the participants and their colleagues is a common occurrence. Medical educators should create awareness of the risks of unprofessional (online) behaviour among healthcare professionals, as well as the necessity and ways of addressing colleagues in case of such lapses.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Ting Qian ◽  
Ling Wei

As an important tool for data analysis and knowledge processing, formal concept analysis (FCA) has been applied to many fields. In this paper, we introduce a new method to find all formal concepts based on formal contexts. The amount of intents calculation is reduced by the method. And the corresponding algorithm of our approach is proposed. The main theorems and the corresponding algorithm are examined by examples, respectively. At last, several real-life databases are analyzed to demonstrate the application of the proposed approach. Experimental results show that the proposed approach is simple and effective.


Sign in / Sign up

Export Citation Format

Share Document