scholarly journals A Survey on Automated Log Analysis for Reliability Engineering

2021 ◽  
Vol 54 (6) ◽  
pp. 1-37
Author(s):  
Shilin He ◽  
Pinjia He ◽  
Zhuangbin Chen ◽  
Tianyi Yang ◽  
Yuxin Su ◽  
...  

Logs are semi-structured text generated by logging statements in software source code. In recent decades, software logs have become imperative in the reliability assurance mechanism of many software systems, because they are often the only data available that record software runtime information. As modern software is evolving into a large scale, the volume of logs has increased rapidly. To enable effective and efficient usage of modern software logs in reliability engineering, a number of studies have been conducted on automated log analysis. This survey presents a detailed overview of automated log analysis research, including how to automate and assist the writing of logging statements, how to compress logs, how to parse logs into structured event templates, and how to employ logs to detect anomalies, predict failures, and facilitate diagnosis. Additionally, we survey work that releases open-source toolkits and datasets. Based on the discussion of the recent advances, we present several promising future directions toward real-world and next-generation automated log analysis.

Author(s):  
Jeff Elpern ◽  
Sergiu Dascalu

Traditional software engineering methodologies have mostly evolved from the environment of proprietary, large-scale software systems. Here, software design principles operate within a hierarchical decision- making context. Development of banking, enterprise resource and complex weapons systems all fit this paradigm. However, another paradigm for developing software-intensive systems has emerged, the paradigm of open source software. Although from a traditional perspective open source projects might look like chaos, their real-world results have been spectacular. This chapter presents open source software development as a fundamentally new paradigm driven by economics and facilitated by new processes. The new paradigm’s revolutionary aspects are explored, a framework for describing the massive impact brought about by the new paradigm is proposed, and directions of future research are outlined. The proposed framework’s goals are to help the understanding of the open source paradigm as a new economic revolution and stimulate research in designing open source software.


Author(s):  
Thijs Smit ◽  
Niels Aage ◽  
Stephen J. Ferguson ◽  
Benedikt Helgason

AbstractThis paper presents a Python wrapper and extended functionality of the parallel topology optimization framework introduced by Aage et al. (Topology optimization using PETSc: an easy-to-use, fully parallel, open source topology optimization framework. Struct Multidiscip Optim 51(3):565–572, 2015). The Python interface, which simplifies the problem definition, is intended to expand the potential user base and to ease the use of large-scale topology optimization for educational purposes. The functionality of the topology optimization framework is extended to include passive domains and local volume constraints among others, which contributes to its usability to real-world design applications. The functionality is demonstrated via the cantilever beam, bracket and torsion ball examples. Several tests are provided which can be used to verify the proper installation and for evaluating the performance of the user’s system setup. The open-source code is available at https://github.com/thsmit/, repository $$\texttt {TopOpt\_in\_PETSc\_wrapped\_in\_Python}$$ TopOpt _ in _ PETSc _ wrapped _ in _ Python .


Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


2016 ◽  
Vol 22 (3) ◽  
pp. 1146-1193 ◽  
Author(s):  
Pooyan Behnamghader ◽  
Duc Minh Le ◽  
Joshua Garcia ◽  
Daniel Link ◽  
Arman Shahbazian ◽  
...  

Author(s):  
Seyed Hossein HAERI ◽  
Peter Thompson ◽  
Neil Davies ◽  
Peter Van Roy ◽  
Kevin Hammond ◽  
...  

This paper directly addresses a critical issue that affects the development of many complex distributed software systems: how to establish quickly, cheaply and reliably whether they will deliver their intended performance before expending significant time, effort and money on detailed design and implementation. We describe ΔQSD, a novel metrics-based and quality-centric paradigm that uses formalised outcome diagrams to explore the performance consequences of design decisions, as a performance blueprint of the system. The ΔQSD paradigm is both effective and generic: it allows values from various sources to be combined in a rigorous way, so that approximate results can be obtained quickly and subsequently refined. ΔQSD has been successfully used by Predictable Network Solutions for consultancy on large-scale applications in a number of industries, including telecommunications, avionics, and space and defence, resulting in cumulative savings of $Bs. The paper outlines the ΔQSD paradigm, describes its formal underpinnings, and illustrates its use via a topical real-world example taken from the blockchain/cryptocurrency domain, where application of this approach enabled an advanced distributed proof-of-stake system to meet challenging throughput targets.


Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


2010 ◽  
Vol 23 (2) ◽  
pp. 53-76 ◽  
Author(s):  
Sharif H. Melouk ◽  
Uzma Raja ◽  
Burcu B. Keskin

The authors use a simulation approach to determine effective management of resource allocation and task prioritization decisions for the development of open source enterprise solutions software in the context of a large scale collaborative development project (CDP). Unlike traditional software systems where users have limited access to the development team, in open source environments, the resolution of issues is a collaborative effort among users and the team. However, as the project grows in size, complexity, and usage, effective allocation of resources and prioritization of tasks become a necessity to improve the operational performance of the software system. In this paper, by mining an open source software repository, the authors analyze the effects of collaborative issue resolution in a CDP and its effects on resource allocation of the team developers. This article examines several scenarios to evaluate the effects of forum discussions, resource allocation, and task prioritization on operational performance of the software system.


2021 ◽  
Vol 14 (13) ◽  
pp. 3420-3420
Author(s):  
Matei Zaharia

Building production ML applications is difficult because of their resource cost and complex failure modes. I will discuss these challenges from two perspectives: the Stanford DAWN Lab and experience with large-scale commercial ML users at Databricks. I will then present two emerging ideas to help address these challenges. The first is "ML platforms", an emerging class of software systems that standardize the interfaces used in ML applications to make them easier to build and maintain. I will give a few examples, including the open-source MLflow system from Databricks [3]. The second idea is models that are more "production-friendly" by design. As a concrete example, I will discuss retrieval-based NLP models such as Stanford's ColBERT [1, 2] that query documents from an updateable corpus to perform tasks such as question-answering, which gives multiple practical advantages, including low computational cost, high interpretability, and very fast updates to the model's "knowledge". These models are an exciting alternative to large language models such as GPT-3.


Sign in / Sign up

Export Citation Format

Share Document