A Survey on Automated Log Analysis for Reliability Engineering

Traditional software engineering methodologies have mostly evolved from the environment of proprietary, large-scale software systems. Here, software design principles operate within a hierarchical decision- making context. Development of banking, enterprise resource and complex weapons systems all fit this paradigm. However, another paradigm for developing software-intensive systems has emerged, the paradigm of open source software. Although from a traditional perspective open source projects might look like chaos, their real-world results have been spectacular. This chapter presents open source software development as a fundamentally new paradigm driven by economics and facilitated by new processes. The new paradigm’s revolutionary aspects are explored, a framework for describing the massive impact brought about by the new paradigm is proposed, and directions of future research are outlined. The proposed framework’s goals are to help the understanding of the open source paradigm as a new economic revolution and stimulate research in designing open source software.

Download Full-text

Topology optimization using PETSc: a Python wrapper and extended functionality

Structural and Multidisciplinary Optimization ◽

10.1007/s00158-021-03018-7 ◽

2021 ◽

Author(s):

Thijs Smit ◽

Niels Aage ◽

Stephen J. Ferguson ◽

Benedikt Helgason

Keyword(s):

Topology Optimization ◽

Open Source ◽

Cantilever Beam ◽

Real World ◽

Large Scale ◽

Source Code ◽

Problem Definition ◽

Potential User ◽

Local Volume ◽

Optimization Framework

AbstractThis paper presents a Python wrapper and extended functionality of the parallel topology optimization framework introduced by Aage et al. (Topology optimization using PETSc: an easy-to-use, fully parallel, open source topology optimization framework. Struct Multidiscip Optim 51(3):565–572, 2015). The Python interface, which simplifies the problem definition, is intended to expand the potential user base and to ease the use of large-scale topology optimization for educational purposes. The functionality of the topology optimization framework is extended to include passive domains and local volume constraints among others, which contributes to its usability to real-world design applications. The functionality is demonstrated via the cantilever beam, bracket and torsion ball examples. Several tests are provided which can be used to verify the proper installation and for evaluating the performance of the user’s system setup. The open-source code is available at https://github.com/thsmit/, repository $$\texttt {TopOpt\_in\_PETSc\_wrapped\_in\_Python}$$ TopOpt _ in _ PETSc _ wrapped _ in _ Python .

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Research Anthology on Usage and Development of Open Source Software ◽

10.4018/978-1-7998-9158-1.ch038 ◽

2021 ◽

pp. 733-761

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

Extracting and studying the Logging-Code-Issue- Introducing changes in Java-based large-scale open source software systems

Empirical Software Engineering ◽

10.1007/s10664-019-09690-0 ◽

2019 ◽

Vol 24 (4) ◽

pp. 2285-2322

Author(s):

Boyuan Chen ◽

Zhen Ming Jiang

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Software Systems

Download Full-text

A large-scale study of architectural evolution in open-source software systems

Empirical Software Engineering ◽

10.1007/s10664-016-9466-0 ◽

2016 ◽

Vol 22 (3) ◽

pp. 1146-1193 ◽

Cited By ~ 13

Author(s):

Pooyan Behnamghader ◽

Duc Minh Le ◽

Joshua Garcia ◽

Daniel Link ◽

Arman Shahbazian ◽

...

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Software Systems ◽

Large Scale Study

Download Full-text

DebCheck: Efficient Checking for Open Source Code Clones in Software Systems

2011 IEEE 19th International Conference on Program Comprehension ◽

10.1109/icpc.2011.27 ◽

2011 ◽

Cited By ~ 7

Author(s):

James R. Cordy ◽

Chanchal K. Roy

Keyword(s):

Open Source ◽

Source Code ◽

Software Systems ◽

Code Clones ◽

Open Source Code

Download Full-text

Mind Your Outcomes: Quality-Centric Systems Development

10.20944/preprints202112.0132.v1 ◽

2021 ◽

Author(s):

Seyed Hossein HAERI ◽

Peter Thompson ◽

Neil Davies ◽

Peter Van Roy ◽

Kevin Hammond ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Critical Issue ◽

Systems Development ◽

Software Systems ◽

Design Decisions ◽

Detailed Design ◽

Distributed Software ◽

Significant Time ◽

Design And Implementation

This paper directly addresses a critical issue that affects the development of many complex distributed software systems: how to establish quickly, cheaply and reliably whether they will deliver their intended performance before expending significant time, effort and money on detailed design and implementation. We describe ΔQSD, a novel metrics-based and quality-centric paradigm that uses formalised outcome diagrams to explore the performance consequences of design decisions, as a performance blueprint of the system. The ΔQSD paradigm is both effective and generic: it allows values from various sources to be combined in a rigorous way, so that approximate results can be obtained quickly and subsequently refined. ΔQSD has been successfully used by Predictable Network Solutions for consultancy on large-scale applications in a number of industries, including telecommunications, avionics, and space and defence, resulting in cumulative savings of $Bs. The paper outlines the ΔQSD paradigm, describes its formal underpinnings, and illustrates its use via a topical real-world example taken from the blockchain/cryptocurrency domain, where application of this approach enabled an advanced distributed proof-of-stake system to meet challenging throughput targets.

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Optimizing Contemporary Application and Processes in Open Source Software ◽

10.4018/978-1-5225-5314-4.ch003 ◽

2018 ◽

pp. 57-85

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

Managing Resource Allocation and Task Prioritization Decisions in Large Scale Virtual Collaborative Development Projects

Information Resources Management Journal ◽

10.4018/irmj.2010102604 ◽

2010 ◽

Vol 23 (2) ◽

pp. 53-76 ◽

Cited By ~ 4

Author(s):

Sharif H. Melouk ◽

Uzma Raja ◽

Burcu B. Keskin

Keyword(s):

Resource Allocation ◽

Open Source ◽

Large Scale ◽

Development Project ◽

Operational Performance ◽

Software Systems ◽

Software System ◽

Collaborative Development ◽

Task Prioritization ◽

And Task

The authors use a simulation approach to determine effective management of resource allocation and task prioritization decisions for the development of open source enterprise solutions software in the context of a large scale collaborative development project (CDP). Unlike traditional software systems where users have limited access to the development team, in open source environments, the resolution of issues is a collaborative effort among users and the team. However, as the project grows in size, complexity, and usage, effective allocation of resources and prioritization of tasks become a necessity to improve the operational performance of the software system. In this paper, by mining an open source software repository, the authors analyze the effects of collaborative issue resolution in a CDP and its effects on resource allocation of the team developers. This article examines several scenarios to evaluate the effects of forum discussions, resource allocation, and task prioritization on operational performance of the software system.

Download Full-text

Designing production-friendly machine learning

Proceedings of the VLDB Endowment ◽

10.14778/3484224.3484241 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3420-3420

Author(s):

Matei Zaharia

Keyword(s):

Machine Learning ◽

Open Source ◽

Large Scale ◽

Question Answering ◽

Failure Modes ◽

Computational Cost ◽

Language Models ◽

Software Systems ◽

Resource Cost ◽

Low Computational Cost

Building production ML applications is difficult because of their resource cost and complex failure modes. I will discuss these challenges from two perspectives: the Stanford DAWN Lab and experience with large-scale commercial ML users at Databricks. I will then present two emerging ideas to help address these challenges. The first is "ML platforms", an emerging class of software systems that standardize the interfaces used in ML applications to make them easier to build and maintain. I will give a few examples, including the open-source MLflow system from Databricks [3]. The second idea is models that are more "production-friendly" by design. As a concrete example, I will discuss retrieval-based NLP models such as Stanford's ColBERT [1, 2] that query documents from an updateable corpus to perform tasks such as question-answering, which gives multiple practical advantages, including low computational cost, high interpretability, and very fast updates to the model's "knowledge". These models are an exciting alternative to large language models such as GPT-3.

Download Full-text