scholarly journals The unreasonable effectiveness of traditional information retrieval in crash report deduplication

Author(s):  
Joshua Charles Campbell ◽  
Eddie Antonio Santos ◽  
Abram Hindle

Organizations like Mozilla, Microsoft, and Apple are flooded with thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examine individually. Therefore, in industry, crash reports are often automatically grouped together in buckets. Ubuntu’s repository contains crashes from hundreds of software systems available with Ubuntu. A variety of crash report bucketing methods are evaluated using data collected by Ubuntu’s Apport automated crash reporting system. The trade-off between precision and recall of numerous scalable crash 7 deduplication techniques is explored. A set of criteria that a crash deduplication method must meet is presented and several methods that meet these criteria are evaluated on a new dataset. The evaluations presented in this paper show that using off-the-shelf information retrieval techniques, that were not designed to be used with crash reports, outperform other techniques which are specifically designed for the task of crash bucketing at realistic industrial scales. This research indicates that automated crash bucketing still has a lot of room for improvement, especially in terms of identifier tokenization.

2016 ◽  
Author(s):  
Joshua Charles Campbell ◽  
Eddie Antonio Santos ◽  
Abram Hindle

Organizations like Mozilla, Microsoft, and Apple are flooded with thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examine individually. Therefore, in industry, crash reports are often automatically grouped together in buckets. Ubuntu’s repository contains crashes from hundreds of software systems available with Ubuntu. A variety of crash report bucketing methods are evaluated using data collected by Ubuntu’s Apport automated crash reporting system. The trade-off between precision and recall of numerous scalable crash 7 deduplication techniques is explored. A set of criteria that a crash deduplication method must meet is presented and several methods that meet these criteria are evaluated on a new dataset. The evaluations presented in this paper show that using off-the-shelf information retrieval techniques, that were not designed to be used with crash reports, outperform other techniques which are specifically designed for the task of crash bucketing at realistic industrial scales. This research indicates that automated crash bucketing still has a lot of room for improvement, especially in terms of identifier tokenization.


Author(s):  
Henk Ernst Blok ◽  
Djoerd Hiemstra ◽  
Sunil Choenni ◽  
Franciska de Jong ◽  
Henk M. Blanken ◽  
...  

2018 ◽  
Vol 51 (3) ◽  
pp. 598-616 ◽  
Author(s):  
Jaewoo Cho ◽  
Jae Hong Kim ◽  
Yonsu Kim

While much scholarly attention has been paid to ways in which metropolitan areas are politically structured and operated to achieve a dual goal, economic growth, and equality, relatively less is known about the complex relationship between metropolitan governance structures and growth–inequality dynamics. This study investigates how and to what extent metropolitan governance structures shape regional economic growth and inequality trajectories using data for 267 US metropolitan areas from 1990 to 2010. Findings from a two-stage least squares regression analysis suggest that economic growth is associated with governance structures in a nonlinear fashion, with relatively more rapid growth rates in both highly centralized and decentralized metropolitan areas. However, these regions are also found to experience a larger increase in income inequality, indicating an important trade-off to be considered carefully in exploring ways to reform existing governance settings. These findings further suggest that the so-called growth–inequality trade-off may exist not only in their direct interactions but through their connections via governance or other variables.


2017 ◽  
Vol 62 (1) ◽  
Author(s):  
Erica Yookyung Lee ◽  
Aisling R. Caffrey

ABSTRACT Several studies have suggested the risk of thrombocytopenia with tedizolid, a second-in-class oxazolidinone antibiotic (approved June 2014), is less than that observed with linezolid (first-in-class oxazolidinone). Using data from the Food and Drug Administration adverse event reporting system (July 2014 through December 2016), we observed significantly increased risks of thrombocytopenia of similar magnitudes with both antibiotics: linezolid reporting odds ratio [ROR], 37.9 (95% confidence interval [CI], 20.78 to 69.17); tedizolid ROR, 34.0 (95% CI, 4.67 to 247.30).


Author(s):  
Lerina Aversano ◽  
Carmine Grasso ◽  
Maria Tortorella

The evaluation of the alignment level existing between a business process and the supporting software systems is a critical concern for an organization, as the higher the alignment level is, the better the process performance is. Monitoring the alignment implies the characterization of all the items it involves and definition of measures for evaluating it. This is a complex task, and the availability of automatic tools for supporting evaluation and evolution activities may be precious. This chapter presents the ALBIS Environment (Aligning Business Processes and Information Systems), designed to support software maintenance tasks. In particular, the proposed environment allows the modeling and tracing between business and software entities and the measurement of their alignment degree. An information retrieval approach is embedded in ALBIS based on two processing phases including syntactic and semantic analysis. The usefulness of the environment is discussed through two case studies.


Author(s):  
Andreas Bolfing

Chapter 5 considers distributed systems by their properties. The first section studies the classification of software systems, which is usually distinguished in centralized, decentralized and distributed systems. It studies the differences between these three major approaches, showing there is a rather multidimensional classification instead of a linear one. The most important case are distributed systems that enable spreading of computational tasks across several autonomous, independently acting computational entities. A very important result of this case is the CAP theorem that considers the trade-off between consistency, availability and partition tolerance. The last section deals with the possibility to reach consensus in distributed systems, discussing how fault tolerant consensus mechanisms enable mutual agreement among the individual entities in presence of failures. One very special case are so-called Byzantine failures that are discussed in great detail. The main result is the so-called FLP Impossibility Result which states that there is no deterministic algorithm that guarantees solution to the consensus problem in the asynchronous case. The chapter concludes by considering practical solutions that circumvent the impossibility result in order to reach consensus.


2007 ◽  
Vol 41 (5) ◽  
pp. 633-643 ◽  
Author(s):  
Alan M. Hochberg ◽  
Stephanie J. Reisinger ◽  
Ronald K. Pearson ◽  
Donald J. O’Hara ◽  
Kevin Hall

Sign in / Sign up

Export Citation Format

Share Document