Automatic generation of high quality test sets via CBMC

Software Testing is the most used technique for software verification in industry. In the case of safety critical software, the test set can be required to cover a high percentage (up to 100%) of the software code according to some metrics. Unfortunately, attaining such high percentages is not easy using standard automatic tools for tests generation, and manual generation by domain experts is often necessary, thereby significantly increasing the associated costs.In previous papers, we have shown how it is possible to automatize the test generation process of C programs via the bounded model checker CBMC. In particular, we have shown how it is possible to productively use CBMC for the automatic generation of test sets covering 100% of branches of 5 modules of ERTMS/ETCS, a safety critical industrial software by Ansaldo STS. Unfortunately, the test set we automatically generated, is of lower "quality" if compared to the test set manually generated by domain experts: Both test sets attained the desired 100% branch coverage, but the sizes of the automatically generated test sets are roughly twice the sizes of the corresponding manually generated ones. Indeed, the automatically generated test sets contain redundant tests, i.e. tests that do not contribute to reach the desired 100% branch coverage. These redundant tests are useless from the perspective of the branch coverage, are not easy to detect and then to eliminate a posteriori, and, if maintained, imply additional costs during the verification process.In this paper we present a new methodology for the automatic generation of "high quality" test sets guaranteeing full branch coverage. Given an initially empty test set T, the basic idea is to extend T with a test covering as many as possible of the branches which are not covered by T. This requires an analysis of the control flow graph of the program in order to first individuate a path p with the desired property, and then the run of a tool (CBMC in our case) able to return either a test causing the execution of p or that such a test does not exist (under the given assumptions). We have experimented the methodology on 31 modules of the Ansaldo STS ERTMS/ETCS software, thus greatly extending the benchmarking set. For 27 of the 31 modules we succeeded in our goal to automatically generate "high quality" test sets attaining full branch coverage: All the feasible branches are executed by at least one test and the sizes of our test sets are significantly smaller than the sizes of the test sets manually generated by domain experts (and thus are also significantly smaller than the test sets automatically generated with our previous methodology). However, for 4 modules, we have been unable to automatically generate test sets attaining full branch coverage: These modules contain complex functions falling out of CBMC capacity.Our analysis on 31 modules greatly extends our previous analysis based on 5 modules, confirming that automatic test generation tools based on CBMC can be productively used in industry for attaining full branch coverage. Further, the methodology presented in this paper leads to a further increase in the productivity by substantially reducing the number of generated tests and thus the costs of the testing phase.

Download Full-text

Test Pattern Ordering and Selection for High Quality Test Set under Constraints

IEICE Transactions on Information and Systems ◽

10.1587/transinf.e95.d.3001 ◽

2012 ◽

Vol E95.D (12) ◽

pp. 3001-3009 ◽

Cited By ~ 3

Author(s):

Michiko INOUE ◽

Akira TAKETANI ◽

Tomokazu YONEDA ◽

Hideo FUJIWARA

Keyword(s):

Test Pattern ◽

Quality Test ◽

High Quality ◽

Test Set ◽

Selection For

Download Full-text

Diverse, High-Quality Test Set for the Validation of Protein−Ligand Docking Performance

Journal of Medicinal Chemistry ◽

10.1021/jm061277y ◽

2007 ◽

Vol 50 (4) ◽

pp. 726-741 ◽

Cited By ~ 386

Author(s):

Michael J. Hartshorn ◽

Marcel L. Verdonk ◽

Gianni Chessari ◽

Suzanne C. Brewerton ◽

Wijnand T. M. Mooij ◽

...

Keyword(s):

Ligand Docking ◽

Quality Test ◽

High Quality ◽

Test Set

Download Full-text

Automatic generation of a large dictionary with concreteness/abstractness ratings based on a small human dictionary

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219240 ◽

2021 ◽

pp. 1-9

Author(s):

Vladimir Ivanov ◽

Valery Solovyev

Keyword(s):

State Of The Art ◽

Research Question ◽

Automatic Generation ◽

Expert Assessment ◽

High Quality ◽

Test Set ◽

Expert Assessments ◽

Abstract Words ◽

Expert Ratings

Concrete/abstract words are used in a growing number of psychological and neurophysiological research. For a few languages, large dictionaries have been created manually. This is a very time-consuming and costly process. To generate large high-quality dictionaries of concrete/abstract words automatically one needs extrapolating the expert assessments obtained on smaller samples. The research question that arises is how small such samples should be to do a good enough extrapolation. In this paper, we present a method for automatic ranking concreteness of words and propose an approach to significantly decrease amount of expert assessment. The method has been evaluated on a large test set for English. The quality of the constructed dictionaries is comparable to the expert ones. The correlation between predicted and expert ratings is higher comparing to the state-of-the-art methods.

Download Full-text

Excitation, observation, and ELF-MD: optimization criteria for high quality test sets

22nd IEEE VLSI Test Symposium, 2004. Proceedings. ◽

10.1109/vtest.2004.1299219 ◽

2004 ◽

Cited By ~ 6

Author(s):

J. Dworak ◽

D. Dorsey ◽

A. Wang ◽

M.R. Mercer

Keyword(s):

Quality Test ◽

High Quality ◽

Optimization Criteria ◽

Test Sets

Download Full-text

Feature-Weighted Sampling for Proper Evaluation of Classification Models

Applied Sciences ◽

10.3390/app11052039 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2039

Author(s):

Hyunseok Shin ◽

Sejong Oh

Keyword(s):

Random Sampling ◽

Sampling Method ◽

Classification Model ◽

Training Set ◽

Test Set ◽

Feature Importance ◽

Proper Training ◽

Machine Learning Applications ◽

Test Sets ◽

The Given

In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution–difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.

Download Full-text

Concurrent Core Test Based on Partial Test Set Reusing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.529.359 ◽

2014 ◽

Vol 529 ◽

pp. 359-363

Author(s):

Xi Lei Huang ◽

Mao Xiang Yi ◽

Lin Wang ◽

Hua Guo Liang

Keyword(s):

Minimum Size ◽

Application Time ◽

Logical Circuit ◽

Test Set ◽

Test Cost ◽

Test Application ◽

Sharing Strategy ◽

Data Volume ◽

Set Sharing ◽

Test Sets

A novel concurrent core test approach is proposed to reduce the test cost of SoC. Before test, a novel test set sharing strategy is proposed to obtain a minimum size of merged test set by merging the test sets corresponding to cores under test (CUT).Moreover, it can be used in conjunction with general compression/decompression techniques to further reduce test data volume (TDV). During test, the proposed vector separating device which is composed of a set of simple combinational logical circuit (CLC) is designed for separating the vector from the merged test set to the correspondent test core. This approach does not add any test vector for each core and can test synchronously to reduce test application time (TAT). Experimental results for ISCAS’ 89 benchmarks have been rproven the efficiency of the proposed approach.

Download Full-text

Deep learning to predict subtypes of poorly differentiated lung cancer from biopsy whole slide images.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.8536 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 8536-8536

Author(s):

Gouji Toyokawa ◽

Fahdi Kanavati ◽

Seiya Momosaki ◽

Kengo Tateishi ◽

Hiroaki Takeoka ◽

...

Keyword(s):

Lung Cancer ◽

Deep Learning ◽

Learning Model ◽

Test Set ◽

Cancer Subtypes ◽

Independent Test ◽

Poorly Differentiated ◽

Test Sets ◽

Deep Learning Model ◽

Whole Slide Images

8536 Background: Lung cancer is the leading cause of cancer-related death in many countries, and its prognosis remains unsatisfactory. Since treatment approaches differ substantially based on the subtype, such as adenocarcinoma (ADC), squamous cell carcinoma (SCC) and small cell lung cancer (SCLC), an accurate histopathological diagnosis is of great importance. However, if the specimen is solely composed of poorly differentiated cancer cells, distinguishing between histological subtypes can be difficult. The present study developed a deep learning model to classify lung cancer subtypes from whole slide images (WSIs) of transbronchial lung biopsy (TBLB) specimens, in particular with the aim of using this model to evaluate a challenging test set of indeterminate cases. Methods: Our deep learning model consisted of two separately trained components: a convolutional neural network tile classifier and a recurrent neural network tile aggregator for the WSI diagnosis. We used a training set consisting of 638 WSIs of TBLB specimens to train a deep learning model to classify lung cancer subtypes (ADC, SCC and SCLC) and non-neoplastic lesions. The training set consisted of 593 WSIs for which the diagnosis had been determined by pathologists based on the visual inspection of Hematoxylin-Eosin (HE) slides and of 45 WSIs of indeterminate cases (64 ADCs and 19 SCCs). We then evaluated the models using five independent test sets. For each test set, we computed the receiver operator curve (ROC) area under the curve (AUC). Results: We applied the model to an indeterminate test set of WSIs obtained from TBLB specimens that pathologists had not been able to conclusively diagnose by examining the HE-stained specimens alone. Overall, the model achieved ROC AUCs of 0.993 (confidence interval [CI] 0.971-1.0) and 0.996 (0.981-1.0) for ADC and SCC, respectively. We further evaluated the model using five independent test sets consisting of both TBLB and surgically resected lung specimens (combined total of 2490 WSIs) and obtained highly promising results with ROC AUCs ranging from 0.94 to 0.99. Conclusions: In this study, we demonstrated that a deep learning model could be trained to predict lung cancer subtypes in indeterminate TBLB specimens. The extremely promising results obtained show that if deployed in clinical practice, a deep learning model that is capable of aiding pathologists in diagnosing indeterminate cases would be extremely beneficial as it would allow a diagnosis to be obtained sooner and reduce costs that would result from further investigations.

Download Full-text

ETW - High quality test performance in cryogenic environment

10.2514/6.2000-2206 ◽

2000 ◽

Cited By ~ 19

Author(s):

Juergen Quest

Keyword(s):

Test Performance ◽

Quality Test ◽

High Quality

Download Full-text

Generating Collaborative Systems for Digital Libraries: a Model-Driven Approach

Information Technology and Libraries ◽

10.6017/ital.v29i4.3128 ◽

2010 ◽

Vol 29 (4) ◽

pp. 171 ◽

Cited By ~ 2

Author(s):

Alessio Malizia ◽

Paolo Bottoni ◽

S. Levialdi

Keyword(s):

Digital Library ◽

User Interfaces ◽

Digital Libraries ◽

Automatic Generation ◽

Visual Language ◽

Domain Experts ◽

Model Driven ◽

Neutral Models ◽

Definition Of ◽

High Level

The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework.

Download Full-text

Issues for the automatic generation of safety critical software

Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering ◽

10.1109/ase.2000.873677 ◽

2000 ◽

Cited By ~ 5

Author(s):

C. O'Halloran

Keyword(s):

Automatic Generation ◽

Safety Critical

Download Full-text