code size
Recently Published Documents

IoT firmware oftentimes incorporates third-party components, such as network-oriented middleware and media encoders/decoders. These components consist of large and mature codebases, shipping with a variety of non-critical features. Feature bloat increases code size, complicates auditing/debugging, and reduces stability. This is problematic for IoT devices, which are severely resource-constrained and must remain operational in the field for years. Unfortunately, identification and complete removal of code related to unwanted features requires familiarity with codebases of interest, cumbersome manual effort, and may introduce bugs. We address these difficulties by introducing PRAT, a system that takes as input the codebase of software of interest, identifies and maps features to code, presents this information to a human analyst, and removes all code belonging to unwanted features. PRAT solves the challenge of identifying feature-related code through a novel form of differential dynamic analysis and visualizes results as user-friendly feature graphs . Evaluation on diverse codebases shows superior code removal compared to both manual feature deactivation and state-of-art debloating tools, and generality across programming languages. Furthermore, a user study comparing PRAT to manual code analysis shows that it can significantly simplify the feature identification workflow.

Download Full-text

Device Hopping

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3471909 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-25

Author(s):

Paul Metzger ◽

Volker Seeker ◽

Christian Fensch ◽

Murray Cole

Keyword(s):

Programming Model ◽

Heterogeneous Systems ◽

Code Size ◽

Fine Grained ◽

Scheduling Policy ◽

High Level ◽

Many Core ◽

Execution Models ◽

Current Systems

Existing OS techniques for homogeneous many-core systems make it simple for single and multithreaded applications to migrate between cores. Heterogeneous systems do not benefit so fully from this flexibility, and applications that cannot migrate in mid-execution may lose potential performance. The situation is particularly challenging when a switch of language runtime would be desirable in conjunction with a migration. We present a case study in making heterogeneous CPU + GPU systems more flexible in this respect. Our technique for fine-grained application migration, allows switches between OpenMP, OpenCL, and CUDA execution, in conjunction with migrations from GPU to CPU, and CPU to GPU. To achieve this, we subdivide iteration spaces into slices, and consider migration on a slice-by-slice basis. We show that slice sizes can be learned offline by machine learning models. To further improve performance, memory transfers are made migration-aware. The complexity of the migration capability is hidden from programmers behind a high-level programming model. We present a detailed evaluation of our mid-kernel migration mechanism with the First Come, First Served scheduling policy. We compare our technique in a focused evaluation scenario against idealized kernel-by-kernel scheduling, which is typical for current systems, and makes perfect kernel to device scheduling decisions, but cannot migrate kernels mid-execution. Models show that up to 1.33× speedup can be achieved over these systems by adding fine-grained migration. Our experimental results with all nine applicable SHOC and Rodinia benchmarks achieve speedups of up to 1.30× (1.08× on average) over an implementation of a perfect but kernel-migration incapable scheduler when migrated to a faster device. Our mechanism and slice size choices introduce an average slowdown of only 2.44% if kernels never migrate. Lastly, our programming model reduces the code size by at least 88% if compared to manual implementations of migratable kernels.

Download Full-text

Constraint-based Diversification of JOP Gadgets

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12848 ◽

2021 ◽

Vol 72 ◽

pp. 1471-1505

Author(s):

Rodothea Myrsini Tsoupidi ◽

Roberto Castañeda Lozano ◽

Benoit Baudry

Keyword(s):

Large Scale ◽

Distance Measure ◽

Neighborhood Search ◽

Software Systems ◽

Code Size ◽

Assembly Code ◽

Precise Control ◽

Application Specific ◽

Specific Distance ◽

Binary Programs

Modern software deployment process produces software that is uniform, and hence vulnerable to large-scale code-reuse attacks, such as Jump-Oriented Programming (JOP) attacks. Compiler-based diversification improves the resilience and security of software systems by automatically generating different assembly code versions of a given program. Existing techniques are efficient but do not have a precise control over the quality, such as the code size or speed, of the generated code variants. This paper introduces Diversity by Construction (DivCon), a constraint-based compiler approach to software diversification. Unlike previous approaches, DivCon allows users to control and adjust the conflicting goals of diversity and code quality. A key enabler is the use of Large Neighborhood Search (LNS) to generate highly diverse assembly code efficiently. For larger problems, we propose a combination of LNS with a structural decomposition of the problem. To further improve the diversification efficiency of DivCon against JOP attacks, we propose an application-specific distance measure tailored to the characteristics of JOP attacks. We evaluate DivCon with 20 functions from a popular benchmark suite for embedded systems. These experiments show that DivCon's combination of LNS and our application-specific distance measure generates binary programs that are highly resilient against JOP attacks (they share between 0.15% to 8% of JOP gadgets) with an optimality gap of 10%. Our results confirm that there is a trade-off between the quality of each assembly code version and the diversity of the entire pool of versions. In particular, the experiments show that DivCon is able to generate binary programs that share a very small number of gadgets, while delivering near-optimal code. For constraint programming researchers and practitioners, this paper demonstrates that LNS is a valuable technique for finding diverse solutions. For security researchers and software engineers, DivCon extends the scope of compiler-based diversification to performance-critical and resource-constrained applications.

Download Full-text

Using machine learning to predict the code size impact of duplication heuristics in a dynamic compiler

10.1145/3475738.3480943 ◽

2021 ◽

Author(s):

Raphael Mosaner ◽

David Leopoldseder ◽

Lukas Stadler ◽

Hanspeter Mössenböck

Keyword(s):

Machine Learning ◽

Code Size

Download Full-text

New Optimization Sequences for Code-Size Reduction for the LLVM Compilation Infrastructure

10.1145/3475061.3475085 ◽

2021 ◽

Author(s):

Anderson Faustino ◽

Edson Borin ◽

Fernando Pereira ◽

Otávio Nápoli ◽

Vanderson Rosário

Keyword(s):

Size Reduction ◽

Code Size

Download Full-text

Inlining for Code Size Reduction

10.1145/3475061.3475081 ◽

2021 ◽

Author(s):

Thaís Damásio ◽

Vinícius Pacheco ◽

Fabrício Goes ◽

Fernando Pereira ◽

Rodrigo Rocha

Keyword(s):

Size Reduction ◽

Code Size

Download Full-text

FORMALIZATION CODING METHODS OF INFORMATION UNDER TOROIDAL COORDINATE SYSTEMS

Radio Electronics Computer Science Control ◽

10.15588/1607-3274-2021-2-15 ◽

2021 ◽

pp. 144-153

Author(s):

V. V. Riznyk

Keyword(s):

Coordinate System ◽

Information Technologies ◽

Optimal Placement ◽

Data Sets ◽

Structural Elements ◽

Vector Data ◽

Coordinate Systems ◽

Code Size ◽

Data Coding ◽

Coding Systems

Contents. Coding and processing large information content actualizes the problem of formalization of interdependence between information parameters of vector data coding systems on a single mathematical platform. Objective. The formalization of relationships between information parameters of vector data coding systems in the optimized basis of toroidal coordinate systems with the achievement of a favorable compromise between contradictory goals. Method. The method involves the establishing harmonious mutual penetration of symmetry and asymmetry as the remarkable property of real space, which allows use decoded information for forming the mathematical principle relating to the optimal placement of structural elements in spatially or temporally distributed systems, using novel designs based on the concept of Ideal Ring Bundles (IRB)s. IRBs are cyclic sequences of positive integers which dividing a symmetric sphere about center of the symmetry. The sums of connected sub-sequences of an IRB enumerate the set of partitions of a sphere exactly R times. Two-and multidimensional IRBs, namely the “Glory to Ukraine Stars”, are sets of t-dimensional vectors, each of them as well as all modular sums of them enumerate the set node points grid of toroid coordinate system with the corresponding sizes and dimensionality exactly R times. Moreover, we require each indexed vector data “category-attribute” mutually uniquely corresponds to the point with the eponymous set of the coordinate system. Besides, a combination of binary code with vector weight discharges of the database is allowed, and the set of all values of indexed vector data sets are the same that a set of numerical values. The underlying mathematical principle relates to the optimal placement of structural elements in spatially and/or temporally distributed systems, using novel designs based on tdimensional “star” combinatorial configurations, including the appropriate algebraic theory of cyclic groups, number theory, modular arithmetic, and IRB geometric transformations. Results. The relationship of vector code information parameters (capacity, code size, dimensionality, number of encodingvectors) with geometric parameters of the coordinate system (dimension, dimensionality, and grid sizes), and vector data characteristic (number of attributes and number of categories, entity-attribute-value size list) have been formalized. The formula system is derived as a functional dependency between the above parameters, which allows achieving a favorable compromise between the contradictory goals (for example, the performance and reliability of the coding method). Theorem with corresponding corollaries about the maximum vector code size of conversion methods for t-dimensional indexed data sets “category-attribute” proved. Theoretically, the existence of an infinitely large number of minimized basis, which give rise to numerous varieties of multidimensional “star” coordinate systems, which can find practical application in modern and future multidimensional information technologies, substantiated. Conclusions. The formalization provides, essentially, a new conceptual model of information systems for optimal coding and processing of big vector data, using novel design based on the remarkable properties and structural perfection of the “Glory to Ukraine Stars” combinatorial configurations. Moreover, the optimization has been embedded in the underlying combinatorial models. The favorable qualities of the combinatorial structures can be applied to vector data coded design of multidimensional signals, signal compression and reconstruction for communications and radar, and other areas to which the GUS-model can be useful. There are many opportunities to apply them to numerous branches of sciences and advanced systems engineering, including information technologies under the toroidal coordinate systems. A perfection, harmony and beauty exists not only in the abstract models but in the real world also.

Download Full-text

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01459-0 ◽

2021 ◽

Vol 21 (S2) ◽

Author(s):

Feihong Yang ◽

Xuwen Wang ◽

Hetong Ma ◽

Jiao Li

Keyword(s):

Language Processing ◽

Pearson Correlation ◽

Fine Tuning ◽

Entity Recognition ◽

Training Dataset ◽

Training Methods ◽

Code Size ◽

Model Framework ◽

Language Understanding ◽

Medical Language

Abstract Background Transformer is an attention-based architecture proven the state-of-the-art model in natural language processing (NLP). To reduce the difficulty of beginning to use transformer-based models in medical language understanding and expand the capability of the scikit-learn toolkit in deep learning, we proposed an easy to learn Python toolkit named transformers-sklearn. By wrapping the interfaces of transformers in only three functions (i.e., fit, score, and predict), transformers-sklearn combines the advantages of the transformers and scikit-learn toolkits. Methods In transformers-sklearn, three Python classes were implemented, namely, BERTologyClassifier for the classification task, BERTologyNERClassifier for the named entity recognition (NER) task, and BERTologyRegressor for the regression task. Each class contains three methods, i.e., fit for fine-tuning transformer-based models with the training dataset, score for evaluating the performance of the fine-tuned model, and predict for predicting the labels of the test dataset. transformers-sklearn is a user-friendly toolkit that (1) Is customizable via a few parameters (e.g., model_name_or_path and model_type), (2) Supports multilingual NLP tasks, and (3) Requires less coding. The input data format is automatically generated by transformers-sklearn with the annotated corpus. Newcomers only need to prepare the dataset. The model framework and training methods are predefined in transformers-sklearn. Results We collected four open-source medical language datasets, including TrialClassification for Chinese medical trial text multi label classification, BC5CDR for English biomedical text name entity recognition, DiabetesNER for Chinese diabetes entity recognition and BIOSSES for English biomedical sentence similarity estimation. In the four medical NLP tasks, the average code size of our script is 45 lines/task, which is one-sixth the size of transformers’ script. The experimental results show that transformers-sklearn based on pretrained BERT models achieved macro F1 scores of 0.8225, 0.8703 and 0.6908, respectively, on the TrialClassification, BC5CDR and DiabetesNER tasks and a Pearson correlation of 0.8260 on the BIOSSES task, which is consistent with the results of transformers. Conclusions The proposed toolkit could help newcomers address medical language understanding tasks using the scikit-learn coding style easily. The code and tutorials of transformers-sklearn are available at https://doi.org/10.5281/zenodo.4453803. In future, more medical language understanding tasks will be supported to improve the applications of transformers_sklearn.

Download Full-text

PICO

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3460434 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-27

Author(s):

Tina Jung ◽

Fabian Ritter ◽

Sebastian Hack

Keyword(s):

Execution Time ◽

Code Size ◽

Memory Safety ◽

Common Solution ◽

Buffer Overflows ◽

Code Instrumentation ◽

Spec Benchmarks ◽

Memory Accesses ◽

The Impact ◽

Time Overhead

Memory safety violations such as buffer overflows are a threat to security to this day. A common solution to ensure memory safety for C is code instrumentation. However, this often causes high execution-time overhead and is therefore rarely used in production. Static analyses can reduce this overhead by proving some memory accesses in bounds at compile time. In practice, however, static analyses may fail to verify in-bounds accesses due to over-approximation. Therefore, it is important to additionally optimize the checks that reside in the program. In this article, we present PICO, an approach to eliminate and replace in-bounds checks. PICO exactly captures the spatial memory safety of accesses using Presburger formulas to either verify them statically or substitute existing checks with more efficient ones. Thereby, PICO can generate checks of which each covers multiple accesses and place them at infrequently executed locations. We evaluate our LLVM-based PICO prototype with the well-known SoftBound instrumentation on SPEC benchmarks commonly used in related work. PICO reduces the execution-time overhead introduced by SoftBound by 36% on average (and the code-size overhead by 24%). Our evaluation shows that the impact of substituting checks dominates that of removing provably redundant checks.

Download Full-text

Software energy consumption prediction using software code metrics

10.32920/ryerson.14666424.v1 ◽

2021 ◽

Author(s):

Sedef Akinli Koçak

Keyword(s):

Energy Efficiency ◽

Energy Consumption ◽

Prediction Models ◽

Code Size ◽

Research Attention ◽

Software Products ◽

Code Metrics ◽

Software Code ◽

Energy Consumption Prediction ◽

Consumption Prediction

In recent years, a significant amount of energy consumption of ICT products has resulted in environmental concerns. Growing demand for mobile devices, personal computers, and the widespread adaptation of cloud computing and data centers are the main drivers for the energy consumption of the ICT systems. Finding solutions for improving the energy efficiency of the systems has become an important objective for both industry and academia. In order to address the increase in ICT energy consumption, hardware technology, such as production of energy efficient processors, has been substantially improved. However, demand for energy is growing faster than improvements are being made on these energy-aware technologies. Therefore, in addition to hardware, software technologies must also be a focus of research attention. Although software does not consume energy by itself, its characteristics determine which hardware resources are made available and how much electrical energy is used. Current literature on the energy efficiency of software, highlights, in particular, a lack of measurements and models. In this dissertation, first, the relationship between software code properties and energy consumption is explored. Second, using static code metrics regression based energy consumption prediction models are investigated. Finally, the models performance are assessed using within product and cross-product energy consumption prediction approaches. For this purpose, a quantitative based retrospective cohort study was employed. As research methods, observational data collection, mining software repositories, and regression analysis were utilized. This research results show inconsistent relationships between energy consumption and code size and complexity attributes considering different types of software products. Such results provide a foundation of knowledge that static code attributes may give some insights but would not be the sole predictors of energy consumption of software products.

Download Full-text

code sizeRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Guided Feature Identification and Removal for Resource-constrained Firmware

Device Hopping

Constraint-based Diversification of JOP Gadgets

Using machine learning to predict the code size impact of duplication heuristics in a dynamic compiler

New Optimization Sequences for Code-Size Reduction for the LLVM Compilation Infrastructure

Inlining for Code Size Reduction

FORMALIZATION CODING METHODS OF INFORMATION UNDER TOROIDAL COORDINATE SYSTEMS

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models

PICO

Software energy consumption prediction using software code metrics

code size
Recently Published Documents