instruction sets
Recently Published Documents


TOTAL DOCUMENTS

145
(FIVE YEARS 13)

H-INDEX

15
(FIVE YEARS 1)

2021 ◽  
Vol 7 ◽  
pp. e769
Author(s):  
Bérenger Bramas

The way developers implement their algorithms and how these implementations behave on modern CPUs are governed by the design and organization of these. The vectorization units (SIMD) are among the few CPUs’ parts that can and must be explicitly controlled. In the HPC community, the x86 CPUs and their vectorization instruction sets were de-facto the standard for decades. Each new release of an instruction set was usually a doubling of the vector length coupled with new operations. Each generation was pushing for adapting and improving previous implementations. The release of the ARM scalable vector extension (SVE) changed things radically for several reasons. First, we expect ARM processors to equip many supercomputers in the next years. Second, SVE’s interface is different in several aspects from the x86 extensions as it provides different instructions, uses a predicate to control most operations, and has a vector size that is only known at execution time. Therefore, using SVE opens new challenges on how to adapt algorithms including the ones that are already well-optimized on x86. In this paper, we port a hybrid sort based on the well-known Quicksort and Bitonic-sort algorithms. We use a Bitonic sort to process small partitions/arrays and a vectorized partitioning implementation to divide the partitions. We explain how we use the predicates and how we manage the non-static vector size. We also explain how we efficiently implement the sorting kernels. Our approach only needs an array of O(log N) for the recursive calls in the partitioning phase, both in the sequential and in the parallel case. We test the performance of our approach on a modern ARMv8.2 (A64FX) CPU and assess the different layers of our implementation by sorting/partitioning integers, double floating-point numbers, and key/value pairs of integers. Our results show that our approach is faster than the GNU C++ sort algorithm by a speedup factor of 4 on average.


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-31
Author(s):  
Guy L. Steele Jr. ◽  
Sebastiano Vigna

In 2014, Steele, Lea, and Flood presented SplitMix, an object-oriented pseudorandom number generator (prng) that is quite fast (9 64-bit arithmetic/logical operations per 64 bits generated) and also splittable . A conventional prng object provides a generate method that returns one pseudorandom value and updates the state of the prng; a splittable prng object also has a second operation, split , that replaces the original prng object with two (seemingly) independent prng objects, by creating and returning a new such object and updating the state of the original object. Splittable prng objects make it easy to organize the use of pseudorandom numbers in multithreaded programs structured using fork-join parallelism. This overall strategy still appears to be sound, but the specific arithmetic calculation used for generate in the SplitMix algorithm has some detectable weaknesses, and the period of any one generator is limited to 2 64 . Here we present the LXM family of prng algorithms. The idea is an old one: combine the outputs of two independent prng algorithms, then (optionally) feed the result to a mixing function. An LXM algorithm uses a linear congruential subgenerator and an F 2 -linear subgenerator; the examples studied in this paper use a linear congruential generator (LCG) of period 2 16 , 2 32 , 2 64 , or 2 128 with one of the multipliers recommended by L’Ecuyer or by Steele and Vigna, and an F 2 -linear xor-based generator (XBG) of the xoshiro family or xoroshiro family as described by Blackman and Vigna. For mixing functions we study the MurmurHash3 finalizer function; variants by David Stafford, Doug Lea, and degski; and the null (identity) mixing function. Like SplitMix, LXM provides both a generate operation and a split operation. Also like SplitMix, LXM requires no locking or other synchronization (other than the usual memory fence after instance initialization), and is suitable for use with simd instruction sets because it has no branches or loops. We analyze the period and equidistribution properties of LXM generators, and present the results of thorough testing of specific members of this family, using the TestU01 and PractRand test suites, not only on single instances of the algorithm but also for collections of instances, used in parallel, ranging in size from 2 to 2 24 . Single instances of LXM that include a strong mixing function appear to have no major weaknesses, and LXM is significantly more robust than SplitMix against accidental correlation in a multithreaded setting. We believe that LXM, like SplitMix, is suitable for “everyday” scientific and machine-learning applications (but not cryptographic applications), especially when concurrent threads or distributed processes are involved.


Author(s):  
Sam L. Thomas ◽  
Jan Van den Herrewegen ◽  
Georgios Vasilakis ◽  
Zitai Chen ◽  
Mihai Ordean ◽  
...  

Performing security analysis of embedded devices is a challenging task. They present many difficulties not usually found when analyzing commodity systems: undocumented peripherals, esoteric instruction sets, and limited tool support. Thus, a significant amount of reverse engineering is almost always required to analyze such devices. In this paper, we present Incision, an architecture and operating-system agnostic reverse engineering framework. Incision tackles the problem of reducing the upfront effort to analyze complex end-user devices. It combines static and dynamic analyses in a feedback loop, enabling information from each to be used in tandem to improve our overall understanding of the firmware analyzed. We use Incision to analyze a variety of devices and firmware. Our evaluation spans firmware based on three RTOSes, an automotive ECU, and a 4G/LTE baseband. We demonstrate that Incision does not introduce significant complexity to the standard reverse engineering process and requires little manual effort to use. Moreover, its analyses produce correct results with high confidence and are robust across different OSes and ISAs.


2021 ◽  
Vol 5 (2) ◽  
pp. 18-24
Author(s):  
Renas Rajab Asaad

In this article, we'll learn about the concepts of instruction organized in computer organization. On the premise of accessibility of ALU operands sorts of CPU organization is moreover endorsed in this article. When the constructing agent forms an Instruction it changes over the instruction from its memory helpers shape to standard machine language format called the "Instruction organize". Within the preparation of change, the constructing agent must decide the sort of instruction, change over typical names and express documentation to a base/displacement organize, decide the lengths of certain operands, and parse any strict and constants. An instruction arrangement characterizes the format of bits of instruction, in terms of its constituent parts. An instruction arrangement must incorporate an opcode and verifiably or unequivocally, zero or more operands. Each unequivocal operand is referenced utilizing one of tending to modes. Arrange must, certainly or unequivocally, show tending to the mode for each operand. For most instruction sets, more than on instruction used.


2020 ◽  
Vol 36 (16) ◽  
pp. 4399-4405 ◽  
Author(s):  
Marco Oliva ◽  
Franco Milicchio ◽  
Kaden King ◽  
Grace Benson ◽  
Christina Boucher ◽  
...  

Abstract Motivation Oxford Nanopore technologies (ONT) add miniaturization and real time to high-throughput sequencing. All available software for ONT data analytics run on cloud/clusters or personal computers. Instead, a linchpin to true portability is software that works on mobile devices of internet connections. Smartphones’ and tablets’ chipset/memory/operating systems differ from desktop computers, but software can be recompiled. We sought to understand how portable current ONT analysis methods are. Results Several tools, from base-calling to genome assembly, were ported and benchmarked on an Android smartphone. Out of 23 programs, 11 succeeded. Recompilation failures included lack of standard headers and unsupported instruction sets. Only DSK, BCALM2 and Kraken were able to process files up to 16 GB, with linearly scaling CPU-times. However, peak CPU temperatures were high. In conclusion, the portability scenario is not favorable. Given the fast market growth, attention of developers to ARM chipsets and Android/iOS is warranted, as well as initiatives to implement mobile-specific libraries. Availability and implementation The source code is freely available at: https://github.com/marco-oliva/portable-nanopore-analytics.


Author(s):  
Ben Simner ◽  
Shaked Flur ◽  
Christopher Pulte ◽  
Alasdair Armstrong ◽  
Jean Pichon-Pharabod ◽  
...  

AbstractComputing relies on architecture specifications to decouple hardware and software development. Historically these have been prose documents, with all the problems that entails, but research over the last ten years has developed rigorous and executable-as-test-oracle specifications of mainstream architecture instruction sets and “user-mode” concurrency, clarifying architectures and bringing them into the scope of programming-language semantics and verification. However, the system semantics, of instruction-fetch and cache maintenance, exceptions and interrupts, and address translation, remains obscure, leaving us without a solid foundation for verification of security-critical systems software.In this paper we establish a robust model for one aspect of system semantics: instruction fetch and cache maintenance for ARMv8-A. Systems code relies on executing instructions that were written by data writes, e.g. in program loading, dynamic linking, JIT compilation, debugging, and OS configuration, but hardware implementations are often highly optimised, e.g. with instruction caches, linefill buffers, out-of-order fetching, branch prediction, and instruction prefetching, which can affect programmer-observable behaviour. It is essential, both for programming and verification, to abstract from such microarchitectural details as much as possible, but no more. We explore the key architecture design questions with a series of examples, discussed in detail with senior Arm staff; capture the architectural intent in operational and axiomatic semantic models, extending previous work on “user-mode” concurrency; make these models executable as test oracles for small examples; and experimentally validate them against hardware behaviour (finding a bug in one hardware device). We thereby bring these subtle issues into the mathematical domain, clarifying the architecture and enabling future work on system software verification.


2019 ◽  
Vol 9 (3) ◽  
pp. 416-428
Author(s):  
Nuhu Amin ◽  
Dawn D. Sagerman ◽  
Fosiul A. Nizame ◽  
Kishor K. Das ◽  
Md Nuruzzaman ◽  
...  

Abstract Handwashing instructions vary in complexity, with some recommending multiple steps. To assess whether complex handwashing instructions changed handwashing procedure replication, we conducted a randomized non-inferiority trial in a low-income area, Dhaka. We randomly assigned mothers and children aged 5–10 years to one of three handwashing instruction sets: simple (N = 85 mothers/134 children), moderate (N = 75 mothers/148 children), or complex (84 mothers/147 children). Simple instructions had three steps: wet, lather, and rinse hands, and moderate included the simple instructions plus steps to scrub palms, backs of hands, and dry hands in the air. Complex instructions included moderate instructions plus steps to scrub between fingers, under nails, and lather for 20 s. After baseline, cue cards were used to promote handwashing instructions, and adherence after 2 weeks of interventions was evaluated. Compliance with handwashing procedure replication to all instructions in simple, moderate, and complex increased after the intervention among mothers and children. Compliance to all instructions in the simple group was higher in the simple group (100%) compared to all instructions in moderate (47%) and complex instruction groups (38%). Simple handwashing steps are easier to remember for long time periods compared to complex steps.


Sign in / Sign up

Export Citation Format

Share Document