SHAKTI: An Open-Source Processor Ecosystem

Processors have become ubiquitous in all the appliances and machines we use, in both consumer and industrial settings. These processors range from extremely small and low power micro-controllers (used in motor controls, home robots and appliances) to high-performance multi-core processors (used in servers and supercomputers). However, the growth of modern AI/ML environments (like Caffe[Jia et al. 2014], Tensorflow[Abadi et al. 2016]) and the need for features like enhanced security has forced the industry to look beyond general purpose solutions and towards domain-specific-customizations. While a large number of companies today can develop custom ASICs (Application Specific Integrated Chips) and license specific silicon blocks from chip-vendors to develop a customized SoCs (System on Chips), at the heart of every design is the processor and the associated hardware. To serve modern workloads better, these processors also need to be customized, upgraded, re-designed and augmented suitably. This requires that vendors/consumers have access to appropriate processor variants and the flexibility to make modifications and ship them at an affordable cost.

Download Full-text

Rack Server Solution in Data Center

Volume 1: Thermal Management ◽

10.1115/ipack2015-48258 ◽

2015 ◽

Cited By ~ 2

Author(s):

Sheng Kang ◽

Guofeng Chen ◽

Chun Wang ◽

Ruiquan Ding ◽

Jiajun Zhang ◽

...

Keyword(s):

Power Consumption ◽

Low Power ◽

Power Supply ◽

Data Center ◽

Power Efficiency ◽

High Performance ◽

High Efficiency ◽

High Growth ◽

General Purpose ◽

Power Supplies

With the advent of big data and cloud computing solutions, enterprise demand for servers is increasing. There is especially high growth for Intel based x86 server platforms. Today’s datacenters are in constant pursuit of high performance/high availability computing solutions coupled with low power consumption and low heat generation and the ability to manage all of this through advanced telemetry data gathering. This paper showcases one such solution of an updated rack and server architecture that promises such improvements. The ability to manage server and data center power consumption and cooling more completely is critical in effectively managing datacenter costs and reducing the PUE in the data center. Traditional Intel based 1U and 2U form factor servers have existed in the data center for decades. These general purpose x86 server designs by the major OEM’s are, for all practical purposes, very similar in their power consumption and thermal output. Power supplies and thermal designs for server in the past have not been optimized for high efficiency. In addition, IT managers need to know more information about servers in order to optimize data center cooling and power use, an improved server/rack design needs to be built to take advantage of more efficient power supplies or PDU’s and more efficient means of cooling server compute resources than from traditional internal server fans. This is the constant pursuit of corporations looking at new ways to improving efficiency and gaining a competitive advantage. A new way to optimize power consumption and improve cooling is a complete redesign of the traditional server rack. Extracting internal server power supplies and server fans and centralizing these within the rack aims to achieve this goal. This type of design achieves an entirely new low power target by utilizing centralized, high efficiency PDU’s that power all servers within the rack. Cooling is improved by also utilizing large efficient rack based fans for airflow to all servers. Also, opening up the server design is to allow greater airflow across server components for improved cooling. This centralized power supply breaks through the traditional server power limits. Rack based PDU’s can adjust the power efficiency to a more optimum point. Combine this with the use of online + offline modes within one single power supply. Cold backup makes data center power to achieve optimal power efficiency. In addition, unifying the mechanical structure and thermal definitions within the rack solution for server cooling and PSU information allows IT to collect all server power and thermal information centrally for improved ease in analyzing and processing.

Download Full-text

High-Performance Reconfigurable Computing

Advances in Computer and Electrical Engineering - Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics ◽

10.4018/978-1-5225-7598-6.ch053 ◽

2019 ◽

pp. 731-744

Author(s):

Mário Pereira Vestias

Keyword(s):

Power Consumption ◽

Integrated Circuit ◽

Reconfigurable Computing ◽

High Performance ◽

General Purpose ◽

Reconfigurable Hardware ◽

Coarse Grained ◽

Lower Power ◽

Fine Grained ◽

Application Specific

High-performance reconfigurable computing systems integrate reconfigurable technology in the computing architecture to improve performance. Besides performance, reconfigurable hardware devices also achieve lower power consumption compared to general-purpose processors. Better performance and lower power consumption could be achieved using application-specific integrated circuit (ASIC) technology. However, ASICs are not reconfigurable, turning them application specific. Reconfigurable logic becomes a major advantage when hardware flexibility permits to speed up whatever the application with the same hardware module. The first and most common devices utilized for reconfigurable computing are fine-grained FPGAs with a large hardware flexibility. To reduce the performance and area overhead associated with the reconfigurability, coarse-grained reconfigurable solutions has been proposed as a way to achieve better performance and lower power consumption. In this chapter, the authors provide a description of reconfigurable hardware for high-performance computing.

Download Full-text

High-Performance Reconfigurable Computing

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch348 ◽

2018 ◽

pp. 4018-4029

Author(s):

Mário Pereira Vestias

Keyword(s):

Power Consumption ◽

Integrated Circuit ◽

Reconfigurable Computing ◽

High Performance ◽

General Purpose ◽

Reconfigurable Hardware ◽

Coarse Grained ◽

Lower Power ◽

Fine Grained ◽

Application Specific

High-Performance Reconfigurable Computing systems integrate reconfigurable technology in the computing architecture to improve performance. Besides performance, reconfigurable hardware devices also achieve lower power consumption compared to General-Purpose Processors. Better performance and lower power consumption could be achieved using Application Specific Integrated Circuit (ASIC) technology. However, ASICs are not reconfigurable, turning them application specific. Reconfigurable logic becomes a major advantage when hardware flexibility permits to speed up whatever the application with the same hardware module. The first and most common devices utilized for reconfigurable computing are fine-grained FPGAs with a large hardware flexibility. To reduce the performance and area overhead associated with the reconfigurability, coarse-grained reconfigurable solutions has been proposed as a way to achieve better performance and lower power consumption. In this chapter we will provide a description of reconfigurable hardware for high performance computing.

Download Full-text

An Empirical Investigation into the Adoption of Open Source Software in Hospitals

Software Applications ◽

10.4018/978-1-60566-060-8.ch095 ◽

2009 ◽

pp. 1608-1627

Author(s):

Gilberto Munoz-Cornejo ◽

Carolyn B. Seaman ◽

A. Günes Koru

Keyword(s):

Open Source ◽

Open Source Software ◽

Empirical Model ◽

Mixed Method ◽

Empirical Investigation ◽

Critical Factor ◽

General Purpose ◽

Research Approach ◽

Mixed Method Research ◽

Domain Specific

Open source software (OSS) has gained considerable attention recently in healthcare. Yet, how and why OSS is being adopted within hospitals in particular remains a poorly understood issue. This research attempts to further this understanding. A mixed-method research approach was used to explore the extent of OSS adoption in hospitals as well as the factors facilitating and inhibiting adoption. The findings suggest a very limited adoption of OSS in hospitals. Hospitals tend to adopt general-purpose instead of domain-specific OSS. We found that software vendors are the critical factor facilitating the adoption of OSS in hospitals. Conversely, lack of in-house development as well as a perceived lack of security, quality, and accountability of OSS products were factors inhibiting adoption. An empirical model is presented to illustrate the factors facilitating and inhibiting the adoption of OSS in hospitals.

Download Full-text

USING METAPROGRAMMING TO PARALLELIZE FUNCTIONAL SPECIFICATIONS

Parallel Processing Letters ◽

10.1142/s0129626402000926 ◽

2002 ◽

Vol 12 (02) ◽

pp. 193-210 ◽

Cited By ~ 3

Author(s):

CHRISTOPH A. HERRMANN ◽

CHRISTIAN LENGAUER

Keyword(s):

Parallel Computing ◽

Programming Language ◽

High Performance ◽

Parallel Implementation ◽

General Purpose ◽

Functional Language ◽

Application Domain ◽

Domain Specific ◽

Domain Independent

Metaprogramming is a paradigm for enhancing a general-purpose programming language with features catering for a special-purpose application domain, without a need for a reimplementation of the language. In a staged compilation, the special-purpose features are translated and optimised by a domain-specific preprocessor, which hands over to the general-purpose compiler for translation of the domain-independent part of the program. The domain we work in is high-performance parallel computing. We use metaprogramming to enhance the functional language Haskell with features for the efficient, parallel implementation of certain computational patterns, called skeletons.

Download Full-text

ProjectQ: an open source software framework for quantum computing

Quantum ◽

10.22331/q-2018-01-31-49 ◽

2018 ◽

Vol 2 ◽

pp. 49 ◽

Cited By ~ 66

Author(s):

Damian S. Steiger ◽

Thomas Häner ◽

Matthias Troyer

Keyword(s):

Quantum Computing ◽

Open Source ◽

Open Source Software ◽

High Performance ◽

Quantum Algorithms ◽

Cloud Service ◽

Domain Specific Language ◽

Resource Estimation ◽

Domain Specific ◽

Compiler Framework

We introduce ProjectQ, an open source software effort for quantum computing. The first release features a compiler framework capable of targeting various types of hardware, a high-performance simulator with emulation capabilities, and compiler plug-ins for circuit drawing and resource estimation. We introduce our Python-embedded domain-specific language, present the features, and provide example implementations for quantum algorithms. The framework allows testing of quantum algorithms through simulation and enables running them on actual quantum hardware using a back-end connecting to the IBM Quantum Experience cloud service. Through extension mechanisms, users can provide back-ends to further quantum hardware, and scientists working on quantum compilation can provide plug-ins for additional compilation, optimization, gate synthesis, and layout strategies.

Download Full-text

Design space exploration in near-data co-processors for general-purpose acceleration, in high-performance and low-power processing environments

10.12681/eadd/49517 ◽

2021 ◽

Author(s):

Αθανάσιος Τζιουβάρας

Keyword(s):

Low Power ◽

High Performance ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

General Purpose

Οι σύγχρονες αρχιτεκτονικές υπολογιστών είναι αντιμέτωπες με ένα σοβαρό πρόβλημα που αφορά την κλιμάκωση της απόδοσης τους, καθώς η συμφόρηση της πληροφορίας έχει μετατοπιστεί από τον πυρήνα του επεξεργαστή στην μονάδα της κύριας μνήμης και στις λειτουργίες μεταφοράς δεδομένων. Το φαινόμενο αυτό μπορεί μερικώς να αποδοθεί στο τέλος της ισχύος του νόμου του Dennard και στην διαρκή μείωση του μεγέθους των τρανσίστορς. Ως αποτέλεσμα, η πυκνότητα ισχύος των ολοκληρωμένων κυκλωμάτων έχει αυξηθεί τόσο, ώστε η λειτουργία των πολύ-πυρηνικών επεξεργαστών να επιτελείται σε τάσεις που βρίσκονται κοντά στην τάση κατωφλίου. Για να ξεπεράσουν το πρόβλημα αυτό, οι ερευνητές τείνουν να αποκλίνουν από τις κλασικές αρχιτεκτονικές προσεγγίσεις τύπου Von Neuman και να στρέφουν την προσοχή τους σε νέα μοντέλα επεξεργασίας. Την τελευταία δεκαετία έχει παρατηρηθεί μία αναζωπύρωση του ενδιαφέροντος για το παράδειγμα εκτέλεσης εντολών κοντά στην κύρια μνήμη (NDP), κατά το οποίο οι εντολές εκτελούνται στο κύκλωμα της κύριας μνήμης αντί του κεντρικού επεξεργαστή. Έτσι, ο αριθμός των λειτουργιών της μεταφοράς δεδομένων μεταξύ της κύριας μνήμης και του επεξεργαστή μειώνεται σημαντικά, κάτι το οποίο επιδρά θετικά στην κατανάλωση ισχύος και την επιτεύξιμη απόδοση του συστήματος. Κινούμενοι προς αυτήν την υπόθεση, στην διατριβή αυτή εξερευνούμε το NDP παράδειγμα για επεξεργαστές υψηλής απόδοσης αλλά και για επεξεργαστές χαμηλούς ισχύος. Όσον αφορά του επεξεργαστές υψηλής απόδοσης, προτείνουμε μία προσέγγιση στην οποία λαμβάνουμε υπ’ όψη μας την εκτέλεση βρόγχων γενικού σκοπού. H αρχιτεκτονική την οποία προτείνουμε κάνει χρήση μίας μεθοδολογίας χρονοδρομολόγησης εντολών, κατά την οποία η κάθε εντολή του βρόγχου εκδίδεται σε ένα ειδικά προσαρμοσμένο ολοκληρωμένο κύκλωμα που έχει τον ρόλο του επιταχυντή της εκτέλεσης του βρόγχου. Το κύκλωμα αυτό τοποθετείται στο λογικό επίπεδο μίας κύριας μνήμης υβριδικού κύβου (HMC). Στο επίπεδο αυτό οι εντολές εκτελούνται επαναληπτικά και παράλληλα, με έναν τρόπο που θυμίζει αυτόν της επικάλυψης λογισμικού, ενώ τα ενδιάμεσα παραγόμενα αποτελέσματα παροχετεύονται δια μέσου ενός δικτύου διασύνδεσης που βρίσκεται πάνω στο ολοκληρωμένο κύκλωμα. Όσον αφορά τις αρχιτεκτονικές χαμηλής κατανάλωσης ισχύος, αναπτύσσουμε μία καινοτόμο μεθοδολογία ανάλυσης χρονισμού, η οποία βασίζεται στις αρχές του STA και προσανατολίζεται συγκεκριμένα προς συστήματα χαμηλών προδιαγραφών και χαμηλής κατανάλωσης ενέργειας. Η μεθοδολογία αυτή λαμβάνει υπ’ όψη της την διέγερση των διαδρομών χρονισμού της κάθε εντολής που υποστηρίζεται από το σετ εντολών του επεξεργαστή (ISA) και υπολογίζει την καθυστέρηση της χειρότερης περίπτωσης για την κάθε εντολή ξεχωριστά. Ως αποτέλεσμα, αντλούμε πληροφορίες για την χρονική καθυστέρηση σε επίπεδο εντολής και εκμεταλλευόμαστε την πληροφορία αυτή ώστε να κλιμακώνουμε την συχνότητα του ρολογιού δυναμικά, ανάλογα με τον τύπο εντολής που εκτελείται στο κύκλωμα σε κάθε χρονική στιγμή. Στην συνέχεια χρησιμοποιούμε την μεθοδολογία που περιγράψαμε για να συν-σχεδιάσουμε μία αρχιτεκτονική, με γνώμονα την δυναμική μεταβολή της συχνότητας του ρολογιού του επεξεργαστή η οποία εκτείνεται στον βαθμό λεπτομέρειας του κύκλου μηχανής. Επικεντρωνόμαστε ξανά στην εκτέλεση κώδικα γενικού σκοπού και υλοποιούμε συνδυαστικά τη αρχιτεκτονική στο λογικό επίπεδο μίας μνήμης τύπου HMC ώστε να καταστήσουμε ικανό το σύστημα μας για εκτέλεση εντολών δίπλα στην μνήμη τυχαίας προσπέλασης. Επιλέγουμε να αξιολογήσουμε τις αρχιτεκτονικές που υλοποιήσαμε (της υψηλής απόδοσης αλλά και της χαμηλής κατανάλωσης ισχύος) σε επίπεδο υλοποίησης ολοκληρωμένου κυκλώματος σύμφωνα με τα πρότυπα της βιομηχανίας ώστε να ενισχύσουμε την εγκυρότητας της μεθοδολογίας μας. Τα αποτελέσματα τα οποία παίρνουμε υποδεικνύουνε μία μεγάλη αύξηση της απόδοσης του συστήματος όσον αφορά την επιτάχυνση της λειτουργίας του σε σύγκριση με την αρχική αρχιτεκτονική, ενώ η κατανάλωση ισχύος πέφτει σε πολύ χαμηλά επίπεδα.

Download Full-text

Open-source implementation of an ad-hoc IEEE802.11a/g/p software-defined radio on low-power and low-cost general purpose processors

Radioengineering ◽

10.13164/re.2017.1083 ◽

2017 ◽

Vol 26 (4) ◽

pp. 1083-1095 ◽

Cited By ~ 1

Author(s):

S. Ciccia ◽

G. Giordanengo ◽

G. Vecchi

Keyword(s):

Low Power ◽

Open Source ◽

Software Defined Radio ◽

Ad Hoc ◽

Low Cost ◽

General Purpose ◽

General Purpose Processors

Download Full-text

Behavioral synthesis of high performance, low cost, and low power application specific processors for linear computations

Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94) ◽

10.1109/asap.1994.331817 ◽

2002 ◽

Cited By ~ 2

Author(s):

M. Potkonjak ◽

M.B. Srivastava

Keyword(s):

Low Power ◽

High Performance ◽

Low Cost ◽

Behavioral Synthesis ◽

Power Application ◽

Application Specific

Download Full-text

Novel Nanometric Reversible Low Power Bidirectional Universal Logarithmic Barrel Shifter with Overflow and Zero Flags

Journal of Circuits System and Computers ◽

10.1142/s0218126615500498 ◽

2015 ◽

Vol 24 (04) ◽

pp. 1550049 ◽

Cited By ~ 3

Author(s):

Nayereh Hosseininia ◽

Soudabeh Boroumand ◽

Majid Haghparast

Keyword(s):

Low Power ◽

High Speed ◽

High Performance ◽

Error Control ◽

Digital Signal ◽

General Purpose ◽

Vlsi Circuits ◽

Quantum Cost ◽

Hardware Complexity ◽

Barrel Shifter

One of the most important issues in designing VLSI circuits is power consumption. Reversible logic which is widely utilized in quantum computing, low power CMOS design, optical information processing, bioinformatics and nanotechnology-based systems decreases power loss. A reversible circuit has zero internal power dissipation because it does not lose information. Reversible barrel shifters are required to construct reversible embedded digital signal and general-purpose processors. Data shifting is often used in high-speed/low-power error-control applications, floating point normalization, address decoding and bit indexing. This paper proposes a novel reversible bidirectional universal barrel shifter which is applied in high speed and high performance applications. The proposed barrel shifter is designed in a single circuit with overflow and zero flags. It performs three operations consisting of rotating, logical and arithmetic shifting that transfers and shifts data in both directions. The design is evaluated and formulated in terms of number of garbage outputs, number of constant inputs, quantum cost, number of reversible gates and hardware complexity. All the scales are in nanometric area.

Download Full-text