Byte-Select Compression

Matthew Tomei; Shomit Das; Mohammad Seyedzadeh; Philip Bedoukian; Bradford Beckmann; Rakesh Kumar; David Wood

doi:10.1145/3462209

Byte-Select Compression

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3462209 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-27

Author(s):

Matthew Tomei ◽

Shomit Das ◽

Mohammad Seyedzadeh ◽

Philip Bedoukian ◽

Bradford Beckmann ◽

...

Keyword(s):

Compression Ratio ◽

Ad Hoc ◽

Training Data ◽

Data Sets ◽

Effective Technique ◽

Compression Algorithms ◽

Previous Algorithm ◽

Data Transfers ◽

Cache Compression ◽

Cache Block

Cache-block compression is a highly effective technique for both reducing accesses to lower levels in the memory hierarchy (cache compression) and minimizing data transfers (link compression). While many effective cache-block compression algorithms have been proposed, the design of these algorithms is largely ad hoc and manual and relies on human recognition of patterns. In this article, we take an entirely different approach. We introduce a class of “byte-select” compression algorithms, as well as an automated methodology for generating compression algorithms in this class. We argue that, based on upper bounds within the class, the study of this class of byte-select algorithms has potential to yield algorithms with better performance than existing cache-block compression algorithms. The upper bound we establish on the compression ratio is 2X that of any existing algorithm. We then offer a generalized representation of a subset of byte-select compression algorithms and search through the resulting space guided by a set of training data traces. Using this automated process, we find efficient and effective algorithms for various hardware applications. We find that the resulting algorithms exploit novel patterns that can inform future algorithm designs. The generated byte-select algorithms are evaluated against a separate set of traces and evaluations show that Byte-Select has a 23% higher compression ratio on average. While no previous algorithm performs best for all our data sets which include CPU and GPU applications, our generated algorithms do. Using an automated hardware generator for these algorithms, we show that their decompression and compression latency is one and two cycles respectively, much lower than any existing algorithm with a competitive compression ratio.

Download Full-text

Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Sensors ◽

10.3390/s21051573 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1573

Author(s):

Loris Nanni ◽

Giovanni Minchio ◽

Sheryl Brahnam ◽

Gianluca Maguolo ◽

Alessandra Lumini

Keyword(s):

Vector Space ◽

Image Classification ◽

Ad Hoc ◽

Feature Space ◽

Medical Data ◽

Training Data ◽

Data Sets ◽

Large Set ◽

Clustering Methods ◽

Siamese Networks

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

Download Full-text

A VLSI Approach for Cache Compression in Microprocessor

International Journal of Instrumentation Control and Automation ◽

10.47893/ijica.2011.1034 ◽

2011 ◽

pp. 187-191

Author(s):

Sharada Guptha M N ◽

H. S. Pradeep ◽

M Z Kurian

Keyword(s):

Power Consumption ◽

Compression Ratio ◽

Memory System ◽

Memory Access ◽

Access Time ◽

Compression Algorithms ◽

Past Work ◽

Cache Line ◽

On Chip ◽

Cache Compression

Speed is one of the important issues that generally customers consider for selecting any electronic component in the market. Speed of a microprocessor based system mainly depends on the speed of the microprocessor which in turn depends on the memory access time. Accessing on chip memory takes more time than accessing off-chip memory. Because of these, designers of memory system may find cache compression as an advantageous method to increase speed of a microprocessor based system, as it increases cache capacity and off-chip bandwidth. The However, most past work, and all work on cache compression, has made unsubstantiated assumptions about the performance, power consumption, and area overheads of the proposed compression algorithms and hardware. It is not possible to determine whether compression at levels of the memory hierarchy closest to the processor is beneficial without understanding its costs. Proposed hardware compression algorithms fall into the dictionary-based category, which depend on building a dictionary and using its entries to encode repeated data values. Proposed algorithm has number of novel features like including combining pairs of compressed lines into one cache line and allowing parallel compression of multiple words while using a single dictionary and without degradation in compression ratio.

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text

Intelligent system of training data sets for current reported normality levels and physical fitness analysis

Science & Sports ◽

10.1016/j.scispo.2020.10.001 ◽

2021 ◽

Author(s):

S.-Y. Jang ◽

W.-Y. So ◽

T.T. Jeong

Keyword(s):

Physical Fitness ◽

Intelligent System ◽

Training Data ◽

Data Sets ◽

Fitness Analysis

Download Full-text

Data-based structural health monitoring using small training data sets

Structural Control and Health Monitoring ◽

10.1002/stc.1744 ◽

2015 ◽

Vol 22 (10) ◽

pp. 1240-1264 ◽

Cited By ~ 21

Author(s):

Luciana Balsamo ◽

Raimondo Betti

Keyword(s):

Structural Health Monitoring ◽

Health Monitoring ◽

Training Data ◽

Data Sets ◽

Structural Health

Download Full-text

Application of the C4.5 Algorithm to Predict the Types of Disease in Pigs Based on Android

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v10.i01.p14 ◽

2021 ◽

Vol 10 (1) ◽

pp. 105

Author(s):

I Gusti Ayu Purnami Indryaswari ◽

Ida Bagus Made Mahendra

Keyword(s):

Programming Language ◽

Test Data ◽

Training Data ◽

Data Sets ◽

Android Application ◽

C4.5 Algorithm ◽

Sqlite Database

Many Indonesian people, especially in Bali, make pigs as livestock. Pig livestock are susceptible to various types of diseases and there have been many cases of pig deaths due to diseases that cause losses to breeders. Therefore, the author wants to create an Android-based application that can predict the type of disease in pigs by applying the C4.5 Algorithm. The C4.5 algorithm is an algorithm for classifying data in order to obtain a rule that is used to predict something. In this study, 50 training data sets were used with 8 types of diseases in pigs and 31 symptoms of disease. which is then inputted into the system so that the data is processed so that the system in the form of an Android application can predict the type of disease in pigs. In the testing process, it was carried out by testing 15 test data sets and producing an accuracy value that is 86.7%. In testing the application features built using the Kotlin programming language and the SQLite database, it has been running as expected.

Download Full-text

Development of Reliable NARX Models of Gas Turbine Cold, Warm and Hot Start-Up

Volume 9: Oil and Gas Applications; Supercritical CO2 Power Cycles; Wind Energy ◽

10.1115/gt2017-63332 ◽

2017 ◽

Cited By ~ 2

Author(s):

Hilal Bahlawan ◽

Mirko Morini ◽

Michele Pinelli ◽

Pier Ruggero Spina ◽

Mauro Venturini

Keyword(s):

Gas Turbine ◽

Training Data ◽

Series Data ◽

Data Sets ◽

Control Logic ◽

Start Up ◽

Hot Start ◽

Narx Models ◽

Set Up ◽

Rapid Transients

This paper documents the set-up and validation of nonlinear autoregressive exogenous (NARX) models of a heavy-duty single-shaft gas turbine. The considered gas turbine is a General Electric PG 9351FA located in Italy. The data used for model training are time series data sets of several different maneuvers taken experimentally during the start-up procedure and refer to cold, warm and hot start-up. The trained NARX models are used to predict other experimental data sets and comparisons are made among the outputs of the models and the corresponding measured data. Therefore, this paper addresses the challenge of setting up robust and reliable NARX models, by means of a sound selection of training data sets and a sensitivity analysis on the number of neurons. Moreover, a new performance function for the training process is defined to weigh more the most rapid transients. The final aim of this paper is the set-up of a powerful, easy-to-build and very accurate simulation tool which can be used for both control logic tuning and gas turbine diagnostics, characterized by good generalization capability.

Download Full-text

A Novel Active Contours Model for Environmental Change Detection from Multitemporal Synthetic Aperture Radar Images

Remote Sensing ◽

10.3390/rs12111746 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1746

Author(s):

Salman Ahmadi ◽

Saeid Homayouni

Keyword(s):

Synthetic Aperture Radar ◽

Change Detection ◽

Active Contours ◽

Training Data ◽

Synthetic Aperture ◽

Data Sets ◽

Difference Image ◽

Proposed Model ◽

The Difference ◽

Aperture Radar

In this paper, we propose a novel approach based on the active contours model for change detection from synthetic aperture radar (SAR) images. In order to increase the accuracy of the proposed approach, a new operator was introduced to generate a difference image from the before and after change images. Then, a new model of active contours was developed for accurately detecting changed regions from the difference image. The proposed model extracts the changed areas as a target feature from the difference image based on training data from changed and unchanged regions. In this research, we used the Otsu histogram thresholding method to produce the training data automatically. In addition, the training data were updated in the process of minimizing the energy function of the model. To evaluate the accuracy of the model, we applied the proposed method to three benchmark SAR data sets. The proposed model obtains 84.65%, 87.07%, and 96.26% of the Kappa coefficient for Yellow River Estuary, Bern, and Ottawa sample data sets, respectively. These results demonstrated the effectiveness of the proposed approach compared to other methods. Another advantage of the proposed model is its high speed in comparison to the conventional methods.

Download Full-text

Data Analysis With Shapley Values For Automatic Subject Selection in Alzheimer's Disease Data Sets Using Interpretable Machine Learning

10.21203/rs.3.rs-245707/v1 ◽

2021 ◽

Author(s):

Louise Bloch ◽

Christoph M. Friedrich

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Test Data ◽

Noisy Data ◽

Training Data ◽

Data Sets ◽

Data Set ◽

Model Interpretation ◽

Percentage Points ◽

Shapley Values

Abstract Background: The prediction of whether Mild Cognitive Impaired (MCI) subjects will prospectively develop Alzheimer's Disease (AD) is important for the recruitment and monitoring of subjects for therapy studies. Machine Learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to noisy data sets. Additional noise is introduced by multicentric study designs and varying acquisition protocols. This article examines whether an automatic and fair data valuation method based on Shapley values can identify subjects with noisy data. Methods: An ML-workow was developed and trained for a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test data set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workow included volumetric Magnetic Resonance Imaging (MRI) feature extraction, subject sample selection using data Shapley, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for model training and Kernel SHapley Additive exPlanations (SHAP) values for model interpretation. This model interpretation enables clinically relevant explanation of individual predictions. Results: The XGBoost models which excluded 116 of the 467 subjects from the training data set based on their Logistic Regression (LR) data Shapley values outperformed the models which were trained on the entire training data set and which reached a mean classification accuracy of 58.54 % by 14.13 % (8.27 percentage points) on the independent ADNI test data set. The XGBoost models, which were trained on the entire training data set reached a mean accuracy of 60.35 % for the AIBL data set. An improvement of 24.86 % (15.00 percentage points) could be reached for the XGBoost models if those 72 subjects with the smallest RF data Shapley values were excluded from the training data set. Conclusion: The data Shapley method was able to improve the classification accuracies for the test data sets. Noisy data was associated with the number of ApoEϵ4 alleles and volumetric MRI measurements. Kernel SHAP showed that the black-box models learned biologically plausible associations.

Download Full-text

Artificial Intelligence Is a Promising Prospect for the Detection of Prostate Cancer Extracapsular Extension With Mp-mri: A Two-center Comparative Study

10.21203/rs.3.rs-298296/v1 ◽

2021 ◽

Author(s):

Ying Hou ◽

Yi-Hong Zhang ◽

Jie Bao ◽

Mei-Ling Bao ◽

Guang Yang ◽

...

Keyword(s):

Prostate Cancer ◽

Artificial Intelligence ◽

Paired Comparison ◽

Characteristic Curve ◽

Multiparametric Mri ◽

Training Data ◽

Urinary Continence ◽

Data Sets ◽

Extracapsular Extension ◽

Promising Alternative

Abstract Purpose: A balance between preserving urinary continence and achievement of negative margins is of clinical relevance while implementary difficulty. Preoperatively accurate detection of prostate cancer (PCa) extracapsular extension (ECE) is thus crucial for determining appropriate treatment options. We aimed to develop and clinically validate an artificial intelligence (AI)-assisted tool for the detection of ECE in patients with PCa using multiparametric MRI. Methods: 849 patients with localized PCa underwent multiparametric MRI before radical prostatectomy were retrospectively included from two medical centers. The AI tool was built on a ResNeXt network embedded with a spatial attention map of experts’ prior knowledges (PAGNet) from 596 training data sets. The tool was validated in 150 internal and 103 external data sets, respectively; and its clinical applicability was compared with expert-based interpretation and AI-expert interaction.Results: An index PAGNet model using a single-slice image yielded the highest areas under the receiver operating characteristic curve (AUC) of 0.857 (95% confidence interval [CI], 0.827-0.884), 0.807 (95% CI, 0.735-0.867) and 0.728 (95% CI, 0.631-0.811) in the training, internal test and external test cohorts, compared to the conventional ResNeXt networks. For experts, the inter-reader agreement was observed in only 437/849 (51.5%) patients with a Kappa value 0.343. And the performance of two experts (AUC, 0.632 to 0.741 vs 0.715 to 0.857) was lower (paired comparison, all p values < 0.05) than that of AI assessment. When expert’ interpretations were adjusted by the AI assessments, the performance of both two experts was improved.Conclusion: Our AI tool, showing improved accuracy, offers a promising alternative to human experts for imaging staging of PCa ECE using multiparametric MRI.

Download Full-text