ML-LOO: Detecting Adversarial Examples with Feature Attribution

Deep neural networks obtain state-of-the-art performance on a series of tasks. However, they are easily fooled by adding a small adversarial perturbation to the input. The perturbation is often imperceptible to humans on image data. We observe a significant difference in feature attributions between adversarially crafted examples and original examples. Based on this observation, we introduce a new framework to detect adversarial examples through thresholding a scale estimate of feature attribution scores. Furthermore, we extend our method to include multi-layer feature attributions in order to tackle attacks that have mixed confidence levels. As demonstrated in extensive experiments, our method achieves superior performances in distinguishing adversarial examples from popular attack methods on a variety of real data sets compared to state-of-the-art detection methods. In particular, our method is able to detect adversarial examples of mixed confidence levels, and transfer between different attacking methods. We also show that our method achieves competitive performance even when the attacker has complete access to the detector.

Download Full-text

BARcode DEmixing through Non-negative Spatial Regression (BarDensr)

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008256 ◽

2021 ◽

Vol 17 (3) ◽

pp. e1008256

Author(s):

Shuonan Chen ◽

Jackson Loper ◽

Xiaoyin Chen ◽

Alex Vaughan ◽

Anthony M. Zador ◽

...

Keyword(s):

State Of The Art ◽

Spatial Regression ◽

Image Data ◽

Real Data ◽

Optimization Methods ◽

Spatial Density ◽

Signal Recovery ◽

Cloud Platform ◽

Rna Transcripts ◽

Different Types

Modern spatial transcriptomics methods can target thousands of different types of RNA transcripts in a single slice of tissue. Many biological applications demand a high spatial density of transcripts relative to the imaging resolution, leading to partial mixing of transcript rolonies in many voxels; unfortunately, current analysis methods do not perform robustly in this highly-mixed setting. Here we develop a new analysis approach, BARcode DEmixing through Non-negative Spatial Regression (BarDensr): we start with a generative model of the physical process that leads to the observed image data and then apply sparse convex optimization methods to estimate the underlying (demixed) rolony densities. We apply BarDensr to simulated and real data and find that it achieves state of the art signal recovery, particularly in densely-labeled regions or data with low spatial resolution. Finally, BarDensr is fast and parallelizable. We provide open-source code as well as an implementation for the ‘NeuroCAAS’ cloud platform.

Download Full-text

A New Strategy of Hybrid Models using ARIMA, ANN, and DWT in Time Series Modelling

Journal of Statistics Advances in Theory and Applications ◽

10.18642/jsata_7100122182 ◽

2021 ◽

Vol 25 (1) ◽

pp. 27-50

Author(s):

Tsung-Lin Li ◽

◽

Chen-An Tsai ◽

Keyword(s):

Time Series ◽

Moving Average ◽

Arima Model ◽

Real Data ◽

Hybrid Models ◽

Discrete Wavelet ◽

Data Sets ◽

Model Combining ◽

Time Series Modelling ◽

Significant Difference

Time series forecasting is a challenging task of interest in many disciplines. A variety of techniques have been developed to deal with the problem through a combination of different disciplines. Although various researches have proved successful for hybrid models, none of them carried out the comparisons with solid statistical test. This paper proposes a new stepwise model determination method for artificial neural network (ANN) and a novel hybrid model combining autoregressive integrated moving average (ARIMA) model, ANN and discrete wavelet transformation (DWT). Simulation studies are conducted to compare the performance of different models, including ARIMA, ANN, ARIMA-ANN, DWT-ARIMA-ANN and the proposed method, ARIMA-DWT-ANN. Also, two real data sets, Lynx data and cabbage data, are used to demonstrate the applications. Our proposed method, ARIMA-DWT-ANN, outperforms other methods in both simulated datasets and Lynx data, while ANN shows a better performance in the cabbage data. We conducted a two-way ANOVA test to compare the performances of methods. The results showed a significant difference between methods. As a brief conclusion, it is suggested to try on ANN and ARIMA-DWT-ANN due to their robustness and high accuracy. Since the performance of hybrid models may vary across data sets based on their ARIMA alike or ANN alike natures, they should all be considered when encountering a new data to reach an optimal performance.

Download Full-text

Sensitive, reliable, and robust circRNA detection from RNA-seq with CirComPara2

10.1101/2021.02.18.431705 ◽

2021 ◽

Author(s):

Enrico Gaffo ◽

Alessia Buratin ◽

Anna Dal Molin ◽

Stefania Bortoluzzi

Keyword(s):

Real Data ◽

Detection Algorithm ◽

Detection Methods ◽

Circular Rnas ◽

Data Sets ◽

Rna Seq ◽

Bioinformatics Tool ◽

Diverse Data ◽

High Throughput Study ◽

Discovery Rates

AbstractCurrent methods for identifying circular RNAs (circRNAs) suffer from low discovery rates and inconsistent performance in diverse data sets. Therefore, the applied detection algorithm can bias high-throughput study findings by missing relevant circRNAs. Here, we show that our bioinformatics tool CirComPara2 (https://github.com/egaffo/CirComPara2), by combining multiple circRNA detection methods, consistently achieves high recall rates without loss of precision in simulated and different real-data sets.

Download Full-text

Constructing a Lightweight Key-Value Store Based on the Windows Native Features

Applied Sciences ◽

10.3390/app9183801 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3801 ◽

Cited By ~ 1

Author(s):

Hyuk-Yoon Kwon

Keyword(s):

State Of The Art ◽

Main Idea ◽

Real Data ◽

Data Sets ◽

Parameter Setting ◽

Data Set ◽

Multi Level ◽

Windows Registry ◽

Best Parameter ◽

Better Than

In this paper, we propose a method to construct a lightweight key-value store based on the Windows native features. The main idea is providing a thin wrapper for the key-value store on top of a built-in storage in Windows, called Windows registry. First, we define a mapping of the components in the key-value store onto the components in the Windows registry. Then, we present a hash-based multi-level registry index so as to distribute the key-value data balanced and to efficiently access them. Third, we implement basic operations of the key-value store (i.e., Get, Put, and Delete) by manipulating the Windows registry using the Windows native APIs. We call the proposed key-value store WR-Store. Finally, we propose an efficient ETL (Extract-Transform-Load) method to migrate data stored in WR-Store into any other environments that support existing key-value stores. Because the performance of the Windows registry has not been studied much, we perform the empirical study to understand the characteristics of WR-Store, and then, tune the performance of WR-Store to find the best parameter setting. Through extensive experiments using synthetic and real data sets, we show that the performance of WR-Store is comparable to or even better than the state-of-the-art systems (i.e., RocksDB, BerkeleyDB, and LevelDB). Especially, we show the scalability of WR-Store. That is, WR-Store becomes much more efficient than the other key-value stores as the size of data set increases. In addition, we show that the performance of WR-Store is maintained even in the case of intensive registry workloads where 1000 processes accessing to the registry actively are concurrently running.

Download Full-text

Unsupervised Feature Selection Based on Spectral Clustering with Maximum Relevancy and Minimum Redundancy Approach

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421500312 ◽

2021 ◽

Vol 35 (11) ◽

pp. 2150031

Author(s):

Bahareh Khozaei ◽

Mahdi Eftekhari

Keyword(s):

Feature Selection ◽

Spectral Clustering ◽

Information Gain ◽

State Of The Art ◽

Nearest Neighbors ◽

Data Sets ◽

Unsupervised Feature Selection ◽

Significant Difference ◽

Cluster A ◽

Novel Approaches

In this paper, two novel approaches for unsupervised feature selection are proposed based on the spectral clustering. In the first proposed method, spectral clustering is employed over the features and the center of clusters is selected as well as their nearest-neighbors. These features have a minimum similarity (redundancy) between themselves since they belong to different clusters. Next, samples of data sets are clustered employing spectral clustering so that to the samples of each cluster a specific pseudo-label is assigned. After that according to the obtained pseudo-labels, the information gain of the features is computed that secures the maximum relevancy. Finally, the intersection of the selected features in the two previous steps is determined that simultaneously guarantees both the maximum relevancy and minimum redundancy. Our second proposed approach is very similar to the first one whose only but significant difference with the first method is that it selects one feature from each cluster and sorts all the features in terms of their relevancy. Then, by appending the selected features to a sorted list and ignoring them for the next step, the algorithm continues with the remaining features until all the features to be appended into the sorted list. Both of our proposed methods are compared with state-of-the-art methods and the obtained results confirm the performance of our proposed approaches especially the second one.

Download Full-text

Evaluation of Image Forgery Detection Using Multi-Scale Weber Local Descriptors

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015400163 ◽

2015 ◽

Vol 24 (04) ◽

pp. 1540016 ◽

Cited By ~ 18

Author(s):

Muhammad Hussain ◽

Sahar Qasem ◽

George Bebis ◽

Ghulam Muhammad ◽

Hatim Aboalsamh ◽

...

Keyword(s):

Digital Image ◽

Image Data ◽

Feature Space ◽

Detection Methods ◽

Support Vector ◽

Data Sets ◽

Forgery Detection ◽

Data Set ◽

Multi Scale ◽

Copy Move Forgery Detection

Due to the maturing of digital image processing techniques, there are many tools that can forge an image easily without leaving visible traces and lead to the problem of the authentication of digital images. Based on the assumption that forgery alters the texture micro-patterns in a digital image and texture descriptors can be used for modeling this change; we employed two stat-of-the-art local texture descriptors: multi-scale Weber's law descriptor (multi-WLD) and multi-scale local binary pattern (multi-LBP) for splicing and copy-move forgery detection. As the tamper traces are not visible to open eyes, so the chrominance components of an image encode these traces and were used for modeling tamper traces with the texture descriptors. To reduce the dimension of the feature space and get rid of redundant features, we employed locally learning based (LLB) algorithm. For identifying an image as authentic or tampered, Support vector machine (SVM) was used. This paper presents the thorough investigation for the validation of this forgery detection method. The experiments were conducted on three benchmark image data sets, namely, CASIA v1.0, CASIA v2.0, and Columbia color. The experimental results showed that the accuracy rate of multi-WLD based method was 94.19% on CASIA v1.0, 96.52% on CASIA v2.0, and 94.17% on Columbia data set. It is not only significantly better than multi-LBP based method, but also it outperforms other stat-of-the-art similar forgery detection methods.

Download Full-text

Multivariate phase combination improves automated crystallographic model building

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444910014642 ◽

2010 ◽

Vol 66 (7) ◽

pp. 783-788 ◽

Cited By ~ 15

Author(s):

Pavol Skubák ◽

Willem-Jan Waterreus ◽

Navraj S. Pannu

Keyword(s):

Model Building ◽

State Of The Art ◽

Standard Technique ◽

Real Data ◽

Data Sets ◽

Macromolecular Crystallography ◽

Current State ◽

Phase Combination ◽

Density Map ◽

Electron Density Map

Density modification is a standard technique in macromolecular crystallography that can significantly improve an initial electron-density map. To obtain optimal results, the initial and density-modified map are combined. Current methods assume that these two maps are independent and propagate the initial map information and its accuracy indirectly through previously determined coefficients. A multivariate equation has been derived that no longer assumes independence between the initial and density-modified map, considers the observed diffraction data directly and refines the errors that can occur in a single-wavelength anomalous diffraction experiment. The equation has been implemented and tested on over 100 real data sets. The results are dramatic: the method provides significantly improved maps over the current state of the art and leads to many more structures being built automatically.

Download Full-text

Semi-Supervised Outlier Detection with Only Positive and Unlabeled Data Based on Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500037 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550003 ◽

Cited By ~ 1

Author(s):

Armin Daneshpazhouh ◽

Ashkan Sami

Keyword(s):

Intrusion Detection ◽

Outlier Detection ◽

Fuzzy Clustering ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Real World Applications

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.

Download Full-text

INCREASING POWER BY USING HAPLOTYPE SIMILARITY IN A MULTIMARKER TRANSMISSION/DISEQUILIBRIUM TEST

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001250014x ◽

2013 ◽

Vol 11 (02) ◽

pp. 1250014 ◽

Cited By ~ 4

Author(s):

MARÍA M. ABAD-GRAU ◽

NURIA MEDINA-MEDINA ◽

SERAFÍN MORAL ◽

ROSANA MONTES-SOLDADO ◽

SERGIO TORRES-SÁNCHEZ ◽

...

Keyword(s):

High Risk ◽

Prior Knowledge ◽

Transmission Disequilibrium Test ◽

State Of The Art ◽

Real Data ◽

Low Risk ◽

Data Sets ◽

Transmission Disequilibrium ◽

Maximum Extent ◽

Haplotype Similarity

It is already known that power in multimarker transmission/disequilibrium tests may improve with the number of markers as some associations may require several markers to be captured. However, a mechanism such as haplotype grouping must be used to avoid incremental complexity with the number of markers. 2G, a state-of-the-art transmission/disequilibrium test, implements this mechanism to its maximum extent by grouping haplotypes into only two groups, high and low-risk haplotypes, so that the test has only one degree of freedom regardless of the number of markers. The test checks whether those haplotypes more often transmitted from parents to offspring are truly high-risk haplotypes. In this paper we use haplotype similarity as prior knowledge to classify haplotypes as high or low risk ones and start with those haplotypes in which the prior will have lower impact i.e. those with the largest differences between transmission and non-transmission counts. If their counts are very different, the prior knowledge has little effect and haplotypes are classified as low or high risk as 2G does. We show a substantial gain in power achieved by this approach, in both simulation and real data sets.

Download Full-text

Joint Character-Level Word Embedding and Adversarial Stability Training to Defend Adversarial Text

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6356 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8384-8391

Author(s):

Hui Liu ◽

Yongzheng Zhang ◽

Yipeng Wang ◽

Zheng Lin ◽

Yige Chen

Keyword(s):

Language Processing ◽

Text Classification ◽

State Of The Art ◽

Word Embedding ◽

Data Sets ◽

Basic Task ◽

Gradient Based ◽

Adversarial Examples ◽

Stability Training ◽

Adversarial Example

Text classification is a basic task in natural language processing, but the small character perturbations in words can greatly decrease the effectiveness of text classification models, which is called character-level adversarial example attack. There are two main challenges in character-level adversarial examples defense, which are out-of-vocabulary words in word embedding model and the distribution difference between training and inference. Both of these two challenges make the character-level adversarial examples difficult to defend. In this paper, we propose a framework which jointly uses the character embedding and the adversarial stability training to overcome these two challenges. Our experimental results on five text classification data sets show that the models based on our framework can effectively defend character-level adversarial examples, and our models can defend 93.19% gradient-based adversarial examples and 94.83% natural adversarial examples, which outperforms the state-of-the-art defense models.

Download Full-text