Neuronal Subcompartment Classification and Merge Error Correction

AbstractRecent advances in 3d electron microscopy are yielding ever larger reconstructions of brain tissue, encompassing thousands of individual neurons interconnected by millions of synapses. Interpreting reconstructions at this scale demands advances in the automated analysis of neuronal morphologies, for example by identifying morphological and functional subcompartments within neurons. We present a method that for the first time uses full 3d input (voxels) to automatically classify reconstructed neuron fragments as axon, dendrite, or somal subcompartments. Based on 3d convolutional neural networks, this method achieves a mean f1-score of 0.972, exceeding the previous state of the art of 0.955. The resulting predictions can support multiple analysis and proofreading applications. In particular, we leverage finely localized subcompartment predictions for automated detection and correction of merge errors in the volume reconstruction, successfully detecting 90.6% of inter-class merge errors with a false positive rate of only 2.7%.

Download Full-text

A Partial Correlation Screening Approach for Controlling the False Positive Rate in Sparse Gaussian Graphical Models

Scientific Reports ◽

10.1038/s41598-019-53795-x ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Ginette Lafit ◽

Francis Tuerlinckx ◽

Inez Myin-Germeys ◽

Eva Ceulemans

Keyword(s):

Graphical Models ◽

False Positive ◽

Partial Correlation ◽

State Of The Art ◽

False Positive Rate ◽

False Positives ◽

Gaussian Graphical Models ◽

Undirected Network ◽

Partial Correlations ◽

Positive Rate

AbstractGaussian Graphical Models (GGMs) are extensively used in many research areas, such as genomics, proteomics, neuroimaging, and psychology, to study the partial correlation structure of a set of variables. This structure is visualized by drawing an undirected network, in which the variables constitute the nodes and the partial correlations the edges. In many applications, it makes sense to impose sparsity (i.e., some of the partial correlations are forced to zero) as sparsity is theoretically meaningful and/or because it improves the predictive accuracy of the fitted model. However, as we will show by means of extensive simulations, state-of-the-art estimation approaches for imposing sparsity on GGMs, such as the Graphical lasso, ℓ1 regularized nodewise regression, and joint sparse regression, fall short because they often yield too many false positives (i.e., partial correlations that are not properly set to zero). In this paper we present a new estimation approach that allows to control the false positive rate better. Our approach consists of two steps: First, we estimate an undirected network using one of the three state-of-the-art estimation approaches. Second, we try to detect the false positives, by flagging the partial correlations that are smaller in absolute value than a given threshold, which is determined through cross-validation; the flagged correlations are set to zero. Applying this new approach to the same simulated data, shows that it indeed performs better. We also illustrate our approach by using it to estimate (1) a gene regulatory network for breast cancer data, (2) a symptom network of patients with a diagnosis within the nonaffective psychotic spectrum and (3) a symptom network of patients with PTSD.

Download Full-text

Website Fingerprinting with Website Oracles

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2020-0013 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 235-255 ◽

Cited By ~ 2

Author(s):

Tobias Pulls ◽

Rasmus Dahlberg

Keyword(s):

False Positive ◽

State Of The Art ◽

False Positive Rate ◽

Open World ◽

Wide Range ◽

Online Advertisement ◽

Positive Rate ◽

Access Logs ◽

Attacks And Defenses ◽

Security Notion

AbstractWebsite Fingerprinting (WF) attacks are a subset of traffic analysis attacks where a local passive attacker attempts to infer which websites a target victim is visiting over an encrypted tunnel, such as the anonymity network Tor. We introduce the security notion of a Website Oracle (WO) that gives a WF attacker the capability to determine whether a particular monitored website was among the websites visited by Tor clients at the time of a victim’s trace. Our simulations show that combining a WO with a WF attack—which we refer to as a WF+WO attack—significantly reduces false positives for about half of all website visits and for the vast majority of websites visited over Tor. The measured false positive rate is on the order one false positive per million classified website trace for websites around Alexa rank 10,000. Less popular monitored websites show orders of magnitude lower false positive rates.We argue that WOs are inherent to the setting of anonymity networks and should be an assumed capability of attackers when assessing WF attacks and defenses. Sources of WOs are abundant and available to a wide range of realistic attackers, e.g., due to the use of DNS, OCSP, and real-time bidding for online advertisement on the Internet, as well as the abundance of middleboxes and access logs. Access to a WO indicates that the evaluation of WF defenses in the open world should focus on the highest possible recall an attacker can achieve. Our simulations show that augmenting the Deep Fingerprinting WF attack by Sirinam et al. [60] with access to a WO significantly improves the attack against five state-of-the-art WF defenses, rendering some of them largely ineffective in this new WF+WO setting.

Download Full-text

Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0070 ◽

2019 ◽

Vol 2019 (4) ◽

pp. 292-310 ◽

Cited By ~ 10

Author(s):

Sanjit Bhat ◽

David Lu ◽

Albert Kwon ◽

Srinivas Devadas

Keyword(s):

Deep Learning ◽

State Of The Art ◽

False Positive Rate ◽

True Positive Rate ◽

Training Data ◽

Open World ◽

Prior Art ◽

Lower False Positive Rate ◽

Positive Rate ◽

Fingerprinting Attack

Abstract In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over 1% higher true positive rate (TPR) than state-of-the-art attacks while achieving 4× lower false positive rate (FPR). Var-CNN’s improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by 3.12% while increasing the TPR by 13%. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues.

Download Full-text

Toward optimal fingerprint indexing for large scale genomics

10.1101/2021.11.04.467355 ◽

2021 ◽

Author(s):

Clement Agret ◽

Bastien Cazaux ◽

Antoine Limasset

Keyword(s):

Large Scale ◽

State Of The Art ◽

False Positive Rate ◽

Bacterial Genomes ◽

Genomic Databases ◽

Novel Structure ◽

Fingerprint Indexing ◽

Positive Rate ◽

Inverted Indexes ◽

User Friendly

Motivation: To keep up with the scale of genomic databases, several methods rely on local sensitive hashing methods to efficiently find potential matches within large genome collections. Existing solutions rely on Minhash or Hyperloglog fingerprints and require reading the whole index to perform a query. Such solutions can not be considered scalable with the growing amount of documents to index. Results: We present NIQKI, a novel structure using well-designed fingerprints that lead to theoretical and practical query time improvements, outperforming state-of-the-art by orders of magnitude. Our contribution is threefold. First, we generalize the concept of Hyperminhash fingerprints in (h,m)-HMH fingerprints that can be tuned to present the lowest false positive rate given the expected sub-sampling applied. Second, we provide a structure able to index any kind of fingerprints based on inverted indexes that provide optimal queries, namely linear with the size of the output. Third, we implemented these approaches in a tool dubbed NIQKI that can index and calculate pairwise distances for over one million bacterial genomes from GenBank in a matter of days on a small cluster. We show that our approach can be orders of magnitude faster than state-of-the-art with comparable precision. We believe that this approach can lead to tremendous improvement allowing fast query, scaling on extensive genomic databases. Availability and implementation: We wrote the NIQKI index as an open-source C++ library under the AGPL3 license available at https://github.com/Malfoy/ NIQKI. It is designed as a user-friendly tool and comes along with usage sample

Download Full-text

Matrix Completion of World Trade

10.21203/rs.3.rs-1030693/v1 ◽

2021 ◽

Author(s):

Giorgio Gnecco ◽

Federico Nutarelli ◽

Massimo Riccaboni

Keyword(s):

State Of The Art ◽

Symmetric Matrix ◽

Matrix Completion ◽

False Positive Rate ◽

Relative Advantage ◽

Binary Classifier ◽

Economic Complexity ◽

Machine Learning Methods ◽

The Matrix ◽

Positive Rate

Abstract This work applies Matrix Completion (MC) – a class of machine-learning methods commonly used in the context of recommendation systems – to analyze economic complexity. MC is applied to reconstruct the Revealed Comparative Advantage (RCA) matrix, whose elements express the relative advantage of countries in given classes of products, as evidenced by yearly trade flows. A high-accuracy binary classifier is derived from the MC application, with the aim of discriminating between elements of the RCA matrix that are, respectively, higher/lower than one. We introduce a novel Matrix cOmpletion iNdex of Economic complexitY (MONEY) based on MC, and related to the degree of predictability of the RCA entries of different countries (the lower the predictability, the higher the complexity). Differently from previously-developed economic complexity indices, MONEY takes into account several singular vectors of the matrix reconstructed by MC, whereas other indices are based only on one/two eigenvectors of a suitable symmetric matrix, derived from the RCA matrix. Finally, MC is compared with a state-of-the-art economic complexity index (GENEPY), showing that the false positive rate per country of a binary classifier constructed starting from the average entry-wise output of MC is a proxy of GENEPY.

Download Full-text

Preservation of Urine for Flow Cytometric and Visual Microscopic Testing

Clinical Chemistry ◽

10.1093/clinchem/48.6.900 ◽

2002 ◽

Vol 48 (6) ◽

pp. 900-905 ◽

Cited By ~ 18

Author(s):

Timo Kouri ◽

Lotta Vuotari ◽

Simo Pohjavaara ◽

Pekka Laippala

Keyword(s):

False Positive Rate ◽

Practical Importance ◽

Automated Analysis ◽

Test Strip ◽

Particle Analysis ◽

Flow Cytometric ◽

Contrast Microscopy ◽

Positive Rate ◽

Urine Specimens ◽

Supravital Staining

Abstract Background: Preservatives that could prevent destruction of cells, casts, and bacteria in urine are of great practical importance because they allow centralization and improvement of accuracy of urine particle counting. We compared two in-house mixtures and one commercial solution, as well as refrigeration, for their ability to preserve urine for both automated analysis (flow cytometry) and visual microscopy. Methods: Urine specimens were preserved by refrigeration at 4 °C without preservatives (procedure 1); in a lyophilized solution intended to preserve specimens for bacterial culture (Urine C&S tubes; BD Preanalytical Solutions; procedure 2); in 10 mL/L formalin–0.15 mol/L NaCl (procedure 3); in 80 mL/L ethanol–20 g/L polyethylene glycol (procedure 4); and by storage at 20 °C without preservatives (procedure 5). Test strip measurements were used to select specimens positive for leukocyte esterase, hemoglobin, albumin, or nitrite. For 106 consecutive strip-positive specimens, urinalysis was performed by UF-100TM (Sysmex) and by phase-contrast microscopy after Sternheimer supravital staining. Automated analysis was performed at arrival in the morning, on the same day in the afternoon, and after 1 and 3 days. Visual microscopy was performed at arrival and 3 days later. Results: Urine bacterial counts were well preserved with procedures 1–3, with a false-positive rate of 0.0–3.4% at day 3 vs 28% without preservation (procedure 5). Erythrocytes were poorly preserved for 3 days (κ coefficients, 0.24–0.61); after 1 day, fair preservation was seen with procedure 2 (κ = 0.78), compared with less favorable preservation with procedure 1 (κ = 0.61) or procedure 5 (κ = 0.66). Leukocytes were well preserved by all five procedures in the acidic adult urines investigated. Counts of casts and large epithelial cells were artifactually increased by procedure 3. Procedure 2 performed at least as well as refrigeration for specimens analyzed with visual microscopy. Conclusions: Urine specimens from adults can be stabilized at room temperature for both automated particle analysis and visual microscopy.

Download Full-text

IntaRNAhelix-composing RNA–RNA interactions from stable inter-molecular helices boosts bacterial sRNA target prediction

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019400092 ◽

2019 ◽

Vol 17 (05) ◽

pp. 1940009 ◽

Cited By ~ 1

Author(s):

Rick Gelhausen ◽

Sebastian Will ◽

Ivo L. Hofacker ◽

Rolf Backofen ◽

Martin Raden

Keyword(s):

Prediction Accuracy ◽

Prediction Models ◽

State Of The Art ◽

False Positive Rate ◽

Dynamic Programming Algorithm ◽

Target Prediction ◽

Programming Algorithm ◽

Positive Rate ◽

Kinetic Effects ◽

Rna Interaction

Efficient computational tools for the identification of putative target RNAs regulated by prokaryotic sRNAs rely on thermodynamic models of RNA secondary structures. While they typically predict RNA–RNA interaction complexes accurately, they yield many highly-ranked false positives in target screens. One obvious source of this low specificity appears to be the disability of current secondary-structure-based models to reflect steric constraints, which nevertheless govern the kinetic formation of RNA–RNA interactions. For example, often — even thermodynamically favorable — extensions of short initial kissing hairpin interactions are kinetically prohibited, since this would require unwinding of intra-molecular helices as well as sterically impossible bending of the interaction helix. Another source is the consideration of instable and thus unlikely subinteractions that enable better scoring of longer interactions. In consequence, the efficient prediction methods that do not consider such effects show a high false positive rate. To increase the prediction accuracy we devise IntaRNAhelix, a dynamic programming algorithm that length-restricts the runs of consecutive inter-molecular base pairs (perfect canonical stackings), which we hypothesize to implicitly model the steric and kinetic effects. The novel method is implemented by extending the state-of-the-art tool IntaRNA. Our comprehensive bacterial sRNA target prediction benchmark demonstrates significant improvements of the prediction accuracy and enables more than 40-times faster computations. These results indicate — supporting our hypothesis — that stable helix composition increases the accuracy of interaction prediction models compared to the current state-of-the-art approach.

Download Full-text

A four-dimensional analysis of partitioned approximate filters

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476286 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2355-2368

Author(s):

Tobias Schmidt ◽

Maximilian Bandle ◽

Jana Giceva

Keyword(s):

False Positive ◽

State Of The Art ◽

False Positive Rate ◽

Bloom Filters ◽

Use Case ◽

Positive Rate ◽

Key Dimensions ◽

The Many ◽

Data Disk ◽

The Impact

With today's data deluge, approximate filters are particularly attractive to avoid expensive operations like remote data/disk accesses. Among the many filter variants available, it is non-trivial to find the most suitable one and its optimal configuration for a specific use-case. We provide open-source implementations for the most relevant filters (Bloom, Cuckoo, Morton, and Xor filters) and compare them in four key dimensions: the false-positive rate, space consumption, build, and lookup throughput. We improve upon existing state-of-the-art implementations with a new optimization, radix partitioning, which boosts the build and lookup throughput for large filters by up to 9x and 5x. Our in-depth evaluation first studies the impact of all available optimizations separately before combining them to determine the optimal filter for specific use-cases. While register-blocked Bloom filters offer the highest throughput, the new Xor filters are best suited when optimizing for small filter sizes or low false-positive rates.

Download Full-text

Feasibility of using template-based and object-based automated detection methods for quantifying black and hybrid imported fire ant (Solenopsis invicta and S. invicta×richteri) mounds in aerial digital imagery

The Rangeland Journal ◽

10.1071/rj08007 ◽

2008 ◽

Vol 30 (3) ◽

pp. 291 ◽

Cited By ~ 3

Author(s):

James T. Vogt ◽

Bradley Wallet

Keyword(s):

False Positive ◽

False Positive Rate ◽

Bare Soil ◽

Automated Detection ◽

Probability Of Detection ◽

Fire Ant ◽

Detection Methods ◽

Object Based ◽

Positive Rate ◽

A Site

Imported fire ants construct earthen nests (mounds) that exhibit many characteristics which make them potentially good targets for remote sensing programs, including geographical orientation, topography, and bare soil surrounded by actively growing vegetation. Template-based features and object-based features extracted from aerial multispectral imagery of fire ant infested pastures were used to construct classifiers for automated fire ant mound detection. A classifier constructed using template-based features alone yielded a 79% probability of detection with a corresponding false positive rate of 9%. Addition of object-based features (compactness and symmetry) to the classifier yielded a 79% probability of detection with a corresponding false positive rate of 4%. Maintaining a 79% detection rate when applying the classifier to a second, unique pasture dataset with different seasonal and other environmental factors resulted in a false positive rate of 17.5%. Data demonstrate that automated detection of mounds with classifiers incorporating template- and object-based features is feasible, but it may be necessary to construct unique classifiers on a site-specific basis.

Download Full-text

BLATTA: Early Exploit Detection on Network Traffic with Recurrent Neural Networks

Security and Communication Networks ◽

10.1155/2020/8826038 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Baskoro A. Pratomo ◽

Pete Burnap ◽

George Theodorakopoulos

Keyword(s):

Network Traffic ◽

Recurrent Neural Networks ◽

State Of The Art ◽

False Positive Rate ◽

Warning System ◽

Deep Packet Inspection ◽

Application Layer ◽

Detection Mechanism ◽

Positive Rate ◽

Packet Inspection

Detecting exploits is crucial since the effect of undetected ones can be devastating. Identifying their presence on the network allows us to respond and block their malicious payload before they cause damage to the system. Inspecting the payload of network traffic may offer better performance in detecting exploits as they tend to hide their presence and behave similarly to legitimate traffic. Previous works on deep packet inspection for detecting malicious traffic regularly read the full length of application layer messages. As the length varies, longer messages will take more time to analyse, during which time the attack creates a disruptive impact on the system. Hence, we propose a novel early exploit detection mechanism that scans network traffic, reading only 35.21% of application layer messages to predict malicious traffic while retaining a 97.57% detection rate and a 1.93% false positive rate. Our recurrent neural network- (RNN-) based model is the first work to our knowledge that provides early prediction of malicious application layer messages, thus detecting a potential attack earlier than other state-of-the-art approaches and enabling a form of early warning system.

Download Full-text