Timely Reporting of Heavy Hitters Using External Memory

Given an input stream S of size N , a ɸ-heavy hitter is an item that occurs at least ɸN times in S . The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection ( TED ) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device’s random I/O throughput, i.e., ≈100K observations per second.

Download Full-text

Algorithms and Data Structures for External Memory

10.1561/9781601981073 ◽

2006 ◽

Author(s):

Jeffrey Scott Vitter

Keyword(s):

Data Structures ◽

External Memory ◽

Algorithms And Data Structures

Download Full-text

Sky Segmentation for Enhanced Depth Reconstruction and Bokeh Rendering with Efficient Architectures

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.14.coimg-378 ◽

2020 ◽

Vol 2020 (14) ◽

pp. 378-1-378-7

Author(s):

Tyler Nuanes ◽

Matt Elsey ◽

Radek Grzeszczuk ◽

John Paul Shen

Keyword(s):

Real Time ◽

Mobile Device ◽

Computational Cost ◽

False Positives ◽

Compact Model ◽

High Quality ◽

False Negatives ◽

Trade Off ◽

Depth Reconstruction ◽

Binary Classifiers

We present a high-quality sky segmentation model for depth refinement and investigate residual architecture performance to inform optimally shrinking the network. We describe a model that runs in near real-time on mobile device, present a new, highquality dataset, and detail a unique weighing to trade off false positives and false negatives in binary classifiers. We show how the optimizations improve bokeh rendering by correcting stereo depth misprediction in sky regions. We detail techniques used to preserve edges, reject false positives, and ensure generalization to the diversity of sky scenes. Finally, we present a compact model and compare performance of four popular residual architectures (ShuffleNet, MobileNetV2, Resnet-101, and Resnet-34-like) at constant computational cost.

Download Full-text

Rapid Online Buffer Exchange: A Method for Screening of Proteins, Protein Complexes, and Cell Lysates by Native Mass Spectrometry

10.26434/chemrxiv.8792177 ◽

2019 ◽

Author(s):

Zachary VanAernum ◽

Florian Busch ◽

Benjamin J. Jones ◽

Mengxuan Jia ◽

Zibo Chen ◽

...

Keyword(s):

Mass Spectrometry ◽

High Speed ◽

Structural Information ◽

Protein Complexes ◽

High Sensitivity ◽

Native Mass Spectrometry ◽

Structural Features ◽

Consumer Products ◽

Cell Lysates ◽

Protein Expression And Purification

It is important to assess the identity and purity of proteins and protein complexes during and after protein purification to ensure that samples are of sufficient quality for further biochemical and structural characterization, as well as for use in consumer products, chemical processes, and therapeutics. Native mass spectrometry (nMS) has become an important tool in protein analysis due to its ability to retain non-covalent interactions during measurements, making it possible to obtain protein structural information with high sensitivity and at high speed. Interferences from the presence of non-volatiles are typically alleviated by offline buffer exchange, which is timeconsuming and difficult to automate. We provide a protocol for rapid online buffer exchange (OBE) nMS to directly screen structural features of pre-purified proteins, protein complexes, or clarified cell lysates. Information obtained by OBE nMS can be used for fast (<5 min) quality control and can further guide protein expression and purification optimization.

Download Full-text

Automatic Extraction of Acronyms from Text

10.26686/wgtn.12922298 ◽

2020 ◽

Author(s):

Stuart Yeates

Keyword(s):

Digital Library ◽

False Positives ◽

Automatic Extraction ◽

False Negatives ◽

Library Research ◽

Communications Theory ◽

Textual Content

A brief introduction to acronyms is given and motivation for extracting them in a digital library environment is discussed. A technique for extracting acronyms is given with an analysis of the results. The technique is found to have a low number of false negatives and a high number of false positives. Introduction Digital library research seeks to build tools to enable access of content, while making as few as possible assumptions about the content, since assumptions limit the range of applicability of the tools. Generally, the broader the assumptions the more widely applicable the tools. For example, keyword based indexing [5] is based on communications theory and applies to all natural human textual languages (allowances for differences in character sets and similar localisation issues not withstanding) . The algorithm described in this paper makes much stronger assumptions about the content. It assumes textual content that contains acronyms, an assumption which is known to hold for...

Download Full-text

Faculty Opinions recommendation of High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.11045956.12239054 ◽

2011 ◽

Author(s):

Stephen Wright ◽

Juan Escobar

Keyword(s):

Positive Selection ◽

High Sensitivity ◽

False Positives ◽

High Rate

Download Full-text

Avoiding Interest-Based Revenues While Constructing Shariah-Compliant Portfolios: False Negatives and False Positives

SSRN Electronic Journal ◽

10.2139/ssrn.2975790 ◽

2017 ◽

Author(s):

zggr Arslan-Ayaydin ◽

Kris Boudt ◽

Muhammad Wajid Raza

Keyword(s):

False Positives ◽

False Negatives

Download Full-text

Study on the Implementation of a Simple and Effective Memory System for an AI Chip

Electronics ◽

10.3390/electronics10121399 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1399

Author(s):

Taepyeong Kim ◽

Sangun Park ◽

Yongbeom Cho

Keyword(s):

High Speed ◽

Memory Systems ◽

Memory System ◽

External Memory ◽

Essential Factor ◽

Wide Bandwidth ◽

Reading And Writing ◽

Data Skew ◽

Lower Performance ◽

Efficient Memory

In this study, a simple and effective memory system required for the implementation of an AI chip is proposed. To implement an AI chip, the use of internal or external memory is an essential factor, because the reading and writing of data in memory occurs a lot. Those memory systems that are currently used are large in design size and complex to implement in order to handle a high speed and a wide bandwidth. Therefore, depending on the AI application, there are cases where the circuit size of the memory system is larger than that of the AI core. In this study, SDRAM, which has a lower performance than the currently used memory system but does not have a problem in operating AI, was used and all circuits were implemented digitally for simple and efficient implementation. In particular, a delay controller was designed to reduce the error due to data skew inside the memory bus to ensure stability in reading and writing data. First of all, it verified the memory system based on the You Only Look Once (YOLO) algorithm in FPGA to confirm that the memory system proposed in AI works efficiently. Based on the proven memory system, we implemented a chip using Samsung Electronics’ 65 nm process and tested it. As a result, we designed a simple and efficient memory system for AI chip implementation and verified it with hardware.

Download Full-text

Timely Reporting of Heavy Hitters using External Memory

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ◽

10.1145/3318464.3380598 ◽

2020 ◽

Author(s):

Prashant Pandey ◽

Shikha Singh ◽

Michael A. Bender ◽

Jonathan W. Berry ◽

Martín Farach-Colton ◽

...

Keyword(s):

External Memory ◽

Heavy Hitters

Download Full-text

Test and Design Strategy Analysis of Large Space Acoustic Environment of High-Speed Railway Station

2020 IEEE 2nd International Conference on Architecture, Construction, Environment and Hydraulics (ICACEH) ◽

10.1109/icaceh51803.2020.9366248 ◽

2020 ◽

Author(s):

Jingyi Han ◽

Hui Zhang ◽

Qiaoneng Huang ◽

Yuxi Liu

Keyword(s):

High Speed ◽

Design Strategy ◽

Strategy Analysis ◽

Large Space ◽

Railway Station ◽

High Speed Railway ◽

Acoustic Environment

Download Full-text

Bi- or multiparametric MRI in a sequential screening program for prostate cancer with PSA followed by MRI? Results from the Göteborg prostate cancer screening 2 trial

European Radiology ◽

10.1007/s00330-021-07907-9 ◽

2021 ◽

Author(s):

Jonas Wallström ◽

Kjell Geterud ◽

Kimia Kohestani ◽

Stephan E. Maier ◽

Marianne Månsson ◽

...

Keyword(s):

Prostate Cancer ◽

Cancer Detection ◽

Screening Program ◽

Prostate Cancer Screening ◽

High Sensitivity ◽

Multiparametric Mri ◽

False Positives ◽

Quality Data ◽

Gadolinium Contrast ◽

Sequential Screening

Abstract Objectives The PIRADS Steering Committee has called for “higher quality data before making evidence-based recommendations on MRI without contrast enhancement as an initial diagnostic work up,” however, recognizing biparametric (bp) MRI as a reasonable option in a low-risk setting such as screening. With bpMRI, more men can undergo MRI at a lower cost and they can be spared the invasiveness of intravenous access. The aim of this study was to assess cancer detection in bpMRI vs mpMRI in sequential screening for prostate cancer (PCa). Methods Within the ongoing Göteborg PCa screening 2 trial, we assessed cancer detection in 551 consecutive participants undergoing prostate MRI. In the same session, readers first assessed bpMRI and then mpMRI. Four targeted biopsies were performed for lesions scored PIRADS 3–5 with bpMRI and/or mpMRI. Results Cancer was detected in 84/551 cases (15.2%; 95% CI: 12.4–18.4) with mpMRI and in 83/551 cases (15.1%; 95% CI: 12.3–18.2%) with bpMRI. The relative risk (RR) for cancer detection with bpMRI compared to mpMRI was 0.99 (95% one-sided CI: > 94.8); bpMRI was non-inferior to mpMRI (10% non-inferiority margin). bpMRI resulted in fewer false positives, 45/128 (35.2%), compared to mpMRI, 52/136 (38.2%), RR = 0.92; 95% CI: 0.84–0.98. Of 8 lesions scored positive only with mpMRI, 7 were false positives. The PPV for MRI and targeted biopsy was 83/128 (64.8%) for bpMRI and 84/136 (61.8%) for mpMRI, RR = 1.05, 95% CI: 1.01–1.10. Conclusions In a PSA-screened population, bpMRI was non-inferior to mpMRI for cancer detection and resulted in fewer false positives. Key Points • In screening for prostate cancer with PSA followed by MRI, biparametric MRI allows radiologists to detect an almost similar number of prostate cancers and score fewer false positive lesions compared to multiparametric MRI. • In a screening program, high sensitivity should be weighed against cost and risks for healthy men; a large number of men can be saved the exposure of gadolinium contrast medium by adopting biparametric MRI and at the same time allowing for a higher turnover in the MRI room.

Download Full-text