High-Quality and Low-Memory-Footprint Progressive Decoding of Large-Scale Particle Data

Author(s):  
Duong Hoang ◽  
Harsh Bhatia ◽  
Peter Lindstrom ◽  
Valerio Pascucci
2021 ◽  
Vol 15 (5) ◽  
pp. 1-52
Author(s):  
Lorenzo De Stefani ◽  
Erisa Terolli ◽  
Eli Upfal

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.


Geosciences ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 174
Author(s):  
Marco Emanuele Discenza ◽  
Carlo Esposito ◽  
Goro Komatsu ◽  
Enrico Miccadei

The availability of high-quality surface data acquired by recent Mars missions and the development of increasingly accurate methods for analysis have made it possible to identify, describe, and analyze many geological and geomorphological processes previously unknown or unstudied on Mars. Among these, the slow and large-scale slope deformational phenomena, generally known as Deep-Seated Gravitational Slope Deformations (DSGSDs), are of particular interest. Since the early 2000s, several studies were conducted in order to identify and analyze Martian large-scale gravitational processes. Similar to what happens on Earth, these phenomena apparently occur in diverse morpho-structural conditions on Mars. Nevertheless, the difficulty of directly studying geological, structural, and geomorphological characteristics of the planet makes the analysis of these phenomena particularly complex, leaving numerous questions to be answered. This paper reports a synthesis of all the known studies conducted on large-scale deformational processes on Mars to date, in order to provide a complete and exhaustive picture of the phenomena. After the synthesis of the literature studies, the specific characteristics of the phenomena are analyzed, and the remaining main open issued are described.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


Toxins ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 420
Author(s):  
Yi Ma ◽  
Liu Cui ◽  
Meng Wang ◽  
Qiuli Sun ◽  
Kaisheng Liu ◽  
...  

Bacterial ghosts (BGs) are empty cell envelopes possessing native extracellular structures without a cytoplasm and genetic materials. BGs are proposed to have significant prospects in biomedical research as vaccines or delivery carriers. The applications of BGs are often limited by inefficient bacterial lysis and a low yield. To solve these problems, we compared the lysis efficiency of the wild-type protein E (EW) from phage ΦX174 and the screened mutant protein E (EM) in the Escherichia coli BL21(DE3) strain. The results show that the lysis efficiency mediated by protein EM was improved. The implementation of the pLysS plasmid allowed nearly 100% lysis efficiency, with a high initial cell density as high as OD600 = 2.0, which was higher compared to the commonly used BG preparation method. The results of Western blot analysis and immunofluorescence indicate that the expression level of protein EM was significantly higher than that of the non-pLysS plasmid. High-quality BGs were observed by SEM and TEM. To verify the applicability of this method in other bacteria, the T7 RNA polymerase expression system was successfully constructed in Salmonella enterica (S. Enterica, SE). A pET vector containing EM and pLysS were introduced to obtain high-quality SE ghosts which could provide efficient protection for humans and animals. This paper describes a novel and commonly used method to produce high-quality BGs on a large scale for the first time.


Author(s):  
Christopher Pagano ◽  
Flavia Tauro ◽  
Salvatore Grimaldi ◽  
Maurizio Porfiri

Large scale particle image velocimetry (LSPIV) is a nonintrusive environmental monitoring methodology that allows for continuous characterization of surface flows in natural catchments. Despite its promise, the implementation of LSPIV in natural environments is limited to areas accessible to human operators. In this work, we propose a novel experimental configuration that allows for unsupervised LSPIV over large water bodies. Specifically, we design, develop, and characterize a lightweight, low cost, and stable quadricopter hosting a digital acquisition system. An active gimbal maintains the camera lens orthogonal to the water surface, thus preventing severe image distortions. Field experiments are performed to characterize the vehicle and assess the feasibility of the approach. We demonstrate that the quadricopter can hover above an area of 1×1m2 for 4–5 minutes with a payload of 500g. Further, LSPIV measurements on a natural stream confirm that the methodology can be reliably used for surface flow studies.


2006 ◽  
Vol 19 (11) ◽  
pp. 1118-1123 ◽  
Author(s):  
T Zilbauer ◽  
P Berberich ◽  
A Lümkemann ◽  
K Numssen ◽  
T Wassner ◽  
...  

2017 ◽  
Vol 814 ◽  
pp. 592-613 ◽  
Author(s):  
Andras Nemes ◽  
Teja Dasari ◽  
Jiarong Hong ◽  
Michele Guala ◽  
Filippo Coletti

We report on optical field measurements of snow settling in atmospheric turbulence at $Re_{\unicode[STIX]{x1D706}}=940$. It is found that the snowflakes exhibit hallmark features of inertial particles in turbulence. The snow motion is analysed in both Eulerian and Lagrangian frameworks by large-scale particle imaging, while sonic anemometry is used to characterize the flow field. Additionally, the snowflake size and morphology are assessed by digital in-line holography. The low volume fraction and mass loading imply a one-way interaction with the turbulent air. Acceleration probability density functions show wide exponential tails consistent with laboratory and numerical studies of homogeneous isotropic turbulence. Invoking the assumption that the particle acceleration has a stronger dependence on the Stokes number than on the specific features of the turbulence (e.g. precise Reynolds number and large-scale anisotropy), we make inferences on the snowflakes’ aerodynamic response time. In particular, we observe that their acceleration distribution is consistent with that of particles of Stokes number in the range $St=0.1{-}0.4$ based on the Kolmogorov time scale. The still-air terminal velocities estimated for the resulting range of aerodynamic response times are significantly smaller than the measured snow particle fall speed. This is interpreted as a manifestation of settling enhancement by turbulence, which is observed here for the first time in a natural setting.


Sign in / Sign up

Export Citation Format

Share Document