Tiered Sampling

2021 ◽  
Vol 15 (5) ◽  
pp. 1-52
Author(s):  
Lorenzo De Stefani ◽  
Erisa Terolli ◽  
Eli Upfal

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.


2015 ◽  
Vol 25 (2) ◽  
pp. 281-293 ◽  
Author(s):  
Miloš Kudĕlka ◽  
Šárka Zehnalová ◽  
Zdenĕk Horák ◽  
Pavel Krömer ◽  
Václav Snášel

Abstract Many real world data and processes have a network structure and can usefully be represented as graphs. Network analysis focuses on the relations among the nodes exploring the properties of each network. We introduce a method for measuring the strength of the relationship between two nodes of a network and for their ranking. This method is applicable to all kinds of networks, including directed and weighted networks. The approach extracts dependency relations among the network’s nodes from the structure in local surroundings of individual nodes. For the tasks we deal with in this article, the key technical parameter is locality. Since only the surroundings of the examined nodes are used in computations, there is no need to analyze the entire network. This allows the application of our approach in the area of large-scale networks. We present several experiments using small networks as well as large-scale artificial and real world networks. The results of the experiments show high effectiveness due to the locality of our approach and also high quality node ranking comparable to PageRank.



Author(s):  
Miguel Ángel Hernández-Rodríguez ◽  
Ermengol Sempere-Verdú ◽  
Caterina Vicens-Caldentey ◽  
Francisca González-Rubio ◽  
Félix Miguel-García ◽  
...  

We aimed to identify and compare medication profiles in populations with polypharmacy between 2005 and 2015. We conducted a cross-sectional study using information from the Computerized Database for Pharmacoepidemiologic Studies in Primary Care (BIFAP, Spain). We estimated the prevalence of therapeutic subgroups in all individuals 15 years of age and older with polypharmacy (≥5 drugs during ≥6 months) using the Anatomical Therapeutic Chemical classification system level 4, by sex and age group, for both calendar years. The most prescribed drugs were proton-pump inhibitors (PPIs), statins, antiplatelet agents, benzodiazepine derivatives, and angiotensin-converting enzyme inhibitors. The greatest increases between 2005 and 2015 were observed in PPIs, statins, other antidepressants, and β-blockers, while the prevalence of antiepileptics was almost tripled. We observed increases in psychotropic drugs in women and cardiovascular medications in men. By patient´s age groups, there were notable increases in antipsychotics, antidepressants, and antiepileptics (15–44 years); antidepressants, PPIs, and selective β-blockers (45–64 years); selective β-blockers, biguanides, PPIs, and statins (65–79 years); and in statins, selective β-blockers, and PPIs (80 years and older). Our results revealed important increases in the use of specific therapeutic subgroups, like PPIs, statins, and psychotropic drugs, highlighting opportunities to design and implement strategies to analyze such prescriptions’ appropriateness.



Geosciences ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 174
Author(s):  
Marco Emanuele Discenza ◽  
Carlo Esposito ◽  
Goro Komatsu ◽  
Enrico Miccadei

The availability of high-quality surface data acquired by recent Mars missions and the development of increasingly accurate methods for analysis have made it possible to identify, describe, and analyze many geological and geomorphological processes previously unknown or unstudied on Mars. Among these, the slow and large-scale slope deformational phenomena, generally known as Deep-Seated Gravitational Slope Deformations (DSGSDs), are of particular interest. Since the early 2000s, several studies were conducted in order to identify and analyze Martian large-scale gravitational processes. Similar to what happens on Earth, these phenomena apparently occur in diverse morpho-structural conditions on Mars. Nevertheless, the difficulty of directly studying geological, structural, and geomorphological characteristics of the planet makes the analysis of these phenomena particularly complex, leaving numerous questions to be answered. This paper reports a synthesis of all the known studies conducted on large-scale deformational processes on Mars to date, in order to provide a complete and exhaustive picture of the phenomena. After the synthesis of the literature studies, the specific characteristics of the phenomena are analyzed, and the remaining main open issued are described.



Author(s):  
Haotian Yang ◽  
Hao Zhu ◽  
Yanru Wang ◽  
Mingkai Huang ◽  
Qiu Shen ◽  
...  
Keyword(s):  


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.



2021 ◽  
Vol 15 (3) ◽  
pp. 1-28
Author(s):  
Xueyan Liu ◽  
Bo Yang ◽  
Hechang Chen ◽  
Katarzyna Musial ◽  
Hongxu Chen ◽  
...  

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1



Toxins ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 420
Author(s):  
Yi Ma ◽  
Liu Cui ◽  
Meng Wang ◽  
Qiuli Sun ◽  
Kaisheng Liu ◽  
...  

Bacterial ghosts (BGs) are empty cell envelopes possessing native extracellular structures without a cytoplasm and genetic materials. BGs are proposed to have significant prospects in biomedical research as vaccines or delivery carriers. The applications of BGs are often limited by inefficient bacterial lysis and a low yield. To solve these problems, we compared the lysis efficiency of the wild-type protein E (EW) from phage ΦX174 and the screened mutant protein E (EM) in the Escherichia coli BL21(DE3) strain. The results show that the lysis efficiency mediated by protein EM was improved. The implementation of the pLysS plasmid allowed nearly 100% lysis efficiency, with a high initial cell density as high as OD600 = 2.0, which was higher compared to the commonly used BG preparation method. The results of Western blot analysis and immunofluorescence indicate that the expression level of protein EM was significantly higher than that of the non-pLysS plasmid. High-quality BGs were observed by SEM and TEM. To verify the applicability of this method in other bacteria, the T7 RNA polymerase expression system was successfully constructed in Salmonella enterica (S. Enterica, SE). A pET vector containing EM and pLysS were introduced to obtain high-quality SE ghosts which could provide efficient protection for humans and animals. This paper describes a novel and commonly used method to produce high-quality BGs on a large scale for the first time.



2006 ◽  
Vol 19 (11) ◽  
pp. 1118-1123 ◽  
Author(s):  
T Zilbauer ◽  
P Berberich ◽  
A Lümkemann ◽  
K Numssen ◽  
T Wassner ◽  
...  


2018 ◽  
Vol 15 ◽  
pp. 31-36 ◽  
Author(s):  
Sufu Liu ◽  
Xinhui Xia ◽  
Shengjue Deng ◽  
Liyuan Zhang ◽  
Yuqian Li ◽  
...  


Author(s):  
Xiaoyin Bai ◽  
Huimin Zhang ◽  
Gechong Ruan ◽  
Hong Lv ◽  
Yue Li ◽  
...  

Abstract Background There is lack of real-world data for disease behavior and surgery of Crohn’s disease (CD) from large-scale Chinese cohorts. Methods Hospitalized patients diagnosed with CD in our center were consecutively included from January 2000 to December 2018. Disease behavior progression was defined as the initial classification of B1 to the progression to B2 or B3. Clinical characteristics including demographics, disease classification and activity, medical therapy, development of cancers, and death were collected. Results Overall, 504 patients were included. Two hundred and thirty one (45.8%) patients were initially classified as B1; 30 (13.0%), 71 (30.7%), and 95 (41.1%) of them had disease progression at the 1-year follow-up, 5-year follow-up, and overall, respectively. Patients without location transition before behavior transition were less likely to experience behavior progression. However, patients without previous exposure to a corticosteroid, immunomodulator, or biological agent had a greater chance of experiencing behavior progression. When the long-term prognosis was evaluated, 211 (41.9%) patients underwent at least one CD-related surgery; 108 (21.4%) and 120 (23.8%) of these patients underwent surgery before and after their diagnosis, respectively. An initial classification as B1, no behavior transition, no surgery prior to diagnosis, and previous corticosteroid exposure during follow-up were associated with a lower risk of undergoing surgery. Conclusions This study depicts the clinical features and factors associated with behavior progression and surgery among hospitalized CD patients in a Chinese center. Behavior progression is associated with a higher probability of CD-related surgery, and strengthened therapies are necessary for them in the early phase.



Sign in / Sign up

Export Citation Format

Share Document