small subset
Recently Published Documents





2022 ◽  
Vol 54 (7) ◽  
pp. 1-35
Uttam Chauhan ◽  
Apurva Shah

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

2022 ◽  
Vol 13 (1) ◽  
pp. 1-28
Mohammad Ehsan Shahmi Chowdhury ◽  
Chowdhury Farhan Ahmed ◽  
Carson K. Leung

Nowadays graphical datasets are having a vast amount of applications. As a result, graph mining—mining graph datasets to extract frequent subgraphs—has proven to be crucial in numerous aspects. It is important to perform correlation analysis among the subparts (i.e., elements) of the frequent subgraphs generated using graph mining to observe interesting information. However, the majority of existing works focuses on complexities in dealing with graphical structures, and not much work aims to perform correlation analysis. For instance, a previous work realized in this regard, operated with a very naive raw approach to fulfill the objective, but dealt only on a small subset of the problem. Hence, in this article, a new measure is proposed to aid in the analysis for large subgraphs, mined from various types of graph transactions in the dataset. These subgraphs are immense in terms of their structural composition, and thus parallel the entire set of graphs in real-world. A complete framework for discovering the relations among parts of a frequent subgraph is proposed using our new method. Evaluation results show the usefulness and accuracy of the newly defined measure on real-life graphical datasets.

F1000Research ◽  
2022 ◽  
Vol 11 ◽  
pp. 38
Mikhail Raevskiy ◽  
Anna Kondrashina ◽  
Yulia Medvedeva

Identification of transcription factors (TFs) that could induce and direct cell conversion remains a challenge. Though several hundreds of TFs are usually transcribed in each cell type, the identity of a cell is controlled and can be achieved through the ectopic overexpression of only a small subset of so-called core TFs. Currently, the experimental identification of the core TFs for a broad spectrum of cell types remains challenging. Computational solutions to this problem would provide a better understanding of the mechanisms controlling cell identity during natural embryonic or malignant development, as well as give a foundation for cell-based therapy. Herein, we propose a computational approach based on over-enrichment of transcription factors binding sites (TFBS) in differentially accessible chromatin regions that could identify the potential core TFs for a variety of primary human cells involved in hematopoiesis. Our approach enables the integration of both transcriptomic (single-cell RNA sequencing, scRNA-seq) and epigenenomic (single-cell assay for transposable-accessible chromatin, scATAC-seq) data at the single-cell resolution to search for core TFs, and can be scalable to predict subsets of core TFs and their role in a given conversion between cells.

2022 ◽  
Vol 18 (1) ◽  
pp. e1009628
Zhi Ming Xu ◽  
Sina Rüeger ◽  
Michaela Zwyer ◽  
Daniela Brites ◽  
Hellen Hiza ◽  

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.

Funsho J Ogunshola ◽  
Werner Smidt ◽  
Anneta F Naidoo ◽  
Thandeka Prudence Nkosi ◽  
Thandekile Ngubane ◽  

CD8+ T-cells play an important role in HIV control. However, in human lymph nodes (LNs), only a small subset of CD8+ T-cells expresses CXCR5, the chemokine receptor required for cell migration into B cell follicles, which are major sanctuaries for HIV persistence in individuals on therapy. Here, we investigate the impact of HIV infection on follicular CD8+ T-cells (fCD8s) frequencies, trafficking pattern and CXCR5 regulation. We show that, although HIV infection results in a marginal increase of fCD8s in LN, the majority of HIV-specific CD8+ T-cells are CXCR5 negative (non-fCD8s) (p<0.003). Mechanistic investigations using ATAC-seq showed that non-fCD8s have closed chromatin at the CXCR5 transcriptional start site (TSS). DNA bisulfite sequencing identified DNA hypermethylation at the CXCR5 TSS as the most probable cause of closed chromatin. Transcriptional factor footprints analysis revealed enrichment of transforming growth factors (TGFs) at the TSS of fCD8s. In-vitro stimulation of non-fCD8s with recombinant TGF-β resulted in significant increase in CXCR5 expression (fCD8s). Thus, this study identifies TGF-β signaling as a viable strategy for increasing fCD8s frequencies in follicular areas of the LN where they are needed to eliminate HIV infected cells, with implications for HIV cure strategies.

Children ◽  
2022 ◽  
Vol 9 (1) ◽  
pp. 56
Yi-Ting Cheng ◽  
Yu-Shin Lee ◽  
Jainn-Jim Lin ◽  
Hung-Tao Chung ◽  
Yhu-Chering Huang ◽  

Kawasaki disease (KD) is an acute systemic vasculitis of unknown cause that mainly affects infants and children and can result in coronary artery complications if left untreated. A small subset of KD patients with fever and cervical lymphadenitis has been reported as node-first-presenting KD (NFKD). This type of KD commonly affects the older pediatric population with a more intense inflammatory process. Considering its unusual initial presentation, a delay in diagnosis and treatment increases the risk of coronary artery complications. Herein, we report the case of a 9-year-old female with fever and neck mass that rapidly deteriorated to shock status. A diagnosis of KD was made after the signs and symptoms fulfilled the principal diagnostic criteria. The patient’s heart failure and blood pressure improved dramatically after a single dose of intravenous immunoglobulin. This case reminds us that NFKD could be the initial manifestation of KDSS, which is a potentially fatal condition. We review the literature to identify the overlapping characteristics of NFKD and KDSS, and to highlight the importance of early recognition of atypical KD regardless of age. We conclude that unusually high C-reactive protein, neutrophilia, and thrombocytopenia serve as supplemental laboratory indicators for early identification of KDSS in patients with NFKD.

Jingwei Yun ◽  
Erin Evoy ◽  
Soleil Worthy ◽  
Melody Fraser ◽  
Daniel Veber ◽  

Ice nucleating particles (INPs) are a small subset of atmospheric particles that can initiate the formation of ice in mixed-phase clouds. Here we report concentrations of INPs during October and...

2022 ◽  
Vol 258 (1) ◽  
pp. 15
S. Everett ◽  
B. Yanny ◽  
N. Kuropatkin ◽  
E. M. Huff ◽  
Y. Zhang ◽  

Abstract We describe an updated calibration and diagnostic framework, Balrog, used to directly sample the selection and photometric biases of the Dark Energy Survey (DES) Year 3 (Y3) data set. We systematically inject onto the single-epoch images of a random 20% subset of the DES footprint an ensemble of nearly 30 million realistic galaxy models derived from DES Deep Field observations. These augmented images are analyzed in parallel with the original data to automatically inherit measurement systematics that are often too difficult to capture with generative models. The resulting object catalog is a Monte Carlo sampling of the DES transfer function and is used as a powerful diagnostic and calibration tool for a variety of DES Y3 science, particularly for the calibration of the photometric redshifts of distant “source” galaxies and magnification biases of nearer “lens” galaxies. The recovered Balrog injections are shown to closely match the photometric property distributions of the Y3 GOLD catalog, particularly in color, and capture the number density fluctuations from observing conditions of the real data within 1% for a typical galaxy sample. We find that Y3 colors are extremely well calibrated, typically within ∼1–8 mmag, but for a small subset of objects, we detect significant magnitude biases correlated with large overestimates of the injected object size due to proximity effects and blending. We discuss approaches to extend the current methodology to capture more aspects of the transfer function and reach full coverage of the survey footprint for future analyses.

2021 ◽  
Vol 9 (4) ◽  
pp. 1-39
Paul GÖlz ◽  
Anson Kahng ◽  
Simon Mackenzie ◽  
Ariel D. Procaccia

Liquid democracy is the principle of making collective decisions by letting agents transitively delegate their votes. Despite its significant appeal, it has become apparent that a weakness of liquid democracy is that a small subset of agents may gain massive influence. To address this, we propose to change the current practice by allowing agents to specify multiple delegation options instead of just one. Much like in nature, where—fluid mechanics teaches us—liquid maintains an equal level in connected vessels, we seek to control the flow of votes in a way that balances influence as much as possible. Specifically, we analyze the problem of choosing delegations to approximately minimize the maximum number of votes entrusted to any agent by drawing connections to the literature on confluent flow. We also introduce a random graph model for liquid democracy and use it to demonstrate the benefits of our approach both theoretically and empirically.

2021 ◽  
Hanieh Falahati ◽  
Yumei Wu ◽  
Vanessa Feuerer ◽  
Pietro De Camilli

The spine apparatus is a specialization of the neuronal ER in dendritic spines consisting of stacks of interconnected cisterns separated by a dense matrix. Synaptopodin, a specific actin binding protein of the spine apparatus, is essential for its formation, but the underlying mechanisms remain unknown. We show that synaptopodin, when expressed in fibroblasts, forms actin-rich structures with connections to the ER, and that an ER-tethered synaptopodin assembles into liquid condensates. We also identified protein neighbors of synaptopodin in spines by in vivo proximity biotinylation. We validated a small subset of such proteins and showed that they co-assemble with synaptopodin in living cells. One of them is Pdlim7, an actin binding protein not previously identified in spines, and we show its precise colocalization with synaptopodin. We suggest that the matrix of the spine apparatus has the property of a liquid protein condensate generated by a multiplicity of low affinity interactions.

Sign in / Sign up

Export Citation Format

Share Document