SGI: Automatic clinical subgroup identification in omics datasets

Mapping Intimacies ◽

10.1101/2021.03.12.435108 ◽

2021 ◽

Author(s):

Mustafa Buyukozkan ◽

Karsten Suhre ◽

Jan Krumsiek

Keyword(s):

Large Scale ◽

R Package ◽

Metabolomics Data ◽

Source Codes ◽

Link Type ◽

Association Testing ◽

Subgroup Identification ◽

Hands On ◽

Control Study

SummaryThe ‘Subgroup Identification’ (SGI) toolbox provides an algorithm to automatically detect clinical subgroups of samples in large-scale omics datasets. It is based on hierarchical clustering trees in combination with a specifically designed association testing and visualization framework that can process an arbitrary number of clinical parameters and outcomes in a systematic fashion. A multi-block extension allows for the simultaneous use of multiple omics datasets on the same samples. In this paper, we describe the functionality of the toolbox and demonstrate an application example on a blood metabolomics dataset with various clinical biochemistry readouts in a type 2 diabetes case-control study.Availability and implementationSGI is an open-source package implemented in R. Package source codes and hands-on tutorials are available at https://github.com/krumsieklab/sgi. The QMdiab metabolomics data is included in the package and can be downloaded from https://doi.org/10.6084/m9.figshare.5904022.

Download Full-text

Haplotype Structures and Large-Scale Association Testing of the 5' AMP-Activated Protein Kinase Genes PRKAA2, PRKAB1, and PRKAB2 With Type 2 Diabetes

Diabetes ◽

10.2337/diabetes.55.03.06.db05-1418 ◽

2006 ◽

Vol 55 (3) ◽

pp. 849-855 ◽

Cited By ~ 16

Author(s):

M. W. Sun ◽

J. Y. Lee ◽

P. I.W. de Bakker ◽

N. P. Burtt ◽

P. Almgren ◽

...

Keyword(s):

Type 2 Diabetes ◽

Protein Kinase ◽

Large Scale ◽

Association Testing ◽

Amp Activated Protein Kinase

Download Full-text

HierCC: A multi-level clustering scheme for population assignments based on core genome MLST

10.1101/2020.11.25.397539 ◽

2020 ◽

Author(s):

Zhemin Zhou ◽

Jane Charlesworth ◽

Mark Achtman

Keyword(s):

Disease Surveillance ◽

Large Scale ◽

Core Genome ◽

Supplementary Information ◽

Source Codes ◽

Link Type ◽

Scalable Clustering ◽

Population Structures ◽

Multi Level ◽

Level Cluster

AbstractMotivationRoutine infectious disease surveillance is increasingly based on large-scale whole genome sequencing databases. Real-time surveillance would benefit from immediate assignments of each genome assembly to hierarchical population structures. Here we present HierCC, a scalable clustering scheme based on core genome multi-locus typing that allows incremental, static, multi-level cluster assignments of genomes. We also present HCCeval, which identifies optimal thresholds for assigning genomes to cohesive HierCC clusters. HierCC was implemented in EnteroBase in 2018, and has since genotyped >400,000 genomes from Salmonella, Escherichia, Yersinia and Clostridioides.AvailabilityImplementation: http://enterobase.warwick.ac.uk/ and Source codes: https://github.com/zheminzhou/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

dsSwissKnife: An R package for federated data analysis

10.1101/2020.11.17.386813 ◽

2020 ◽

Author(s):

Iulian Dragan ◽

Thomas Sparsø ◽

Dmitry Kuznetsov ◽

Roderick Slieker ◽

Mark Ibberson

Keyword(s):

Type 2 Diabetes ◽

Data Analysis ◽

General Public ◽

Biomarker Discovery ◽

R Package ◽

Link Type ◽

General Public License

ABSTRACTSummarydsSwissKnife is an R package that enables several powerful analyses to be performed on federated datasets. The package works alongside DataSHIELD and extends its functionality. We have developed and implemented dsSwissKnife in a large IMI project on type 2 diabetes, RHAPSODY, where data from 10 observational cohorts have been harmonised and federated in CDISC SDTM format and made available for biomarker discovery.Availability and implementationdsSwissKnife is freely available online at https://github.com/sib-swiss/dsSwissKnife. The package is distributed under the GNU General Public License version [email protected]

Download Full-text

Genetic variation in IL6 gene and type 2 diabetes: tagging-SNP haplotype analysis in large-scale case–control study and meta-analysis

Human Molecular Genetics ◽

10.1093/hmg/ddl113 ◽

2006 ◽

Vol 15 (11) ◽

pp. 1914-1920 ◽

Cited By ~ 66

Author(s):

Lu Qi ◽

Rob M. van Dam ◽

James B. Meigs ◽

JoAnn E. Manson ◽

David Hunter ◽

...

Keyword(s):

Type 2 Diabetes ◽

Large Scale ◽

Haplotype Analysis ◽

Case Control Study ◽

Meta Analysis ◽

Case Control ◽

Tagging Snp ◽

Control Study ◽

Il6 Gene

Download Full-text

pmparser and PMDB: resources for large-scale, open studies of the biomedical literature

10.1101/2020.09.07.285924 ◽

2020 ◽

Author(s):

Joshua L. Schoenbachler ◽

Jacob J. Hughey

Keyword(s):

Relational Database ◽

Large Scale ◽

R Package ◽

Biomedical Literature ◽

Complex Queries ◽

Link Type ◽

Biomedical Community

AbstractPubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org. PMDB is stored in PostgreSQL and compressed dumps are available on Zenodo (https://doi.org/10.5281/zenodo.4008109).

Download Full-text

HiCAGE : an R package for large-scale annotation and visualization of 3C-based genomic data

10.1101/315234 ◽

2018 ◽

Author(s):

Michael J. Workman ◽

Tiago C. Silva ◽

Simon G. Coetzee ◽

Dennis J. Hazelett

Keyword(s):

Large Scale ◽

R Package ◽

Rna Seq ◽

Regulatory Interactions ◽

Link Type ◽

Chromatin Interactions ◽

Ready Access ◽

Web App ◽

Genome Annotations ◽

Gene Ontologies

AbstractChromatin interactions measured by the 3C-based family of next generation technologies are becoming increasingly important for measuring the physical basis for regulatory interactions between different classes of functional domains in the genome. Software is needed to streamline analyses of these data and integrate them with custom genome annotations, RNA-seq, and gene ontologies. We introduce a new R package compatible with Bioconductor—Hi-C Annotation and Graphics Ensemble (HiCAGE)—to perform these tasks with minimum effort. In addition, the package contains a shiny/R web app interface to provide ready access to its functions.Availability and ImplementationThe software is implemented in R and is freely available under GPLv3. HiCAGE runs in R (version 3.4) and is freely available through github (https://github.com/mworkman13/HiCAGE) or on the web (https://junkdnalab.shinyapps.io/hicage).

Download Full-text

dropClust2: An R package for resource efficient analysis of large scale single cell RNA-Seq data

10.1101/596924 ◽

2019 ◽

Author(s):

Debajyoti Sinha ◽

Pradyumn Sinha ◽

Ritwik Saha ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Single Cell ◽

Programming Languages ◽

Large Scale ◽

Principal Component ◽

Cell Types ◽

R Package ◽

Locality Sensitive Hashing ◽

Rna Seq ◽

Link Type ◽

Component Selection

ABSTRACTDropClust leverages Locality Sensitive Hashing (LSH) to speed up clustering of large scale single cell expression data. It makes ingenious use of structure persevering sampling and modality based principal component selection to rescue minor cell types. Existing implementation of dropClust involves interfacing with multiple programming languagesviz. R, python and C, hindering seamless installation and portability. Here we present dropClust2, a complete R package that’s not only fast but also minimally resource intensive. DropClust2 features a novel batch effect removal algorithm that allows integrative analysis of single cell RNA-seq (scRNA-seq) datasets.Availability and implementationdropClust2 is freely available athttps://debsinha.shinyapps.io/dropClust/as an online web service and athttps://github.com/debsin/dropClustas an R package.

Download Full-text

ECHO: an Application for Detection and Analysis of Oscillators Identifies Metabolic Regulation on Genome-Wide Circadian Output

10.1101/690941 ◽

2019 ◽

Cited By ~ 1

Author(s):

Hannah De los Santos ◽

Emily J. Collins ◽

Catherine Mann ◽

April W. Sagan ◽

Meaghan S. Jankowski ◽

...

Keyword(s):

Metabolic Regulation ◽

Large Scale ◽

R Package ◽

Supplementary Information ◽

Link Type ◽

Time Courses ◽

Synthetic Datasets ◽

Mass Spring ◽

Genome Scale ◽

The Impact

AbstractMotivationTime courses utilizing genome scale data are a common approach to identifying the biological pathways that are controlled by the circadian clock, an important regulator of organismal fitness. However, the methods used to detect circadian oscillations in these datasets are not able to accommodate changes in the amplitude of the oscillations over time, leading to an underestimation of the impact of the clock on biological systems.ResultsWe have created a program to efficaciously identify oscillations in large-scale datasets, called the Extended Circadian Harmonic Oscillator application, or ECHO. ECHO utilizes an extended solution of the fixed amplitude mass-spring oscillator that incorporates the amplitude change coefficient. Employing synthetic datasets, we determined that ECHO outperforms existing methods in detecting rhythms with decreasing oscillation amplitudes and recovering phase shift. Rhythms with changing amplitudes identified from published biological datasets revealed distinct functions from those oscillations that were harmonic, suggesting purposeful biologic regulation to create this subtype of circadian rhythms.AvailabilityECHO’s full interface is available athttps://github.com/delosh653/ECHO. An R package for this functionality, echo.find, can be downloaded athttps://CRAN.R-project.org/[email protected] informationSupplementary data are available

Download Full-text