scholarly journals SGI: Automatic clinical subgroup identification in omics datasets

2021 ◽  
Author(s):  
Mustafa Buyukozkan ◽  
Karsten Suhre ◽  
Jan Krumsiek

SummaryThe ‘Subgroup Identification’ (SGI) toolbox provides an algorithm to automatically detect clinical subgroups of samples in large-scale omics datasets. It is based on hierarchical clustering trees in combination with a specifically designed association testing and visualization framework that can process an arbitrary number of clinical parameters and outcomes in a systematic fashion. A multi-block extension allows for the simultaneous use of multiple omics datasets on the same samples. In this paper, we describe the functionality of the toolbox and demonstrate an application example on a blood metabolomics dataset with various clinical biochemistry readouts in a type 2 diabetes case-control study.Availability and implementationSGI is an open-source package implemented in R. Package source codes and hands-on tutorials are available at https://github.com/krumsieklab/sgi. The QMdiab metabolomics data is included in the package and can be downloaded from https://doi.org/10.6084/m9.figshare.5904022.

Diabetes ◽  
2006 ◽  
Vol 55 (3) ◽  
pp. 849-855 ◽  
Author(s):  
M. W. Sun ◽  
J. Y. Lee ◽  
P. I.W. de Bakker ◽  
N. P. Burtt ◽  
P. Almgren ◽  
...  

2020 ◽  
Author(s):  
Zhemin Zhou ◽  
Jane Charlesworth ◽  
Mark Achtman

AbstractMotivationRoutine infectious disease surveillance is increasingly based on large-scale whole genome sequencing databases. Real-time surveillance would benefit from immediate assignments of each genome assembly to hierarchical population structures. Here we present HierCC, a scalable clustering scheme based on core genome multi-locus typing that allows incremental, static, multi-level cluster assignments of genomes. We also present HCCeval, which identifies optimal thresholds for assigning genomes to cohesive HierCC clusters. HierCC was implemented in EnteroBase in 2018, and has since genotyped >400,000 genomes from Salmonella, Escherichia, Yersinia and Clostridioides.AvailabilityImplementation: http://enterobase.warwick.ac.uk/ and Source codes: https://github.com/zheminzhou/[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Iulian Dragan ◽  
Thomas Sparsø ◽  
Dmitry Kuznetsov ◽  
Roderick Slieker ◽  
Mark Ibberson

ABSTRACTSummarydsSwissKnife is an R package that enables several powerful analyses to be performed on federated datasets. The package works alongside DataSHIELD and extends its functionality. We have developed and implemented dsSwissKnife in a large IMI project on type 2 diabetes, RHAPSODY, where data from 10 observational cohorts have been harmonised and federated in CDISC SDTM format and made available for biomarker discovery.Availability and implementationdsSwissKnife is freely available online at https://github.com/sib-swiss/dsSwissKnife. The package is distributed under the GNU General Public License version [email protected]


2006 ◽  
Vol 15 (11) ◽  
pp. 1914-1920 ◽  
Author(s):  
Lu Qi ◽  
Rob M. van Dam ◽  
James B. Meigs ◽  
JoAnn E. Manson ◽  
David Hunter ◽  
...  

2020 ◽  
Author(s):  
Joshua L. Schoenbachler ◽  
Jacob J. Hughey

AbstractPubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org. PMDB is stored in PostgreSQL and compressed dumps are available on Zenodo (https://doi.org/10.5281/zenodo.4008109).


2018 ◽  
Author(s):  
Michael J. Workman ◽  
Tiago C. Silva ◽  
Simon G. Coetzee ◽  
Dennis J. Hazelett

AbstractChromatin interactions measured by the 3C-based family of next generation technologies are becoming increasingly important for measuring the physical basis for regulatory interactions between different classes of functional domains in the genome. Software is needed to streamline analyses of these data and integrate them with custom genome annotations, RNA-seq, and gene ontologies. We introduce a new R package compatible with Bioconductor—Hi-C Annotation and Graphics Ensemble (HiCAGE)—to perform these tasks with minimum effort. In addition, the package contains a shiny/R web app interface to provide ready access to its functions.Availability and ImplementationThe software is implemented in R and is freely available under GPLv3. HiCAGE runs in R (version 3.4) and is freely available through github (https://github.com/mworkman13/HiCAGE) or on the web (https://junkdnalab.shinyapps.io/hicage).


2019 ◽  
Author(s):  
Debajyoti Sinha ◽  
Pradyumn Sinha ◽  
Ritwik Saha ◽  
Sanghamitra Bandyopadhyay ◽  
Debarka Sengupta

ABSTRACTDropClust leverages Locality Sensitive Hashing (LSH) to speed up clustering of large scale single cell expression data. It makes ingenious use of structure persevering sampling and modality based principal component selection to rescue minor cell types. Existing implementation of dropClust involves interfacing with multiple programming languagesviz. R, python and C, hindering seamless installation and portability. Here we present dropClust2, a complete R package that’s not only fast but also minimally resource intensive. DropClust2 features a novel batch effect removal algorithm that allows integrative analysis of single cell RNA-seq (scRNA-seq) datasets.Availability and implementationdropClust2 is freely available athttps://debsinha.shinyapps.io/dropClust/as an online web service and athttps://github.com/debsin/dropClustas an R package.


2019 ◽  
Author(s):  
Hannah De los Santos ◽  
Emily J. Collins ◽  
Catherine Mann ◽  
April W. Sagan ◽  
Meaghan S. Jankowski ◽  
...  

AbstractMotivationTime courses utilizing genome scale data are a common approach to identifying the biological pathways that are controlled by the circadian clock, an important regulator of organismal fitness. However, the methods used to detect circadian oscillations in these datasets are not able to accommodate changes in the amplitude of the oscillations over time, leading to an underestimation of the impact of the clock on biological systems.ResultsWe have created a program to efficaciously identify oscillations in large-scale datasets, called the Extended Circadian Harmonic Oscillator application, or ECHO. ECHO utilizes an extended solution of the fixed amplitude mass-spring oscillator that incorporates the amplitude change coefficient. Employing synthetic datasets, we determined that ECHO outperforms existing methods in detecting rhythms with decreasing oscillation amplitudes and recovering phase shift. Rhythms with changing amplitudes identified from published biological datasets revealed distinct functions from those oscillations that were harmonic, suggesting purposeful biologic regulation to create this subtype of circadian rhythms.AvailabilityECHO’s full interface is available athttps://github.com/delosh653/ECHO. An R package for this functionality, echo.find, can be downloaded athttps://CRAN.R-project.org/[email protected] informationSupplementary data are available


2016 ◽  
Vol 22 ◽  
pp. 183
Author(s):  
Shahjada Selim ◽  
Shahjada Selim ◽  
Shahabul Chowdhury ◽  
Mohammad Saifuddin ◽  
Marufa Mustary ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document