scholarly journals STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kenneth S. Katz ◽  
Oleg Shutov ◽  
Richard Lapoint ◽  
Michael Kimelman ◽  
J. Rodney Brister ◽  
...  

AbstractSequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.

2021 ◽  
Author(s):  
Kenneth S Katz ◽  
Oleg Shutov ◽  
Richard Lapoint ◽  
Michael Kimelman ◽  
J Rodney Brister ◽  
...  

Sequence Read Archive submissions to the National Center for Biotechnology Information, often lack useful metadata, which limits the utility of these submissions.. We describe a scalable k-mer based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.


2021 ◽  
Vol 9 (2) ◽  
pp. 416
Author(s):  
Charles Dumolin ◽  
Charlotte Peeters ◽  
Evelien De Canck ◽  
Nico Boon ◽  
Peter Vandamme

Culturomics-based bacterial diversity studies benefit from the implementation of MALDI-TOF MS to remove genomically redundant isolates from isolate collections. We previously introduced SPeDE, a novel tool designed to dereplicate spectral datasets at an infraspecific level into operational isolation units (OIUs) based on unique spectral features. However, biological and technical variation may result in methodology-induced differences in MALDI-TOF mass spectra and hence provoke the detection of genomically redundant OIUs. In the present study, we used three datasets to analyze to which extent hierarchical clustering and network analysis allowed to eliminate redundant OIUs obtained through biological and technical sample variation and to describe the diversity within a set of spectra obtained from 134 unknown soil isolates. Overall, network analysis based on unique spectral features in MALDI-TOF mass spectra enabled a superior selection of genomically diverse OIUs compared to hierarchical clustering analysis and provided a better understanding of the inter-OIU relationships.


PLoS ONE ◽  
2020 ◽  
Vol 15 (2) ◽  
pp. e0228483
Author(s):  
Akira Iguchi ◽  
Miyuki Nishijima ◽  
Yuki Yoshioka ◽  
Aika Miyagi ◽  
Ryuichi Miwa ◽  
...  

2003 ◽  
Vol 83 (4) ◽  
pp. 695-712 ◽  
Author(s):  
Ronaldo F. Hashimoto ◽  
Edward.R. Dougherty ◽  
Marcel Brun ◽  
Zheng-Zheng Zhou ◽  
Michael L. Bittner ◽  
...  

2011 ◽  
Vol 76 (1) ◽  
pp. 88-94 ◽  
Author(s):  
Jamie S. Sanderlin ◽  
Nicole Lazar ◽  
Michael J. Conroy ◽  
Jaxk Reeves

Sign in / Sign up

Export Citation Format

Share Document