mass spectrometry dataset Latest Research Papers

AbstractNon-target analysis (NTA) employing high-resolution mass spectrometry is a commonly applied approach for the detection of novel chemicals of emerging concern in complex environmental samples. NTA typically results in large and information-rich datasets that require computer aided (ideally automated) strategies for their processing and interpretation. Such strategies do however raise the challenge of reproducibility between and within different processing workflows. An effective strategy to mitigate such problems is the implementation of inter-laboratory studies (ILS) with the aim to evaluate different workflows and agree on harmonized/standardized quality control procedures. Here we present the data generated during such an ILS. This study was organized through the Norman Network and included 21 participants from 11 countries. A set of samples based on the passive sampling of drinking water pre and post treatment was shipped to all the participating laboratories for analysis, using one pre-defined method and one locally (i.e. in-house) developed method. The data generated represents a valuable resource (i.e. benchmark) for future developments of algorithms and workflows for NTA experiments.

Download Full-text

A facile immunopeptidomics workflow for capturing the HLA-I ligandome with PEAKS XPro

10.1101/2021.05.20.444976 ◽

2021 ◽

Author(s):

Kyle S. Hoffman ◽

Baozhen Shan ◽

Jonathan R. Krieger

Keyword(s):

Cancer Vaccines ◽

De Novo ◽

Consensus Sequence ◽

Human Leukocyte ◽

Sequence Motif ◽

Cell Surfaces ◽

Leukocyte Antigen ◽

Uniprot Database ◽

Hla Ligands ◽

Mass Spectrometry Dataset

AbstractIdentifying antigens displayed specifically on tumour cell surfaces by human leukocyte antigen (HLA) proteins is important for the development of immunotherapies and cancer vaccines. The difficulty in capturing an HLA ligandome stems from the fact that many HLA ligands are derived from splicing events or contain mutations, hindering their identification in a standard database search. To address this challenge, we developed an immunopeptidomics workflow with PEAKS XPro that uses de novo sequencing to uncover such peptides and identifies mutations for neoantigen discovery. We demonstrate the utility of this workflow by re-analyzing HLA-I ligandome datasets and reveal a vast diversity in peptide sequences among clones derived from a colorectal cancer tumour. Over 8000 peptides predicted to bind HLA-I molecules were identified by de novo sequencing only (not found in the UniProt database) and make up over 50% of identified peptides from each sample. Lastly, tumour-specific mutations and consensus sequence motif characteristics are defined. This workflow is widely applicable to any immunopeptidomic mass spectrometry dataset and does not require custom database generation for neoantigen discovery.

Download Full-text

Energy Efficiency of Inference Algorithms for Medical Datasets: A Green AI study (Preprint)

10.2196/preprints.28036 ◽

2021 ◽

Author(s):

Jia-Ruei Yu ◽

Chun-Hsien Chen ◽

Tsung-Wei Huang ◽

Jang-Jih Lu ◽

Chia-Ru Chung ◽

...

Keyword(s):

Mass Spectrometry ◽

Energy Efficiency ◽

Power Consumption ◽

Optimization Techniques ◽

Support Vector ◽

Time Consumption ◽

Inference Algorithms ◽

Medical Domain ◽

Hidden Layer ◽

Mass Spectrometry Dataset

BACKGROUND Harnessing artificial intelligence (AI) in medical domain has raised considerable interest recently. An AI model must be energy-efficient if it has to be used for inference applications in medical domain. Different from other type of data in visual AI, data in medical domain are usually composed of features with strong signals. Numerous energy optimization techniques have been developed to relieve the burden on the hardware required to deploy a complex learning model. However, the energy efficiencies of different AI models used for medical applications have not yet been studied. OBJECTIVE To explore and compare the energy efficiencies of widely-used machine learning (ML) algorithms, including logistic regression (LR), k-nearest neighbors (kNN), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGB), and two different neural networks (NN) in the medical datasets. METHODS We applied the algorithms above to two distinct medical datasets, the mass spectrometry data of Staphylococcus aureus for predicting methicillin-resistance (“Mass spectrometry” dataset: 3338 cases; 268 features), and the urinalysis data for predicting Trichomonas vaginalis infection (“Urinalysis” dataset: 839,164 cases; 9 features). We compared the performance among these seven inference algorithms across accuracy, area under the receiver operating characteristic (AUROC), time consumption, and power consumption. The time and power consumptions were determined using the performance counter data from Intel Power Gadget 3.5. RESULTS Experimental results showed that the RF and XGB algorithms achieved the two highest AUROC scores with both datasets (84.7% and 83.9% with the “Mass spectrometry” dataset, respectively, and 91.1% and 91.4% with the “Urinalysis” dataset, respectively). In terms of time consumption, the XGB, 1-hidden-layer NN and LR algorithms exhibit the lowest time consumption with both datasets. RF as the referral baseline, XGB, 1-hidden-layer NN and LR achieved 45% reduction of inferencing time with the “Mass spectrometry” dataset, and 53-60% reduction with the “Urinalysis” dataset, respectively. In terms of energy efficiency, XGB, LR, SVM and RF consumed the least power. 5-hidden-layer NN as the referral baseline, XGB, LR, SVM and RF achieved 24-32% reduction of power consumption with the “Mass spectrometry” dataset, and 20-53% reduction with the “Urinalysis” dataset, respectively. Among all experiments, XGB achieved the best performance across accuracy, runtime, and energy efficiency. CONCLUSIONS In current study, XGB attained a balanced performance across accuracy, runtime, and energy efficiency in the medical datasets. The research results indicate that the XGB would be an ideal algorithm for applying ML to real-world medical scenarios.

Download Full-text

Using metacommunity ecology to understand environmental metabolomes

Nature Communications ◽

10.1038/s41467-020-19989-y ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Robert E. Danczak ◽

Rosalie K. Chu ◽

Sarah J. Fansler ◽

Amy E. Goldman ◽

Emily B. Graham ◽

...

Keyword(s):

Ecological Systems ◽

Molecular Properties ◽

Mechanistic Models ◽

Biogeochemical Processes ◽

Focus Attention ◽

Molecular Processes ◽

Active Metabolites ◽

Metacommunity Ecology ◽

Underlying Mechanisms ◽

Mass Spectrometry Dataset

AbstractEnvironmental metabolomes are fundamentally coupled to microbially-linked biogeochemical processes within ecosystems. However, significant gaps exist in our understanding of their spatiotemporal organization, limiting our ability to uncover transferrable principles and predict ecosystem function. We propose that a theoretical paradigm, which integrates concepts from metacommunity ecology, is necessary to reveal underlying mechanisms governing metabolomes. We call this synthesis between ecology and metabolomics ‘meta-metabolome ecology’ and demonstrate its utility using a mass spectrometry dataset. We developed three relational metabolite dendrograms using molecular properties and putative biochemical transformations and performed ecological null modeling. Based upon null modeling results, we show that stochastic processes drove molecular properties while biochemical transformations were structured deterministically. We further suggest that potentially biochemically active metabolites were more deterministically assembled than less active metabolites. Understanding variation in the influences of stochasticity and determinism provides a way to focus attention on which meta-metabolomes and which parts of meta-metabolomes are most likely to be important to consider in mechanistic models. We propose that this paradigm will allow researchers to study the connections between ecological systems and their molecular processes in previously inaccessible detail.

Download Full-text