scholarly journals Mass spectra alignment using virtual lock-masses

2018 ◽  
Author(s):  
Francis Brochu ◽  
Pier-Luc Plante ◽  
Alexandre Drouin ◽  
Dominic Gagnon ◽  
Dave Richard ◽  
...  

ABSTRACTMass spectrometry is a valued method to evaluate the metabolomics content of a biological sample. The recent advent of rapid ionization technologies such as Laser Diode Thermal Desorption (LDTD) and Direct Analysis in Real Time (DART) has rendered high-throughput mass spectrometry possible. It is used for large-scale comparative analysis of populations of samples. In practice, many factors resulting from the environment, the protocol, and even the instrument itself, can lead to minor discrepancies between spectra, rendering automated comparative analysis difficult. In this work, a sequence/pipeline of algorithms to correct variations between spectra is proposed. The algorithms correct multiple spectra by identifying peaks that are common to all and, from those, computes a spectrum-specific correction. We show that these algorithms increase comparability within large datasets of spectra, facilitating comparative analysis, such as machine learning.

IAWA Journal ◽  
2020 ◽  
Vol 41 (4) ◽  
pp. 720-739 ◽  
Author(s):  
Rene J. Carmona ◽  
Michael C. Wiemann ◽  
Pieter Baas ◽  
Cristobal Barros ◽  
Gabriela D. Chavarria ◽  
...  

Abstract Alerce (Fitzroya cupressoides (Mol.) Johnst.) and Guaitecas cypress (Pilgerodendron uviferum (Don) Florin) are two of the three closely-related species of conifers in the Cupressaceae that are endemic to southern Chile and Argentina. Both are listed in Appendix I of the Convention on International Trade in Endangered Species of Fauna and Flora (CITES). The presence or absence of nodular (conspicuously pitted) end walls in the parenchyma cells provide good diagnostic characters to separate the two species wood anatomically, but the latter is sometimes difficult to distinguish. Therefore, a collaborative project was designed to study the chemical-molecular expression of these species by analyzing the heartwood using DART TOFMS (Direct Analysis in Real-Time (DART) Time-of-Flight Mass Spectrometry (TOFMS). This study compares the anatomical features of heartwood for both species and demonstrates that anatomy in conjunction with chemistry can separate them. DART TOFMS analysis combined with PCA was able to unequivocally determine taxonomic source with a statistical certainty of 99%. The mass spectra results obtained from heartwood demonstrated that identification is feasible after a few seconds, using a very small sample. DART TOFMS is a robust tool for reliable species identification and is useful to identify the taxonomic source of finished products or timber that are suspected of being illegally harvested.


2020 ◽  
Author(s):  
Rolando A. Gittens ◽  
Alejandro Almanza ◽  
Eric Álvarez ◽  
Kelly L. Bennett ◽  
Luis C. Mejía ◽  
...  

AbstractMatrix-assisted laser desorption/ionization (MALDI) time-of-flight mass spectrometry is an analytical method that detects macromolecules that can be used as biomarkers for taxonomic identification in arthropods. The conventional MALDI approach uses fresh laboratory-reared arthropod specimens to build a reference mass spectra library with high-quality standards required to achieve reliable identification. However, this may not be possible to accomplish in some arthropod groups that are difficult to rear under laboratory conditions, or for which only alcohol preserved samples are available. Here, we generated MALDI mass spectra of highly abundant proteins from the legs of 18 Neotropical species of adult field-collected hard ticks, several of which had not been analyzed by mass spectrometry before. We then used their mass spectra as fingerprints to identify each tick species by applying machine learning and pattern recognition algorithms that combined unsupervised and supervised clustering approaches. Both principal component analysis (PCA) and linear discriminant analysis (LDA) classification algorithms were able to identify spectra from different tick species, with LDA achieving the best performance when applied to field-collected specimens that did have an existing entry in a reference library of arthropod protein spectra. These findings contribute to the growing literature that ascertains mass spectrometry as a rapid and effective method for taxonomic identification of disease vectors, which is the first step to predict and manage arthropod-borne pathogens.Author SummaryHard ticks (Ixodidae) are external parasites that feed on the blood of almost every species of terrestrial vertebrate on earth, including humans. Due to a complete dependency on blood, both sexes and even immature stages, are capable of transmitting disease agents to their hosts, causing distress and sometimes death. Despite the public health significance of ixodid ticks, accurate species identification remains problematic. Vector species identification is core to developing effective vector control schemes. Herein, we provide the first report of MALDI identification of several species of field-collected Neotropical tick specimens preserved in ethanol for up to four years. Our methodology shows that identification does not depend on a commercial reference library of lab-reared samples, but with the help of machine learning it can rely on a self-curated reference library. In addition, our approach offers greater accuracy and lower cost per sample than conventional and modern identification approaches such as morphology and molecular barcoding.


Author(s):  
Kirti Magudia ◽  
Christopher P. Bridge ◽  
Katherine P. Andriole ◽  
Michael H. Rosenthal

AbstractWith vast interest in machine learning applications, more investigators are proposing to assemble large datasets for machine learning applications. We aim to delineate multiple possible roadblocks to exam retrieval that may present themselves and lead to significant time delays. This HIPAA-compliant, institutional review board–approved, retrospective clinical study required identification and retrieval of all outpatient and emergency patients undergoing abdominal and pelvic computed tomography (CT) at three affiliated hospitals in the year 2012. If a patient had multiple abdominal CT exams, the first exam was selected for retrieval (n=23,186). Our experience in attempting to retrieve 23,186 abdominal CT exams yielded 22,852 valid CT abdomen/pelvis exams and identified four major categories of challenges when retrieving large datasets: cohort selection and processing, retrieving DICOM exam files from PACS, data storage, and non-recoverable failures. The retrieval took 3 months of project time and at minimum 300 person-hours of time between the primary investigator (a radiologist), a data scientist, and a software engineer. Exam selection and retrieval may take significantly longer than planned. We share our experience so that other investigators can anticipate and plan for these challenges. We also hope to help institutions better understand the demands that may be placed on their infrastructure by large-scale medical imaging machine learning projects.


2019 ◽  
Vol 411 (30) ◽  
pp. 8133-8142 ◽  
Author(s):  
Wen Dong ◽  
Jian Liang ◽  
Isabella Barnett ◽  
Paul C. Kline ◽  
Elliot Altman ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document