scholarly journals rawR - Direct access to raw mass spectrometry data in R

2020 ◽  
Author(s):  
Tobias Kockmann ◽  
Christian Panse

AbstractThe Bioconductor project has shown that the R statistical environment is a highly valuable tool for genomics data analysis1, but with respect to proteomics we are still missing low level infrastructure to enable performant and robust analysis workflows in R. Fundamentally important are libraries that provide raw data access. Our R package rawDiag has provided the proof-of-principle how access to mass spectromerty raw files can be realized by wrapping vendor-provided APIs, but rather focused on meta data analysis and visualization2. Our novel package rawR now provides complete, OS independent access to all spectral data logged in Thermo Fisher Scientific raw files. In this technical note we present implementation details and describe the main functionality provided by the rawR package. In addition, we report two use cases inspired by real-word research task that demonstrate the application of the package.Availabilityhttps://github.com/fgcz/rawR

2021 ◽  
Vol 80 (Suppl 1) ◽  
pp. 206.1-207
Author(s):  
C. Grönwall ◽  
L. Liljefors ◽  
H. Bang ◽  
A. Hensvold ◽  
M. Hansson ◽  
...  

Background:Seropositive rheumatoid arthritis (RA) is characterized by the presence of rheumatoid factor (RF) and anti-citrullinated protein autoantibodies (ACPA) with different fine-specificities. Yet, other serum anti-modified protein autoantibodies (AMPA), e.g. anti-carbamylated (Carb), anti-acetylated (KAc), and anti-malondialdehyde acetaldehyde (MAA) modified protein antibodies, have been described. By using RA patient single-cell derived monoclonal antibodies we have previously shown that individual ACPA clones recognize small distinct citrulline-containing epitopes giving them extensive multireactivity when these epitopes are found in many peptides and proteins. Moreover, certain CCP2+ multireactive ACPA clones bind also to cabamylated and acetylated autoantigens [1].Objectives:To provide a comprehensive evaluation of serum IgG and IgA autoreactivity to different post-translational modifications in RA.Methods:We analyzed 30 different IgG and IgA AMPA reactivities to modified antigens by ELISA and autoantigen arrays, in N=1985 newly diagnosed RA patients and population controls. The study utilized both previously established (i.e IgG and IgA CCP2; IgG ACPA fine-specificities; IgG anti-Carb fibrinogen and Carb FCS; IgG and IgA Cit/Carb/KAc/Orn(Ac)-vimentin), and novel assays (e.g. IgG anti-MAA and IgG anti-acetylated histones). Association with patient characteristics such as smoking and disease activity were explored. The newly developed assays were also evaluated in SLE disease controls and CCP2+ RA-risk individuals without arthritis.Results:Carb and KAc reactivities by different assays were primarily seen in patients also positive for citrulline-reactivity. Modified vimentin (mod-Vim) peptides were used for direct comparison of different AMPA reactivities, revealing that IgA AMPA recognizing mod-Vim was mainly detected in subsets of patients with high IgG anti-Cit-Vim levels and a history of smoking. IgG acetylation reactivity was mainly detected in a subset of patients with Cit and Carb reactivity. Anti-acetylated histone 2B reactivity was RA-specific and associated with high anti-CCP2 IgG levels, multiple ACPA fine-specificities, and smoking. This reactivity was also found to be present in CCP2+ RA-risk individuals without arthritis. Our data further demonstrate that IgG autoreactivity to MAA was increased in RA compared to controls with highest levels in CCP2+ RA, but was not RA-specific, and showed low correlation with other AMPA. Anti-MAA was instead associated with disease activity and was not significantly increased in CCP2+ individuals at risk of RA. Notably, RA patients could be subdivided into four different subsets based on their AMPA IgG and IgA reactivity profiles.Conclusion:We conclude that autoantibodies exhibiting different patterns of ACPA fine-specificities as well as Carb and KAc reactivity are present in RA and may be derived from multireactive B-cell clones. Anti-Carb and anti-KAc could be considered reactivities within the “Cit-umbrella” similar to ACPA fine-specificities, while MAA is distinctly different.References:[1]Sahlström P, Hansson M, Steen J, Amara K, Titcombe PJ, Forsström B, Stålesen R, Israelsson L, Piccoli L, Lundberg K, Klareskog L, Mueller DL, Catrina AI, Skriner K, Malmström V, Grönwall C. Different Hierarchies of Anti-Modified Protein Autoantibody Reactivities in Rheumatoid Arthritis. Arthritis Rheumatol. 2020 Oct;72(10):1643-1657. PMID: 32501655Caroline Grönwall: None declared, Lisa Liljefors: None declared, Holger Bang Employee of: Employee at ORGENTEC Diagnostika GmbH, Aase Hensvold: None declared, Monika Hansson: None declared, Linda Mathsson-Alm Employee of: Employee at Thermo Fisher Scientific, Lena Israelsson: None declared, Anna Svärd: None declared, Cyril CLAVEL: None declared, Elisabet Svenungsson: None declared, Iva Gunnarsson: None declared, Guy Serre: None declared, Saedis Saevarsdottir: None declared, Alf Kastbom: None declared, Lars Alfredsson: None declared, Vivianne Malmström: None declared, Johan Rönnelid: None declared, Anca Catrina: None declared, Karin Lundberg: None declared, Lars Klareskog: None declared


2017 ◽  
Vol 16 (7) ◽  
pp. 2645-2652 ◽  
Author(s):  
Mathieu Courcelles ◽  
Jasmin Coulombe-Huntington ◽  
Émilie Cossette ◽  
Anne-Claude Gingras ◽  
Pierre Thibault ◽  
...  

2010 ◽  
Vol 25 (2) ◽  
pp. 118-122
Author(s):  
K. Barrial ◽  
T. Le Bricon ◽  
F. Courtier ◽  
M.-H. Tourvieille ◽  
S. Hilaire ◽  
...  

2021 ◽  
Author(s):  
Scott A. Jarmusch ◽  
Justin J. J. van der Hooft ◽  
Pieter C. Dorrestein ◽  
Alan K. Jarmusch

This review covers the current and potential use of mass spectrometry-based metabolomics data mining in natural products. Public data, metadata, databases and data analysis tools are critical. The value and success of data mining rely on community participation.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 1881-1881
Author(s):  
Geoffrey Lowman ◽  
Landon Pastushok ◽  
Karen Mochoruk ◽  
Wayne Hill ◽  
Michelle Toro ◽  
...  

Abstract Introduction B cell repertoire analysis by next-generation sequencing (NGS) is at the forefront of leukemia and lymphoma research. Some advantages provided by NGS-based techniques include a lower limit-of-detection and simpler paths to standardization compared to other methods. Importantly, in research of post-germinal B cell disorders, such as multiple myeloma (MM), NGS methods allow for the study of clonal lineage based on somatic hypermuation patterns. Current targeted NGS assays require multiple libraries to survey each B cell receptor chain (IGH, IgK, IgL), and this fact is highlighted when initial clonality detection fails due to mutations under primer binding sites. This issue can be especially true in MM which has a high rate of SHM. To address these issues, we have developed an assay for B cell analysis, based on Ion AmpliSeq™ technology, which enables efficient detection of IGH, IgK, and IgL chain rearrangements in a single reaction. Methods The B cell pan-clonality panel (Oncomine™ BCR Pan-Clonality Assay) targets the framework 3 (FR3) portion of the variable gene and the joining gene region of heavy- and light-chain loci (IGH, IgK, IgL) for all alleles found within the IMGT database, enabling readout of the complementary-determining region 3 (CDR3) sequence of each immunoglobulin chain. To maximize sensitivity, we included primers to amplify IgK loci rearrangements involving Kappa deletion element and the constant region intron. To evaluate assay performance, we conducted reproducibility studies and clonality assessment using gDNA from a total of 45 MM research samples. All MM cases examined in this work were confirmed clonal previously by light chain restriction via flow cytometry or IHC/ISH in tissue sections - 16 of the 45 MM samples were identified as lambda light chain restricted. For comparison, a small cohort of 12 B-ALL samples were also included in the study. Sequencing and repertoire analyses were performed using the Ion GeneStudio S5 System and Ion Reporter 5.16 analysis software. Results Clonality assessment of MM clinical research samples show an 93% overall positive detection rate by an assay which combines the IGH, IgK, and IgL chains in a single reaction using published guidelines for clonality assignment. Thirty-four of 45 samples show positive detection of an IGH rearrangement, while 41 of 45 showed positive detection of at least one light chain receptor. In total, 42 of 45 samples were deemed clonal by the single tube assay based on detection for one or more receptor. Clonality results for this sample set are well correlated with orthogonal data from flow, IHC/ISH, or alternate NGS assays. A clonal lambda light chain was identified in 14 of 16 samples determined to be lambda restricted by flow cytometry. In two of the lambda restricted samples only a clonal lambda rearrangement was identified, showing the benefit of including primers targeting both the kappa and lambda light chains in a pan-clonality NGS assay. Both the MM and B-ALL cohorts were evaluated for biased IGHV gene usage. IGHV3-11 was observed in 5 of 45 MM and 5 of 12 B-ALL samples. IGHV4-34, typically linked to autoreactive antibodies and underrepresented in germinal center and memory B-cells, was nonetheless found in 5 of 45 MM samples surveyed. Estimates of somatic hypermutation rates were calculated using the BCR pan-clonality assay. Most MM samples, as expected, contained some somatic hypermutation with 6 of 45 samples showing greater than 10% mutation rates. Automated lineage analysis, based on somatic hypermuation signatures within each sample, identified 8 of 45 MM samples which contained 5 or more clones in the primary clonal lineage, with one case containing a lineage with 23 clones. Two MM samples showed no somatic hypermutation as measured using the FR3 primers contained in the BCR pan-clonality assay. These samples were also evaluated using an FR1-J targeted NGS assay, which confirmed relatively low mutation rates for these MM samples at 0.44% and 1.3%, respectively. Conclusions These results demonstrate the utility of a novel assay for combined repertoire analysis of B cell receptor heavy and light chains in a single library preparation reaction. We expect this assay to simplify laboratory workflows and including analysis tools such as automated somatic hypermutation rate calculation and clonal lineage identification may open new paths for research in lymphoid cell disorders. For research use only. Disclosures Lowman: Thermo Fisher Scientific: Current Employment. Toro: Thermo Fisher Scientific: Current Employment. Pickle: Thermo Fisher Scientific: Current Employment. Ostresh: Thermo Fisher Scientific: Current Employment. Sarda: Thermo Fisher Scientific: Current Employment. Yang: Thermo Fisher Scientific: Current Employment.


Author(s):  
Larry Svenson

BackgroundThe Province of Alberta, Canada, maintains a mature data environment with linkable administrative and clinical data dating back up to 30 years. Alberta has a single payer, publicly funded and administered, universal health system, which maintains multiple administrative data sets. Main AimThe main aim of the strategy is to fully maximize the data assets in the province to drive health system health system innovation, with a focus on improving health outcomes and quality of life. Methods/ApproachThe Alberta Ministry of Health has created the Secondary Use Data Access (SUDA) initiative to leverage its administrative health data. SUDA envisions strengthening partnerships between the public and private sectors through two main data access approaches. The first is direct access to de-identified data held within the Alberta Health data warehouse by key health system stakeholders (e.g. academic institutions, professional associations, regulatory colleges). The second is indirect access to private and not-for-profit organizations, using a data access safe haven (DASH) approach. Indirect access is achieved through private sector investments to a trusted third party that hires analysts placed within the Ministry of Health offices. ResultsStaffing agreements and privacy impact assessments are in place. Indirect access includes a multiple stakeholder steering committee to vet and prioritize projects. Private and not-for-profit stakeholders do not have access to raw data, but rather receive access to aggregated data and statistical models. All data disclosures are done by Ministry staff to ensure compliance with Alberta's Health Information Act. Direct access has been established for one professional organization and one academic institution, with access restricted to de-identified data. ConclusionThe Secondary Use Data Access initiative uses a safe haven approach to leveraging data to provide a more secure approach to data access. It reduces the need to provision data outside of the data warehouse while improving timely access to data. The approach provides assurances that people's health information is held secure, while also being used to create health system improvements.


2020 ◽  
Vol 245 ◽  
pp. 06042
Author(s):  
Oliver Gutsche ◽  
Igor Mandrichenko

A columnar data representation is known to be an efficient way for data storage, specifically in cases when the analysis is often done based only on a small fragment of the available data structures. A data representation like Apache Parquet is a step forward from a columnar representation, which splits data horizontally to allow for easy parallelization of data analysis. Based on the general idea of columnar data storage, working on the [LDRD Project], we have developed a striped data representation, which, we believe, is better suited to the needs of High Energy Physics data analysis. A traditional columnar approach allows for efficient data analysis of complex structures. While keeping all the benefits of columnar data representations, the striped mechanism goes further by enabling easy parallelization of computations without requiring special hardware. We will present an implementation and some performance characteristics of such a data representation mechanism using a distributed no-SQL database or a local file system, unified under the same API and data representation model. The representation is efficient and at the same time simple so that it allows for a common data model and APIs for wide range of underlying storage mechanisms such as distributed no-SQL databases and local file systems. Striped storage adopts Numpy arrays as its basic data representation format, which makes it easy and efficient to use in Python applications. The Striped Data Server is a web service, which allows to hide the server implementation details from the end user, easily exposes data to WAN users, and allows to utilize well known and developed data caching solutions to further increase data access efficiency. We are considering the Striped Data Server as the core of an enterprise scale data analysis platform for High Energy Physics and similar areas of data processing. We have been testing this architecture with a 2TB dataset from a CMS dark matter search and plan to expand it to multiple 100 TB or even PB scale. We will present the striped format, Striped Data Server architecture and performance test results.


2018 ◽  
Author(s):  
Kimberly Megan Scott ◽  
Melissa Kline

As more researchers make their datasets openly available, the potential of secondary data analysis to address new questions increases. However, the distinction between primary and secondary data analysis is unnecessarily confounded with the distinction between confirmatory and exploratory research. We propose a framework, akin to library book checkout records, for logging access to datasets in order to support confirmatory analysis where appropriate. This would support a standard form of preregistration for secondary data analysis, allowing authors to demonstrate that their plans were registered prior to data access. We discuss the critical elements of such a system, its strengths and limitations, and potential extensions.


2020 ◽  
Vol 21 (5) ◽  
pp. 1840 ◽  
Author(s):  
Cristina A. Martinez ◽  
Manuel Alvarez-Rodriguez ◽  
Dominic Wright ◽  
Heriberto Rodriguez-Martinez

Spermatozoa need to conduct a series of biochemical changes termed capacitation in order to fertilize. In vivo, capacitation is sequentially achieved during sperm transport and interaction with the female genital tract, by mechanisms yet undisclosed in detail. However, when boar spermatozoa are stored in the tubal reservoir pre-ovulation, most appear to be in a non-capacitated state. This study aimed at deciphering the transcriptomics of capacitation-related genes in the pig pre-ovulatory oviduct, following the entry of semen or of sperm-free seminal plasma (SP). Ex-vivo samples of the utero-tubal junction (UTJ) and isthmus were examined with a microarray chip (GeneChip® Porcine Gene 1.0 ST Array, Thermo Fisher Scientific) followed by bioinformatics for enriched analysis of functional categories (GO terms) and restrictive statistics. The results confirmed that entry of semen or of relative amounts of sperm-free SP modifies gene expression of these segments, pre-ovulation. It further shows that enriched genes are differentially associated with pathways relating to sperm motility, acrosome reaction, single fertilization, and the regulation of signal transduction GO terms. In particular, the pre-ovulation oviduct stimulates the Catsper channels for sperm Ca2+ influx, with AKAPs, CATSPERs, and CABYR genes being positive regulators while PKIs and CRISP1 genes appear to be inhibitors of the process. We postulate that the stimulation of PKIs and CRISP1 genes in the pre-ovulation sperm reservoir/adjacent isthmus, mediated by SP, act to prevent premature massive capacitation prior to ovulation.


2019 ◽  
Vol 2 (1) ◽  
pp. 45-54 ◽  
Author(s):  
Kimberly M. Scott ◽  
Melissa Kline

As more researchers make their data sets openly available, the potential of secondary data analysis to address new questions increases. However, the distinction between primary and secondary data analysis is unnecessarily confounded with the distinction between confirmatory and exploratory research. We propose a framework, akin to library-book checkout records, for logging access to data sets in order to support confirmatory analysis when appropriate. This system would support a standard form of preregistration for secondary data analysis, allowing authors to demonstrate that their plans were registered prior to data access. We discuss the critical elements of such a system, its strengths and limitations, and potential extensions.


Sign in / Sign up

Export Citation Format

Share Document