scholarly journals MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Florian Huber ◽  
Sven van der Burg ◽  
Justin J. J. van der Hooft ◽  
Lars Ridder

AbstractMass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines.

2021 ◽  
Author(s):  
Florian Huber ◽  
Sven van der Burg ◽  
Justin J.J. van der Hooft ◽  
Lars Ridder

Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are considered characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of >100,000 mass spectra of about 15,000 unique known compounds, MS2DeepScore learns to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model's prediction uncertainty. On 3,600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and predicts Tanimoto scores with a root mean squared error of about 0.15. The prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. We demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity metrics have great potential for a range of metabolomics data processing pipelines.


2012 ◽  
Vol 61 (2) ◽  
pp. 277-290 ◽  
Author(s):  
Ádám Csorba ◽  
Vince Láng ◽  
László Fenyvesi ◽  
Erika Michéli

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.


2021 ◽  
pp. 1-21
Author(s):  
Elsa Arrua-Duarte ◽  
Marta Migoya-Borja ◽  
Igor Barahona ◽  
Lena C. Quilty ◽  
Sakina J. Rizvi ◽  
...  

Abstract Objective: The Dimensional Anhedonia Rating Scale (DARS) is a novel questionnaire to assess anhedonia of recent validation. In this work we aim to study the equivalence between the traditional paper-and-pencil and the digital format of DARS. Methods: 69 patients filled the DARS in a paper-based and digital versions. We assessed differences between formats (Wilcoxon test), validity of the scales (Kappa and Intraclass Correlation Coefficients), and reliability (Cronbach’s alpha and Guttman’s coefficient). We calculated the Comparative Fit Index and the Root Mean Squared Error associated with the proposed one-factor structure. Results: Total scores were higher for paper-based format. Significant differences between both formats were found for three items. The weighted Kappa coefficient was approximately 0.40 for most of the items. Internal consistency was greater than 0.94, and the Intraclass Correlation Coefficient for the digital version was 0.95 and 0.94 for the paper-and-pencil version (F= 16.7, p < 0.001). Comparative Adjustment Index was 0.97 for the digital DARS and 0.97 for the paper-and-pencil DARS, and Root Mean Squared Error was 0.11 for the digital DARS and 0.10 for the paper-and-pencil DARS. Conclusion: The digital DARS is consistent in many respects to the paper-and-pencil questionnaire, but equivalence with this format cannot be assumed without caution.


2018 ◽  
Vol 4 (1) ◽  
pp. 24
Author(s):  
Imam Halimi ◽  
Wahyu Andhyka Kusuma

Investasi saham merupakan hal yang tidak asing didengar maupun dilakukan. Ada berbagai macam saham di Indonesia, salah satunya adalah Indeks Harga Saham Gabungan (IHSG) atau dalam bahasa inggris disebut Indonesia Composite Index, ICI, atau IDX Composite. IHSG merupakan parameter penting yang dipertimbangkan pada saat akan melakukan investasi mengingat IHSG adalah saham gabungan. Penelitian ini bertujuan memprediksi pergerakan IHSG dengan teknik data mining menggunakan algoritma neural network dan dibandingkan dengan algoritma linear regression, yang dapat dijadikan acuan investor saat akan melakukan investasi. Hasil dari penelitian ini berupa nilai Root Mean Squared Error (RMSE) serta label tambahan angka hasil prediksi yang didapatkan setelah dilakukan validasi menggunakan sliding windows validation dengan hasil paling baik yaitu pada pengujian yang menggunakan algoritma neural network yang menggunakan windowing yaitu sebesar 37,786 dan pada pengujian yang tidak menggunakan windowing sebesar 13,597 dan untuk pengujian algoritma linear regression yang menggunakan windowing yaitu sebesar 35,026 dan pengujian yang tidak menggunakan windowing sebesar 12,657. Setelah dilakukan pengujian T-Test menunjukan bahwa pengujian menggunakan neural network yang dibandingkan dengan linear regression memiliki hasil yang tidak signifikan dengan nilai T-Test untuk pengujian dengan windowing dan tanpa windowing hasilnya sama, yaitu sebesar 1,000.


2014 ◽  
Vol 590 ◽  
pp. 321-325
Author(s):  
Li Chen ◽  
Chang Huan Kou ◽  
Kuan Ting Chen ◽  
Shih Wei Ma

A two-run genetic programming (GP) is proposed to estimate the slump flow of high-performance concrete (HPC) using several significant concrete ingredients in this study. GP optimizes functions and their associated coefficients simultaneously and is suitable to automatically discover relationships between nonlinear systems. Basic-GP usually suffers from premature convergence, which cannot acquire satisfying solutions and show satisfied performance only on low dimensional problems. Therefore it was improved by an automatically incremental procedure to improve the search ability and avoid local optimum. The results demonstrated that two-run GP generates an accurate formula through and has 7.5 % improvement on root mean squared error (RMSE) for predicting the slump flow of HPC than Basic-GP.


2020 ◽  
Vol 12 (18) ◽  
pp. 3098
Author(s):  
Jongmin Park ◽  
Barton A. Forman ◽  
Rolf H. Reichle ◽  
Gabrielle De Lannoy ◽  
Saad B. Tarik

L-band brightness temperature (Tb) is one of the key remotely-sensed variables that provides information regarding surface soil moisture conditions. In order to harness the information in Tb observations, a radiative transfer model (RTM) is investigated for eventual inclusion into a data assimilation framework. In this study, Tb estimates from the RTM implemented in the NASA Goddard Earth Observing System (GEOS) were evaluated against the nearly four-year record of daily Tb observations collected by L-band radiometers onboard the Aquarius satellite. Statistics between the modeled and observed Tb were computed over North America as a function of soil hydraulic properties and vegetation types. Overall, statistics showed good agreement between the modeled and observed Tb with a relatively low, domain-average bias (0.79 K (ascending) and −2.79 K (descending)), root mean squared error (11.0 K (ascending) and 11.7 K (descending)), and unbiased root mean squared error (8.14 K (ascending) and 8.28 K (descending)). In terms of soil hydraulic parameters, large porosity and large wilting point both lead to high uncertainty in modeled Tb due to the large variability in dielectric constant and surface roughness used by the RTM. The performance of the RTM as a function of vegetation type suggests better agreement in regions with broadleaf deciduous and needleleaf forests while grassland regions exhibited the worst accuracy amongst the five different vegetation types.


Proceedings ◽  
2020 ◽  
Vol 59 (1) ◽  
pp. 2
Author(s):  
Benoit Figuet ◽  
Raphael Monstein ◽  
Michael Felux

In this paper, we present an aircraft localization solution developed in the context of the Aircraft Localization Competition and applied to the OpenSky Network real-world ADS-B data. The developed solution is based on a combination of machine learning and multilateration using data provided by time synchronized ground receivers. A gradient boosting regression technique is used to obtain an estimate of the geometric altitude of the aircraft, as well as a first guess of the 2D aircraft position. Then, a triplet-wise and an all-in-view multilateration technique are implemented to obtain an accurate estimate of the aircraft latitude and longitude. A sensitivity analysis of the accuracy as a function of the number of receivers is conducted and used to optimize the proposed solution. The obtained predictions have an accuracy below 25 m for the 2D root mean squared error and below 35 m for the geometric altitude.


Energies ◽  
2019 ◽  
Vol 12 (22) ◽  
pp. 4291 ◽  
Author(s):  
Lu-Tao Zhao ◽  
Guan-Rong Zeng ◽  
Wen-Jing Wang ◽  
Zhi-Gang Zhang

International oil price forecasting is a complex and important issue in the research area of energy economy. In this paper, a new model based on web-based sentiment analysis is proposed. For the oil market, sentiment analysis is used to extract key information from web texts from the four perspectives of: compound, negative, neutral, and positive sentiment. These are constructed as feature and input into oil price forecasting models with oil price itself. Finally, we analyze the effect in various views and get some interesting discoveries. The results show that the root mean squared error can be reduced by about 0.2 and the error variance by 0.2, which means that the accuracy and stability are thereby improved. Furthermore, we find that different types of sentiments can all improve performance but by similar amounts. Last but not least, text with strong intensity can better support oil price forecasting than weaker text, for which the root mean squared error can be reduced by up to 0.5, and the number of the bad cases is reduced by 20%, indicating that text with strong intensity can correct the original oil price forecast. We believe that our research will play a strong supporting role in future research on using web information for oil price forecasting.


2015 ◽  
Vol 377 ◽  
pp. 719-727 ◽  
Author(s):  
Neha Garg ◽  
Clifford A. Kapono ◽  
Yan Wei Lim ◽  
Nobuhiro Koyama ◽  
Mark J.A. Vermeij ◽  
...  

2013 ◽  
Vol 475-476 ◽  
pp. 978-982 ◽  
Author(s):  
Rui Ping Song ◽  
Bo Wang ◽  
Guo Ming Huang ◽  
Qi Dong Liu ◽  
Rong Jing Hu ◽  
...  

Recommendation systems have achieved widespread success in E-commerce nowadays. There are several evaluation metrics for recommender systems, such as accuracy, diversity, computational efficiency and coverage. Accuracy is one of the most important measurement criteria. In this paper, to improve accuracy, we proposed a hybrid recommender algorithm by an improved similarity method (ISM), combining demographic recommendation techniques and user-based collaborative filtering (CF) algorithms. Experiments were performed to compare the present approach with the other classical similarity measures based on the MovieLens dataset. The Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) values show the superiority of the proposed algorithm.


Sign in / Sign up

Export Citation Format

Share Document