Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.

التفاصيل البيبلوغرافية
العنوان:	Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.
المؤلفون:	Huber, Florian¹ (AUTHOR) f.huber@esciencecenter.nl, Ridder, Lars¹ (AUTHOR), Verhoeven, Stefan¹ (AUTHOR), Spaaks, Jurriaan H.¹ (AUTHOR), Diblen, Faruk¹ (AUTHOR), Rogers, Simon² (AUTHOR), van der Hooft, Justin J. J.³ (AUTHOR) f.huber@esciencecenter.nl
المصدر:	PLoS Computational Biology. 2/16/2021, Vol. 17 Issue 2, p1-18. 18p. 1 Diagram, 5 Graphs.
مصطلحات موضوعية:	TANDEM mass spectrometry, MASS spectrometry, BIG data, STATISTICAL reliability, *TASK analysis
مستخلص:	Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm—Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds. Author summary: Most metabolomics analyses rely upon matching observed fragmentation mass spectra to library spectra for structural annotation or compare spectra with each other through network analysis. As a key part of such processes, scoring functions are used to assess the similarity between pairs of fragment spectra. No studies have so far proposed scores fundamentally different to the popular cosine-based similarity score, despite the fact that its limitations are well understood. We propose a novel spectral similarity score known as Spec2Vec which adapts algorithms from natural language processing to learn relationships between peaks from co-occurrences across large spectra datasets. We find that similarities computed with Spec2Vec i) correlate better to structural similarity than cosine-based scores, ii) subsequently gives better performance in library matching tasks, and iii) is computationally more scalable than cosine-based scores. Given the central place of similarity scoring in key metabolomics analysis tasks such as library matching and spectral networking, we expect Spec2Vec to make a broad impact in all fields that rely upon untargeted metabolomics. [ABSTRACT FROM AUTHOR]
قاعدة البيانات:	Academic Search Index

Full Text Finder

الوصف
تدمد:	1553734X
DOI:	10.1371/journal.pcbi.1008724