Can Pretrained Models Really Learn Better Molecular Representations for AI-Aided Drug Discovery?

التفاصيل البيبلوغرافية
العنوان: Can Pretrained Models Really Learn Better Molecular Representations for AI-Aided Drug Discovery?
المؤلفون: Zhang, Ziqiao, Bian, Yatao, Xie, Ailin, Han, Pengju, Zhou, Shuigeng
المصدر: Journal of Chemical Information and Modeling; April 2024, Vol. 64 Issue: 7 p2921-2930, 10p
مستخلص: Self-supervised pretrained models are gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pretrained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations has not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure–Activity Relationship analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pretrained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure–activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore, the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pretrained models are analyzed. The results indicate that the state-of-the-art pretrained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints, while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pretrained models. And our findings can guide the community to develop better pretraining techniques to regularize the occurrence of ACs and SH.
قاعدة البيانات: Supplemental Index
الوصف
تدمد:15499596
1549960X
DOI:10.1021/acs.jcim.3c01707