BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

التفاصيل البيبلوغرافية
العنوان: BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction
المؤلفون: Wang, Dong, Salamatian, Kavé, Xia, Yunqing, Deng, Weiwei, Zhang, Qi
المساهمون: Microsoft Research Asia, Laboratoire d'Informatique, Systèmes, Traitement de l'Information et de la Connaissance (LISTIC), Université Savoie Mont Blanc (USMB Université de Savoie Université de Chambéry )
المصدر: KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining ; https://hal.science/hal-04219746Test ; KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Aug 2023, Long Beach CA USA, France. pp.5039-5050, ⟨10.1145/3580305.3599780⟩
بيانات النشر: HAL CCSD
ACM
سنة النشر: 2023
المجموعة: Université Savoie Mont Blanc: HAL
مصطلحات موضوعية: CCS CONCEPTS Information systems → Online advertising Recommender systems Language models Non-textual features Multi-modal inputs Pre-trained language model CTR prediction Uni-Attention, CCS CONCEPTS, Information systems → Online advertising, Recommender systems, Language models Non-textual features, Multi-modal inputs, Pre-trained language model, CTR prediction, Uni-Attention, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
جغرافية الموضوع: Long Beach CA USA, France
الوصف: International audience ; Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging. Up to now, two directions have been explored to integrate multimodal inputs in fine-tuning of pre-trained language models. One consists of fusing the outcome of language models and non-textual features through an aggregation layer, resulting into ensemble framework, where the cross-information between textual and nontextual inputs are learned only in the aggregation layer. The second one consists of splitting and transforming non-textual features into fine-grained tokens that are fed, along with textual tokens, directly into the transformer layers of language models. However, by adding additional tokens, this approach increases the complexity of the learning and inference. We propose in this paper, a novel framework, BERT4CTR, that addresses these limitations. The new framework leverages Uni-Attention mechanism to benefit from the interactions between non-textual and textual features, while maintaining low training and inference time-costs, through a dimensionality reduction. We demonstrate through comprehensive experiments on both public and commercial data that BERT4CTR outperforms significantly the state-of-the-art approaches to handle multi-modal inputs and is applicable to CTR prediction. In comparison with ensemble framework, BERT4CTR brings more than 0.4% AUC gain on both tested data sets with only 7% increase on latency.
نوع الوثيقة: conference object
اللغة: English
العلاقة: hal-04219746; https://hal.science/hal-04219746Test; https://hal.science/hal-04219746/documentTest; https://hal.science/hal-04219746/file/3580305.3599780.pdfTest
DOI: 10.1145/3580305.3599780
الإتاحة: https://doi.org/10.1145/3580305.3599780Test
https://hal.science/hal-04219746Test
https://hal.science/hal-04219746/documentTest
https://hal.science/hal-04219746/file/3580305.3599780.pdfTest
حقوق: info:eu-repo/semantics/OpenAccess
رقم الانضمام: edsbas.438938C7
قاعدة البيانات: BASE