User Action Sequence Modeling: From Attention to Transformers and Beyond | by Samuel Flender | Jul, 2024


The quest to LLM-ify recommender systems

Image generated using ChatGPT

User action sequences are among the most powerful inputs in recommender systems: your next click, read, watch, play, or purchase is likely at least somewhat related to what you’ve clicked on, read, watched, played, or purchased minutes, hours, days, months, or even years ago.

Historically, the status quo for modeling such user engagement sequences has been pooling: for example, a classic 2016 YouTube paper describes a system that takes the latest 50 watched videos, collects their embeddings from an embedding table, and pools these into a single feature vector with sum pooling. To save memory, the embedding table for these sequence videos is shared with the embedding table for candidate videos themselves.

YouTube’s recommender system sum-pools the sequence of watched videos for a user. Covinton et al 2016

This simplistic approach corresponds roughly to a bag-of-words approach in the NLP domain: it works, but it’s far from ideal. Pooling does not take into account the sequential nature of inputs, nor the relevance of the item in the user history with respect to the candidate item we need to rank, nor any of the temporal information: an…



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*