User Action Sequence Modeling: From Attention to Transformers and Beyond | by Samuel Flender

The quest to LLM-ify recommender systems

User action sequences are among the most powerful inputs in recommender systems: your next click, read, watch, play, or purchase is likely at least somewhat related to what you’ve clicked on, read, watched, played, or purchased minutes, hours, days, months, or even years ago.

Historically, the status quo for modeling such user engagement sequences has been pooling: for example, a classic 2016 YouTube paper describes a system that takes the latest 50 watched videos, collects their embeddings from an embedding table, and pools these into a single feature vector with sum pooling. To save memory, the embedding table for these sequence videos is shared with the embedding table for candidate videos themselves.

YouTube’s recommender system sum-pools the sequence of watched videos for a user. Covinton et al 2016

This simplistic approach corresponds roughly to a bag-of-words approach in the NLP domain: it works, but it’s far from ideal. Pooling does not take into account the sequential nature of inputs, nor the relevance of the item in the user history with respect to the candidate item we need to rank, nor any of the temporal information: an…

Source link