Memory-Efficient Embeddings. Creating smaller models with a new kind… | by Dr. Robert Kübler | Jan, 2024


Creating smaller models with a new kind of embedding layer

Photo by Kostiantyn Vierkieiev on Unsplash

Whenever dealing with categorical data, beginners resort to one-hot encoding. This is often okay, but if you are dealing with thousands or even millions of categories, this approach becomes infeasible. This has the following reasons:

  1. Increased dimensionality: For each category, you get an additional feature. This can lead to the curse of dimensionality. The data becomes more sparse, and the model may suffer from increased computational complexity and decreased generalization performance.
  2. Loss of semantics: One-hot encoding treats each category as an independent feature, ignoring any potential semantic relationships between categories. We lose meaningful relationships present in the original categorical variable.

These problems occur in the area of natural language processing (we have a bunch of words) or recommendation systems (we have a bunch of customers and/or articles) and can be overcome with the help of embeddings. However, if you have many of these embeddings, the memory requirements for your model can skyrocket to several gigabytes.

In this article, I want to show you several ways to decrease this memory footprint. One of these ways comes from an interesting paper Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems by Shi et al. We will also do some experiments to see how these methods fare in a rating prediction task.

In short, instead of long, sparse vectors, we want short, dense vectors of some length d — our embeddings. The embedding dimension d is a hyperparameter we can freely choose ourselves.



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*