The History of Open-Source LLMs: Better Base Models (Part Two) | by Cameron R. Wolfe, Ph.D.

How LLaMA, MPT, Falcon, and LLaMA-2 put open-source LLMs on the map…

16 min read

13 hours ago

Open-source research on large language models (LLMs) is incredibly valuable, as it aims to democratize a powerful and influential technology. Although open-source LLMs are now commonly used and widely studied, this area of research saw some initial struggles that were difficult to overcome. Namely, open-source LLMs performed poorly at first and were heavily criticized. Within this overview, we will study a line of research that changed this narrative by making high-performing pre-trained LLMs available to everyone. Given that pre-training a language model is so expensive, the models we will study here are especially impactful. After these high-performing base models were created and released, many people could conduct research using these models at marginal added cost.

“The capabilities of LLMs are remarkable considering the seemingly straightforward nature of the training methodology.” — from [14]

The current series. This overview is part two of a three part series on the history of open-source LLMs. The first part in the series overviewed initial attempts at creating open-source LLMs. Here, we will study the most popular open-source base models (i.e., language models that have been pre-trained but not fine-tuned or aligned) that are currently available. Next time, we will go over how these models can be fine-tuned or aligned to create a variety of useful applications.

In part one of this series, we saw that the early days of research on open-source LLMs resulted in the proposal of several important base models, such as OPT and BLOOM. However, these models were widely considered to perform quite poorly compared to closed-source pre-trained models (e.g., GPT-3). How do we solve this? First, we need to take a deeper look at the LLM training process.

Training pipeline. LLMs are trained in several steps, as shown in the figure below. First, we pre-train the model…

Source link