Orca: Properly Imitating Proprietary LLMs | by Cameron R. Wolfe, Ph.D. | Sep, 2023

Leveraging imitation to create high-quality, open-source LLMs…

16 min read

22 hours ago

(Photo by Thomas Lipke on Unsplash)

As research progresses on large language models (LLMs), one key question that remains unanswered is whether an existing, high-quality LLM can be used to effectively train another LLM. Currently, there is a lot of debate and contention around this topic. The recent explosion of open-source imitation models initially indicated that proprietary LLMs like ChatGPT could be easily replicated at a low cost. However, subsequent research concluded that the evaluation of such models was incomplete and misleading, finding that these models actually have large gaps in their comprehension. In this overview, we will study work [1] that aims to solve the limitations of open-source replicas of proprietary LLMs via a more robust approach. In particular, we will see that imitation learning can be made more effective by curating a larger dataset with more detailed information.

“As these models continue to evolve and become more powerful, an intriguing question arises: Can we use the model itself to supervise its own behavior or that of other AI models?” — from [1]

(from [1])

Before diving into the overview, we will cover a few ideas related to both LLMs and deep learning in general. These concepts might not be explicitly described in papers that we read. Rather, they are oftentimes referenced via a citation or assumed to be common knowledge. So, getting a basic grasp of these concepts will make this overview, and the papers it considers, easier to understand.

Instruction Tuning

(from [12])

Instruction tuning was originally proposed by FLAN [12] and aimed to provide a form of training that teaches LLMs to solve language-based tasks in general, rather than a specific task. In particular, this is done by fine-tuning an LLM over sets of “instructions”, or input prompts — including a…

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.