Understanding Foundation Models
Understanding Foundation Models
What This Chapter Can and Can't Do
Training a foundation model is an incredibly complex and costly process. Those who know how to do this well are likely prevented by confidentiality agreements from disclosing the secret sauce.
With the growing lack of transparency in the training process of foundation models, it's difficult to know all the design decisions that go into making a model. In general, however, differences in foundation models can be traced back to decisions about training data, model architecture and size, and how they are post-trained to align with human preferences.
The Decisions That Shape a Model
Training Data
Architecture
Size
Post-Training
Why Architecture Still Matters
Given the dominance of the transformer architecture, it might seem that model architecture is less of a choice.
From Capability to Usability
As mentioned in Chapter 1, a model's training process is often divided into pre-training and post-training.
Pre-Training
Post-Training
But what exactly is human preference? How can it be represented in a way that a model can learn? The way a model developer aligns their model has a significant impact on the model's usability, and will be discussed in this chapter.
The Underrated Role of Sampling
While most people understand the impact of training on a model's performance, the impact of sampling is often overlooked. Sampling is how a model chooses an output from all possible options. It is perhaps one of the most underrated concepts in AI.
For this reason, sampling is the section that I was the most excited to write about in this chapter.
How to Use This Chapter
Feel free free to skip any concept that you're confident about. If you encounter a confusing concept later on, you can revisit this chapter.