[{"data":1,"prerenderedAt":256},["ShallowReactive",2],{"navigation_docs_en":3,"\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models\u002Fch02":77,"\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models\u002Fch02-surround":251},[4],{"title":5,"icon":6,"path":7,"stem":8,"children":9,"page":45},"AI Engineering",null,"\u002Fen\u002Fai-engineering","en\u002F1.ai-engineering",[10,46],{"title":11,"icon":12,"path":13,"stem":14,"children":15,"page":45},"Introduction to Building AI Applications with Foundation Models","i-lucide-brain-circuit","\u002Fen\u002Fai-engineering\u002Fintro","en\u002F1.ai-engineering\u002F1.intro",[16,20,25,30,35,40],{"title":11,"path":17,"stem":18,"icon":19},"\u002Fen\u002Fai-engineering\u002Fintro\u002Fch01","en\u002F1.ai-engineering\u002F1.intro\u002Fch01","i-lucide-sparkles",{"title":21,"path":22,"stem":23,"icon":24},"The Rise of AI Engineering","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch011-the-rise-of-ai-engineering","en\u002F1.ai-engineering\u002F1.intro\u002Fch011-the-rise-of-ai-engineering","i-lucide-history",{"title":26,"path":27,"stem":28,"icon":29},"Foundation Model Use Cases","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch012-foundation-model-use-cases","en\u002F1.ai-engineering\u002F1.intro\u002Fch012-foundation-model-use-cases","i-lucide-layout-grid",{"title":31,"path":32,"stem":33,"icon":34},"Planning AI Applications","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch013-planning-ai-applications","en\u002F1.ai-engineering\u002F1.intro\u002Fch013-planning-ai-applications","i-lucide-clipboard-list",{"title":36,"path":37,"stem":38,"icon":39},"The AI Engineering Stack","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch014-the-ai-engineering-stack","en\u002F1.ai-engineering\u002F1.intro\u002Fch014-the-ai-engineering-stack","i-lucide-layers",{"title":41,"path":42,"stem":43,"icon":44},"Summary","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch015-summary","en\u002F1.ai-engineering\u002F1.intro\u002Fch015-summary","i-lucide-flag",false,{"title":47,"icon":6,"path":48,"stem":49,"children":50,"page":45},"Understanding Foundation Models","\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models","en\u002F1.ai-engineering\u002F2.understanding-foundation-models",[51,54,59,64,69,74],{"title":47,"path":52,"stem":53,"icon":12},"\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models\u002Fch02","en\u002F1.ai-engineering\u002F2.understanding-foundation-models\u002Fch02",{"title":55,"path":56,"stem":57,"icon":58},"Training Data","\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models\u002Fch02-1-training-data","en\u002F1.ai-engineering\u002F2.understanding-foundation-models\u002Fch02-1-training-data","i-lucide-database",{"title":60,"path":61,"stem":62,"icon":63},"Modeling","\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models\u002Fch02-2-modeling","en\u002F1.ai-engineering\u002F2.understanding-foundation-models\u002Fch02-2-modeling","i-lucide-network",{"title":65,"path":66,"stem":67,"icon":68},"Post-Training","\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models\u002Fch02-3-post-training","en\u002F1.ai-engineering\u002F2.understanding-foundation-models\u002Fch02-3-post-training","i-lucide-sliders-horizontal",{"title":70,"path":71,"stem":72,"icon":73},"Sampling","\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models\u002Fch02-4-sampling","en\u002F1.ai-engineering\u002F2.understanding-foundation-models\u002Fch02-4-sampling","i-lucide-dices",{"title":41,"path":75,"stem":76,"icon":44},"\u002Fen\u002Fai-engineering\u002Funderstanding-foundation-models\u002Fch02-5-summary","en\u002F1.ai-engineering\u002F2.understanding-foundation-models\u002Fch02-5-summary",{"id":78,"title":47,"body":79,"description":245,"extension":246,"links":6,"meta":247,"navigation":248,"path":52,"seo":249,"stem":53,"__hash__":250},"docs_en\u002Fen\u002F1.ai-engineering\u002F2.understanding-foundation-models\u002Fch02.md",{"type":80,"value":81,"toc":236},"minimark",[82,97,102,106,114,117,121,144,148,152,155,172,176,179,196,204,208,215,223,226,230,233],[83,84,85,89],"u-page-hero",{},[86,87,47],"template",{"v-slot:title":88},"",[86,90,91,92,96],{"v-slot:description":88},"To build applications with foundation models, you first need ",[93,94,95],"strong",{},"foundation models",". While you don't need to know how to develop a model to use it, a high-level understanding will help you decide what model to use and how to adapt it to your needs.",[98,99,101],"h2",{"id":100},"what-this-chapter-can-and-cant-do","What This Chapter Can and Can't Do",[103,104,105],"p",{},"Training a foundation model is an incredibly complex and costly process. Those who know how to do this well are likely prevented by confidentiality agreements from disclosing the secret sauce.",[107,108,109,110,113],"warning",{},"This chapter won't be able to tell you how to build a model to compete with ChatGPT. Instead, it'll focus on ",[93,111,112],{},"design decisions with consequential impact on downstream applications",".",[103,115,116],{},"With the growing lack of transparency in the training process of foundation models, it's difficult to know all the design decisions that go into making a model. In general, however, differences in foundation models can be traced back to decisions about training data, model architecture and size, and how they are post-trained to align with human preferences.",[98,118,120],{"id":119},"the-decisions-that-shape-a-model","The Decisions That Shape a Model",[122,123,124,132,136,141],"card-group",{},[125,126,127,128,131],"card",{"icon":58,"title":55},"Since models learn from data, their training data reveals a great deal about their ",[93,129,130],{},"capabilities and limitations",". This chapter begins with how model developers curate training data, focusing on the distribution of training data.",[125,133,135],{"icon":63,"title":134},"Architecture","Given the dominance of the transformer architecture, it might seem that model architecture is less of a choice. You might be wondering what makes the transformer architecture so special that it continues to dominate.",[125,137,140],{"icon":138,"title":139},"i-lucide-maximize","Size","Whenever a new model is released, one of the first things people want to know is its size. This chapter will explore how a model developer might determine the appropriate size for their model.",[125,142,143],{"icon":68,"title":65},"Pre-training makes a model capable, but not necessarily safe or easy to use. Post-training aligns the model with human preferences, which has a significant impact on the model's usability.",[145,146,147],"note",{},"Chapter 8 explores dataset engineering techniques in detail, including data quality evaluation and data synthesis.",[98,149,151],{"id":150},"why-architecture-still-matters","Why Architecture Still Matters",[103,153,154],{},"Given the dominance of the transformer architecture, it might seem that model architecture is less of a choice.",[156,157,158,164,168],"accordion",{},[159,160,163],"accordion-item",{"icon":161,"label":162},"i-lucide-circle-help","What makes the transformer architecture so special that it continues to dominate?","This chapter will address this question.",[159,165,167],{"icon":161,"label":166},"How long until another architecture takes over?","This chapter will address this question, too.",[159,169,171],{"icon":161,"label":170},"What might this new architecture look like?","This chapter will also address what a future alternative architecture might look like.",[98,173,175],{"id":174},"from-capability-to-usability","From Capability to Usability",[103,177,178],{},"As mentioned in Chapter 1, a model's training process is often divided into pre-training and post-training.",[122,180,181,189],{},[125,182,184,185,188],{"icon":19,"title":183},"Pre-Training","Pre-training makes a model ",[93,186,187],{},"capable",", but not necessarily safe or easy to use.",[125,190,192,193,113],{"icon":191,"title":65},"i-lucide-user-check","Post-training aims to align the model with ",[93,194,195],{},"human preferences",[103,197,198,199,203],{},"But what exactly is ",[200,201,202],"em",{},"human preference","? How can it be represented in a way that a model can learn? The way a model developer aligns their model has a significant impact on the model's usability, and will be discussed in this chapter.",[98,205,207],{"id":206},"the-underrated-role-of-sampling","The Underrated Role of Sampling",[103,209,210,211,214],{},"While most people understand the impact of training on a model's performance, the impact of ",[200,212,213],{},"sampling"," is often overlooked. Sampling is how a model chooses an output from all possible options. It is perhaps one of the most underrated concepts in AI.",[216,217,218,219,222],"tip",{},"Sampling explains many seemingly baffling AI behaviors, including ",[93,220,221],{},"hallucinations and inconsistencies",". Choosing the right sampling strategy can also significantly boost a model's performance with relatively little effort.",[103,224,225],{},"For this reason, sampling is the section that I was the most excited to write about in this chapter.",[98,227,229],{"id":228},"how-to-use-this-chapter","How to Use This Chapter",[145,231,232],{},"Concepts covered in this chapter are fundamental for understanding the rest of the book. However, because these concepts are fundamental, you might already be familiar with them.",[103,234,235],{},"Feel free free to skip any concept that you're confident about. If you encounter a confusing concept later on, you can revisit this chapter.",{"title":88,"searchDepth":237,"depth":237,"links":238},2,[239,240,241,242,243,244],{"id":100,"depth":237,"text":101},{"id":119,"depth":237,"text":120},{"id":150,"depth":237,"text":151},{"id":174,"depth":237,"text":175},{"id":206,"depth":237,"text":207},{"id":228,"depth":237,"text":229},"A guide to how training data, architecture, size, post-training, and sampling shape foundation model behavior.","md",{},{"icon":12},{"title":47,"description":245},"ZqqcQ2Fgc22Bjz50TxD9joHKkaw7QUCjjeyKL_clKGI",[252,254],{"title":41,"path":42,"stem":43,"description":253,"icon":44,"children":-1},"A recap of how foundation models gave rise to AI engineering, the application patterns enabled, and the framework this book provides.",{"title":55,"path":56,"stem":57,"description":255,"icon":58,"children":-1},"How training data quality, language coverage, and domain coverage shape foundation model capability, cost, and reliability.",1779363441985]