In this episode we'll see how OpenAPI built the models behind ChatGPT from the decoder part of a Transformer. We'll see how Generative Pre-Trained Transformers are fined tuned for instruction following in natural language and how hey are aligned using RLHF (Reinforcement Learning from Human Feedback).
Series website: https://llm-chronicles.com/
🖹 Download the mindmap for this episode here:
- GPT, Instruction Fine-tuning, RLHF: https://llm-chronicles.com/pdfs/llm-chronicles-5.4_gpt_instruction_finetuning_rlhf.pdf
🕤 Timestamps:
00:35 - Decoder Recap
00:54 - Generative Pre-Trained Transformer
01:38 - Next-Word Prediction
02:47 - GPT-1, GPT-2, GPT-3
03:54 - Emergent Abilities
04:53 - In-Context Learning (zero, one, few shot)
06:59 - Intruction Fine-Tuning (InstructGPT)
09:54 - RLHF (Reinforcement Learning from Human Feedback)
13:04 - Map of LLMs (OpenAiI, Google, Anthropic, Meta, Microsoft, Mistral)
References:
- Improving Language Understanding by Generative Pre-Training: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- GPT (Generative Pre-trained Transformer) – A Comprehensive Review on Enabling Technologies,
Potential Applications, Emerging Challenges, and Future Directions: https://arxiv.org/pdf/2305.10435.pdf
- Zero shot , One shot and Few shot learning with examples: https://rahulrajpvr7d.medium.com/zero-shot-one-shot-and-few-shot-learning-with-examples-8a3efdcbb158
- The Story of RLHF: Origins, Motivations, Techniques, and Modern Applications: https://cameronrwolfe.substack.com/p/the-story-of-rlhf-origins-motivations