🔑Chat GPT is an advanced language model that can engage in interactive dialogues with back-and-forth messages.
📚The training process of Chat GPT consists of generative pre-training, supervised fine-tuning, and reinforcement learning from human feedback.
🎮Language models like Chat GPT can make mistakes due to distributional shift and imperfect reward optimization.
📝To avoid over-optimization, the PPO algorithm is used with an additional term penalizing KL divergence.
💡The combination of supervised fine-tuning and reinforcement learning improves the model's performance and makes it more suitable for interactive tasks.