Reinforcement Learning with AI Feedback. As opposed to RLHF there is no human in the loop anymore.
Reinforcement learning from human feedback (RLHF) is effective at aligning large language models (LLMs) to human preferences, but gathering high quality human preference labels is a key bottleneck. We conduct a head-to-head comparison of RLHF vs. RL from AI Feedback (RLAIF) – a technique where prefe…