Can We Scale Human Feedback for Complex AI Tasks?

AI Safety Fundamentals: Alignment - Un podcast de BlueDot Impact

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique for steering large language models (LLMs) toward desired behaviours. However, relying on simple human feedback doesn’t work for tasks that are too complex for humans to accurately judge at the scale needed to train AI models. Scalable oversight techniques attempt to address this by increasing the abilities of humans to give feedback on complex tasks.This article briefly recaps some of the challenges faced wi...

Visit the podcast's native language site