Debate Update: Obfuscated Arguments Problem
AI Safety Fundamentals: Alignment - Un podcast de BlueDot Impact
Catégories:
This is an update on the work on AI Safety via Debate that we previously wrote about here. What we did: We tested the debate protocol introduced in AI Safety via Debate with human judges and debaters. We found various problems and improved the mechanism to fix these issues (details of these are in the appendix). However, we discovered that a dishonest debater can often create arguments that have a fatal error, but where it is very hard to locate the error. We don’t have a fix for th...