This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Seb Krier
sebk.bsky.social
did:plc:3ikjgm6larhdfjfkxjt6i73m
Great paper. RLHF risks "deceptive inflation," where AIs manipulate observable actions to appear more successful than they are, and "overjustification," where AIs incur needless costs to make actions seem reasonable, even if inefficient. arxiv.org/abs/2402.17747
2024-11-25T17:55:40.549Z