This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Brad Knox
bradknox.bsky.social
did:plc:hk6pafjjdyzxaexgx64bbwp3
RLHF algorithms assume humans generate preferences according to normative models. We propose a new method for model alignment: influence humans to conform to these assumptions through interface design. Good news: it works!
#AI #MachineLearning #RLHF #Alignment (1/n)
2025-01-14T23:51:18.157Z