This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Eugene Yan
eugeneyan.com
did:plc:ilhs5ksfpze2l5rqgig4ycl6
Evaluating LLM output is hard. For many teams, it's the bottleneck to scaling AI-powered product.
A key mistake is defining eval criteria w/o actually LOOKING AT THE DATA. This leads to irrelevant / unrealistic criteria + lots of wasted effort.
Thus I built AlignEval.com
https://AlignEval.com
2024-10-31T02:11:35.556Z