@valentinapy.bsky.social on Bluesky

JavaScript RequiredThis is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is. Learn more about Bluesky at bsky.social and atproto.com.

Post

Valentina Pyatkin

valentinapy.bsky.social

did:plc:foein5wpdem766qyzttbxiqd

We introduce a training algorithm pipeline consisting of SFT, DPO, and a novel method we call Reinforcement Learning with Verifiable Rewards (RLVR). Read paper for more information: https://allenai.org/papers/tulu-3-report.pdf

2024-11-21T17:29:20.326Z