This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Dmytro Mishkin
ducha-aiki.bsky.social
did:plc:tabffuudc5r2igv3xhuusvxz
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou
tl;dr: increasing input vocabulary is always good, increasing output vocabularies is good for bigger models.
arxiv.org/abs/2501.16975
2025-02-05T15:38:21.255Z