This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
François Fleuret
francois.fleuret.org
did:plc:3x6fjk6uqc5lynzzjecmetzh
- Prenorm: normalization in the residual blocks before the attention operation and the FFN respectively
- GQA (Group Query Attention): more Q than (K, V)
2025-04-28T06:47:52.580Z