This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Alex Wettig
awettig.bsky.social
did:plc:n7jgqqwe6ohsyxzbcoqmgii7
Our domains also shine a light on which type of content is implicitly upsampled when using quality filters!
💡 FineWeb-Edu, DCLM-fasttext, and our RegMix predictions share similarities (e.g. all upsample Science topics) but also diverge (e.g. DCLM is more balanced across topics)
2025-02-18T12:31:50.648Z