An experimentally heavy paper on how to train smaller language models (LMs) while still remaining effective
Share this post
Musing 24: Pre-training Small Base Language…
Share this post
An experimentally heavy paper on how to train smaller language models (LMs) while still remaining effective