Dispersion loss counteracts embedding condensation in small language models

38 points | by E-Reverance 14 hours ago

8 comments