Steering interpretable language models with concept algebra

33 points | by luulinh90s a day ago

3 comments