I spent the last weekend thinking about continual learning. A lot of people think that we can solve long term memory and learning in LLMs by simply extending the context length to infinity. I analyse a different perspective that challenges this assumption.
Also interesting to think about: could a single system be generally intelligent, or is a certain bias actually a power. Can we have billions of models, each with their own "experience"
Author here.
I spent the last weekend thinking about continual learning. A lot of people think that we can solve long term memory and learning in LLMs by simply extending the context length to infinity. I analyse a different perspective that challenges this assumption.
Let me know how you think about this.
Your conclusion touches on this, but I think the brain analogy is stronger than the hardware/software dichotomy.
It is also my very uninformed intuition: https://news.ycombinator.com/item?id=44910353
Also interesting to think about: could a single system be generally intelligent, or is a certain bias actually a power. Can we have billions of models, each with their own "experience"
> Let me know how you think about this.
Well, I think of every Large Language Model as if it were a spectacularly faceted diamond.
More on these lines in a recent-ish "thinking in public" attempt by yours truly, lay programmer, to interpret what an LLM-machine might be.
Riff: LLMs are Software Diamonds
https://www.evalapply.org/posts/llms-are-diamonds/