we’ve done a lot of stuff to lengthen those context windows already, and yet we still have to do caching because of the economics of attention. We may spend more time on this branch of the tech tree because it is baked into the inference stack but when we let go we will be free
One under-discussed takeaway from this piece is that, from what I understand, RLMs and other large context mgmt techniques (e.g., pruning, re-hydration) completely break this.
— corsaren (@corsaren) February 23, 2026
If caching is king then RLMs may face much worse economics vs. simply lengthening the context window. https://t.co/sg4V7CP9o6