Context windows | deepfates

we’ve done a lot of stuff to lengthen those context windows already, and yet we still have to do caching because of the economics of attention. We may spend more time on this branch of the tech tree because it is baked into the inference stack but when we let go we will be free

One under-discussed takeaway from this piece is that, from what I understand, RLMs and other large context mgmt techniques (e.g., pruning, re-hydration) completely break this.

If caching is king then RLMs may face much worse economics vs. simply lengthening the context window. https://t.co/sg4V7CP9o6
— corsaren (@corsaren) February 23, 2026

View original