Meta-Learned Memory
Automatically discovering how LLMs should manage their context window
How an LLM uses its context window matters a lot – the difference between a good and bad context management strategy can be 6x on downstream tasks1. Right now, these strategies are hand-designed. We’re trying to automate that.
The idea
We represent context management strategies as executable Python programs: each program decides what to store, how to retrieve it, and how to format it for the model. Then we search over the space of programs using LLM-guided evolution. The search loop maintains a population of strategy programs, evaluates them on a task suite, and uses an LLM to propose mutations informed by execution logs. The result is a fully automatic pipeline that discovers context strategies without human design effort.
Results
What we found
- The search discovers qualitatively distinct strategies. Across runs, 7 different strategy families emerged, not just minor variants of each other. Some resemble known approaches; others are genuinely novel.
- Novel strategies include an online bandit over retrieval methods that dynamically selects between different retrieval functions based on recent reward, and a dual-pool MMR diversity scheme that maintains separate memory pools and merges them using maximal marginal relevance.
- Less context can be better. The top strategies aggressively compress and filter, suggesting that current models are hurt more by irrelevant context than by missing information.
Links
- Paper (coming soon)
- Code
-
Measured on the SWE-bench Verified suite across 64 agent configurations. See the full benchmark data for details. ↩