Meta-Learned Memory

Automatically discovering how LLMs should manage their context window

Updated 2026-02-10

How an LLM uses its context window matters a lot – the difference between a good and bad context management strategy can be 6x on downstream tasks1. Right now, these strategies are hand-designed. We’re trying to automate that.

The idea

We represent context management strategies as executable Python programs: each program decides what to store, how to retrieve it, and how to format it for the model. Then we search over the space of programs using LLM-guided evolution. The search loop maintains a population of strategy programs, evaluates them on a task suite, and uses an LLM to propose mutations informed by execution logs. The result is a fully automatic pipeline that discovers context strategies without human design effort.

Results

Main results table
Reference baselines (top) vs. meta-learned strategies (bottom). Best strategy reaches 55.8% average accuracy -- 8.3 points above the hand-designed ACE baseline -- while using 3x fewer context characters.

What we found

  • The search discovers qualitatively distinct strategies. Across runs, 7 different strategy families emerged, not just minor variants of each other. Some resemble known approaches; others are genuinely novel.
  • Novel strategies include an online bandit over retrieval methods that dynamically selects between different retrieval functions based on recent reward, and a dual-pool MMR diversity scheme that maintains separate memory pools and merges them using maximal marginal relevance.
  • Less context can be better. The top strategies aggressively compress and filter, suggesting that current models are hurt more by irrelevant context than by missing information.
  • Paper (coming soon)
  • Code
  1. Measured on the SWE-bench Verified suite across 64 agent configurations. See the full benchmark data for details.