Photo of Yoonho Lee

I’m a Ph.D. candidate at Stanford CS, advised by Chelsea Finn. My research is supported by OpenAI and KFAS.

My research focuses on operationalizing text as a substrate for learning. As tasks grow more complex, low-bandwidth scalar signals can’t keep up. These require learning from higher-bandwidth feedback that preserves the structure of what went wrong. I develop methods that enable models to extract massive amounts of information from direct experience through structured textual feedback such as natural-language corrections, pairwise comparisons with “why better” explanations, and reasoning traces.

Rather than treating text as throwaway scaffolding, I view it as a persistent store to optimize, where models accumulate experience at increasing levels of abstraction, similar to how humans write papers and books. This combines parametric models (for inductive biases and in-context understanding) with nonparametric text storage (for persistence and interpretability). Looking forward, I’m focused on scaling these methods to scientific discovery and other open-ended domains that require continual learning across long horizons.

Recent papers along these lines:

2025

ICLR 2026 submission

Operationalizes the core text optimization loop, accumulating "why better" signals from pairwise comparisons across up to a thousand iterations.
2025

ICML 2025 workshops: AI for Math, PRAL, ES-FoMo

A hierarchical RL framework for training LLMs to discover and use textual abstractions for solving complex reasoning problems. Demonstrates that useful information for solving reasoning problems can be represented in pure text form.
2025

ICML 2025 Workshop PUT

Test-time alignment by reweighting ensemble members using a small set of labeled examples from the target distribution. Adaptation without retraining weights.
2024

UIST 2024, NeurIPS 2023 workshops XAIA and ICBINB

Built a natural language interface for humans to teach vision models using natural language corrections instead of manual labels. Demonstrates how natural language can provide higher-bandwidth feedback that communicates what went wrong.
2023

ICLR 2023

Learns from structured disagreement signals between diverse models; working at a higher level of abstraction than datapoints by "choosing the best model" among different functions that fit the training data.

My name (윤호) is pronounced like ‘you-know’ said quickly (with stress on ‘you’). This is a good approximation.

Feel free to reach out via email—I’m always happy to connect! I plan to be on the academic and industry job markets in the late 2026-early 2027 cycle, so please let me know if you think I’d be a good fit for your organization.