Photo of Yoonho Lee

I’m a final-year PhD student at Stanford CS, advised by Chelsea Finn. My research is supported by OpenAI and KFAS.

My research focuses on building systems that can learn from rich textual feedback. Standard learning approaches reduce learning to optimizing a scalar “loss” or “reward”, but this single number necessarily discards useful information originally present in environmental feedback.

In many real settings, much richer feedback is available: stack traces, natural-language corrections, explanations in pairwise comparisons, or long-form reflections on failed attempts. I develop methods that leverage such signals to drive continual improvement, enabling models to refine their behavior based on such rich information.

For a technical overview, see my blog post or the selected papers below.

2025

ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems (MemAgents)
ICLR 2026 Workshop on AI with Recursive Self-Improvement

Operationalizes the core text optimization loop, accumulating "why better" signals from pairwise comparisons across up to a thousand iterations.
2026

ICLR 2026 · ES-FoMo @ ICML 2025 (Spotlight) · Ram2 @ CoLM 2025 (Oral)

A hierarchical RL framework for training LLMs to discover and use textual abstractions for solving complex reasoning problems. Demonstrates that useful information for solving reasoning problems can be represented in pure text form.
2025

ICML 2025 Workshop PUT

Test-time alignment by reweighting ensemble members using a small set of labeled examples from the target distribution. Adaptation without retraining weights.
2024

UIST 2024, NeurIPS 2023 workshops XAIA and ICBINB

Built a natural language interface for humans to teach vision models using natural language corrections instead of manual labels. Demonstrates how natural language can provide higher-bandwidth feedback that communicates what went wrong.
2023

ICLR 2023

Learns from structured disagreement signals between diverse models; working at a higher level of abstraction than datapoints by "choosing the best model" among different functions that fit the training data.

My name (윤호) is pronounced like ‘you-know’ said quickly (with stress on ‘you’). This is a good approximation.

Feel free to reach out via email if you’d like to chat. I’m planning to be on the job market in early 2027 (both academic and industry).