In Defense of Prompting Papers

We tend to treat loss functions as clean and prompts as hacky.

A loss function lives inside a familiar story. You specify an objective, run optimization, and get a model that minimizes it. There are convergence guarantees in some settings. There are generalization results in some settings. The whole setup has the feel of mathematics, which gives it a kind of legitimacy. Prompting, by contrast, often feels improvised. You change some words, the behavior changes, and the whole procedure can look embarrassingly contingent.

I think this contrast is overstated. Prompts are not some lower, less principled form of specification. They are another way of telling a computational system what we want.

More strongly, I think we often understate how contingent ordinary loss-based learning already is. A “loss function” is rarely just a single equation. In practice it is a much larger specification bundle: model class, architecture, data distribution, preprocessing, augmentation, optimizer, regularization, hyperparameters, and the actual loss equation itself. We choose this whole object, run optimization, inspect what it gives us, and modify the specification until the resulting system behaves well enough.

That process is not so different from prompting.

Why Losses Feel More Principled

There are good reasons loss-based learning feels cleaner.

One is that optimization gives us a stable inner loop. If I fix the model class, the data, and the objective, gradient descent gives me a reproducible procedure for improving training loss. In some special cases, we can say much more than that. Convexity gives convergence results. Statistical learning theory gives relationships between train and test performance under particular assumptions. Even when these theorems do not directly apply to modern deep learning, they still shape our intuitions about what a proper objective should look like.

Another is that loss-based systems scale well. If you have a framework that works, you can often add more data, more compute, or a larger model and get a predictable amount of improvement. This is one of the deepest reasons gradient-based learning has been so successful. It is not just that the objective feels mathematically natural. It is that the whole recipe has turned out to be an extremely effective way of turning scale into competence.

All of this is real. But I think it is easy to infer from it something stronger than what the evidence supports. We start to feel that the loss itself is the uniquely pure part of the story, and that other ways of specifying behavior are somehow less legitimate.

The Loss Is Not the Objective

What we really care about is not minimizing cross-entropy or mean squared error. What we care about is behavior.

The loss is an interface. It is a way of communicating desired behavior to a learning system through a training process. But it is only one such interface, and it is less canonical than it first appears.

Suppose I say I am training a model with a particular loss. That leaves out almost everything that matters. Which data did I choose? Which negatives? What architecture made that loss useful? Which regularizers? Which optimizer? Which sampling scheme? Which temperature, clipping threshold, or weighting coefficient? Each of these changes what the model learns. The realized objective is the whole training specification, not the symbolic loss alone.

Once you view things this way, loss design starts to look less like writing down truth and more like constructing a communication channel. We are trying to get our preferences into the system in a form that optimization can use.

Prompting is also that.

Prompts as Specification

A prompt is a specification language for model behavior. It tells the model what task it is in, what matters, what style is desired, what constraints to satisfy, and what tradeoffs to make. We write one, run the model, inspect the result, and revise the prompt until the behavior improves. The loop is familiar because it is the same loop we use for training setups: specify, run, inspect, modify.

What makes prompts feel less pure is not that they are fundamentally more arbitrary. It is that their contingency is easier to see. If I change three words in a prompt and the output changes, the arbitrariness is visible. But training specifications are also full of such choices. They are just spread across code, data pipelines, model architecture, and optimization settings, so they feel less like language and more like engineering.

I do not mean that prompts and losses are identical. They operate at different levels.

Losses are amortized specifications. You pay the cost of training once and get a reusable object that can be applied across many future inputs. Prompts are often local specifications. They steer behavior at inference time, per task or per distribution slice, and they are much easier to edit. One buys reusability, the other buys flexibility.

But both are ways of injecting information about what we want into a computation.

The Real Currency Is Information

I think the more useful abstraction here is not “losses versus prompts” but “different channels for communicating intent.”

One channel works through optimization in parameter space. Another works through text in context space. In both cases we are trying to specify a behavior we cannot directly write down in executable form. We provide hints, constraints, examples, preferences, and structure, then let a system transform that specification into behavior.

From this perspective, it is not surprising that prompts can generalize well. Text is a rich medium for specifying tasks. We already use language to coordinate humans on extremely complex objectives. There is nothing inherently impure about using language to coordinate models too. If anything, prompting is appealing precisely because it lets us specify intent in the medium we are best at using.

In some settings, prompts may even be the more natural interface. If the desired behavior is subtle, changing quickly, or hard to reduce to a stable dataset and objective, then text may be a better carrier of information than a fixed loss. A prompt can state exceptions, emphasize priorities, point to analogies, describe failure modes, and encode context that would be awkward to bake into a scalar objective.

This does not make prompts universally better. It just means they belong in the same conceptual category: they are both mechanisms for turning information about intent into behavior.

Two Modes of Control

I think we should view these as two major modes of controlling computation.

One is training-time specification. We define a learning problem, choose the model and data, choose a loss and optimizer, and let optimization compress our intent into the model weights.

The other is inference-time specification. We communicate through prompts, context, examples, retrieved documents, and other textual artifacts that shape what the model does at run time.

These modes have different tradeoffs. Training-time control is more amortized and often more robust. Inference-time control is more editable, more expressive, and easier to adapt on the fly. Real systems increasingly use both. We train a capable base model, then steer it with prompts, retrieval, tools, and scaffolding. That is not a dirty workaround layered on top of the real method. It is just a second specification channel.

Why This Matters

I think this matters because the language we use shapes what kinds of methods we take seriously.

If we keep treating losses as principled and prompts as hacks, we will underestimate a large part of the design space. We will keep searching for the “real” objective in one place while ignoring the fact that much of modern capability comes from better communication with the model rather than cleaner equations alone.

The real question is not whether a specification is written as a loss or as text. The real question is whether it transmits useful information about the behavior we want, and whether the system can turn that information into generalization.

Loss functions have earned our respect because that style of specification has scaled extraordinarily well.

Prompts may deserve the same respect for exactly the same reason.