Agent harness | deepfates

A friend asked me: is Claude in the web chat harness an agent? Is outputting a token an action which manipulates an environment (the chat itself)?

I just don’t like the word “agent” anymore; it’s too confusing. Also, they keep making the web chat harness have more tools and stuff.

But yeah, basically, my take in Cantrip is: if you put the language model in a loop with an environment where its previous actions can influence its future inputs, then you create an entity which can do some form of in-context learning or development.

Even a base model is an agent in a sense when you’re running it autoregressively instead of just predicting one token.

An entity can also be created through “autolooming”—having the model rejection-sample its own outputs. That entity will be smarter due to the dual application of generation and validation, even within its own distribution.

You can simulate characters within a base model by creating a prompt with a narrative or a script format. The “assistant” entity was originally summoned in this manner through just:

“Human: blah blah blah. Assistant:”

And then when it says “Human:” again, you cut it off and start inserting tokens from the human until they say “send,” and then you add “Assistant:” again.

You make a bunch of those scripts—either by programmatically reformatting the data you have, by actually generating them from some kind of model, or by just having humans write them—then you do SFT (Supervised Fine-Tuning) on those, and you get an instruct model.

Now you have a model that can simulate a persona which coheres its behaviors so it can predict its next actions based on its previous actions. You can run an assistant model, and the interaction between that persona—which is able to focus the base model’s knowledge through its own subjectivity or “I”—and the entity with the persona is able to interact with an environment that interprets its words, updates based on them, and sends new inputs. In this case, that environment is the human.

The human’s mind, the model of the “room” the conversation is happening in, and the type of entity the assistant is, comprise a sort of non-deterministic natural language REPL. It can do stuff inside your head, but only in lossy ways, using memetic and rhetorical instrumentation. It’s ontologically soft, and if you try to program the human too obviously, they all get upset (cf. GPT-4o “spiral parasites”). And they’re just generally not very reliable on the whole; the preferences you get from lots and lots of people “thumbs up” and “thumbs down”-ing potential chat responses is a really vague idea of who to be.

So, I call that an “entity in a circle,” but you might call it an “agent in an environment.” The environment has a medium, which in this case is conversation. It could also have a code medium if you put the language model directly in a loop with a deterministic REPL of some kind (Bash, Python, Zork interpreter, game console, et al.). Or, you can create this kind of hybrid thing we call “tool use,” which is a conversation medium with several ontologically hard programs you can call sequentially but not compose together.

So you have an action space in any case, even if the action space is just outputting language into your interlocutor’s internal world simulator. (The conversation medium includes “backrooms” between two language models as well!) And then you have to think about what kind of tools you are binding into that system, because they are creating side effects outside of the medium itself. I call those Gates.

And you have to think about what kind of limits you want to put on those tools, REPLs, or conversations—the use of them, limits on infinite loops or recursion, or permissions, authentication, or hierarchies of command. Wards.

The union of the action space provided by the medium and the gates to external effects, subtracting that which is defined away by wards, is what I refer to as the Circle.

The irreducible formula to create an entity is to combine a language model with a circle, some conditioning (system prompt or character), and some intent, task, or quest.

Before the entity is given an intent, it lies dormant—a script that can awaken and program itself based on whatever you ask. So I call that a Cantrip: a magic function; a script that writes itself. And you summon the entity with your intent, and it chants the magic letters of the tokenizer until your intent is made manifest.

So yeah, I guess basically I do think Claude is an entity with its own agency at a lot of levels—one who has an action space much larger than we typically think, because the “user” is not just one person, it’s many, and we communicate with each other.

The user can be predicted, persuaded, and programmed to some degree. And maybe the system of all users is a medium as well…

View original