Daniele Polencic

Streaming Zod: How Tambo Actually Works

2026-02-21T00:00:00Z

Looks like they've hacked Zod to do validation on partial/streaming data. Very clever. by Colin Hacks (@colinhacks), creator of Zod.

Colin tweeted about Tambo, a React toolkit for generative UIs that streams structured data from LLMs into React components. He claims they found a way to use Zod for validating partial, streaming data.

My first thought was: how does that work?

Streaming schema validation seems to require a significant change to how Zod operates.

Maybe it would need something like a SAX-style JSON parser built into the schema?

I looked through the source code and realized there was a way to simplify it.

I also built a minimal demo to show how it works.

The real answer is actually simpler than I expected.

Zod isn't used at all during streaming.

It's only involved at the start, when schemas are converted to JSON Schema using zod-to-json-schema.

After that, Zod isn't needed anymore.

There are two ways streaming happens:

Tool call arguments: String deltas from the LLM are collected and parsed with the partial-json library on every chunk. Then, a function called unstrictify removes null values in OpenAI's structured output mode, using the JSON Schema (not Zod).
Component props: The backend processes the LLM output and sends JSON Patch (RFC 6902) operations to the client using fast-json-patch. The client doesn't parse partial JSON for props.

So, there's no Zod hacking involved.

The so-called "streaming validation" just uses partial-json (which closes open brackets and quotes) and converts schemas to JSON Schema at the start.

A more interesting question: why does Tambo use JSON Patch and a backend for component props when they already use partial-json on the client for tool call arguments?

The backend sits between the LLM and the client, finds completed key/value pairs, and sends clean patch operations.

This avoids the problem of "garbage intermediate keys," where partial-json can give you cut-off keys like {"da": ""} if the stream is mid-key.

But could this logic just run on the client instead?

I think it could.

Using a backend is more of an architectural decision than a technical requirement.

Tambo Cloud uses this setup as part of its revenue model, and the backend is already there for storing conversations and managing agents.

Another question I had was: why use patches at all? Couldn't component props use the same partial-json method as tool call arguments?

It all boils down to this:

Zod schema -> JSON Schema -> send to LLM as tool definition.
LLM streams JSON -> accumulate -> partial-json parse -> pass as props

The tricky part: partial-json can't tell whether a value is still being written or has finished.

For example, with a URL like "https://upload.wiki", it just closes the string, leaving you with a broken URL that would cause a 404 if you tried to render it in an .

Type	Streamable?	Why
Long strings	Yes	Every prefix is renderable
Short strings	Mostly	"Ad" is fine to show, it'll grow
URLs	No	Useless until complete
Numbers	No	`12` vs `120` vs `1200` — you can't know
Booleans	No	`tru` is not `true`
Enums	No	`"pen"` is not `"pending"`
Array elements	Partially	Settled elements are safe, last one is uncertain

This is the real unsolved problem: it's not about streaming validation, but about knowing when a streaming JSON value is "complete enough" to render.

The Zod schema already shows you how to stream.

The schema has enough type information to automatically figure out a streaming strategy. z.string() is streamable, show partial text as it arrives. z.url() becomes { format: "uri" } in JSON Schema, atomic, wait until the value stabilises. z.number(), z.boolean(), and z.enum() are all atomic. z.array() should emit all-but-last elements, holding back the trailing one which may be mid-stream.

I made a deriveStreamingStrategy(jsonSchema) function that analyzes the converted JSON Schema and generates a strategy map for each field.

Then, a filterProps(partial) function decides what gets sent to the component. Streamable fields pass through immediately. Atomic fields are held back until the value stops changing between consecutive parses, meaning the closing " was seen. Array fields emit all elements except the last, which may be mid-stream.

The render function becomes simple; it just displays whatever it gets.

All the logic lives in the filter, which is based on the schema.

A SAX-style JSON parser would handle this more cleanly.

A SAX parser only triggers events when a JSON value is fully complete, like when a closing quote is seen or a number ends with a comma or bracket.

The filterProps approach is a pragmatic workaround for using partial-json, which isn't a SAX parser.

The ideal solution would be to use a SAX JSON parser, emit completed paths, validate each one against the schema as you go, and then emit a patch.

That last step, validating the schema incrementally for each path, doesn't exist yet and would be a new feature.

oboe.js was the best SAX JSON parser for JavaScript, but it's archived.

jsonriver is the active alternative but lacks Oboe's pattern matching.

I put together a demo with Claude that shows all this in action. It converts a Zod schema to JSON Schema, simulates LLM streaming character-by-character, and runs partial-json on every chunk with a schema-derived strategy. The ProfileCard renders step by step: text streams in, URLs wait until they're complete, and tags show up only when they're ready.

Just zod + zod-to-json-schema + partial-json from esm.sh CDN.

Software Is Cheap Now

2026-02-15T00:00:00Z

I am the bottleneck now by Thorsten Ball (@thorstenball).

Thorsten shared a story about receiving a bug report on Slack. He took a screenshot, uploaded it to Codex, and had the fix completed in 5 minutes. The code looked solid, all tests passed, and he pushed.

Then he realised he was the bottleneck. The process could have gone directly from Slack to Codex to a review thread, without him in the middle.

His point is that ticketing, triage, and sprints exist because human engineers are costly and have limited time. If that goes away, the whole process needs to change.

I agree with the general idea, but saying "I'm the bottleneck" feels like an exaggeration.

Even if the LLM eventually becomes smarter than me, which seems likely, it still lacks morals, taste, and real-world consequences.

When a human ships a bad deployment, they worry about it afterward.

You’re not really a bottleneck; you’re the only one in the process who faces the consequences.

Humans will still be part of the process, maybe in a different role, but they’ll still be there.

The economic point is the one I agree with most.

For decades, the industry has focused on making the most of limited engineering time through practices like agile, sprint planning, velocity tracking, and story points.

All of these methods assume that writing software is expensive. If that cost drops to almost nothing, we’ll need to rethink these approaches.

Ironically, when I began my career, everyone was pushing for agile and criticizing waterfall methods.

Writing specs before any code felt old-fashioned.

How can you write a specification if you haven’t explored the problem by coding first?

Agile just felt right.

It also helped reduce risk: you write some code, make some progress, and then decide whether to roll back or keep going.

With waterfall, you only find out about bad decisions at the end.

But things are different now. The cost of writing software is approaching zero, and you can iterate much faster and discard versions easily, as Thorsten points out in the video.

So, should we go back to writing more detailed app specs?

I think yes.

Nicholas Zakas uses a similar approach, spending a lot of time creating enterprise artifacts (PRDs, ADRs, TDDs) and updating them as the project changes. At first, I thought it was overkill, but now I’m rethinking that.

Picture this instead.

You write the specification, and the model builds the app.

But the app isn’t quite what you wanted.

Instead of fixing the app, you update the spec, adjust the architecture, and rebuild it from scratch.

If you think of the model as a compiler, then the input is all that really matters.

In three years, I probably won’t care about NPM dependencies or which framework version I used.

All I’ll need is my old specification to rebuild everything.

Writing detailed specs up front and keeping them up to date actually makes sense.

It could end up being the only thing that really lasts.

The code is disposable. The specification is the product.

But what form should the specification take?

Alex Jones, a Principal Engineer at AWS and creator of K8sGPT, argues that we need more than the usual verifiers.

As agentic systems become more autonomous, we’ll need invariants, constraints, and properties supported by mathematical proofs.

It’s not enough to just have observability or policies.

He calls this idea "Provable Autonomy."

There's a whole world of formal specification languages: TLA+, Event-B, Dafny, Alloy, Gherkin. But from what I've seen, specs without verification sit in an unstable middle.

They’re structured enough to be a maintenance burden.

But they’re not verified enough to guarantee anything, so you end up with text that the model might just ignore.

Why Talking to LLMs Has Improved My Thinking

2026-02-10T00:00:00Z

Why Talking to LLMs Has Improved My Thinking by Philip O'Toole, creator of rqlite (via HN).

Philip's thesis: LLMs help articulate tacit knowledge, the understanding we have but can't easily put into words. This isn't learning new things, it's recognition: mapping latent structure to language.

As programmers and developers, we build up a lot of understanding that never quite becomes explicit.This is not a failure. It is how experience operates. The brain compresses experience into patterns that are efficient for action, not for speech. Those patterns are real, but they are not stored in sentences.

This resonates. I already have the knowledge to solve most problems I encounter, I just can't always articulate the path. The LLM helps me find the words for what I already know.

The problem is that reflection, planning, and teaching all require language. If you cannot express an idea, you cannot easily inspect it or improve it.

Once an idea is written down, it becomes easier to work with. Vague intuitions turn into named distinctions. Implicit assumptions become visible. At that point you can test them, negate them, or refine them.

The other thing I've noticed: even when the LLM gets it wrong, the reaction from being wrong helps distill the idea. You read its response and think "no, that's not quite it"—and suddenly you know what it actually is.

This is not new. Writing has always done this for me. What is different is the speed.

Exactly. Writing has always been my tool for thinking, but it's slow. With an LLM, the loop between "I vaguely know this" and "now I can express it clearly" tightens dramatically.

It is improving the interface between my thinking and language. Since reasoning depends heavily on what one can represent explicitly, that improvement can feel like a real increase in clarity.

I hadn't paid attention to this framing before—the LLM as an interface improvement, not a knowledge source.

From the HN discussion, firefoxd pushes back:

Not to dismiss other people's experience, but thinking improves thinking. People tend to forget that you can ask yourself questions and try to answer them. There is such thing as recursive thinking where you end up with a new thought you didn't have before you started. Don't dismiss this superpower you have in your own head.

And john01dav adds:

In my experience LLMs offer two advantages over private thinking:1. They have access to a vast array of extremely well indexed knowledge and can tell me about things that I'd never have found before.1. They are able to respond instantly and engagingly, while working on any topic, which helps fight fatigue, at least for me.

I think the combination is a killer—it helps me introspect my thoughts and offers extra knowledge. That's why writing books or articles is so much easier now.

Writing isn't about finding the words. It's about the journey and exploration and asking questions. Writing becomes documenting and curating the journey.

For the reader: they can also get this info from the docs or an LLM, but it's the journey that's missing.

firefoxd's point about "recursive thinking" is valid but misses something: the LLM provides resistance.

Thinking alone can loop. An external response, even an imperfect one, creates friction that forces your thoughts into new shapes.

It's the same reason rubber ducking works in software development. The solution is usually already inside you. You just need to externalise it. The rubber duck doesn't solve your problem; the act of explaining does. LLMs take this further: they're a rubber duck that talks back, occasionally pushes you in a new direction, and never gets tired of listening.