We built the product at her desk

Most of my work happens in code. But the best product decision I made this month happened at someone else’s desk.
My last piece was about what your legal ops dashboard is actually measuring. This one is about what you can only learn by being in the room.
Hi btw, I’m Paul. I build AI agents for in-house legal teams at Flank.

The thing that still surprises me

Earlier this month I spent a day in a room with our CTO, Martin, and a legal service provider. Two people on their side, the founder and one of the practitioners who actually does the work. Two laptops on ours. We walked through their manual workflows, watched them use the product, and by the end of the day we had designed and validated a meaningful change to how our supervision experience works. The practitioner used the new version inside the same session, and told us whether it solved the problem we had spotted twenty minutes earlier.

We did not ship to production in the room. That is not what changed. What changed is that the design and learning loop, the part that used to take weeks of customer interviews, design rounds, internal reviews and prototypes, closed in a single afternoon. The productionisation that sits after it is a matter of days. Integration, security review, the standards an enterprise product has to meet. That part is easy by comparison.

I want to write about what that day taught me, because I think the collapse of the design and learning loop is a bigger shift in how product work gets done than the question of how quickly any single change reaches production.

The practitioner used the new version inside the same session, and told us whether it solved the problem we had spotted twenty minutes earlier.

⚡ The thing that still surprises me

After years of building products with lawyers, the part that still surprises me is how much I learn from sitting next to one and watching them work.

You can read transcripts of customer calls, scroll through the Slack channel where they ask questions, study the requests they have raised on the roadmap. None of it tells you what an hour next to a lawyer at her desk tells you. The friction is not in the artefacts. It is in the small motions between them. The pause before approving an agent’s output. The way she keeps the original contract open in one window while she scrolls back and forth looking for the bit that anchors the change in front of her. The half-second of hesitation as she tries to remember which clause governs the redline the agent has just proposed.

I find that the highest-value insights almost never come from what the customer says they want. They come from what she does without saying anything at all. The hand briefly hovering over the approve button. The scroll back to page nineteen, then back to page two, then back to the supervision view. These are not feature requests. They are the seams in the workflow, and you can only see them if you are in the room.

🔍 What we actually changed

The specific friction we hit that day was on supervision. The agent’s work was good. The problem was that the lawyer reviewing it could not easily see the context she needed to approve it. To validate a single proposed change, she was hunting through a long, dense contract for the surrounding clauses, the defined terms, the cross-references, the bits that determined whether the agent’s call was right. The context she needed to make the judgement was scattered across the document. The supervision view was not bringing it to her. So the “approval” was, in practice, a hunt.

We sat next to her and built a working surface in flight. The CTO had the codebase open and the model running alongside him. Within the session we had a version that anchored the supervisor in the relevant context, surfacing the parts of the document she needed in the place she needed them, at the moment she needed them. The approval stopped being a hunt and became a glance. Same agent, same output. A completely different supervision experience.

What we built that afternoon was not the production version. It was the surface through which we could test the assumption in real life, on real work, with the person whose problem it was. That is the part that used to be slow. In the old shape of product work, the equivalent change would have moved through a workshop, a follow-up document, a Linear ticket, a sprint conversation, a build, a release, and eventually a user test where the customer’s memory of the original friction had already faded. You would ship the fix and hear “oh yes, that thing, we worked around it.” Most of those weeks were not engineering. They were the design and learning loop, the slow back-and-forth between thinking we had understood the problem and actually finding out.

The friction is not in the artefacts. It is in the small motions between them.

That loop has collapsed. The customer’s first experience of the change happened in the room, on her own work, while we watched. When the first version was not quite right, we tweaked it. When the second one landed, she said so, and we moved on. By the time we left, we did not need another round of customer interviews, design reviews or prototypes to know what to build for production. We had already tested it on the work that actually matters.

Vibe coding next to the customer

I do not want to over claim what vibe coding can do. The thing we built in the room was a prototype on a working surface, not a production release. Shipping it properly still requires the things shipping it properly has always required: integration, enterprise security review, the standards we hold ourselves to before anything touches a customer’s environment. That work is real, but it is also relatively quick and well understood. The bottleneck used to be everything that came before it. The design rounds, the assumption testing, the rebuilds after the user research told us we had read the problem wrong. That bottleneck is what has compressed.

What is now genuinely iterable in the moment, with the customer in the room, is the design itself. The surface the supervisor sees, the way context is presented to the approver, the wording of a follow-up question. The customer co-creates the design in the moment it is needed, on real work, and gives feedback you cannot get any other way.

The unlock is not the tooling on its own. It is what the tooling does to the value of being physically present. When the design and learning loop is short enough to close inside a single session, in-person time becomes the densest learning surface a product team has access to. You do not need to schedule a separate research sprint. You do not need to go away for three weeks to design a v1. The v1 happens at the desk. The customer’s reaction to the v1 happens five minutes later. The v2 happens before the meeting ends. The production version follows a few days later, because by then you already know exactly what you are building.

Why I think this is now our operating model

I keep hearing the argument that AI makes physical proximity less important. The reasoning goes that if agents can synthesise customer calls and surface patterns automatically, why does it matter whether anyone from the product team sat in the room.

I think the argument has it backwards. Faster synthesis raises the marginal value of being there in person, not the other way round. When the cost of shipping the response was high, the bottleneck was on the build side, and the question of how you gathered the insight was less interesting because most of the insight died on the way to production anyway. Now that the build side has compressed, the bottleneck moves upstream. The question is whether you have the right insight in the first place, and whether you have it in a form that survives contact with the actual workflow. Both of those are easier to get right in a room than over Zoom or via tickets.

There is a related point I want to make carefully, because it is easy to slip into the wrong register. The day was not “user research.” We were not there to validate hypotheses. We were there to do the work alongside them and improve our product in flight. That framing matters. It changes what the customer expects from the meeting, what we expect from ourselves, and what is on the table to be changed. A research interview asks “would you use this.” A working session asks “let’s make this work, now.” Vibe coding with the customer in the room is what makes the second question answerable in real time.

This is now our operating model. Not a workshop we run when something is broken. A working pattern we default to whenever there is a meaningful question about how the product fits into the customer’s day. We do them more often, with more of the team, with the codebase open. The frequency itself is part of the design. Compressing the learning loop only works if you are actually inside it on a regular cadence.

The v1 happens at the desk. The customer’s reaction to the v1 happens five minutes later.

⚖️ Workflows rule

If I had to pick the single biggest takeaway from the day, it would be the same takeaway I keep arriving at from every workshop I have done in this category. Workflows rule.

You can produce an agentic output that is technically perfect, accurate, well-reasoned, drafted in the right tone, with the right fallback positions applied. And the customer will still struggle if it fits clumsily into her workflow. The context for the approval scattered around the document. The supervisor having to hunt for the clauses that justify the agent’s call. The bit that should anchor the decision sitting three sections away from the bit that needs the decision. Each of these is, on its own, a small detail. Together they are the difference between a supervision experience that takes seconds and one that takes minutes, and at the scale enterprise legal teams work at, that difference is the whole product.

I think this is the most underrated truth in legal AI right now. Most of the public conversation is about model accuracy, benchmarks, retrieval architectures. These matter. But they matter as inputs to a much harder question, which is whether the output of the agent fits into the shape of the work the human is actually doing. The shape of the work was built up over years of habits, conventions, tool choices, and shortcuts that nobody in the legal team can fully articulate. You cannot derive it from a transcript. You can only see it when you sit with the lawyer and watch the work happen.

The agent we left the room with was functionally indistinguishable from the agent we walked in with, if you measure it on accuracy alone. What changed was the surface the supervisor saw. The new version puts the context where her eyes are already looking. The old version did not. From her perspective the difference is enormous. From a benchmarking perspective it is invisible.

I think the teams that win in this space over the next two years will be the ones that take this seriously. Not “workflow integration” as a feature flag, but workflow as the unit of design. The agent is not a thing you build and then plumb in afterwards. The plumbing is the product. The agent is one of its components.

Where this leaves me

I find myself with two convictions and an open question.

The first conviction is that there is no substitute for sitting next to the customer while she does the work. Not for diligence. Not for empathy. For product accuracy. I think a product team that does not have a regular cadence of physical proximity to its users in 2026 is leaving a startling amount of value on the table.

The second conviction is that the value of that proximity is now meaningfully higher than it was a year ago, because the design and learning loop has shortened to the point that the customer can be part of the iteration in real time. The build itself still takes the build’s worth of time. What has changed is that you arrive at the build with the design already validated by the person who will use it. The two trends compound. AI does not replace the workshop. It is what finally makes the workshop a design environment rather than a research environment.

The open question is how this scales. A handful of customers can be served this way without changing how the team operates. Fifty cannot. I do not yet have a clean answer to what the model looks like when the number of accounts we work with this way grows by an order of magnitude. I suspect part of the answer is that the workshop-and-ship pattern becomes the template for a particular kind of customer touch, and other patterns sit around it. But I want to be honest that this is not solved yet, and the fact that it is not solved is one of the more interesting problems on our plate.

For now, the next workshop is in the calendar and the laptops will be open.

✳️

The thing that still surprises me

⚡ The thing that still surprises me

🔍 What we actually changed

Vibe coding next to the customer

Why I think this is now our operating model

⚖️ Workflows rule

Where this leaves me

Keep reading.

Are we just accepting that legal is a bottleneck?

Why your lawyers re-read the agent's work

Sort your AI worries into two piles

Give the business instant legal support with Flank agents.