I’m Paul. I build AI agents for in-house legal teams at Flank.

I wrote last time about why legal AI doesn’t scale without an intake. This piece is about the layer on top: the surface that turns demand data into decisions.

More on this as we learn. The interface problem is real, and we’re deep in it.

The surface is the product

I spent a few hours last week playing with Claude Design after it shipped. I am not a designer. I am a product person who occasionally vibes my way through a Figma file or a Tailwind prototype when I am trying to think through a flow. Some colleagues had framed Design as “a wrapper around Co-work, which is itself a wrapper around Code,” with the implication that the wrappers were cosmetic. The model underneath is what does the work.

Technically, that is true. The intelligence is the same. What changes is where you do your iteration. With Code, you iterate in code. With Design, you iterate at the design layer, before anything gets committed to a component tree or a codebase. You see the thing, adjust it, see it again, adjust it again. The loop is tighter because the surface is the surface the decision is being made on.

The thought I could not shake afterwards: the intelligence is necessary but it is not the point. The point is the surface that makes the intelligence useful to a human trying to make a decision. Strip the surface off and you still have all the capability. You just cannot do anything useful with it.

I have been thinking about that a lot in the context of the front door.

A quick backward glance

I wrote recently about why legal AI does not scale without a front door. The short version: most legal teams are operating in the dark. They feel the pressure of too much work but cannot see the shape of it. What is coming in, from where, how often, how quickly anyone picks it up. A front door that sits inside the channels the business is already using, generates demand data from day one, and turns the expansion decision from a pitch into a recommendation grounded in the team’s own numbers.

The argument in that post was about the data. This one is about what you do with it.

The intelligence is necessary but it is not the point. The point is the surface that makes the intelligence useful to a human trying to make a decision.

🔒 “I could not give you the numbers”

I have had three or four conversations in the past month where a GC or Head of Legal Ops has said some version of the same thing. We ask them what their inbound volume looks like, which types of work dominate, where the time is going. They say they could not tell me. Not because they are not thinking about it. They are thinking about it constantly. It is the basis of every hiring case, every outside counsel conversation, every planning exercise. But the data to answer it with precision is not there.

This is not, I think, a failure of legal teams to try. The past decade of legal technology is full of serious attempts to fix exactly this. Ticketing systems. CLM portals. Intake forms. Workflow tools that route requests through a web interface and tag them on the way through. Every one of those tools could produce the numbers. In theory.

The problem is that they all required the business to change where it submits work. A sales rep who used to email the legal team is now being asked to log into a portal, pick a category from a dropdown, and fill in a form. That friction is where the data capture lives, and it is also where adoption dies. What happens in practice is that the high-performing rep sends an email anyway, because email is how things get done when they need to get done. The tool captures the requests that were already low-stakes enough to wait for a form. The urgent and commercially important stuff continues to flow through the channels that were there before, and those channels are still invisible.

So the GC ends up in a strange position. They have bought tools that were meant to give them visibility. The tools work, technically. The demand data just is not in them, because the demand did not go there. The shared inbox, the Teams DMs, the hallway conversation, the quick word with legal at the coffee machine. That is where the work actually arrives, and none of those channels are instrumented. The dashboard shows what the business was willing to submit through the portal. The business has not been willing.

The tool captures the requests that were already low-stakes enough to wait for a form. The urgent and commercially important stuff continues to flow through the channels that were there before.

The front door is a different proposition because it does not ask the business to change anything. You plug the agent into the channels the business is already using, primarily email for most of our customers, and the agent sits there. Whether it is doing execution, routing, or just observing, the demand becomes visible as a side effect of being in the right place. Every request gets categorised. Every request gets timestamped. Every request gets logged with a destination. The business never has to notice that anything changed. The legal team, for the first time, has data.

But the data is not the insight. The surface is.

⚡ Data is not visibility

Here is where I think it is easy to get confused. A lot of legal ops tools will tell you they give you visibility. What they mean is that they capture the data. Whether you can actually see anything useful in it is a separate question. I have seen plenty of dashboards with every metric you could possibly want and no signal in any of them. The data was all there. The surface was not.

What the surface has to do, if it is going to earn its place, is take the demand data and do three things with it. None of them are trivial.

The first is surfacing what needs attention. Of all the requests flowing through the front door, some will be clean routing decisions the agent can make on its own. Some will be executable tasks the agent can run end-to-end. But some will be edge cases where the agent does not know what to do, or items where the agent has a view but wants human approval before acting. Those are the exceptions that need to rise to the top, filtered from the noise, so that a lawyer checking the agent’s work can deal with them in minutes rather than hunting through a queue. That reviewer might be one of the customer’s own lawyers, a partner firm providing the supervision layer, or our own supervision team at Flank. Whichever it is, they need the surface to do the filtering for them, because the whole point of agents is that the human only touches the things that need a human, and the surface is how the human knows which things those are.

The second is zooming out. The same system that surfaces individual exceptions needs to aggregate them. How much demand is coming in, broken down by work type. Where it is coming from. How long different categories of work sit before someone picks them up. Which teams are the heaviest users. What the shape of the backlog is. This is the demand funnel view, and it is the thing most legal ops leaders have never had. It is also the view that changes the conversation about what to do next, because the conversation becomes “here is what the data shows” rather than “here is what we think is happening.”

The whole point of agents is that the human only touches the things that need a human, and the surface is how the human knows which things those are.

The third is turning the data into recommendations. Raw numbers are a start, but what a GC actually needs is not a dashboard. It is a next step. “31% of your inbound is straightforward NDA reviews, currently averaging three days to pick up. Here is what happens if you deploy an agent on them.” Or, a bit further down the maturity curve, “your MSA agent has accepted this particular redline every time the counterparty has proposed it, which is 64% of contracts. You might want to update the template and stop having the conversation.” Those recommendations are where the data stops being an observation and starts being a lever. Without them, you have a reporting tool. With them, you have a system that tells the team what the highest-value decision in front of them is.

None of those three things is easy to build. I want to be honest about that.

🏗️ The ways this goes wrong

The easiest mistake is to build a dashboard that performs comprehensiveness. Every metric visible, every filter available, every cut of the data selectable. It looks serious. It is the product surface equivalent of answering “what should we automate next” with a 40-slide deck of options. A real surface chooses. It highlights the things that matter and hides the things that do not, because the user’s time is scarce and their attention is the bottleneck. A GC opening this at 8am before a board meeting should see the two or three items that need them personally, the one or two patterns worth noticing, and nothing else.

A GC opening this at 8am before a board meeting should see the two or three items that need them personally, the one or two patterns worth noticing, and nothing else.

The second mistake is forgetting that the business users never see any of this. This is important. The people sending requests, the sales reps and procurement leads and HR partners, do not want to log into a portal. They want to send an email and have their contract back. The interface surface for them is their own inbox, possibly with a status label, and nothing more. The demand intelligence layer is for the legal team. If the front door starts leaking change-management overhead into the business, the whole model breaks. Invisibility on one side, comprehension on the other.

The third mistake, and this is the one I think about most, is optimising the surface for the wrong loop.

If the dashboard is a thing people look at once a quarter during a planning exercise, it will be polished and useless. The loop that matters is the daily one. The lawyer who opens the review queue in the morning, deals with the handful of items the agent has flagged, and closes it inside ten minutes. The legal ops lead who glances at the trend view once a week and notices that procurement volume has jumped by 40% for no obvious reason. The GC who sees a recommendation, understands the evidence behind it, and makes a call about whether to act. If the surface does not fit into those rhythms, it will be ignored, regardless of how intelligent the system underneath is.

Back to Claude Design

The reason the Claude Design experience stuck with me is that it collapsed the distance between intention and artefact. I could see what I was thinking. I could change it in the same place I was thinking about it. The surface was not a view onto the intelligence, it was where I did the work.

I think legal front door visibility has to feel like that. Not a reporting layer bolted onto the execution system. Not a tab you open when someone asks a question. The place where a legal leader actually sees what the function is being asked to do, understands the patterns, and decides what to deploy next. The intelligence has to be there, obviously. The routing, the categorisation, the execution, the supervision logic. But the surface is what turns it into decisions.

I do not think anyone has fully solved this yet, ourselves included. There are a lot of things about the right interface that we do not know, and the honest answer is that we will learn them by building versions of the surface, deploying them, watching which decisions they help with and which they do not, and iterating. That is a substantial product problem, and I suspect it will take longer to get right than the execution layer did.

But I think it is the thing that determines whether the front door becomes the demand intelligence layer we think it can be, or just a triage tool that happens to generate some data. The difference is not in the intelligence. It is in the surface.

✳️

Paul Lacey builds AI agents for in-house legal teams at Flank, focused on the gap between rising contract volume and flat headcount.