Legal AI doesn't scale without a front door

Every legal team I talk to knows they're behind.
None of them can tell me exactly where, or by how much. That gap between feeling the pressure and seeing the pattern is what this piece is about.
In the upcoming weeks we will deep-dive into what actually changes when a legal team goes from seeing the data to acting on it.

The data that legal teams have never had

Last month I sat through a call with a prospective customer, a head of legal ops at a large enterprise, and I asked the question I always ask early: what does your intake process look like right now?

She laughed. Not bitterly, just the resigned laugh of someone who has answered this question before and knows the answer isn’t good. “It’s email,” she said. “It’s a shared inbox. Sometimes it’s Teams messages. Sometimes someone walks over. We have no idea how much comes in, what types, how long things sit, or who picks them up. We just know we’re behind.”

“We have no idea how much comes in, what types, how long things sit, or who picks them up. We just know we’re behind.”

I have had that conversation many times in the past year. The details change. The shape of it never does. Legal teams are operating without any picture of what the business is actually asking them to do, how often, and at what cost. They feel the pressure. They cannot see the pattern.

I work on three core pillars of the product at Flank: the legal front door, orchestrations, and supervision. All three matter. But the front door is the one I find myself thinking about at odd hours, the one I get most animated about in planning sessions, and I want to explain why. Because I think it solves something that the rest of the industry has been getting wrong.

Where the detail lives, and where the leverage lives

I should be honest about the other two pillars first, because they’re genuinely absorbing work.

Orchestrations are where the precision lives. This is the engine room. When an agent drafts a franchise agreement or redlines a supplier MSA against a client’s playbook, the quality of that output depends on how carefully the orchestration has been configured. Getting the conditional logic right on a complex template, making sure the agent applies the correct fallback language for a specific jurisdiction, ensuring it catches untracked changes a counterparty has slipped in without using tracked changes. This is painstaking, detailed work, and I find it deeply satisfying in the way that any craft is satisfying when you get the tolerances right.

Supervision is where the operating model becomes real. We build for three different models of supervision: the customer’s own lawyers reviewing outputs, a partner firm managing oversight on the customer’s behalf, and Flank’s internal legal team providing supervision as part of the service. Designing a product surface that works for all three, where a GC reviewing ten items before their first meeting of the day and a dedicated supervision team processing hundreds of items a week both have what they need, is a genuinely hard design problem. The core shift it enables, moving lawyers from doing the work to checking the work, is transformative when it clicks.

But the front door is different. The front door is where scale comes from. And I think scale, specifically the ability to deploy agents on the right use cases with conviction, is the thing the market is missing most.

The front door is where scale comes from. And I think scale, specifically the ability to deploy agents on the right use cases with conviction, is the thing the market is missing most.

⚡ The problem with the pilot

Here is the pattern I see over and over. A legal team gets excited about AI. They pick a use case. Often it’s NDAs, because NDAs feel safe and contained. They run a pilot. The pilot works reasonably well. Then they try to figure out what to do next, and the conversation stalls. Should they expand to MSAs? Procurement contracts? Policy Q&A? They don’t know, because the decision is based on intuition and whatever the vendor suggested. There is no data. The expansion case is a pitch deck, not an evidence base.

I find this depressing, honestly, because I have watched good pilots die on this hill. The technology performed. The supervision held up. But the customer had no systematic way to identify which of their other processes were candidates for agent execution, so the pilot stayed a pilot. The GC couldn’t build a business case for expanding because the numbers didn’t exist. What should have been the start of something became the whole thing.

I have watched good pilots die on this hill. The technology performed. The supervision held up. But the customer had no systematic way to identify which of their other processes were candidates for agent execution, so the pilot stayed a pilot.

The industry has a word for this and it’s “pilot purgatory.” I think the cause is underdiagnosed. The problem isn’t that buyers don’t believe in the technology. It’s that they have no demand-side data to act on. They know they’re busy. They don’t know they’re getting forty-seven NDA requests a month with an average pickup time of fourteen hours, that a third of inbound work is misdirected, or that three people on the team are handling sixty percent of all contract reviews.

🔍 What changes when the lights come on

The legal front door, as we build it, embeds in the customer’s existing email. An address like legal@company.com or contracts@company.com. The business sends requests the same way they always have. Nothing changes for the requestor. But every request that comes through is now categorised, timestamped, and tracked. For the first time, the legal team sees its own demand pattern.

Volume: what’s actually coming in, broken down by work type. Review requests, drafting requests, policy questions, ad hoc queries. The shape of the demand.

Latency: how long each request sits before someone picks it up. This is the number I find most powerful, because it’s the one that surprises people. Legal ops leaders who guessed their NDA turnaround was “a day or two” discover it’s closer to fourteen hours just for the initial pickup, before any substantive work has started.

Routing: where work actually goes, who handles which types, who’s overloaded, what gets bounced between people before it lands.

And then the one that matters most for what comes next: opportunity. Which work types are high-volume, template-driven, and low-complexity? Which processes are the obvious candidates for agent execution?

I keep describing this to the team as “the lights coming on.” Most legal teams are operating in the dark. They can feel the walls, they know roughly where things are, but they’ve never seen the room. The front door shows them the room. And once they can see it, the conversation about what to automate stops being theoretical.

Most legal teams are operating in the dark. They can feel the walls, they know roughly where things are, but they've never seen the room. The front door shows them the room.

Data-led deployment, not pitch-led deployment

This is the thing that genuinely excites me, and it is why the front door is my favourite part of the product to work on.

When the front door has been running for a few weeks, the customer’s own data tells them what to do next. Not our pitch deck. Not a vendor’s ROI calculator with assumptions we picked to make the numbers work. Their actual request volumes, their actual latency numbers, their actual routing patterns.

The expansion conversation changes completely. Instead of “we think agents could help with your NDAs,” it becomes “your data shows forty-seven NDA requests per month at an average pickup time of fourteen hours. Want to see what happens when the agent handles them end-to-end, with lawyer supervision on the output?”

That's not a pitch. It's a recommendation based on evidence the customer generated themselves.

That’s not a pitch. It’s a recommendation based on evidence the customer generated themselves. I think this distinction matters enormously for trust, for retention, and for the pace at which a customer expands their use of agents.

I have watched the alternative play out too many times. A vendor sells a customer on an NDA automation use case. The customer runs it. It works. Then the vendor suggests expanding to MSA reviews. The customer asks: why MSAs? The vendor says: because that’s our next strongest use case. The customer says: but how do I know MSAs are my biggest problem? And the conversation gets stuck, because nobody has the demand data to answer the question.

The front door eliminates that deadlock. The customer doesn’t need to believe our analysis of which processes to automate. They can see it in their own data.

Why this kills the valueless pilot

I think the legal AI industry has a pilot problem, and the front door is the cleanest solution I’ve seen to it.

The valueless pilot works like this. A customer spends twelve weeks testing an AI tool on a single, sandboxed use case. The pilot “succeeds” in the narrow sense that the tool performed the task. But the pilot was disconnected from the customer’s real operating environment, ran on a sample that doesn’t represent their actual demand, and produced no data about what to do next. The customer is no closer to deploying agents at scale than they were before the pilot started. They just spent three months proving a technology works in a vacuum.

There is no sandbox. There is no sample data. There is no twelve-week evaluation period that produces nothing actionable. The front door is the production system, and it is generating the expansion business case from the moment it goes live.

The front door is the opposite of this. It is live, in production, from week one. It handles real inbound work from real business users in the customer’s real email. It generates real demand data from the first day. And it doesn’t require the customer to trust agents with substantive legal execution until the data gives them reason to. Start with triage and routing. See what the demand looks like. Then deploy execution on the work types where the evidence is clearest.

I find this much more honest than the typical pilot model. We’re not asking the customer to take our word for where the value is. We’re giving them a tool that shows them, in their own environment, with their own data.

We’re not asking the customer to take our word for where the value is. We’re giving them a tool that shows them, in their own environment, with their own data.

We're not asking the customer to take our word for where the value is. We're giving them a tool that shows them, in their own environment, with their own data.

🏗️ The thing I don’t know yet

I want to be direct about an open question, because pretending it’s resolved would be dishonest.

The front door generates demand data. That data shows which work types are candidates for agent execution. But the transition from “the front door shows you NDAs are your biggest bottleneck” to “the agent is now drafting your NDAs end-to-end with supervision” requires something else entirely. It requires the orchestration layer to be configured for that specific customer’s templates, playbooks, fallback positions, and escalation rules. It requires the supervision model to be set up and working. It requires onboarding work that takes time and expertise.

We’re building toward making that transition as seamless as possible, where expanding from front-door-only to full agent execution on a specific work type is a configuration change, not a new deployment. But I want to acknowledge that the gap between “seeing the opportunity” and “executing on the opportunity” is where a lot of the real product and operational complexity sits. The front door surfaces the right question. The answer still requires careful work.

I think this is fine, and I think customers understand it. The point of the front door isn’t to pretend that deploying execution agents is trivial. It’s to make sure the decision about where to deploy them is grounded in evidence rather than intuition.

Where this is heading

I said at the start that I think the front door solves something the rest of the industry has been getting wrong. Let me be more specific about what I mean.

Most legal AI products are sold use-case-first. Pick NDAs, or pick MSA review, or pick procurement, and then deploy the tool. The customer has to decide upfront which process to automate, usually based on a combination of gut feel and vendor suggestion. If they pick the wrong one, the value doesn’t materialise, and confidence drops.

I think the front door inverts this. You don’t need to know which process to automate before you start. You start with the front door, which handles all intake regardless of type. The data accumulates. The picture forms. And the customer deploys execution agents on the processes where the evidence is strongest, in the order that makes the most sense for their team.

This is a fundamentally different way to approach agentic deployment at scale. It’s demand-led rather than supply-led. The customer’s operating reality determines the expansion path, not the vendor’s product roadmap.

I think about that head of legal ops from the call I mentioned at the start. She knew she was behind. She could feel the pressure. But she couldn’t see the pattern, and without seeing the pattern she couldn’t make a data-driven decision about what to do about it. The front door gives her that visibility. What she does with it is up to her. But at least the decision is informed.

That’s the part that excites me. Not the triage logic or the categorisation engine or the routing rules, though all of those matter. What excites me is the shift from guessing to knowing. The shift from vendors pitching solutions to customers seeing their own problems clearly enough to choose the right ones. The shift from pilots that prove a technology works in a vacuum to deployments that expand because the evidence demands it.

I think that’s how legal AI actually gets adopted at enterprise scale. Not through better demos or sharper pitch decks, but through products that generate the conviction the customer needs to act.

Paul Lacey builds AI agents for in-house legal teams at Flank, focused on the gap between rising contract volume and flat headcount.

✳️

Seeing your demand is step one. Acting on it is where things get interesting. Next week, I’m writing about what actually changes when a legal team moves from triage to full agent execution on their first work type.

The data that legal teams have never had

Where the detail lives, and where the leverage lives

⚡ The problem with the pilot

🔍 What changes when the lights come on

Data-led deployment, not pitch-led deployment

Why this kills the valueless pilot

🏗️ The thing I don’t know yet

Where this is heading

Keep reading.

Are we just accepting that legal is a bottleneck?

Why your lawyers re-read the agent's work

Sort your AI worries into two piles

Give the business instant legal support with Flank agents.