The First Step in Legal AI Evaluations: Identify the Artifacts

M

By Mohamed Al Mamari

The First Step in Legal AI Evaluations: Identify the Artifacts

Legal AI evaluation is maturing.

The community is shifting the conversation away from marketing hype and feature comparisons to structure, risks, and capability.

That progress matters.

For any framework to produce useful results, Legal teams must answer one question:

“What exactly creates value in our legal work?”

This article is about what Legal teams must define before evaluation begins.

Legal Artifacts

Legal work is usually talked about in abstract categories: contracting, advisory, compliance, disputes, etc…they’re great for organising teams, planning, and reporting but are far less useful at describing the results (the value of legal work).

This happens at the level of artifacts.

In this context, an artifact is a discrete legal output that embodies legal judgment, moves through a defined life cycle, and carries risk when relied upon by the business/client.

Article image

Human Attention

There’s a common promise around Legal AI: that it will eventually take over legal work end to end. In practice, that doesn’t work; as Legal artifacts carry legal and commercial consequences.

“How much human attention does this artifact deserve at each stage?”

Asking how much human attention an artifact deserves only makes sense if we know where that attention is being applied. Let’s break it down.

Every artifact moves through a lifecycle with four stages:

1. Design: deciding what the thing should be before anyone builds it.

2. Development: turning the design blueprint into something tangible.

3. Verification: a lawyer in the loop.

4. Delivery: artifact is sent, filed, and stored.

Treating all stages the same by either keeping humans everywhere or by forcing autonomy doesn’t work.

Legal teams need to decide how much assistance is appropriate at each stage of the lifecycle, given the artifact’s risk (how bad things could turn out) and frequency (how often it is produced).

The goal is the right distribution of human attention.

The Right Level of AI Assistance

Level 1

Very high human involvement across all stages.

Lawyer manually explores options, tests ideas and generates early drafts or outlines. Tools do not act on their own. The lawyer decides what to ask, what to use, and what to discard.

Tools at this level are typically delivered as standalone interfaces, such as web-based assistants or chat-style applications.

Level 2

High human involvement across all stages, but less so in development.

Lawyer meaningfully assisted by tool via drafting, suggestions, and revisions, but it does not decide what is acceptable.

These tools are usually delivered inside existing work environments (e.g. Google Docs or Microsoft Word), operating in context, responding to what the lawyer is doing.

Level 3

Lower degree of human involvement, excluding verification and delivery.

Tools actively generate and assemble content within defined boundaries. Lawyers remain responsible for verification.

Interaction shifts from open-ended prompting to configuration, review, and approval.

This level works where legal judgment has already been translated into templates, playbooks, or rules, and variation is expected but controlled.

Level 4

Human involvement remains high at the design and verification stages.

Tools actively generate and assemble content within defined boundaries. Lawyers remain responsible for verification.

Interaction shifts from open-ended prompting to configuration, review, and approval.

This level works where legal judgment has already been translated into templates, playbooks, or rules, and variation is expected but controlled.

Levels 5 (semi-autonomous) & 6 (fully-autonomous) do not fit the operational reality of most lawyers.

Bringing Everything Together

Before evaluating any Legal AI solution, legal teams need to go through four steps, in order:

  • Identify the legal artifacts: Focus on the outputs that concentrate risk, volume, or business dependency.
  • Map each artifact across its lifecycle Understand where true opportunities for operational improvements exist.
  • Define required human involvement Decide how much human judgment each stage deserves.
  • Set the appropriate level of AI assistance Use these boundaries to determine where AI can assist safely and where it cannot.

Do this work first.

From there, tools can be evaluated against a clear operating design, rather than shaping it.

About the Author

M

Mohamed Al Mamari

Mohamed Al Mamari is part of the founding legal team at Vodafone (Oman), advising go-to-market teams across consumer and enterprise. He previously co-founded a legal tech startup in the document automation space and is interested in how in-house counsel can do more with less by leveraging AI and building systems that improve productivity.