What Most People Missed in the Claude for Word Demo (A Lawyer's Take)

A

By Anna Guo

What Most People Missed in the Claude for Word Demo (A Lawyer's Take)

Every Claude announcement recently lands like an atomic bomb in the legal tech world, with many thinking it spells doom and gloom for legal tech builders.

Article image

As a lawyer focused on legal AI research, I felt compelled to check it out.

In this article, I'll break down what the demo gets right, where there are gaps, and share my take on what this means for:

  • Legal teams choosing between general-purpose and purpose-built solutions, and
  • Legal tech builders are wondering whether all of these announcements spell doom and gloom for them.
Spoiler: I don't think they do. In fact, I would share this video with my clients if I were a legal tech founder.

More on why below.

This article is adapted from my YouTube review, which goes into more detail on each section. You can watch the full video here: https://www.youtube.com/watch?v=dbn7VJLiA3s

Disclaimer: I'm only reacting based on the demo and my own experience using Claude Cowork, which has the same functions. I'm not on the enterprise or team plan, so I don't have access to the Word add-in right now.

Before We Get Into Substance: 2 Things I Noticed

An open dig at Microsoft Copilot?

Anthropic drops an Easter egg on the very first demo screen. Look at the top right corner of the toolbar: there's a Copilot add-in.

Article image

This feels intentional.

Many companies already have Copilot installed but are still adding Claude, because Copilot, even though it's native to Microsoft, isn't cutting it for what lawyers need.

I am not surprised, because from both Legal Benchmarks studies on information extraction and contract drafting, the results show that Copilot does not perform well in legal tasks.

Article image

(Copilot was only 50% accurate in the Contract Drafting Benchmark)

What is "OpenClaude"?

I also noticed "OpenClaude" in the toolbar right next to Copilot.

Article image

What is it?

My theory: either a new model Anthropic hasn't announced yet, or the supercharged internal version the Anthropic team uses that's too powerful for the rest of us.

This is pure speculation, but if anyone at Anthropic is reading this, I'd love an answer.

With that out of the way, let's get into the substance.

What's Actually Being Demoed

The demo showcases a more sophisticated use case for Claude than what Anthropic has shown before with Claude Cowork's legal plugin. It covers:

  • Key issue spotting (with redlines),
  • Plain English instruction-based redline application, and
  • Document proofreading.

Taken together, that maps out the full workflow a lawyer would follow for most contract reviews, which shows that Anthropic understands how Claude can be part of an entire legal workflow, not just a single step.

On the surface, this seems to overlap with what a lot of contract AI solutions already offer: redlining, review, and drafting.

But claiming a capability and performing well on that capability are two very different things. So I went through each part of the demo to see how Claude actually performed.

Claude's Demo Performance

The Document Itself Doesn't Feel Real

Before getting into the substance, the document used in the demo raises questions.

The file name says this is the 4th version of a draft mutual NDA between Acme and Vertex, two companies. Yet the document header reads "Morrison & Callaway LLP," a law firm. If two companies are negotiating a mutual NDA, they would likely never use a template with a law firm's header on it.

Additionally, the overall feel of the document is that it's artificial.

Article image

This matters to lawyers because the quality of the demo document directly affects how much lawyers trust the tool. If the test scenario doesn't feel realistic, the results are harder to take seriously.

The Liability Clause Is Legally Incoherent

The proposed redline from the counterparty under the liability cap section 5 feels equally unrealistic.

In the demo, the counterparty added:

"Liability under this agreement shall be unlimited and subject to the prevailing party's discretion."

Article image

There are several things wrong with this sentence.

First, it makes no sense for a counterparty to ask for unlimited liability for both parties. The typical move is to push for unlimited liability on the other side, not to expose yourself as well.

Second, if liability is unlimited, why is it still "subject to the prevailing party's discretion"? "Prevailing party" is a concept that belongs in dispute resolution; it refers to the party that wins. But a limitation of liability clause is about setting a pre-agreed cap.

The sentence contradicts itself internally.

Whoever wrote it used terms like "subject to" and "prevailing party," language that sounds legal on the surface, but the result is legally incoherent.

This was not written by a lawyer, and it gets harder to believe this is a realistic document.

Surgical Redlines: The Gap Claude Hasn't Closed

The word "surgical" appears constantly in legal tech marketing, and for good reason. It reflects a real user demand that Claude hasn't addressed yet.

Take Section 3 as an example. On the surface, Claude deleted the counterparty's proposed subsection C and replaced it with the user's accepted language. But the markup deleted the entirety of what the counterparty proposed and block-inserted the replacement. I've noticed the same behavior when using Claude Cowork for redlining.

Article image

You might think: so what? It achieved the goal.

But if you're a contracts lawyer and you receive a markup that completely deletes your language and replaces it wholesale, you'd be frustrated. And so would the other side.

There's an etiquette to redlining. And it will matter until it’s no longer humans reviewing these documents.

The principle is to minimize edits: improve readability, reduce friction, and assert your position with as few changes as possible.

That's what "surgical edits" means. Legal tech companies invest significant effort into fine-tuning their systems to produce surgical redlines, because they're obsessed with solving this exact problem. Claude, as a general-purpose tool serving a hundred different professions simultaneously, simply isn't there yet.

A Missed Opportunity: Comments to the Counterparty

Claude includes a comment bubble explaining what was redlined and why to the user. But it would be even more valuable to insert comments explaining the user's rationale for pushing back on a clause, the kind of comment you'd leave for the counterparty's counsel.

Article image

Claude has the ability to insert those comments because I already used Cowork to insert comments to the counterparty before.

If I were making this demo for lawyers, I would absolutely include that, because explaining why you're pushing back is a critical part of the negotiation workflow.

I hope the Anthropic team picks this up for the next iteration.

Proofreading: The Simplest Task, and Still Not Right

The demo also showed Claude proofreading a legal document. On the surface, it demonstrated the concept.

But the execution was disappointing. It flagged two issues, and both were wrong.

  • The first was a cross-reference correction - changing a reference from Section 4(c) to 4(d). Except that neither Section 4(c) nor 4(d) actually exists in the document. So Claude replaced one broken cross-reference with a phantom one.
Article image
  • The second was an entity-name correction. The boilerplate clause referenced "Vertex Laboratories" while the signature block said "Vertex Labs." Claude amended it to "Vertex Labs Inc.", which did not match the defined term or signature-block entity name. A proofreading tool that introduces a new inconsistency while claiming to fix one has made the document less reliable, not more.
Article image

AI proofreading should be table stakes. If a tool can't get that right consistently, it undermines confidence in everything else.

What This Tells Us About Legal AI Right Now

What struck me most about this demo is how important human experts still are, on both sides.

On the building side:

Claude clearly lacks legal awareness, both in creating a demo document that looks realistic and in its performance on core legal tasks.

I have no doubt that Anthropic's technical team can build out every feature that legal tech companies offer. But do they understand enough about lawyers and legal workflows to build something that feels tailored?

If the demo, which is where you first build trust with legal buyers, already contains this many issues, that's a gap where legal domain expertise adds real value.

On the user side:

This was a simple legal task: review, redline, and proofread a basic NDA. And AI still wasn't perfect. This isn't unique to Claude. In my own use of Claude Cowork and other legal tech tools, they hallucinate and sometimes lack legal nuance.

That's where human lawyers remain essential: for verification, and for setting up these systems so they deliver not just fast outputs but high-quality ones. This becomes even more critical as agentic systems capable of executing multi-step workflows become more common. Without robust verification at each stage, errors don't just persist. They compound.

Final Takeaways

Claude first moved to meet the engineers where they are, in the terminal. Now it's moved to meet lawyers where they are, in Microsoft Word.

What's worth paying attention to is the dual positioning. Claude's recent developments show us that frontier model labs are just as interested in being the supplier to legal tech companies as they are in being their competitors.

So what does this mean for everyone else? Here's my take:

For legal teams:

General-purpose solutions like Claude are a sensible starting point. They can get you a meaningful distance on common tasks like review and drafting. But there's a gap, because Claude currently lacks legal awareness. If you can cultivate legal engineers internally, or bring them in externally, to customize these tools and adapt your workflows, that can help close some of that gap, particularly around prompt design, playbook configuration, and output verification.

But some gaps, like surgical redlining, are deeper than configuration. If you need something off the shelf that's already adapted to legal workflows, a purpose-built solution is worth exploring.

Get comfortable with uncertainty and stay willing to experiment.

For legal tech builders:

This is definitely not doom and gloom. To the contrary, this Claude Word add-in demo is something you can show your clients, because so many details in it showcase a lack of legal awareness. You can point to it and say: "We do this better."

But only if you actually do.

Surgical redlining, drafting hygiene, and accurate proofreading: these are the areas where deep legal knowledge translates directly into product quality.

Anthropic has billions of dollars behind it and still produced a demo with gaps.

For the founders and overthinkers: Anthropic isn't perfect, and they still generate massive attention with every release.

So if you have an idea, don't wait to make it perfect.

Article image

Just ship it.

About the Author

A

Anna Guo

Anna is the founder of Legal Benchmarks.