Vendor Due Diligence for Legal AI: 12 Questions Tax & Crypto Practices Must Ask
vendor selectionAI governancecybersecurity

Vendor Due Diligence for Legal AI: 12 Questions Tax & Crypto Practices Must Ask

JJordan Ellis
2026-05-09
19 min read

A practitioner checklist for vetting legal AI vendors on provenance, security, explainability, compliance, and support.

The legal AI market is scaling fast, and that speed is exactly why tax and crypto practices need a disciplined vendor review process. When a platform can help teams draft faster, search deeper, and analyze more documents, it can also create new exposure if its data sources, security controls, training methods, or support model are weak. Recent growth in legal AI adoption shows how quickly firms are moving from experimentation to procurement, but procurement without rigorous evidence from tech vendors is how risk enters quietly. For tax controversy, SALT, IRS defense, and crypto compliance work, the right approach is a practical vendor evaluation checklist built around model risk, data provenance, legal AI security, explainability, and service levels.

This guide is designed for practitioners, not technologists. It gives you 12 questions to ask any legal AI vendor before you sign, renew, or expand use. It also shows how those answers should be documented in your hybrid cloud architecture, your internal policy set, and your contract checklist. If you are evaluating AI for tax memo drafting, transaction review, subpoena analysis, wallet tracing, or regulatory research, the goal is simple: reduce manual work without increasing confidentiality, negligence, or compliance exposure. That means being as demanding about provenance and support as you are about the product demo.

Confidentiality is not optional

Tax and crypto matters often contain the most sensitive data a practice handles: SSNs, EINs, filing histories, transaction records, wallet addresses, bank activity, audit workpapers, engagement letters, and privileged strategy discussions. A general-purpose tool may be impressive in a demo, but if it cannot clearly explain retention, segregation, encryption, and access controls, it is not ready for practice use. In this space, a vendor’s marketing language is less important than its actual operating model, especially when attorneys and staff may paste in client facts during a live matter. This is why your review should go beyond the brochure and include secure AI operating models and written answers tied to the system architecture.

In tax and crypto work, hallucinations are not just a quality issue; they can become a filing problem, an advice problem, or a sanctions problem. A wrong citation in a memo, a missed reporting threshold, or a mistaken interpretation of a token transaction can cost real money and create professional liability. That is why model risk management matters as much as feature comparison. You should ask how the vendor tests for accuracy, whether outputs are grounded in source documents, and what guardrails reduce unsupported conclusions. For a useful mental model, think of vendor selection the way you would think about measuring AI agents: if you cannot define the metrics, you cannot manage the risk.

Regulatory drift is constant

Crypto guidance changes quickly, IRS procedure shifts often, and state tax rules can diverge in meaningful ways. A vendor that was acceptable six months ago may already be outdated if its legal content pipeline and update cadence are weak. Your diligence should therefore examine how the vendor monitors regulatory updates, whether it differentiates federal, state, and international rules, and how it handles fast-changing topics like digital assets, broker reporting, and cross-border information returns. Firms that underestimate this step often end up with a tool that is fast but stale, which is worse than no tool at all. The right standard is not novelty; it is reliability under change.

2. The 12 due diligence questions every practice should ask

Question 1: What is your data provenance?

Start with the most fundamental issue: where does the model get its knowledge, and how is that knowledge maintained? You want a precise answer that distinguishes between licensed legal databases, public web content, customer-uploaded materials, vendor-curated datasets, and synthetic training data. Ask for the date ranges, jurisdictions, and document types included, along with any exclusion criteria. If the vendor cannot explain provenance clearly, you should treat every output as unverified. For a broader perspective on evidence quality, see our guide on demanding evidence from tech vendors.

Question 2: How was the model trained and fine-tuned?

Training-set transparency is essential because it shapes bias, confidence, and error patterns. Ask whether the system is a general foundation model, a retrieval-augmented workflow, a fine-tuned legal model, or a hybrid. The answer should tell you whether the model was optimized for legal text, how it was evaluated, and whether it was exposed to privileged or customer data during training. In practice, you want a vendor who can separate training data from your firm’s live matter data and explain whether your prompts are used for future model improvement. If the answer is vague, the risk is that your client content becomes a future learning input.

Question 3: What security controls protect client data?

Security due diligence should include encryption, identity controls, logging, access segmentation, vendor employee access limits, breach notification, and data deletion practices. Ask for SOC 2, ISO 27001, penetration testing summaries, and subprocessor lists. Also ask how the product handles file uploads, chat history, document indexing, and exports. Security controls should be documented in contract language, not merely described in a sales deck. For practical handling of sensitive files on mobile devices, our mobile security checklist for signing and storing contracts offers a helpful framework.

Question 4: Can you explain how outputs are grounded and cited?

Explainability is critical in legal AI because attorneys need to know where a conclusion came from. Ask whether the system returns source citations, quote-level attribution, document lineage, and confidence indicators. If the vendor uses retrieval augmented generation, request a demonstration using a difficult tax or crypto issue, not a simple one. You want to see whether the system can distinguish authority from commentary, federal from state law, and current from superseded guidance. If it cannot show its work, it should not be relied on for legal analysis or client-facing memos.

Question 5: What are your service levels and support commitments?

Support matters because AI issues often become urgent at the worst possible time: during an audit response, injunction review, closing deadline, or filing rush. Ask about uptime, response times, escalation paths, named technical contacts, and incident updates. Service levels should reflect business criticality, not generic software standards. For practices that use AI in deadline-sensitive environments, our guide to cutting costs before the deadline is a reminder that timing is often the hidden constraint, even when the issue is not software.

Question 6: How do you handle confidentiality, retention, and deletion?

Many vendors promise “no training on your data,” but that statement is incomplete unless it covers logs, backups, analytics, human review, and third-party subprocessors. Ask how long prompts and documents are retained, whether deletion is immediate or delayed, and what legal hold procedures exist. Also ask whether there is a client-level partitioning model for matter data and whether exports can be fully purged. For tax and crypto matters, retention policy should align with your own document management obligations and engagement terms. If the vendor’s deletion process is weak, the platform may create long-tail exposure that outlives the matter itself.

Question 7: Does the system support jurisdictional and topic filtering?

A strong legal AI product should let you constrain results by jurisdiction, date, practice area, or source type. This matters immensely in tax and crypto work, where the wrong state rule or outdated federal interpretation can mislead a matter team. Ask whether you can limit answers to IRS materials, tax court opinions, state administrative guidance, or recognized crypto regulatory sources. Better systems allow you to build matter-specific workspaces with approved sources only. If the vendor cannot demonstrate source controls, you are relying on generic language models to do what your research workflow should control explicitly.

Question 8: How do you manage model updates and change control?

AI vendors frequently update models, prompts, retrieval layers, or safety filters without enough customer visibility. That creates version drift, meaning the tool you approved last quarter may behave differently after the next release. Ask whether the vendor provides release notes, testing windows, rollback options, and advance notice for material changes. Practices should insist on a formal change-management process similar to what they expect in enterprise systems. Without that discipline, it becomes hard to defend why a prior output changed, especially when the issue is later scrutinized by a client, auditor, or malpractice carrier.

Question 9: What third parties can access the data?

Subprocessors, cloud providers, labeling vendors, and analytics tools all expand the risk surface. Your review should identify every entity that can touch data, where those entities operate, and what contractual restrictions govern their use. Ask whether customer data is used for debugging, whether support staff can view matter content, and whether customer data leaves approved regions. This is especially important for international tax and digital asset clients with cross-border implications. If you need a practical reference point for third-party governance, our article on building a competitive intelligence pipeline for identity verification vendors shows how to track external dependencies systematically.

Question 10: How do you test accuracy, bias, and hallucination rates?

Demand to see benchmark results, internal quality assurance methods, and failure-mode testing. The vendor should be able to explain how often the system produces unsupported claims, what categories of questions trigger weaker performance, and how it prevents confident but wrong outputs. Ask whether tests were run against tax, crypto, and legal scenarios that resemble your own practice. If the vendor only shows generic productivity metrics, you still do not know whether the tool is trustworthy for substantive work. This is a classic model risk question, and it belongs in every contract checklist.

Question 11: What compliance obligations do you support?

For tax and crypto work, compliance may include confidentiality, data privacy, records management, AML-adjacent workflows, sanctions screening support, and jurisdiction-specific regulatory review. Ask whether the vendor can support GDPR, U.S. privacy laws, data residency commitments, and regulated data handling. For crypto matters, ask how the product handles chain analytics context, exchange records, wallet attribution, and enforcement references without over-claiming certainty. You are not asking the vendor to practice law, but you are asking whether the software respects the compliance reality of your practice. A compliance-aware vendor will answer carefully and document the limits of its product.

Question 12: What post-deployment support and training do you provide?

The best vendors do not disappear after the signature. Ask what onboarding looks like, who trains your team, how prompt libraries are maintained, and whether there are office hours or usage reviews. Post-deployment support should include policy templates, admin dashboards, matter-level permissions, and periodic adoption reviews. If your firm is building a repeatable rollout, think of it as operational change management rather than a software install. That mindset is similar to how teams approach scaling analytics in other regulated workflows, such as secure BI architectures or other data-heavy decision systems.

3. What good answers look like in practice

Strong answers are specific, not promotional

The difference between a useful answer and a marketing answer is easy to spot. A strong answer says where the data came from, how it was validated, how it is updated, and who can access it. A weak answer says the vendor has “enterprise-grade security” or “industry-leading intelligence” without naming controls. For legal AI, specificity is the test. If the vendor cannot explain what happens to a matter file after upload, that should be a red flag, not a minor omission.

Ask for demonstrations on your hardest use cases

Do not judge the platform by a simple demo using a generic lease review or basic memo prompt. Instead, test it with a tax controversy issue, a cross-border crypto reporting question, or a document set with mixed authorities and competing timelines. A good system should show source attribution, identify ambiguity, and avoid hallucinating certainty. This is similar to how procurement teams in other categories separate polished messaging from actual performance. The mindset is captured well in our article on evaluating a platform before you commit.

Document answers in writing

Verbal assurances do not survive staff turnover, vendor acquisition, or dispute. Keep a written diligence file that includes security reports, privacy terms, subprocessor lists, sample outputs, and meeting notes. Require the vendor to confirm any material promise in the contract or order form. For firms handling urgent IRS or exchange-related matters, documented answers can be the difference between a managed risk and an unexplained exposure. If a tool is ever questioned, your diligence record becomes evidence of a reasonable procurement process.

4. Contract checklist: clauses that matter before signature

Data use and training restrictions

First, make sure the contract says customer data is not used to train or fine-tune models unless you opt in in writing. Clarify whether prompts, files, embeddings, transcripts, logs, and feedback are covered. If the vendor offers an enterprise plan, the contract should define what that means in practice. Also require data deletion timelines and a clear right to verify deletion upon termination. A vague privacy policy is not enough when client confidentiality is on the line.

Security, audit, and incident terms

Contract language should require ongoing security controls, breach notice windows, and cooperation with forensic investigation if an incident occurs. Ask for audit rights or at least a current security package on request. In regulated practices, you may also need commitments on subprocessor notice, data location, and employee background screening. These clauses are not just legal formalities; they are the operational backbone of vendor risk management. For a parallel example of how buyers should distinguish quality from low-cost risk, see our article on cheap versus quality cables.

Performance, uptime, and remedies

Service levels should be explicit and tied to practical remedies. If the platform is down during a filing deadline or a client deadline, you need to know what compensation, credits, escalation, or support path applies. Include response times for priority incidents and a named escalation contact for enterprise customers. The best contracts also define maintenance windows and material changes that trigger notice. In short, the contract should reflect how the tool actually affects the delivery of legal services.

Due Diligence AreaWhat to AskWhat Good Looks LikeRed FlagsWhy It Matters for Tax & Crypto
Data provenanceWhere do training and retrieval sources come from?Named sources, date ranges, jurisdiction list, update cadence“Proprietary” with no detailOutdated or incomplete law leads to bad advice
SecurityWhat protects prompts, files, and matter data?Encryption, access controls, logging, SOC 2, deletion policyNo subprocessor list or weak retention termsClient confidentiality and privilege exposure
ExplainabilityCan outputs be traced to authority?Citations, quotes, source links, confidence notesAnswers without supportSupports defensible legal analysis
Model riskHow are hallucinations and errors tested?Benchmarking, QA samples, use-case testingNo published testing methodPrevents unreliable filing or memo content
Service levelsWhat happens if the platform fails?Uptime commitments, escalation, response timesGeneric support email onlyDeadlines in tax and crypto are unforgiving

5. How tax and crypto teams should run the review process

Start with a risk-tiered use case list

Not every AI use case has the same exposure. Drafting internal research notes is different from generating client-facing advice, and summarizing documents is different from creating citation-backed analysis. Start by listing your use cases and assigning risk tiers based on confidentiality, legal significance, and deadline sensitivity. Then match vendor controls to each tier. If the platform cannot support your highest-risk use cases, it may still be useful for lower-risk administrative work.

Vendor review fails when it sits entirely with one department. Legal can assess privilege and workflow fit, IT can validate security and integration, and operations can measure support quality and user friction. The most effective reviews combine all three, because legal AI is not merely software; it is a business process with legal consequences. That approach is consistent with how enterprises evaluate data-heavy systems like secure BI architectures and other mission-critical tools. Shared review also helps avoid the common trap where a flashy demo wins before the control requirements are even defined.

Pilot before full deployment

A short pilot with defined success criteria is the best way to verify vendor claims. Use real matters, but limit access to a controlled group and a limited data set. Measure output quality, user behavior, citation quality, and support responsiveness. Then compare the results to your internal standards and document the findings. A controlled pilot is far better than a blind rollout because it gives you evidence, not assumptions. If you need a good frame for evaluating an emerging platform before broad commitment, see our article on how to evaluate a platform before you commit.

Use the contract to turn promises into obligations

Many vendors are happy to make strong statements in sales calls, but the legal value lies in what the contract says. If a promise matters to confidentiality, accuracy, or uptime, get it in writing. Ask for definitions, not slogans. If a vendor says it provides “enterprise security,” ask which standards, certifications, and controls that phrase actually includes. That discipline reduces later disputes and gives your firm a clearer basis for reliance.

Require transparency on product changes

One of the most overlooked risks in AI procurement is silent drift. A model update or retrieval change can alter outputs in ways that are difficult to detect. Ask for change logs, advance notice for material updates, and a mechanism to review major version changes before they hit production. This is especially important for tax and crypto practices because technical accuracy and source currency matter every day. A vendor that respects your review process is a vendor that understands legal risk.

Pro Tip: If the vendor cannot answer your due diligence questions in writing within 48 hours, assume the same response speed will apply when there is an incident, outage, or urgent support issue.

Keep ownership of your workflows

The best AI implementations are ones you can exit cleanly. Preserve prompt libraries, export configurations, approved sources, and internal review notes so the firm owns the workflow even if the vendor changes strategy or pricing. This is a practical hedge against vendor lock-in, and it is especially important in fast-growing markets where consolidation is common. Legal AI is evolving quickly, as the rapid scale achieved by firms serving thousands of customers shows, but your firm’s operating model should not depend on a single black box. To understand the pace of adoption, note how fast the market has moved from pilot to scale in reports on legal AI revenue growth.

Do we need due diligence if the tool only drafts internal notes?

Yes. Internal notes can still contain privileged information, taxpayer data, transaction histories, and client strategy. Once data is entered into a vendor system, the risks of retention, leakage, and misuse still exist. The level of scrutiny can be lower than for client-facing use, but it should never be zero.

What is the single most important question to ask?

There is no single question, but if forced to choose, ask about data provenance and data use. If the vendor cannot clearly explain where its answers come from and how your information is protected, every other feature becomes less trustworthy. Provenance and retention are the foundation of model risk management.

How much explainability is enough?

Enough explainability means your team can trace a conclusion to a source, understand the reasoning path, and verify whether the source is current and authoritative. For tax and crypto work, that usually means source links, document citations, and the ability to confirm jurisdiction and date. If the system only provides a polished paragraph without traceability, that is not enough for professional use.

Should we allow staff to use public AI tools for client work?

Generally, not without a written policy and approved controls. Public tools may retain prompts, use inputs for training, or lack the security and audit features your practice needs. If your firm permits them at all, it should only be for clearly defined low-risk tasks and only after legal and IT approval.

How often should we re-review a vendor?

At minimum, review annually and after any major product, security, or ownership change. In fast-moving AI markets, a once-approved vendor can become a different risk profile very quickly. Re-review is especially important when the vendor changes model providers, subprocessor relationships, or privacy terms.

What should be in the rollout policy?

Your rollout policy should define approved use cases, prohibited content, review requirements for client-facing outputs, escalation procedures, and retention expectations. It should also explain who can access the tool, what data can be uploaded, and how outputs are validated before reliance. Policies only work when they are specific enough for staff to follow under deadline pressure.

Conclusion: Buy the vendor risk controls, not just the software

Legal AI can create real efficiency in tax and crypto practices, but only if the vendor is treated like a critical professional service provider rather than a productivity app. The right procurement process asks hard questions about provenance, training, security, explainability, support, and contractual safeguards before a single matter file is uploaded. That diligence protects clients, reduces malpractice exposure, and improves the odds that the platform will actually make the practice more effective. In a market moving this quickly, the advantage goes to firms that evaluate with rigor and implement with discipline.

If you are reviewing a vendor now, use this checklist to structure your next demo, security review, and contract negotiation. Pair it with practical internal controls, risk tiering, and a controlled pilot, and you will be far better positioned to adopt AI without inheriting avoidable exposure. For deeper operational guidance on secure implementation, revisit our resources on secure contract handling, evidence-based vendor review, and secure AI deployment architecture.

Related Topics

#vendor selection#AI governance#cybersecurity
J

Jordan Ellis

Senior Legal Tech Editor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

2026-05-13T17:23:04.715Z