When you send a document to counsel, you know exactly where it goes. When you share a file with a client, you know who can see it. When someone in your firm types a legal argument into ChatGPT, do you know where that goes, who can access it, and what happens to it next?
This article answers those questions.
Three things happen to your data when you use a publicly available AI tool
The following applies to consumer and enterprise tiers of publicly available tools such as ChatGPT, Claude, Gemini, Grok, and similar products. Other options, including purpose-built commercial tools, self-hosted deployments, and on-premises models, handle data differently. We cover those later in this article.
1. The data is uploaded to the provider's servers
The moment you type a prompt, that data leaves your firm's environment and is transmitted to the provider's servers, typically in the United States. It is no longer within your control or your firm's security perimeter. You have no visibility over who operates those servers, what other systems they connect to, or what security incidents might affect them.
2. The data is retained for a period after your session ends
Your conversation does not disappear when you close the window. Providers keep it for a defined retention period. During that time, the data is subject to automated safety scanning and may be reviewed by human labellers as part of the training process. If a conversation is flagged for safety or abuse concerns, it can be escalated to a human review team. There is no contractual restriction on how the provider uses that data internally, no audit trail you can request, and no obligation to notify you if your data is reviewed. The length of the retention period and what can happen to the data during it depends entirely on which product and tier you are using.
3. The data may be used to train the model
Training means your prompt is used as an example to improve the model's future responses. In practice, this means it may be reviewed by human trainers, it could influence what the model outputs to other users, and it cannot be retrieved or deleted once the training cycle has run. Whether this happens, and whether you can prevent it, depends on the product and tier.
What changes dramatically between a free consumer account and a properly contracted enterprise deployment is how each of those three steps is governed, what contractual protections are in place, and where the data actually sits.
The free tool problem
On a free consumer account, all three of those steps happen with almost no protections around them.
On ChatGPT Free, your data is transmitted to US servers, held for at least 30 days even if you disable training, and subject to automated scanning and potential human review with no contractual restrictions on internal access and no audit trail available to you. OpenAI suffered a platform-level breach in March 2023 where a caching bug exposed users' conversation history and partial payment details to other users.
On Claude's free tier, the position became more significant in late 2025 when Anthropic updated its terms. Users who consent to training now face a five-year retention period on their conversations. Those who decline still face 30 days. Training opt-outs are a settings toggle, not a contractual guarantee.
Neither free account comes with a Data Processing Agreement, which is the written contract UK GDPR requires you to have with any third party handling personal data on your behalf. Using them with client data puts the firm in breach before the first prompt is typed.
Using a free AI tool with confidential client information is not a grey area. It is a data protection breach. The firm has no contract with the provider, no control over what happens to the data, and no legal basis for the transfer of personal data to a third party in a foreign jurisdiction. The SRA Code of Conduct requires you to keep client affairs confidential. UK GDPR requires a written contract with any third party processing personal data on your behalf. Free tools offer neither.
One more common trap: Claude Pro at £20 per month is a consumer account. Paying for a subscription buys more usage, not better data protection.
What enterprise tiers actually offer
The enterprise versions of the same tools operate under fundamentally different contracts and technical arrangements. ChatGPT Enterprise, Claude for Work, and Microsoft Copilot for Microsoft 365 all share three core protections: they contractually prohibit training on your data, they provide a UK GDPR-compliant Data Processing Agreement, and they give firm administrators control over access, audit logs, and data retention.
OpenAI Enterprise offers data residency options including the UK, so data does not have to leave the country. Microsoft Copilot processes prompts within the Microsoft 365 boundary, handled by Azure rather than OpenAI's public service, with Microsoft having opted out of human review for abuse monitoring. Anthropic's enterprise tier can be configured with zero-data retention, meaning prompts are scanned for safety and then immediately discarded.
These protections are genuine and meaningful. But there are two caveats worth knowing. First, OpenAI's Team tier, which sits below Enterprise, does not offer data residency controls or zero-retention options. Microsoft Copilot routes web searches through Bing under different and less protective terms. You need to know exactly which tier and which features your firm has contracted for. Second, enterprise agreements are not available to everyone. Anthropic requires a minimum of 50 users to access Claude for Work at enterprise level. That puts it out of reach for the majority of law firms. If you are a smaller firm, your realistic options are narrower than the market conversation often implies.
Legal-specific tools go further, but ask the right questions
Platforms built specifically for law firms, such as Harvey, Lexis+ with Protégé, and Thomson Reuters CoCounsel, offer meaningfully stronger protections than using a general-purpose AI tool directly. But it is worth understanding how they actually work, because the picture is more layered than the marketing suggests.
None of these platforms have built their own AI models. They run on top of the same foundation models from OpenAI, Anthropic, and Google that power everything else. Harvey, for example, routes legal tasks across GPT, Claude, and Gemini depending on the task, accessed via enterprise API arrangements through AWS Bedrock and Google Vertex. Lexis+ deploys third-party models in private cloud environments. CoCounsel is built on OpenAI models under a dedicated enterprise agreement with Thomson Reuters.
The difference is in how those models are accessed and what agreements govern the data. Enterprise API arrangements are contractually different from consumer accounts. The model providers do not use API customer data to train their public models by default. Harvey states that customer data is not used to train the underlying models and is not shared externally without explicit permission.
Where it gets more complicated is retention. Harvey advertises configurable retention at their platform level, as short as three hours. But data passing through to the underlying model provider, whether that is OpenAI, AWS Bedrock, or Google Vertex, is subject to those providers' own retention terms. OpenAI's default for API usage is 30 days, even for enterprise customers. Zero data retention is available from OpenAI but requires specific prior approval and is not automatic. What retention terms Harvey, Lexis+, and CoCounsel have negotiated with their respective model providers at the API layer is not publicly disclosed. It is a question every firm should ask directly before signing any agreement.
All three hold SOC 2 Type II and ISO 27001 certifications, which is a meaningful baseline. But certifications do not answer the subprocessor question. Ask the provider to identify every subprocessor that touches your data, confirm where each one processes and stores it, and confirm what retention terms apply at each layer of the chain.
What the law requires
UK GDPR Article 28 requires a written Data Processing Agreement with any third party that processes personal data on your behalf. That agreement must confirm the processor acts only on your instructions, keeps data confidential, implements appropriate security, notifies you of breaches, and deletes or returns all data when the relationship ends. Free-tier AI tools offer none of this. Using them with client personal data puts the firm in breach before the first prompt is typed.
Where client data includes health records, criminal history, financial details, or other special category data, Article 9 applies stricter conditions. The ICO also states that AI systems processing personal data almost always require a Data Protection Impact Assessment.
The SRA's Code of Conduct paragraph 6.3 requires you to keep client affairs confidential. The SRA has not yet issued specific AI guidance comparable to that published by the American Bar Association or Australian legal regulators, but its Risk Outlook report makes clear that all normal confidentiality and data protection rules apply to AI. The burden of demonstrating compliance sits with the firm.
What can you actually do
Understanding the risk is one thing. Knowing what you can do about it is more useful. The options sit on a spectrum, roughly in order of how much control you have over your data.
A team or paid subscription plan from a major provider is a step up from a free account in terms of features, but it does not solve the data governance problem on its own. You still need to check whether a DPA is in place and what the training and retention terms actually say for that tier.
A properly contracted enterprise agreement with a major provider gives you the contractual protections, the DPA, and usually the ability to configure retention and residency. It is a reasonable solution for larger firms, but as noted above, minimum user thresholds mean it is not available to everyone.
Purpose-built legal AI tools like Harvey, Lexis+ with Protégé, and CoCounsel are designed from the ground up for professional use. They tend to offer tighter data handling, shorter retention windows, and certifications specifically relevant to regulated industries. For firms doing high volumes of document-intensive work, they are worth evaluating seriously.
Custom-built systems are a further step. Some firms are building their own AI tools by connecting to a model through an API and deploying it within their own infrastructure. This gives considerably more control over where data sits, how it is processed, how long it is retained, and what it is used for. The data governance questions do not disappear, but they become questions your firm answers rather than questions you accept the provider's answer to. A well-designed bespoke system can be tailored precisely to how your firm works and what your clients require.
At the far end of the spectrum are on-premises models, where the model itself runs on servers your firm owns and operates. Data never leaves your environment. This eliminates the transfer and retention questions entirely and is the strongest possible position from a data protection standpoint. The trade-offs are real: on-premises deployments are more complex and expensive to set up and maintain, and the models currently available for on-premises use do not match the capabilities of the leading commercial models. For most firms this will not be the right answer today. But for practices handling the most sensitive matter types, or for firms that want to use AI seriously and on their own terms, it is an option worth understanding.
What to check before any AI tool touches client data
Before authorising use of any AI tool on client matters, a firm should be able to answer yes to the following:
Does the provider offer a UK GDPR Article 28-compliant Data Processing Agreement? A settings toggle is not a contract.
Is there a contractual guarantee that client data will not be used to train the model? An opt-out in a consumer settings panel does not count.
Where is data processed and stored? Does the provider offer UK or EEA data residency, or are they certified under the UK-US Data Bridge?
Does the tool support the access controls your firm needs: single sign-on, role-based permissions, audit logging, and the ability to enforce information barriers?
Does the provider hold SOC 2 Type II and ISO 27001 certifications that cover the specific product tier your firm is using?