AI and Data Privacy: Does the Law Protect Your Personal Information?

AI models are being trained on individuals’ personal data. The EU has responded with sweeping legislation to protect residents. Will the U.S follow its lead and pass its own AI data privacy law?

By , Attorney University of North Carolina School of Law
Updated 10/21/2025

Data privacy is one of the biggest concerns with emerging artificial intelligence technology. The most prominent AI models—large language learning models (LLMs) like OpenAI's ChatGPT, Google's Gemini, and Meta's Llama—are trained on vast quantities of data. The more data these AI models consume, the better they become at simulating human thought and conversation.

People naturally wonder how much data these AI models have access to. How much should they have access to? And what are the risks if they have our personal information?

Lawmakers in and outside the U.S. have started to limit how data—and, importantly, personal information—that's used to train AI is collected, stored, processed, and delivered.

Personal Information in AI Training Data

AI models are trained on large swaths of text coming mostly from websites, books, and newspapers. But where does that data come from? It's a straightforward question that usually gets a less-than-forthcoming answer. Generally, the owners of popular LLMs vaguely declare that their data is from public sources. For example:

  • OpenAI's ChatGPT: On its website, OpenAI says that its foundational models, including ones that power ChatGPT, are trained on "(1) information that is publicly available data on the internet, (2) information that [OpenAI] partner with third parties to access, and (3) information that [OpenAI] users, human trainers, and researchers provide or generate."
  • Google's Gemini: Google's privacy policy says that Gemini is trained on "publicly available, crawlable data from the internet."
  • Meta's Llama: Meta, on its Llama website, says that Llama 2 models are "pretrained on publicly available online data sources."

Data sourced from public sources usually contains personal information like names, email addresses, and birthdates. This information can be taken from databases, articles, blogs, forums, and social media. And people whose personal data is being fed to AI models often don't know that what they've shared online is being used in these training sets. Again, AI developers haven't given details about what kinds of personal data have been collected and from whom.

Should the fact that data is publicly available mean that anyone is allowed to use it for any reason? Many say no, worrying that training data can and inevitably will be revealed to anyone who asks the AI the right questions.

AI Data Privacy Laws: Protecting Personal Information

Data privacy is an area of law that relates to how companies access our personal information. Lawmakers try to protect people's personal information through consumer protection and privacy laws. The idea is to stop businesses from using consumer data in ways that would be unfair, deceptive, or harmful.

In data privacy laws, "personal information" or "personal data" usually refers to information that can directly or indirectly identify someone. Typically, personal information or data can include:

  • names
  • email addresses
  • phone numbers
  • dates of birth
  • gender
  • religion
  • national origin
  • Social Security numbers
  • passport numbers
  • driver's license numbers
  • biometric data
  • web cookies and IP addresses
  • financial information, and
  • employment information.

How Federal Law Protects Personal Information: FTC and Congress

As of October 2025, Congress has yet to pass a comprehensive data privacy law. Instead, U.S. residents must rely primarily on existing consumer protection laws to protect their personal information. Additionally, people in select states can rely on new and emerging state laws specifically tailored to AI data use (as discussed later).

The FTC's Role in AI and Data Privacy

In the U.S., the Federal Trade Commission (FTC) is tasked with protecting consumers' privacy and security. Under Section 5 of the FTC Act, the FTC is responsible for preventing people and businesses from using "unfair or deceptive acts or practices" while conducting business in the U.S. (15 U.S.C. § 45 (2025).)

The FTC can specify the kinds of business practices that are considered unfair or deceptive, such as the unreasonable collection and processing of personal information. The federal agency can also launch investigations and charge businesses with violating consumer protection laws. Businesses that violate the law can be forced to pay civil penalties and restitution to consumers.

FTC Investigation Into OpenAI's Use of Personal Data

In July 2023, the FTC opened an investigation into OpenAI. In its letter to the company, the FTC said it was investigating whether, through the use of its LLMs (like ChatGPT), OpenAI has violated Section 5 of the FTC Act.

The federal agency demanded that OpenAI provide information about its LLMs to see whether the tech company has engaged in unfair or deceptive practices with regard to:

  • privacy or data security, or
  • risks of harm to consumers, including in terms of their reputations.

The FTC is particularly interested in the use of personal data to train ChatGPT and ChatGPT's ability to generate statements containing people's personal information. The agency asked OpenAI whether the company had taken any steps to address risks related to ChatGPT generating statements with actual personal information.

The law enforcement agency seems to be looking for information about how OpenAI collects, processes, and generates personal data. The inquiry should reveal what efforts the company has put toward protecting personal data and whether these efforts protect consumers enough.

In October 2024, the Electronic Privacy Information Center (EPIC) filed its own complaint with the FTC about OpenAI. EPIC is a public interest research center that advocates for the protection of consumers' privacy rights. In its complaint, EPIC urges the FTC to investigate OpenAI for developing and using its AI systems in a way that's below public policy standards. The complaint specifically takes issue with OpenAI's "unprecedented webscraping," which gave the company access to millions of consumer data points that included personal information.

As of October 2025, FTC's investigation into OpenAI for violations of Section 5 of the FTC Act remains open.

FTC Issues Guidelines for Using Consumer Data With AI

In a February 2024 blog post, the FTC cautioned companies against altering their privacy and data security policies to get further access to existing consumer data. The post noted that companies developing AI products depend on a large database to train their AI. As a result, these companies might be tempted to tap into the data that consumers have already provided them.

Unfortunately for these companies, they usually have privacy policies that limit how they can use their consumers' data. Typically, the privacy policy doesn't leave room for companies to use consumer data to train and develop AI products. The FTC warns that the companies can't covertly change their privacy policy to retroactively grant them access to data consumers have already provided them. When the consumers provided the data, they didn't consent to their data being used in ways that weren't outlined in the privacy policy—for example, to train AI systems.

If a company changes its privacy policy to use previously unusable consumer data, it could be engaging in unfair or deceptive practices. In that case, the company would have violated the FTC Act. In the past, the FTC has pursued other companies that have rewritten their data practices to stealthily use customers' data beyond what was originally agreed to.

Congress's Attempts to Pass a Federal AI Data Privacy Law

Although it has the FTC to enforce the FTC Act, the U.S. doesn't have a comprehensive data privacy law. But some lawmakers are trying to change that.

In June 2022, the House Energy and Commerce Committee introduced the American Data Privacy and Protection Act. However, Congress never acted on the bill. In a subsequent attempt, legislators proposed the American Privacy Rights Act in April 2024. But after some controversial revisions, many privacy and civil liberties organizations, including the American Civil Liberties Union (ACLU), withdrew their support. Consequently, the bill never progressed.

As of October 2025, the United States still lacks comprehensive federal data privacy legislation.

How States Protect Personal Information

Whereas U.S. laws apply throughout the country, state laws apply to businesses that operate within those states and to consumers who reside there. States have approached data privacy in varying ways. Some have no consumer data privacy laws. A handful have comprehensive privacy laws.

For example, California has the California Consumer Privacy Act of 2018 (CCPA), one of the most protective state measures for consumer privacy. It was amended by a ballot proposition (called the "California Privacy Rights Act") in 2020, with the amendments taking effect on January 1, 2023. The CCPA includes the rights to:

  • know the personal information a business collects about you and how that information is used, shared, and sold
  • opt out of the sale or sharing of your personal information
  • limit the use and disclosure of sensitive personal information collected about you
  • delete your personal information collected by a business, and
  • correct inaccurate personal information that a business has about you.

(Cal. Civ. Code §§ 1798.140 and following (2025).)

Moreover, the CCPA includes as "personal information" AI systems that are capable of outputting personal information.

As of October 2025, 19 states have passed comprehensive data privacy laws.

Despite California's stricter regulations and the FTC's investigation into ChatGPT, the U.S., in general, is considered behind other nations when it comes to consumer protection and data privacy laws.

Data Privacy Laws Outside the U.S.: The EU AI and GDPR

The European Union (EU) has some of the most comprehensive and protective data privacy laws in the world. The EU AI Act and the General Data Protection Regulation (GDPR) are the two key laws that govern how AI companies treat user data.

Personal Information Under the EU AI Act

The European Council adopted the EU AI Act in May 2024 and will be implemented in stages through August 2, 2027. The EU AI Act is the first comprehensive AI law in the world. The Act has wide applicability beyond data privacy and personal data protection. But for this article, we'll focus on its effects on data protection.

For some background, the EU AI Act separates AI systems into four categories based on risk level:

  • unacceptable risk
  • high risk
  • limited risk, and
  • minimal or no risk.

The EU AI Act also applies special requirements for general purpose AI (GPAI). GPAI systems can perform a range of tasks and are based on GPAI models. GPAI systems would include ChatGPT, Gemini, and other LLMs that are used for general purposes by the public. Most GPAI providers must create training and testing processes and publish a summary of the content the GPAI model was trained on.

To facilitate and encourage the development of AI systems, the Act requires EU Member States to establish at least one AI regulatory sandbox by August 2, 2026. An "AI regulatory sandbox" is a controlled framework where AI providers can develop, train, test, and validate AI systems under regulatory supervision before the system is released.

AI providers can use personal data that was lawfully collected for other purposes to develop, test, or train their AI systems in an AI regulatory sandbox as long as all of the following conditions are met:

  • The AI system is developed for safeguarding a specified public interest, including public health, energy sustainability, transport systems, and public services.
  • The data must be processed to comply with high-risk AI system requirements identified in the Act, and it must be the case that these requirements can't be satisfied by using anonymized, synthetic, or nonpersonal data.
  • The AI system has monitoring and response mechanisms to identify and mitigate any high risks to the rights and freedoms of the owners of the personal data during the sandbox experimentation.
  • The personal data is processed in an isolated, protected environment under the control of the provider, and only authorized persons can access the data.
  • AI providers can only share personal data in line with EU data protection law (i.e., the GDPR).
  • The processing of the personal data doesn't lead to measures or decisions that affect the owners of the personal data or the owners' rights.
  • The processed personal data is protected by technical and organizational measures and deleted at the end of the retention or participation period.
  • The processing logs are kept during the time the personal data is processed unless required otherwise by EU or national law.
  • A complete and detailed description of the process and rationale behind the training, testing, and validation of the AI system is kept along with the testing results.
  • A summary of the AI project, including its objectives and expected results, is published on the competent authorities' website.

(Art. 59 of the EU AI Act (2025).)

The Act's risk-based approach to AI systems and stringent conditions for personal data use, together with the pre-existing protections of the GDPR (discussed below), create an accountable and navigable path for AI providers while addressing privacy concerns.

Personal Information Under the GDPR

Perhaps the most widely known data protection law is the GDPR. The GDPR is a relatively strict EU law that protects personal data and privacy. (It went into effect in May 2018.) While the law applies only to EU member states, many countries have used it as a model and put similar regulations into place.

The GDPR applies to most businesses that process personal data. Under the GDPR, companies can collect and process personal data only under limited circumstances and have to follow strict protocols for collecting, storing, and processing that data.

The GDPR allows companies to process—for example, collect, record, store, organize, or use—personal data only if one of the following is true:

  • the person or entity (known as the "data subject") gives their consent
  • the company is executing a contract that the data subject is a party to (for example, a background check agreement)
  • the company has to process the data to comply with legal obligations (like a court order)
  • the company needs to process the data to protect someone's life
  • the company needs to process the data to perform a task that's in the public interest, or
  • the company has a legitimate interest to process the data.

Most of these situations, except for the last one, are relatively obvious to identify. However, proving you have a legitimate interest in processing personal data is tricky. To determine whether you have a legitimate interest, you must:

  • show the interest is a legitimate one
  • prove that you need to process the personal information to achieve that interest (in other words, there's no better way to achieve the interest), and
  • weigh that legitimate interest (like fraud prevention or IT security) against the data subject's interests, rights, and freedoms.

(Art. 6 of the GDPR (2025).)

Whatever the justification for processing personal data, the GDPR requires companies to make sure the data is accurate, up to date, and secure. Companies also need to be transparent with the data subject about the processing of their personal data. For example, the company must let the person know generally why their personal data was collected and for how long their personal information will be stored.

Some EU member countries have already taken action against AI companies to enforce consumer rights under the GDPR. For example, Italy temporarily banned ChatGPT in March 2023 due to concerns about the chatbot's potential GDPR violations. Italy took issue with how OpenAI was collecting its training data from Italian consumers and how inappropriate data could reach underage users. OpenAI made regional changes to ChatGPT, such as verifying users' ages when they sign up and providing a way for people to remove their personal information from ChatGPT's training data. In response to OpenAI's improvements, Italy lifted its ban.

In addition, Ireland stalled Google's EU launch of Bard (the precursor to Gemini). In an attempt to comply with GDPR rules and to provide more transparency, Google made various changes to Bard pre-launch, including requiring users to create a Google account to use Bard and authenticate that they're 18 years of age or older.

The United Kingdom, on the other hand, is taking a more relaxed approach to AI regulation. Even though the UK is no longer an EU member state, it incorporates the GDPR into its Data Protection Act. The UK has said that it doesn't plan to create new data privacy laws geared toward AI but will give voluntary guidance on existing laws. For example, the UK Information Commissioner's Office has provided companies with best practices and principles to consider when adopting AI within their industry.

What's Next for AI Data Privacy?

In the U.S., it appears that states are taking the lead to protect consumer data. Bills addressing AI and privacy concerns have stalled in Congress. Federally, the U.S. is moving at a relatively slow pace to regulate AI. Additionally, the FTC's investigation into OpenAI has been slow to produce any findings or changes. Though, the advancement of AI regulations outside the U.S. and the introduction of the AI bills in Congress could be an encouraging sign in the years ahead.

If lawmakers decide to get serious about the issue, they could create data protection laws that provide ways for Americans to better control their personal information. If the U.S. and other countries follow the EU's lead, companies will have to reconsider how they use personal information to train AI.

Talk to a Lawyer

Need a lawyer? Start here.

How it Works

  1. Briefly tell us about your case
  2. Provide your contact information
  3. Choose attorneys to contact you
Get Professional Help
Talk to a Consumer Protection attorney.
How It Works
  1. Briefly tell us about your case
  2. Provide your contact information
  3. Choose attorneys to contact you