Data privacy is one of the biggest concerns with emerging artificial intelligence technology. The most prominent AI models—large language learning models (LLMs) like OpenAI's ChatGPT, Google's Gemini, and Meta's Llama—are trained on vast quantities of data. The more data these AI models consume, the better they become at simulating human thought and conversation.
People naturally wonder how much data these AI models have access to. How much should they have access to? And what are the risks if they have our personal information?
Lawmakers in and outside the U.S. have started to limit how data—and, importantly, personal information—that's used to train AI is collected, stored, processed, and delivered.
AI models are trained on large swaths of text coming mostly from websites, books, and newspapers. But where does that data come from? It's a straightforward question that usually gets a less-than-forthcoming answer. Generally, the owners of popular LLMs vaguely declare that their data is from public sources. For example:
Data sourced from public sources usually contains personal information like names, email addresses, and birthdates. This information can be taken from databases, articles, blogs, forums, and social media. And people whose personal data is being fed to AI models often don't know that what they've shared online is being used in these training sets. Again, AI developers haven't given details about what kinds of personal data have been collected and from whom.
Should the fact that data is publicly available mean that anyone is allowed to use it for any reason? Many say no, worrying that training data can and inevitably will be revealed to anyone who asks the AI the right questions.
Data privacy is an area of law that relates to how companies access our personal information. Lawmakers try to protect people's personal information through consumer protection and privacy laws. The idea is to stop businesses from using consumer data in ways that would be unfair, deceptive, or harmful.
In data privacy laws, "personal information" or "personal data" usually refers to information that can directly or indirectly identify someone. Typically, personal information or data can include:
As of October 2025, Congress has yet to pass a comprehensive data privacy law. Instead, U.S. residents must rely primarily on existing consumer protection laws to protect their personal information. Additionally, people in select states can rely on new and emerging state laws specifically tailored to AI data use (as discussed later).
In the U.S., the Federal Trade Commission (FTC) is tasked with protecting consumers' privacy and security. Under Section 5 of the FTC Act, the FTC is responsible for preventing people and businesses from using "unfair or deceptive acts or practices" while conducting business in the U.S. (15 U.S.C. § 45 (2025).)
The FTC can specify the kinds of business practices that are considered unfair or deceptive, such as the unreasonable collection and processing of personal information. The federal agency can also launch investigations and charge businesses with violating consumer protection laws. Businesses that violate the law can be forced to pay civil penalties and restitution to consumers.
In July 2023, the FTC opened an investigation into OpenAI. In its letter to the company, the FTC said it was investigating whether, through the use of its LLMs (like ChatGPT), OpenAI has violated Section 5 of the FTC Act.
The federal agency demanded that OpenAI provide information about its LLMs to see whether the tech company has engaged in unfair or deceptive practices with regard to:
The FTC is particularly interested in the use of personal data to train ChatGPT and ChatGPT's ability to generate statements containing people's personal information. The agency asked OpenAI whether the company had taken any steps to address risks related to ChatGPT generating statements with actual personal information.
The law enforcement agency seems to be looking for information about how OpenAI collects, processes, and generates personal data. The inquiry should reveal what efforts the company has put toward protecting personal data and whether these efforts protect consumers enough.
In October 2024, the Electronic Privacy Information Center (EPIC) filed its own complaint with the FTC about OpenAI. EPIC is a public interest research center that advocates for the protection of consumers' privacy rights. In its complaint, EPIC urges the FTC to investigate OpenAI for developing and using its AI systems in a way that's below public policy standards. The complaint specifically takes issue with OpenAI's "unprecedented webscraping," which gave the company access to millions of consumer data points that included personal information.
As of October 2025, FTC's investigation into OpenAI for violations of Section 5 of the FTC Act remains open.
In a February 2024 blog post, the FTC cautioned companies against altering their privacy and data security policies to get further access to existing consumer data. The post noted that companies developing AI products depend on a large database to train their AI. As a result, these companies might be tempted to tap into the data that consumers have already provided them.
Unfortunately for these companies, they usually have privacy policies that limit how they can use their consumers' data. Typically, the privacy policy doesn't leave room for companies to use consumer data to train and develop AI products. The FTC warns that the companies can't covertly change their privacy policy to retroactively grant them access to data consumers have already provided them. When the consumers provided the data, they didn't consent to their data being used in ways that weren't outlined in the privacy policy—for example, to train AI systems.
If a company changes its privacy policy to use previously unusable consumer data, it could be engaging in unfair or deceptive practices. In that case, the company would have violated the FTC Act. In the past, the FTC has pursued other companies that have rewritten their data practices to stealthily use customers' data beyond what was originally agreed to.
Although it has the FTC to enforce the FTC Act, the U.S. doesn't have a comprehensive data privacy law. But some lawmakers are trying to change that.
In June 2022, the House Energy and Commerce Committee introduced the American Data Privacy and Protection Act. However, Congress never acted on the bill. In a subsequent attempt, legislators proposed the American Privacy Rights Act in April 2024. But after some controversial revisions, many privacy and civil liberties organizations, including the American Civil Liberties Union (ACLU), withdrew their support. Consequently, the bill never progressed.
As of October 2025, the United States still lacks comprehensive federal data privacy legislation.
Whereas U.S. laws apply throughout the country, state laws apply to businesses that operate within those states and to consumers who reside there. States have approached data privacy in varying ways. Some have no consumer data privacy laws. A handful have comprehensive privacy laws.
For example, California has the California Consumer Privacy Act of 2018 (CCPA), one of the most protective state measures for consumer privacy. It was amended by a ballot proposition (called the "California Privacy Rights Act") in 2020, with the amendments taking effect on January 1, 2023. The CCPA includes the rights to:
(Cal. Civ. Code §§ 1798.140 and following (2025).)
Moreover, the CCPA includes as "personal information" AI systems that are capable of outputting personal information.
As of October 2025, 19 states have passed comprehensive data privacy laws.
Despite California's stricter regulations and the FTC's investigation into ChatGPT, the U.S., in general, is considered behind other nations when it comes to consumer protection and data privacy laws.
The European Union (EU) has some of the most comprehensive and protective data privacy laws in the world. The EU AI Act and the General Data Protection Regulation (GDPR) are the two key laws that govern how AI companies treat user data.
The European Council adopted the EU AI Act in May 2024 and will be implemented in stages through August 2, 2027. The EU AI Act is the first comprehensive AI law in the world. The Act has wide applicability beyond data privacy and personal data protection. But for this article, we'll focus on its effects on data protection.
For some background, the EU AI Act separates AI systems into four categories based on risk level:
The EU AI Act also applies special requirements for general purpose AI (GPAI). GPAI systems can perform a range of tasks and are based on GPAI models. GPAI systems would include ChatGPT, Gemini, and other LLMs that are used for general purposes by the public. Most GPAI providers must create training and testing processes and publish a summary of the content the GPAI model was trained on.
To facilitate and encourage the development of AI systems, the Act requires EU Member States to establish at least one AI regulatory sandbox by August 2, 2026. An "AI regulatory sandbox" is a controlled framework where AI providers can develop, train, test, and validate AI systems under regulatory supervision before the system is released.
AI providers can use personal data that was lawfully collected for other purposes to develop, test, or train their AI systems in an AI regulatory sandbox as long as all of the following conditions are met:
(Art. 59 of the EU AI Act (2025).)
The Act's risk-based approach to AI systems and stringent conditions for personal data use, together with the pre-existing protections of the GDPR (discussed below), create an accountable and navigable path for AI providers while addressing privacy concerns.
Perhaps the most widely known data protection law is the GDPR. The GDPR is a relatively strict EU law that protects personal data and privacy. (It went into effect in May 2018.) While the law applies only to EU member states, many countries have used it as a model and put similar regulations into place.
The GDPR applies to most businesses that process personal data. Under the GDPR, companies can collect and process personal data only under limited circumstances and have to follow strict protocols for collecting, storing, and processing that data.
The GDPR allows companies to process—for example, collect, record, store, organize, or use—personal data only if one of the following is true:
Most of these situations, except for the last one, are relatively obvious to identify. However, proving you have a legitimate interest in processing personal data is tricky. To determine whether you have a legitimate interest, you must:
(Art. 6 of the GDPR (2025).)
Whatever the justification for processing personal data, the GDPR requires companies to make sure the data is accurate, up to date, and secure. Companies also need to be transparent with the data subject about the processing of their personal data. For example, the company must let the person know generally why their personal data was collected and for how long their personal information will be stored.
Some EU member countries have already taken action against AI companies to enforce consumer rights under the GDPR. For example, Italy temporarily banned ChatGPT in March 2023 due to concerns about the chatbot's potential GDPR violations. Italy took issue with how OpenAI was collecting its training data from Italian consumers and how inappropriate data could reach underage users. OpenAI made regional changes to ChatGPT, such as verifying users' ages when they sign up and providing a way for people to remove their personal information from ChatGPT's training data. In response to OpenAI's improvements, Italy lifted its ban.
In addition, Ireland stalled Google's EU launch of Bard (the precursor to Gemini). In an attempt to comply with GDPR rules and to provide more transparency, Google made various changes to Bard pre-launch, including requiring users to create a Google account to use Bard and authenticate that they're 18 years of age or older.
The United Kingdom, on the other hand, is taking a more relaxed approach to AI regulation. Even though the UK is no longer an EU member state, it incorporates the GDPR into its Data Protection Act. The UK has said that it doesn't plan to create new data privacy laws geared toward AI but will give voluntary guidance on existing laws. For example, the UK Information Commissioner's Office has provided companies with best practices and principles to consider when adopting AI within their industry.
In the U.S., it appears that states are taking the lead to protect consumer data. Bills addressing AI and privacy concerns have stalled in Congress. Federally, the U.S. is moving at a relatively slow pace to regulate AI. Additionally, the FTC's investigation into OpenAI has been slow to produce any findings or changes. Though, the advancement of AI regulations outside the U.S. and the introduction of the AI bills in Congress could be an encouraging sign in the years ahead.
If lawmakers decide to get serious about the issue, they could create data protection laws that provide ways for Americans to better control their personal information. If the U.S. and other countries follow the EU's lead, companies will have to reconsider how they use personal information to train AI.
Need a lawyer? Start here.