DeepSeek’s latest language model, r1, has caught attention for performing on par with OpenAI’s o1, despite having significantly fewer resources. This unexpected competition has fueled discussions about data usage and legal regulations surrounding large language models (LLMs). However, recent legal scrutiny, particularly from European data protection authorities, raises questions about whether legal hurdles could slow down advancements in AI.

What Happened?

DeepSeek has found itself under regulatory examination in Europe. On January 28, 2025, Italy’s data protection authority (Garante) requested details on DeepSeek’s data processing practices, legal justifications, and storage locations, giving the company 20 days to respond.[1] Shortly after, Ireland’s Data Protection Commission (DPC) and France’s CNIL launched their own inquiries.[2] By January 30, Garante determined DeepSeek’s responses were insufficient and banned the company from collecting data from Italian users. This led to DeepSeek’s removal from Italian app stores. A similar ban was imposed on ChatGPT in April 2024, later lifted after compliance commitments. OpenAI also faced a €15 million fine in December 2024, along with a mandatory public awareness campaign.[3]

Meanwhile, intellectual property disputes continue to heat up. OpenAI and other AI firms face legal battles worldwide, with recent lawsuits from The New York Times and India’s ANI news agency over unauthorized content use.[4] OpenAI has also accused DeepSeek of unfair competition, claiming it used OpenAI’s models to train its own system.[5]

 What Do Legal Regulations Say?

There is no single legal framework governing LLM data usage. Instead, different regulations apply depending on the type of data being processed. However, two key areas of concern stand out: data protection and intellectual property rights.

Under the GDPR, which EU authorities reference in their investigations, companies must provide legal justifications for processing personal data. The U.S., in contrast, lacks a unified federal data protection law, making commercial data usage more flexible. When it comes to intellectual property, laws in the U.S. and EU share core similarities, but the U.S. allows broader “fair use” exceptions. [6]

Can Publicly Available Data Be Freely Used?

AI models require massive datasets, often sourced from publicly available online content. However, just because data is accessible does not mean it is free to use. Under both data protection and copyright laws, public availability does not eliminate legal restrictions.[7] For example, a publicly shared phone number cannot be repurposed for marketing, nor can a social media photo be used for AI training without consent. The GDPR requires a “legitimate interest” analysis before processing publicly available data. [8]

Protecting Data in the AI Race

These developments highlight that AI competition is not just about technical advancements but also about data control. However, intellectual property regulations also play a critical role in shaping the AI landscape. Stricter IP laws could help create a more ethical environment by ensuring that AI models do not exploit copyrighted content without authorization. On the other hand, excessive legal barriers might stifle innovation, making it harder for new competitors to enter the market and potentially consolidating power within a few dominant players. The challenge for regulators is to strike a balance—one that fosters both technological progress and ethical responsibility in the AI race.

 

[1] Garante per la protezione dei dati personali, ‘Garante Privacy Requests Information from DeepSeek AI on Data Processing’ (28 January 2025) <https://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/10096856#english> accessed 28 February 2025.

[2] Mathieu Rosemain, ‘French Privacy Watchdog to Quiz DeepSeek AI on Data Protection’ (Reuters, 30 January 2025) <https://www.reuters.com/technology/artificial-intelligence/french-privacy-watchdog-quiz-deepseek-ai-data-protection-2025-01-30/> accessed 28 February 2025.

Natasha Lomas, ‘Italy Sends First Data Watchdog Request to DeepSeek: “The Data of Millions of Italians Is at Risk”’ (TechCrunch, 29 January 2025) <https://techcrunch.com/2025/01/29/italy-sends-first-data-watchdog-request-to-deepseek-the-data-of-millions-of-italians-is-at-risk/> accessed 28 February 2025.

[3] Garante per la protezione dei dati personali, ‘Garante Privacy Requests Information from DeepSeek AI on Data Processing’ (28 January 2025) <https://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/10096856#english> accessed 28 February 2025.

[4] Aditya Kalra, ‘Indian News Agency ANI Sues OpenAI for Unsanctioned Content Use in AI Training’ (Reuters, 19 November 2024) <https://www.reuters.com/technology/artificial-intelligence/indian-news-agency-ani-sues-openai-unsanctioned-content-use-ai-training-2024-11-19/> accessed 28 February 2025.

[5] DeepSeek, ‘Privacy Policy’ (DeepSeek) <https://cdn.deepseek.com/policies/en-US/deepseek-privacy-policy.html> accessed 28 February 2025.

[6] Decision Foundry, ‘Complete Guide to US Data Protection Laws’ (Decision Foundry) <https://www.decisionfoundry.com/data/articles/complete-guide-us-data-protection-laws/> accessed 28 February 2025.

[7] Aditya Kalra, ‘Indian News Agency ANI Sues OpenAI for Unsanctioned Content Use in AI Training’ (Reuters, 19 November 2024) <https://www.reuters.com/technology/artificial-intelligence/indian-news-agency-ani-sues-openai-unsanctioned-content-use-ai-training-2024-11-19/> accessed 28 February 2025.

[8] European Parliamentary Research Service, The Impact of the General Data Protection Regulation (GDPR) on Artificial Intelligence (European Parliament, 2020) <https://www.europarl.europa.eu/RegData/etudes/STUD/2020/641530/EPRS_STU(2020)641530_EN.pdf> accessed 28 February 2025.