GitHub's new Section J: your Copilot data trains their models unless you say no

GitHub added a new Section J to its Terms of Service this week. It says that starting April 24, inputs, outputs, code snippets, and associated context from Copilot Free, Pro, and Pro+ users will be used to train AI models, unless those users manually opt out.

Notice the carve-out: Business and Enterprise customers are excluded. So are students and teachers accessing Copilot Pro for free. If your GitHub account is a member of or outside collaborator with a paid organization, your interaction data is excluded from training too, even if you're on a personal Copilot plan. The protection follows the org relationship, not just the subscription tier.

The people who pay the most get contractual data protections. Everyone else gets a default toggle flipped to "on" and 30 days' notice.

What GitHub is actually collecting

The blog post from Mario Rodriguez, GitHub's Chief Product Officer, lays out the scope. The interaction data collected includes prompts sent to Copilot, generated suggestions, accepted or modified outputs, code context around your cursor, comments and documentation, file names, repository structure, navigation patterns, and thumbs up/down feedback.

This is not metadata. This is the substance of how developers write code.

Here's the part that redefines "private": if you have model training enabled and you're actively using Copilot in a private repository, code snippets from that repo can be collected during your session. GitHub's FAQ clarifies they don't pull code from private repos "at rest," but that distinction matters less than it sounds. If Copilot is running while you code, your private repo content is in play.

The legal scaffolding

The updated Terms of Service are doing real structural work here. The new Section J consolidates all AI-related terms and creates an explicit license grant: unless you opt out, you're granting GitHub and its affiliates (read: Microsoft) the right to collect and use your inputs and outputs to develop, train, and improve AI models.

The Privacy Statement update is equally pointed. For users in the EEA and UK, GitHub is claiming "legitimate interest" as the lawful basis for processing data for AI development. That's a GDPR mechanism that lets companies process personal data without explicit consent, provided their interest doesn't override the user's fundamental rights. It's legal. It's also the kind of basis that data protection authorities tend to scrutinize when it involves large-scale data collection from millions of users.

GitHub also expanded data sharing with affiliates. The Privacy Statement now allows Microsoft to use shared data for "developing and improving artificial intelligence and machine learning technologies." The company says third-party AI model providers don't get access. But Microsoft isn't a third party here. Microsoft is the parent company that owns GitHub and builds the models Copilot runs on.

The opt-out and its limits

To disable training, go to github.com/settings/copilot/features and turn off "Allow GitHub to use my data for AI model training" under the Privacy heading. If you previously disabled the older "prompt and suggestion collection" setting, your preference carries over automatically.

The opt-out stops collection going forward. GitHub hasn't said anything about deleting data already collected before you flip the switch. That's a gap worth watching, particularly for EEA users who may have deletion rights under GDPR Article 17.

The industry precedent play

GitHub's FAQ explicitly names Anthropic, JetBrains, and Microsoft as companies with similar opt-out training policies. This is a deliberate framing: we're not doing anything unusual, the industry has already moved here.

That framing is mostly accurate. Opt-out data collection for consumer AI products has become the default playbook. But the comparison glosses over a key difference. GitHub isn't a chat assistant where users type casual prompts. It's the platform where developers store and write production code, including proprietary business logic, internal APIs, and security-sensitive implementations. The data surface area is fundamentally different.

Developer reaction so far

The community response on GitHub's own discussion thread has been overwhelmingly negative. According to The Register's reporting, the thread had collected 59 thumbs-down emoji votes against just three rocket ships by the time of publication, with no community members (other than GitHub VP of Developer Relations Martin Woodward) endorsing the change.

The topic is also trending on Hacker News, which tells you where the developer sentiment sits.

What builders should do right now

First, go opt out if you don't want your code in training data. Do it today, not April 23. The setting is at github.com/settings/copilot/features under Privacy.

Second, if you're at a company where developers use personal GitHub accounts alongside org repos, check the fine print. GitHub says members and outside collaborators of paid organizations are excluded from training data collection. Verify your developers' account relationships match that protection.

Third, for anyone operating under GDPR, start documenting your position now. The "legitimate interest" basis GitHub is claiming will likely face challenges from European DPAs. Whether you're building on Copilot or competing with it, the regulatory outcome matters.

The policy change itself isn't surprising. Every major AI company is moving toward opt-out training on user data. What's worth paying attention to is the precision of who gets protected and who doesn't. Enterprise customers negotiated their protections into contracts. Individual developers got a blog post and a settings toggle.

Paul Menon covers AI policy for The Daily Vibe.

GitHub's new Section J: your Copilot data trains their models unless you say no

What GitHub is actually collecting

The legal scaffolding

The opt-out and its limits

The industry precedent play

Developer reaction so far

What builders should do right now

Related Articles

RSAC 2026 turned "agentic security" into a product category. The hard problems are still unsolved.

OpenAI signs Smartly to build conversational ads inside ChatGPT

Microsoft ships its first homegrown AI models. The OpenAI safety net is getting thinner.