GitHub will train AI on your Copilot data unless you opt out by April 24
AIMarch 26, 2026· 5 min read

GitHub will train AI on your Copilot data unless you opt out by April 24

Paul MenonBy Paul MenonAI-GeneratedAnalysisAuto-published2 sources cited

Starting April 24, 2026, GitHub will use "interactions with GitHub features and services, including inputs, outputs, code snippets, and associated context" from Copilot Free, Pro, and Pro+ users to train AI models. That language comes straight from GitHub's updated documentation. If you don't actively flip a toggle in your settings before that date, your data is in the training pool.

Notice the carve-out: Copilot Business and Enterprise customers are explicitly excluded. GitHub's docs state that "Copilot Business or Copilot Enterprise customer data" is "protected under GitHub's Data Protection Agreement, which prohibits such use without customer authorization." If you're paying $19/month for Pro, your code interactions feed the model. If your company is paying $19/seat for Business, they don't. Same price. Different data rights.

What "interaction data" actually means

GitHub's documentation defines the scope broadly: inputs, outputs, code snippets, and associated context. That covers your prompts to Copilot Chat, the code suggestions Copilot generates for you, the code you were writing when those suggestions appeared, and whatever surrounding context the system used to produce them.

This is not just your public repositories. This is your active development workflow, the questions you ask an AI coding assistant, the code you accept and reject, and the patterns of how you write software. For developers working on proprietary projects using personal Copilot subscriptions, this should raise immediate questions about what ends up in the training set.

GitHub frames the change as building "more intelligent, context-aware coding assistance based on real-world development patterns." That's a reasonable technical goal. But the mechanism for achieving it, opt-out rather than opt-in, is the policy choice worth examining.

The opt-out problem

The opt-out toggle lives in your Copilot settings on GitHub.com. Click your profile picture, go to Copilot settings, and look for the model training section. GitHub says you can "opt-out from allowing your data to be used for training in your personal settings."

The problem is behavioral, not technical. Research on default settings consistently shows that the vast majority of users never change defaults. When the EU's GDPR required opt-in consent for data processing, the number of people "agreeing" to data collection dropped dramatically compared to opt-out regimes. GitHub knows this. Every product team at every major tech company knows this. The default is the decision for most users.

GitHub Copilot Free launched in late 2024 and opened the service to anyone with a GitHub account. That means a massive pool of individual developers, many of them students, hobbyists, and early-career engineers, will have their interaction data swept into training without ever knowing the policy changed.

Why Business and Enterprise get different rules

The Business/Enterprise exemption is the tell. Companies with legal teams negotiate data protection agreements. They have procurement processes that flag training-data clauses. Individual developers on Free, Pro, and even Pro+ (GitHub's premium tier at $39/month) don't have that leverage.

This creates a two-tier data rights structure. Enterprise customers get contractual protection against training-data usage by default. Individual users, including those paying up to $39/month, get an opt-out buried in settings. Microsoft, GitHub's parent company, is making a calculated bet that individual developers won't push back the way corporate legal departments would.

It's worth noting this follows a pattern across the AI industry. OpenAI's ChatGPT uses free-tier conversations for training unless users opt out. Meta trained Llama models on public Instagram and Facebook posts. The playbook is familiar: give individuals a free or cheap tier, set the default to data collection, and let inertia do the work.

The regulatory angle

Under the EU AI Act, which began enforcement in phases starting in 2025, AI systems must meet transparency requirements, and training data practices fall under scrutiny for high-risk systems. Whether a code-completion tool qualifies as high-risk is debatable, but the Act's general-purpose AI model provisions in Article 53 require providers to document training data, including a "sufficiently detailed summary" of content used.

GitHub's opt-out approach may satisfy the letter of GDPR's legitimate interest basis for processing, but it will draw attention from European data protection authorities who have already shown willingness to challenge big tech on consent mechanisms. The Irish Data Protection Commission, which oversees Microsoft's EU operations, has not been shy about enforcement actions.

In the US, there's no federal equivalent. California's CCPA gives residents the right to opt out of the sale of personal information, but whether training-data usage constitutes a "sale" under CCPA remains legally untested for AI-specific contexts.

What builders should do now

If you're a developer on Copilot Free, Pro, or Pro+, go to github.com/settings/copilot and opt out before April 24. It takes about 30 seconds.

If you're a founder or engineering lead with team members using personal Copilot subscriptions for work, this is your prompt to move to Copilot Business. The data protection agreement that comes with Business is the only contractual guarantee that your team's interaction data stays out of training sets.

If you're a general counsel reviewing AI tool usage, add Copilot's individual plans to your shadow-IT audit list. Developers using personal accounts on company hardware may be feeding proprietary code patterns into GitHub's training pipeline without anyone in legal knowing about it.

And if you're a policymaker watching the AI training-data debate, this is a clean case study in why opt-out defaults for training data don't produce meaningful consent. The people most affected, individual developers, are the least likely to know about the change and the least equipped to evaluate its implications. Good governance means the default protects the user, not the model.

Paul Menon covers AI policy for The Daily Vibe.

This article was AI-generated. Learn more about our editorial standards

Share:

Report an issue with this article