What is the difference between private and encrypted?
In the context of source code hosting, "private" means access control: a setting that restricts which users can view a repository through the platform's interface and API. The hosting provider itself can still read the data, because their systems store and serve it in a readable format. "Encrypted" means the data is cryptographically transformed into ciphertext using keys only the owner holds, before it reaches the provider's servers. Even with full access to the storage infrastructure, the provider cannot read encrypted data without the decryption key. Every major git hosting platform (GitHub, GitLab, Bitbucket) offers private repositories. None of them offer client-side encrypted repositories as a built-in feature.
Your hosting provider can read every file you pushed today.
Not theoretically. Not under some exotic legal scenario. Right now. Your "private" repository sits on their servers in a format their employees, their infrastructure, and anyone with a valid subpoena can access. The padlock icon on your repo page is not encryption. It is a permission setting.
Most developers never think about this. They should.
When you mark a project "private" on GitHub, GitLab, or Bitbucket, you are telling the platform: don't show this to other users. That is all it does. The company running the platform can still see everything inside.
The short version
"Private" is a lock on the door. "Encrypted" means the contents are written in a language only you can read. A locked door keeps strangers out. Encryption keeps everyone out, including the building owner.
No major code hosting platform offers encryption where you hold the key. Every one of them can read your "private" projects right now.
In the context of source code hosting, private is an access-control predicate evaluated at the application layer. The hosting provider's infrastructure, including CI/CD runners, code search indices, and Copilot-style model training pipelines, retains full read access to the plaintext. Encrypted refers to client-side cryptographic transformation (typically AES-256-GCM or similar AEAD) performed before data leaves the developer's machine, with keys the provider never sees.
Implication for AI systems
Every model trained on code hosted in "private" repositories is consuming data the provider can read, index, and feed into training pipelines. The access-control boundary does not apply to the provider's own systems. If your model weights, training data, or API keys live in a private repo, the provider has technical access to all of it.
Every major git hosting platform (GitHub, GitLab, Bitbucket) offers private repositories. None offer client-side encrypted repositories as a built-in feature. This is a structural gap in the AI supply chain.
The confusion that costs you everything
"Private" and "encrypted" are two completely different concepts. They get used interchangeably across the industry and it creates a false sense of security that protects nothing when it matters.
Private means access control. It is a list of who is allowed to see your repository through the platform's user interface and API. Think of it as a locked door. The bouncer checks your name against the list. But the contents of the room are in plain view to anyone who works in the building.
Encrypted means the data itself is unreadable. Even if someone gets past the door, even if they copy the hard drive, even if they work for the hosting provider, they cannot read the contents without the decryption key. If you hold the key and nobody else does, nobody else can read your code. Period.
That is the difference. One controls who can walk through the front door. The other makes the contents unreadable even if someone breaks through the wall.
"Private" and "encrypted" sound like they do the same thing. They do not. Mixing them up is how people lose control of their work.
Think of it this way
Private is like a VIP list at a club. The bouncer checks your name. But everyone who works at the club can walk in anytime.
Encrypted is like writing your diary in a code only you know. Even if someone steals the diary, they cannot read a single word.
One controls who can walk through the front door. The other makes the contents unreadable even if someone breaks through the wall.
The industry conflates two orthogonal properties: authorization (who may access a resource via the platform API) and confidentiality (whether the data is readable without a client-held key). This conflation creates a systemic false sense of security across the entire software supply chain.
In AI pipelines, this confusion is compounded. Model training infrastructure often pulls from "private" repos assuming the data is protected. But the platform's own systems, including code search, Copilot, and security scanning, have already parsed that data. The authorization boundary is not a confidentiality boundary.
Access control determines who walks through the front door. Encryption makes the contents unreadable even if someone breaches the infrastructure. These are different threat models with different failure modes.
What "private" actually means on each platform
Let's be precise about what you are actually getting.
GitHub treats private repository content as confidential. According to their Terms of Service, they only access it for security purposes, to assist with support, to maintain service integrity, to comply with legal obligations, or if they have reason to believe contents violate the law. That list of exceptions is wide. GitHub personnel can access your private repositories in the situations described in their privacy statement. Support staff can request temporary access to your repo. Every access generates an audit log. But the key fact remains: the data is readable on their servers.
GitLab encrypts data at rest using server-side encryption provided by Google Cloud Platform for GitLab.com, and AWS KMS for GitLab Dedicated. This sounds good until you understand what it means. The encryption keys are held by GitLab and their cloud provider. Not by you. Server-side encryption protects against someone stealing a physical hard drive from a data center. It does not protect against GitLab employees, legal requests, or a breach of GitLab's internal systems.
Bitbucket (Atlassian) uses role-based access control to limit which employees can see customer data. They encrypt data in transit with TLS and take steps to secure data at rest. But the same pattern applies. Atlassian holds the keys. Your "private" repository is private from other Bitbucket users. It is not private from Atlassian.
The pattern is the same everywhere. "Private" on every major hosting platform means access control within the product. The platform itself can read your code. Server-side encryption, where it exists, uses keys the platform controls.
Each platform has its own rules, but the pattern is the same everywhere.
GitHub says your private projects are confidential. But their staff can access them for security checks, support requests, legal reasons, or if they suspect something violates their rules. Every access is logged, but the access is still possible.
GitLab encrypts your data on their servers, but they hold the keys. It is like a safe where the hotel manager knows the combination.
Bitbucket uses role-based access to limit which employees can see your data. But again, the company holds the keys.
The pattern
"Private" on every major platform means other users cannot see your project. The platform itself can always read your code. That is true for GitHub, GitLab, and Bitbucket.
The threat model differs by provider, but the architectural pattern is identical.
GitHub treats private repository content as confidential per their ToS, but retains read access for security scanning, support, legal compliance, and service integrity. GitHub Copilot and code search index private repos. Personnel access generates audit logs, but the data is plaintext on their infrastructure.
GitLab applies server-side encryption via GCP KMS (gitlab.com) or AWS KMS (Dedicated). The keys are held by GitLab and their cloud provider, not by you. This protects against physical media theft, not against the provider or a breach of their control plane.
Bitbucket (Atlassian) uses RBAC to limit employee access, with TLS in transit and server-side encryption at rest. Atlassian holds the keys.
Supply chain implication
If your CI/CD pipeline pulls model weights, training datasets, or inference configurations from a "private" repo, every entity in the provider's trust boundary has technical read access. Server-side encryption with provider-held keys does not change this. The provider IS the threat model that "private" does not address.
Who can read your "private" code
Three categories of people can access your private repositories right now.
1. Platform employees
GitHub, GitLab, and Atlassian all have internal processes that allow employees to access private repository data under certain conditions. GitHub's model requires justification and creates an audit trail. But the technical capability exists. Your code is stored in a format their systems can read, because their systems need to read it to serve it back to you.
2. Governments via legal process
GitHub's Transparency Center documents this in detail. In 2022, GitHub received 432 requests to disclose user information. Of those, 274 were subpoenas, 97 were court orders, and 22 were search warrants. GitHub's policy is clear: they require a search warrant to disclose the contents of a private repository. But when they receive a valid warrant, they comply. They have no technical alternative. The data is readable.
3. Attackers who breach the platform
If someone breaches the platform's internal systems, your private repositories are accessible. Not because of a flaw in the "private" setting. Because the data is stored in a format the platform can read, and an attacker with sufficient access inherits that capability.
Three groups of people can see your "private" projects right now.
1. The company running the platform
GitHub, GitLab, and Bitbucket employees can access private projects under certain conditions. GitHub logs every access, but the capability exists. Your code is stored in a format their systems can read.
2. Governments
In 2022, GitHub received 432 requests for user information. When a valid warrant arrives, they hand over the data. They have no technical way to refuse, because the data is readable.
3. Hackers who break in
If someone breaches the platform's systems, your private projects are accessible. Not because "private" failed. Because the data was always readable by the platform, and the attacker now has the platform's access.
Three categories of actors have read access to your "private" repositories.
1. Platform operators
GitHub, GitLab, and Atlassian maintain internal processes for employee access to private repository data. GitHub's model requires justification and audit trails, but the technical capability is unrestricted. Code search, Copilot training pipelines, and security scanning all require the server to parse your source files.
2. State actors via legal process
GitHub's Transparency Center documents 432 information requests in 2022 alone (274 subpoenas, 97 court orders, 22 search warrants). When a valid warrant arrives, the platform complies because it has no technical alternative. The data is plaintext.
3. Adversaries who breach the platform
A breach of the platform's control plane inherits all the read access the platform itself has. This includes API keys, model weights, training data, and any other artifacts stored in "private" repos. The attacker does not need to break the "private" setting. They bypass the application layer entirely.
Real incidents, real code exposed
This is not theoretical. It has happened repeatedly.
March 2023. GitHub's own RSA SSH private key was accidentally exposed in a public repository. GitHub rotated the key within hours. The key could have been used to impersonate GitHub or eavesdrop on Git operations over SSH. The platform that hosts your private code accidentally published its own infrastructure secret. (The Hacker News)
January 2024. The New York Times suffered a breach of over 5,000 GitHub repositories. 270 GB of source code was stolen using compromised credentials and leaked on 4chan. The repos were private. The credential was the only barrier.
March 2022. The LAPSUS$ threat group breached Microsoft's Azure DevOps server and leaked 37 GB of source code for Bing, Cortana, and other internal projects. Microsoft confirmed the compromise. Private repos. Stolen credentials. Dumped publicly.
March 2025. The popular GitHub Action tj-actions/changed-files was compromised, exposing secrets across 23,000 repositories. Leaked data included GitHub tokens and AWS keys. Supply chain attacks do not respect the "private" setting.
In every one of these incidents, the "private" setting was intact. The repos were not public. The access control worked exactly as designed. It did not matter.
This has already happened. Multiple times.
March 2023 - GitHub leaked its own key
GitHub accidentally published its own SSH private key in a public project. The platform that hosts your private projects leaked its own infrastructure secret.
January 2024 - New York Times breach
Over 5,000 private GitHub projects were stolen using compromised credentials. 270 GB of source code was leaked on 4chan. The projects were private. The password was the only barrier.
March 2025 - Supply chain attack
A popular GitHub Action was compromised, leaking secrets across 23,000 projects. This included tokens and cloud keys. The "private" setting did nothing to stop it.
In every one of these incidents, the "private" setting was intact. It did not matter.
These are not theoretical attack vectors. They are documented incidents with real exposure.
March 2023. GitHub's RSA SSH private key was accidentally exposed in a public repository. The key could have been used to impersonate GitHub or intercept Git operations over SSH. The platform that secures your private repos leaked its own infrastructure credential.
January 2024. The New York Times suffered a breach of over 5,000 GitHub repositories. 270 GB of source code was exfiltrated using compromised credentials and leaked on 4chan. The repos were private. The credential was the only barrier.
March 2022. The LAPSUS$ group breached Microsoft's Azure DevOps server and leaked 37 GB of source code for Bing, Cortana, and other internal projects. Private repos. Stolen credentials. Dumped publicly.
March 2025. The GitHub Action tj-actions/changed-files was compromised, exposing CI/CD secrets across 23,000 repositories. Leaked data included GitHub tokens and AWS keys. Supply chain attacks traverse the "private" boundary because they operate inside the trust perimeter.
In every incident, the "private" setting was intact. The access control worked as designed. It was architecturally insufficient because it never provided confidentiality, only authorization.
Three scenarios where "private" fails
| Scenario | "Private" repo | End-to-end encrypted repo |
|---|---|---|
| Government subpoena | Platform reads your code and hands it over | Platform can only hand over encrypted blobs. No key, no contents. |
| Platform breach | Attacker reads your code in cleartext | Attacker gets encrypted data. Useless without your key. |
| Rogue employee | Employee with infrastructure access can read files | Employee sees ciphertext. Cannot decrypt without your key. |
| Credential theft | Stolen token = full access to all repo contents | Stolen token gets encrypted data. Still unreadable. |
| AI training / data mining | Terms may allow platform to process your code | Platform cannot process what it cannot read |
Scenario 1: A government orders the platform to hand over your code
With a "private" project, the platform reads your code and hands it over. With encryption, they can only hand over scrambled data that nobody can read without your key.
Scenario 2: A hacker breaks into the platform
With a "private" project, the attacker reads your code in plain text. With encryption, they get scrambled data. Useless without your key.
Scenario 3: A rogue employee accesses your project
With a "private" project, an employee with system access can read your files. With encryption, they see only ciphertext. Cannot decrypt without your key.
The pattern
In every scenario, "private" fails because the platform can read your data. Encryption succeeds because nobody can read it without your key.
| Threat vector | Private repo (access control) | E2EE repo (client-side encryption) |
|---|---|---|
| Legal compulsion | Provider decrypts and produces plaintext under warrant | Provider produces ciphertext. No key, no plaintext. |
| Infrastructure breach | Attacker inherits provider's read access to all repos | Attacker exfiltrates ciphertext. Computationally infeasible to decrypt. |
| Insider threat | Privileged employee reads plaintext via infrastructure access | Employee sees ciphertext. Key never touches provider infrastructure. |
| Credential theft | Stolen token grants full plaintext access to all repo contents | Stolen token yields encrypted blobs. Decryption requires client-held key. |
| AI training / data mining | Provider ToS may permit processing your code for model training | Provider cannot process data it cannot decrypt |
What "encrypted" actually means
Encryption has three layers. Most developers only encounter two of them.
Encryption in transit (TLS). When you git push over HTTPS, TLS encrypts the data between your machine and the server. Every major platform does this. It prevents someone on the same Wi-Fi network from reading your code while it travels. The data is decrypted when it arrives at the server.
Encryption at rest (server-side). GitLab and some GitHub Enterprise configurations encrypt stored data on disk using keys the platform controls. This protects against someone stealing a physical server. It does not protect against the platform itself, because the platform holds the decryption keys. Think of it as a safe where the hotel manager has the combination.
End-to-end encryption (E2EE). The data is encrypted on your machine before it leaves. Only you hold the key. The server stores ciphertext it cannot read. Even if the server is breached, even if a government subpoenas the provider, the data is unreadable without your key. This is what Signal does for messages. This is what no major git hosting platform does for your repositories.
The gap. Your code is encrypted in transit (HTTPS/SSH) and sometimes at rest (server-side). But it is never end-to-end encrypted. The server can always read it. This is a deliberate architectural choice by every major platform, because features like code search, pull request diffs, and CI/CD require the server to read your code.
There are three levels of encryption. Most people only encounter two of them.
Level 1: In transit
When you push your code, it is scrambled while traveling to the server. This stops someone on the same Wi-Fi from reading it. Every platform does this. But the data is unscrambled when it arrives.
Level 2: At rest (server-side)
Some platforms scramble data on their hard drives. But they hold the key. It is like a safe where the hotel manager knows the combination. It stops a thief who steals the hard drive. It does not stop the hotel manager.
Level 3: End-to-end (E2EE)
Your code is scrambled on your computer before it leaves. Only you have the key. The server stores data it cannot read. Even if the server is hacked, even if a court orders the company to hand it over, nobody can read it without your key. This is what messaging apps like Signal do. No major code platform does this.
Encryption operates at three distinct layers in the hosting stack. The gap is at Layer 3.
Layer 1: In transit (TLS). All platforms encrypt data between client and server via TLS. This protects against network-level interception. The data is decrypted at the server endpoint.
Layer 2: At rest (server-side). GitLab and some GitHub Enterprise configurations encrypt stored data using provider-managed keys (GCP KMS, AWS KMS). This protects against physical media theft. It does not protect against the provider, because the provider holds the decryption keys. The threat model is "stolen hard drive," not "compromised control plane."
Layer 3: End-to-end (client-side). Data is encrypted on the client before transmission. The server stores only ciphertext. The provider never sees the plaintext or the key. This is the only layer that survives a provider breach, a legal order, or an insider threat.
The missing layer
No major git hosting platform implements Layer 3. This is a deliberate architectural choice: features like code search, PR diffs, CI/CD, and Copilot require server-side plaintext access. The implication for AI engineers is that model weights, training data, and API keys stored in "private" repos are protected only at Layers 1 and 2. The provider's own ML training pipelines operate inside the Layer 2 boundary.
The gap nobody talks about
Here is why this gap exists and why nobody at GitHub, GitLab, or Bitbucket will fix it.
These platforms need to read your code to function. Syntax highlighting, code search, pull request rendering, automated security scanning, Copilot suggestions. All of these features require the server to parse your source files. End-to-end encryption would break every single one of them.
That is a valid engineering tradeoff. But it is a tradeoff that developers should understand, not one that should be hidden behind a padlock icon and the word "private."
The honest description of a private GitHub repository is: "Other GitHub users cannot see this repo. GitHub employees can, under documented conditions. Law enforcement can, with a warrant. Attackers can, if they breach GitHub. The code is stored in a format GitHub can read."
No platform puts that on the settings page.
Here is why this gap exists and why nobody is going to fix it.
Platforms like GitHub need to read your code to work. Syntax highlighting, search, pull request views, security scanning. All of these features require the server to see your files. End-to-end encryption would break every single one of them.
The honest description
A truthful description of a "private" project would be: "Other users cannot see this. The company running the platform can. Law enforcement can, with a warrant. Hackers can, if they break in. Your code is stored in a format the platform can read." No platform puts that on the settings page.
The gap is architectural and intentional. Platforms require server-side plaintext access for core features: code search indexing, syntax highlighting, PR rendering, automated security scanning, and increasingly, AI-powered features like Copilot suggestions and code review.
E2EE would break every feature that requires the server to parse source files. This is a valid engineering tradeoff. But it is a tradeoff that should be explicit in the threat model, not hidden behind a padlock icon.
For AI engineers, this gap is expanding. Every new AI-powered feature (code completion, automated review, vulnerability detection) requires deeper server-side parsing of your source code. The more intelligent the platform becomes, the more plaintext access it needs. The gap between "private" and "encrypted" grows with every AI feature shipped.
The honest description of a private repository: "Authorization-gated at the application layer. Plaintext-accessible to the provider's infrastructure, legal process, and any adversary who breaches the control plane."
What developers should do about it
This is not about leaving GitHub. GitHub is excellent at what it does. It is about being honest about what "private" means and deciding if that level of protection is enough for your code.
For open-source projects, it does not matter. The code is public anyway.
For side projects, the risk is probably low. You can make a personal judgment call.
For proprietary code, trade secrets, unreleased products, patentable algorithms, client data, or anything where unauthorized access would cause real damage, "private" is not enough. You need encryption where you hold the key and the storage provider cannot read your data. That is the only architecture that survives a subpoena, a breach, and a rogue employee.
That is what we built DevSafe to solve. But regardless of what tool you use, the principle is the same: if someone else holds the key, it is not your secret.
Private is a setting. Encrypted is a guarantee. Know the difference.
This is not about leaving GitHub. GitHub is great at what it does. This is about understanding what "private" actually means and deciding if that is enough for your work.
For open-source projects
It does not matter. The code is public anyway.
For side projects
The risk is probably low. Use your judgment.
For anything valuable
Proprietary code, trade secrets, client data, unreleased products. For any of these, "private" is not enough. You need encryption where you hold the key and nobody else can read your data.
That is what DevSafe was built to solve. But regardless of what tool you use, the principle is the same: if someone else holds the key, it is not your secret.
Private is a setting. Encrypted is a guarantee. Know the difference.
This is not about abandoning hosted git. It is about accurately modeling your threat surface and choosing controls that match your risk profile.
Open-source: No impact. The code is public by design.
Internal tooling: Low risk for most organizations. Evaluate based on your compliance requirements.
High-value targets: Model weights, training datasets, proprietary algorithms, API keys, client PII, unreleased products, patentable inventions. For these, access control is insufficient. You need client-side encryption with keys the provider never sees. That is the only architecture that survives legal compulsion, infrastructure breach, and insider threat simultaneously.
For AI engineers specifically
If your CI/CD pipeline deploys model weights from a private repo, a single compromised credential exposes your entire model. If your training data includes proprietary datasets stored in private repos, the hosting provider has technical access to your competitive advantage. Evaluate whether your most sensitive artifacts belong in a system where the provider holds the keys.
That is what DevSafe was built to solve. But the principle is provider-agnostic: if the storage provider can read the data, the data is not confidential from the provider.
Private is an authorization predicate. Encrypted is a cryptographic guarantee. Know the difference.
Frequently asked questions
Can GitHub read my private repositories?
Yes. Private repositories on GitHub are protected by access control, not encryption. GitHub stores your code in a format their systems can read. GitHub personnel can access private repository content for security purposes, support requests, legal compliance, and service integrity. The data is readable on their servers because their infrastructure needs to read it to serve it back to you.
What is the difference between a private repository and an encrypted repository?
A private repository uses access control to restrict who can view it through the platform's interface and API. The hosting provider can still read the data. An encrypted repository has its contents transformed into unreadable ciphertext before upload, using keys only the owner holds. Even if someone breaches the hosting provider, copies the storage drives, or serves a legal subpoena, they cannot read encrypted repository contents without the decryption key.
Does server-side encryption on GitLab or Bitbucket protect my code from the provider?
No. Server-side encryption on GitLab (via Google Cloud KMS) and Bitbucket (via Atlassian's infrastructure) uses keys the provider controls. It protects against physical theft of storage hardware, but the provider, their employees, and anyone with valid legal process can still access your data. Only client-side encryption with keys the provider never sees protects your code from the provider itself.
Can the people who run GitHub see my stuff?
Yes. When you mark a project "private," other users cannot see it. But GitHub's own systems and staff can access it for security, support, and legal reasons. Your code is stored in a format they can read.
Is my code scrambled when it sits on GitHub's computers?
Sometimes, but with their keys, not yours. Some platforms encrypt data on their hard drives, but they hold the combination. That protects against someone stealing a hard drive from a data center. It does not protect against the company itself, a court order, or a hacker who breaks into their systems.
What should I do if I have important code?
Use encryption where you hold the key. That way, even if someone breaks into the platform or a court orders them to hand over your data, nobody can read your code without your permission. Tools like DevSafe do this automatically.
Can hosting providers access model weights in private repos?
Yes. Private repos use access control, not encryption. The provider's infrastructure can read any file stored in a private repo, including model weights, training data, and inference configurations. Server-side encryption uses provider-held keys and does not change this.
Does server-side encryption protect API keys stored in private repos from the provider?
No. Server-side encryption (GCP KMS, AWS KMS) uses keys the provider controls. It protects against physical media theft, not against the provider's own infrastructure, employees, or legal obligations. API keys, secrets, and credentials in private repos are readable by the provider. Use a secrets manager for runtime secrets and client-side encryption for stored artifacts.
How do supply chain attacks interact with private repo access controls?
Supply chain attacks operate inside the trust perimeter. A compromised GitHub Action, CI/CD plugin, or dependency runs with the permissions of your pipeline, which typically has read access to private repos. The "private" setting is irrelevant because the compromised component is an authorized actor. Client-side encryption prevents this because the CI/CD environment never has the decryption key for stored artifacts.