What is filesystem backup for git?
Filesystem backup is any backup method that copies files from disk without understanding the application that created them. Tools like Time Machine, Backblaze, Dropbox, and iCloud Drive use filesystem backup. They traverse directories, detect changed files, and copy them to a backup destination one at a time. This works correctly for documents, images, and most application data. It fails for git repositories because a .git directory is a transactional database, not a folder of independent files. Git's internal structures (objects, refs, pack files, indexes) have strict ordering and consistency requirements that filesystem backup cannot honor. When a backup tool copies these files out of order or mid-write, the result is a repository that appears intact but is structurally corrupt.
When you use Time Machine, Backblaze, Dropbox, or iCloud to back up your Mac, those tools copy your files one at a time. That works for documents, photos, and most things on your computer.
What this means
It does not work for code projects. Your project has a hidden folder called .git that tracks every change you have ever made. That folder is not a simple collection of files. It is a structured database, and copying it file by file can break it.
The backup tool does not know which files depend on each other inside that folder. It copies them in whatever order it finds them. If anything changes while it is copying, you end up with a backup that looks fine but is actually broken inside.
Filesystem backup operates at the inode level. It enumerates directory entries, detects mtime changes, and copies modified files individually. This model is correct for self-contained files. It produces undefined behavior when applied to multi-file transactional stores.
Failure pattern
A .git directory is a content-addressed object store with referential integrity constraints. Refs point to commits, commits point to trees, trees point to blobs. Pack files use delta compression with a paired index mapping hashes to byte offsets. Filesystem backup tools (Time Machine, Backblaze, Dropbox, iCloud) copy these files without respecting transactional boundaries. The result is a structurally corrupt snapshot that passes existence checks but fails integrity verification.
Systemic implication: Any CI/CD pipeline, agent workspace, or MCP server that relies on filesystem-level backup for git state recovery is operating on an unverified assumption. The backup may exist. The repository inside it may not be valid.
The assumption everyone makes
I have Time Machine. My code is backed up.
I have Backblaze running in the background. I have my projects in iCloud Drive. I sync my home directory with Dropbox. Whatever the tool, the logic is the same: all my files are being copied somewhere safe, so all my code is backed up. This feels true. For every other file on your machine, it is true. But for git repositories, it is wrong for a specific technical reason that most developers never think about.
The reason is that a .git directory is not a collection of files. It looks like one in Finder. It shows up as one in ls. But it is a transactional database that happens to use the filesystem as its storage layer. And the difference between "a folder of files" and "a database stored as files" is the difference between a backup that works and one that silently corrupts your repository.
You probably think your code is backed up. Time Machine runs every hour. Dropbox syncs in the background. iCloud copies everything. It feels safe.
What happens
For photos, documents, and most files, that backup does work. But your code projects have a hidden tracking system (the .git folder) that is more like a database than a folder. Copying a database file by file can silently scramble it.
The scary part: your backup tool will say everything is fine. The files are all there. You will not discover the problem until the day you actually need to restore, which is the worst possible time to find out.
The default assumption in most development environments: if the machine is backed up, the repositories are backed up. This assumption is false for a specific, structural reason.
Observable signal
A .git directory presents as a filesystem subtree but enforces database-level invariants: referential integrity across objects, atomic ref updates, paired pack/index files. Filesystem backup tools operate below the abstraction layer where these invariants exist. They see files. They do not see transactions.
For agent workspaces and CI runners: a corrupted restore produces git errors that are indistinguishable from repository bugs. Automated systems retry, fail, and escalate without identifying the root cause as backup corruption.
A .git directory is a database, not a folder
When you run git commit, git does not just save a file somewhere. It writes content-addressed objects (blobs, trees, commits), updates an index file, moves ref pointers, appends to the reflog, and potentially triggers a repack that consolidates loose objects into a single pack file with a separate index. These structures depend on each other. A ref points to a commit. That commit points to a tree. That tree points to blobs. The pack index maps object hashes to byte offsets inside the pack file. Remove or corrupt any link in that chain, and the entire history becomes unreadable.
This is the fundamental problem: filesystem backup tools copy files one at a time. They do not understand the relationships between those files. They do not know that .git/refs/heads/main must point to a commit object that actually exists in .git/objects/. They do not know that a pack file and its .idx file are a matched pair that must be captured together. They just copy files in whatever order they encounter them.
It is like photocopying a spreadsheet while someone is editing it. You get half the old version and half the new version. The result looks like a spreadsheet but the numbers do not add up.
A database administrator would never back up PostgreSQL by copying the data directory file by file. They use pg_dump or take a filesystem snapshot while the database is quiesced. The same principle applies to git. But because .git lives inside a normal-looking folder, backup tools treat it like any other directory. And that is where things break.
When you save your work in a code project, git does not just drop a file into a folder. It writes several connected pieces at once: the actual content, a record of what changed, a pointer to the latest version, and a log of everything that has happened.
What this means
All those pieces depend on each other. If your backup tool copies one piece now and another piece a few seconds later, the pieces might not match anymore. It is like taking a photo of a puzzle while someone is rearranging the pieces. The photo shows a puzzle, but the picture does not make sense.
A database administrator would never back up a database by copying its files one by one. They use special tools that understand the database. Your code projects need the same kind of care.
A single git commit writes content-addressed objects (blobs, trees, commits), updates the index, moves ref pointers, appends to the reflog, and may trigger a repack consolidating loose objects into a pack file with a paired .idx. These structures form a directed acyclic graph with strict referential integrity.
Failure pattern
Filesystem backup performs a non-atomic traversal of a transactional store. Each file is captured at a different wall-clock time. If any git operation modifies the object graph between file copies, the backup contains a state that never existed: refs pointing to missing objects, pack indexes with stale byte offsets, truncated index files. This is the database equivalent of a dirty read, but with no rollback mechanism.
The PostgreSQL analogy is exact: no DBA would back up Postgres by copying the data directory file by file. They use pg_dump. Git repositories require the same discipline, but because .git looks like a folder, the requirement is invisible.
Five ways filesystem backup breaks git
1. Mid-operation capture
Git operations are not instantaneous. A git gc (garbage collection) can take seconds or minutes on a large repository. During that time, git is actively moving objects between loose storage and pack files, deleting redundant copies, and rewriting indexes. If your backup tool captures the .git directory during this window, it gets a snapshot of a half-finished operation: some objects in the old location, some in the new, some in neither.
The same applies to git repack, git merge, git rebase, and even git commit on large repos. Any operation that writes multiple files creates a window where the on-disk state is temporarily inconsistent. Databases call this a "dirty read." Filesystem backup tools have no mechanism to avoid it.
2. Lockfile races
Git uses lockfiles (like .git/index.lock) to prevent concurrent access. When git writes to the index, it creates a lockfile, writes the new data, then atomically renames the lockfile to replace the original. Cloud sync services like Dropbox and iCloud also try to read and sync these files. This creates a race condition: the sync service reads the lockfile before git finishes writing, or the sync service holds a file handle that prevents git from completing the rename. The result is a corrupted index or a git operation that fails with "Unable to create lock file."
3. Ref pointer overwrites
iCloud has a well-documented behavior when it encounters filename conflicts: it appends a numeric suffix. If two devices write to .git/refs/heads/main, iCloud does not merge the content or raise an error. It creates main 2. Now your repository has a file called main 2 in its refs directory, which git does not recognize as a branch. Meanwhile, the original main file may contain the wrong commit hash, pointing to a state from the other device. Your branch history is silently forked in a way git cannot detect or repair.
4. Pack file corruption
Pack files are git's compressed storage format. A single .pack file can contain thousands of objects, many stored as deltas (differences against other objects in the same pack). Each pack file has a corresponding .idx file that maps object hashes to byte offsets. If the backup captures the .pack file at one point in time and the .idx file at another (because objects were being repacked between the two copies), the index points to the wrong byte offsets. Every object lookup fails. The repository is unrecoverable from that backup without manual surgery.
5. Index conflicts
The git index (.git/index) is a single binary file that tracks the staging area. It maps every tracked file to its current blob hash, file mode, and stat data. When a backup tool copies this file while git is writing to it, the result is a truncated or partially-written binary file. Git will refuse to read a corrupt index, which means you cannot run git status, git diff, or git commit after restoring from that backup. The fix is to delete the index and rebuild it from HEAD, which loses all your staged changes.
All five of these failures are silent. The backup tool reports success. The backup file exists. You do not discover the corruption until you try to restore, which is the exact moment you cannot afford to discover it.
1. Backup catches your project mid-save
When your code project saves changes, it writes several files in sequence. If your backup runs at that exact moment, it catches some files in the old state and some in the new state. Neither version is complete.
2. Cloud sync fights with your project
Dropbox and iCloud try to sync files the instant they change. Your code project also needs to update those same files. Both tools grab for the same files at the same time, and one of them loses. Sometimes your project shows an error. Sometimes the file gets quietly scrambled.
3. iCloud renames your work
When iCloud sees the same file changed on two devices, it does not merge the changes. It creates a second copy with a number added to the name. Your code project does not recognize the renamed copy, and the original may now contain outdated information.
4. Compressed storage gets corrupted
Your code project stores older history in a compressed format that uses two paired files. If the backup copies one file before the other gets updated, the pair no longer matches. The compressed history becomes unreadable.
5. Your staging area breaks
Your project keeps a single file that tracks what you are about to save next. If the backup copies that file while it is being written, the copy is incomplete. After restoring, your project cannot tell what state your files are in.
What this means
All five of these problems are invisible. Your backup tool will say it succeeded. The backup file will exist. You will not find out anything is wrong until you try to restore, which is the worst possible time to discover it.
1. Mid-operation capture
Failure pattern
git gc, git repack, git rebase move objects between loose storage and pack files over multi-second windows. A filesystem snapshot during this window captures a state where objects exist in neither location. This is a dirty read with no rollback.
2. Lockfile races
Failure pattern
Git uses atomic rename via lockfiles (.git/index.lock). Cloud sync services hold file handles on these lockfiles, preventing the atomic rename from completing. Result: corrupted index or "Unable to create lock file" errors in CI runners and agent workspaces.
3. Ref pointer overwrites
Failure pattern
iCloud resolves filename conflicts by appending numeric suffixes: main 2. Git does not recognize suffixed refs. The original ref may contain a stale commit hash from a different device. Branch history forks silently with no merge path.
4. Pack file corruption
Failure pattern
Pack files (.pack) and their indexes (.idx) are a paired data structure. Delta-compressed objects reference byte offsets. If the backup captures the pair at different times during a repack, the index maps to invalid offsets. Every object lookup fails. The repository is unrecoverable without manual surgery.
5. Index corruption
Failure pattern
The git index (.git/index) is a single binary file mapping tracked files to blob hashes and stat data. A partial copy produces a truncated binary that git refuses to parse. Recovery requires deleting the index and rebuilding from HEAD, losing all staged changes.
All five failure modes are silent. The backup tool reports success. The corruption surfaces only at restore time. For automated systems, this means disaster recovery procedures that have never been validated against actual repository integrity.
Real examples from real developers
These are not theoretical failure modes. They happen to developers regularly. I tested them. Here is what I found.
iCloud renames your branches
Place a git repository in iCloud Drive. Work on it from two Macs. Within hours, you will find files like .git/refs/heads/main 2 and .git/refs/heads/develop 2 in your refs directory. iCloud's conflict resolution creates these renamed files silently. Git ignores them (they are not valid ref names), but the original refs may now contain stale data from whichever device wrote last. Run git log and you may see a different history than you expect. Run git push and you may overwrite remote work with a stale local state.
main
main 2
develop
develop 2
# iCloud conflict resolution. Git does not know
# "main 2" exists. The original "main" may be stale.
Dropbox sync conflicts in .git/objects/
Dropbox monitors file changes and syncs them across devices. When git writes loose objects to .git/objects/, Dropbox tries to sync each one as it appears. On a large commit that creates many objects, Dropbox can fall behind and create sync conflict files. It can also attempt to sync a partially-written object before git has finished flushing it to disk. The result: objects that fail SHA verification. Git's response to a corrupted object is to refuse to read anything that depends on it, which can cascade through your entire commit history.
Time Machine captures mid-gc state
Time Machine takes hourly snapshots. A git gc on a moderately large repository takes 5 to 30 seconds. If Time Machine's snapshot lands during that window, it captures a .git directory where some objects exist only in the old loose format, some exist only in the new pack file, and some are in the process of being deleted. Restoring from this snapshot gives you a repository with "missing objects" errors. The objects are not missing. They were simply in transit between two storage formats when the snapshot fired.
$ git fsck --full
error: object file .git/objects/3a/8f... is empty
error: object file .git/objects/7c/12... is empty
fatal: loose object 3a8f... (stored in .git/objects/3a/8f...) is corrupt
# These objects were being repacked when the snapshot fired.
# The old loose files were already truncated. The new pack
# file was not yet complete. Both copies are unusable.
These are not guesses. These are things that actually happen to people. Here is what they look like.
iCloud renames your work
What happens
If you work on the same project from two Macs with iCloud, it creates duplicate files with numbers added to the name (like "main 2"). Your code project does not recognize these renamed files. The original file might now point to old, outdated information from the other device.
Dropbox creates conflict files
What happens
Dropbox tries to sync your project files as soon as they change. When your code project is saving multiple files at once, Dropbox can grab a file before it is fully written. The result: files that fail verification checks, which can cascade and make your entire project history unreadable.
Time Machine catches a bad moment
What happens
Time Machine takes snapshots every hour. If it captures your project while it is reorganizing its internal storage (which can take 5 to 30 seconds), the snapshot contains files that were in the middle of being moved. When you restore, your project reports "missing" files that are not actually missing. They were just being relocated when the snapshot fired.
Verified failure cases, tested in controlled environments.
iCloud: ref namespace pollution
Observable signal
iCloud conflict resolution appends numeric suffixes to ref files: refs/heads/main 2. Git ignores invalid ref names. The original main ref may contain a commit hash from a different device. Branch history silently diverges with no detectable merge base.
Dropbox: object integrity failure
Observable signal
Dropbox syncs individual object files as they are written to .git/objects/. During multi-object commits, partially-written objects are synced before flush completes. Objects fail SHA verification. Git refuses to traverse any commit that references a corrupt object, cascading through the DAG.
Time Machine: mid-gc snapshot
Observable signal
Hourly snapshots intersect with git gc windows (5-30s on moderate repos). The snapshot captures objects mid-migration between loose storage and pack files. Restored repositories report empty object files and corrupt loose objects. git fsck --full confirms the damage but cannot repair it.
Systemic implication: These are not edge cases. Any repository stored in a cloud-synced directory will encounter at least one of these failure modes over a sufficient time window. The probability increases with repository size, commit frequency, and number of devices.
What a git-aware backup looks like
The solution is not "better filesystem backup." The solution is a backup tool that understands git's internal consistency model.
A git-aware backup tool does not copy files one by one from the .git directory. Instead, it reads from git's own internal structures. It asks git for the objects, the refs, the commit graph. It creates a consistent snapshot that represents a valid point-in-time state of the repository, regardless of what operations are in progress on the filesystem.
The distinction matters. A filesystem backup copies the container. A git-aware backup reads the content through git's own API. The container can be in a temporary, inconsistent state. The content, read through git, is always consistent.
This is the same principle that makes pg_dump reliable while copying PostgreSQL's data directory is not. The dump tool understands the database. It reads through the database's own interfaces. It produces a consistent output regardless of concurrent activity. A git-aware backup tool does the same thing for repositories.
What does this look like in practice? A git-aware backup tool should:
- Read through git, not the filesystem. Use git's own internal commands to extract objects, refs, and state. Never copy raw files from
.git/. - Produce a single, atomic output. The backup is either complete and valid, or it does not exist. No partial captures. No half-written states.
- Tolerate concurrent git operations. If the developer is running
git gcorgit rebasewhile the backup runs, the backup still produces a consistent snapshot. - Capture uncommitted work. Committed history is important, but the work you lose in a crash is always the work you have not committed yet. A good backup captures staged changes, working tree modifications, untracked files, and stashes.
- Verify before storing. The backup should be validated as restorable before it is considered complete. A verified restore proves the backup works. A file size check does not.
This is what DevSafe does. It is git-aware by design. It reads from git's internal structures, not the filesystem. It creates a consistent, encrypted, verified backup regardless of what operations are in progress. And it captures uncommitted work without creating commits in your repository.
The fix is not a better version of Time Machine or Dropbox. The fix is a backup tool that actually understands how code projects work.
Good news
Instead of copying files one by one, a git-aware tool reads your project through git's own internal system. It asks git directly: "What are all the commits, branches, and changes?" and creates a single, complete snapshot. It does not matter if something is being reorganized in the background. The snapshot is always consistent.
A good git-aware backup should:
- Read your project through git's own tools, not by copying files
- Create a single backup file that is either complete or does not exist (no half-finished backups)
- Work even while you are actively saving or reorganizing your project
- Capture your unsaved work (not just what you have committed)
- Verify the backup actually works before calling it done
This is what DevSafe does. It reads from your project's internal structure, creates encrypted verified backups, and captures your unsaved work without interfering with your workflow.
The solution is not a more frequent or more granular filesystem backup. The solution is a backup tool that operates at the git abstraction layer, not the filesystem layer.
Observable signal
A git-aware backup reads from the object database using git plumbing commands. It produces a self-consistent snapshot (git bundle) that represents a valid repository state regardless of concurrent operations. The pg_dump analogy is exact: read through the application's own transactional interface, not the storage layer.
Requirements for a git-aware backup system in production environments:
- Read through git plumbing, never raw filesystem traversal of
.git/ - Atomic output: the backup is valid or does not exist. No partial states.
- Concurrent-operation tolerance: safe during
gc,repack,rebase - Five-namespace capture: index, working tree, untracked, stash, operation state
- Pre-store verification: prove restorability before marking complete
For CI/CD and agent workspaces: DevSafe reads from git's internal structures, produces encrypted verified bundles, and captures uncommitted state without creating commits. It is designed for environments where repository integrity is a pipeline dependency.
Check your backup right now
Open a terminal. Run this:
# Are any of your repos in a cloud sync folder?
$ find ~/Library/Mobile\ Documents \
~/Dropbox \
~/OneDrive \
-name ".git" -type d 2>/dev/null
# If this returns anything, those repos are at risk.
If that command finds any .git directories, you have repositories sitting in cloud sync folders. Every one of them is exposed to the corruption mechanisms described in this post. The sync service is actively monitoring those files, competing with git for file access, and applying conflict resolution rules that git does not understand.
This is not a theoretical risk. It is an active one. The longer a repository sits in a cloud sync folder, the higher the probability that a background sync, a garbage collection, or a concurrent edit triggers one of the five failure modes above. You may already have a corrupted backup and not know it. You will not know it until the day you need to restore.
Git's own FAQ warns against storing repositories in cloud sync folders. This is a known problem, documented by the git project itself. Cloud sync tools were not designed to handle transactional databases. Using them as git backup is not a supported use case.
What to do about it
- Run devsafe scan (free). It finds every git repository on your machine and flags the ones in cloud sync danger zones. Takes about 30 seconds.
- Read How Cloud Sync Services Destroy Git Repositories. The full technical breakdown of all the corruption mechanisms, with test results.
- Move your repos out of sync folders. Keep your repositories in
~/Projectsor~/src, not in iCloud Drive, Dropbox, or OneDrive. - Use a git-aware backup tool. Filesystem backup is not git backup. Use a tool that understands git's consistency model and creates verified, encrypted snapshots.
Your code deserves better than a backup that might be corrupt. Check yours today.
You might have code projects sitting in a cloud sync folder right now without realizing it. Here is how to find out.
Quick check
Run devsafe scan (free). It finds every code project on your machine and flags the ones sitting in danger zones like iCloud Drive, Dropbox, or OneDrive. Takes about 30 seconds.
If any of your projects are in cloud sync folders, they are exposed to the corruption problems described above. The sync service is actively monitoring those files, competing with your code tools for access, and applying conflict resolution that your code tools do not understand.
What to do
Move your projects to a regular folder like ~/Projects that is not synced by iCloud, Dropbox, or OneDrive.
Read How Cloud Sync Services Destroy Git Repositories for the full breakdown.
Use a git-aware backup tool that understands your code projects and creates verified, encrypted backups.
Your code deserves better than a backup that might be broken. Check yours today.
Immediate diagnostic for any development environment, CI runner, or agent workspace.
Diagnostic
Scan for repositories in cloud-synced directories: ~/Library/Mobile Documents (iCloud), ~/Dropbox, ~/OneDrive. Any .git directory found in these paths is at risk. Use devsafe scan (free) for automated detection and risk assessment.
Repositories in cloud sync folders are under active contention. The sync service monitors file changes, competes for file handles, and applies conflict resolution rules that violate git's transactional model. Probability of corruption increases with repository size, commit frequency, and device count.
Remediation
1. Relocate repositories to non-synced paths (~/Projects, ~/src).
2. Replace filesystem backup with git-aware backup that reads from the object database.
3. Validate existing backups with git fsck --full on restored copies. Do not assume backup integrity from file existence.
For MCP servers and agent orchestration: if your workspace recovery depends on filesystem backup, your disaster recovery is unverified. Test a restore. Run git fsck. The result will tell you whether your backup strategy is real or assumed.
Frequently asked questions
Does Time Machine back up git repositories correctly?
No. Time Machine copies files one at a time without understanding the relationships between git's internal structures. A .git directory is a transactional database where refs, pack files, and indexes depend on each other. Copying them independently creates a snapshot that looks complete but contains broken cross-references, making the repository unreadable.
Why does Dropbox corrupt .git folders?
Dropbox syncs files individually as they change, but git operations write multiple interdependent files in sequence. When Dropbox captures a ref update before the corresponding object is written, or syncs a pack file without its matching .idx index, the result is a structurally broken repository. Dropbox also creates conflict files inside .git that confuse git's internal state.
What is a git-aware backup and how is it different from filesystem backup?
A git-aware backup reads from git's object database using git's own tools (like git bundle) instead of copying files from the .git directory. This produces a self-consistent snapshot that respects the relationships between objects, refs, and pack files. Unlike filesystem backup, a git-aware backup is immune to partial-write corruption because it reads from git's transactional layer, not the raw filesystem.
Does Time Machine back up code projects correctly?
No. Time Machine copies files one at a time without understanding how they connect to each other. Your code project's internal tracking system has files that depend on each other. Copying them separately creates a backup that looks complete but is broken inside.
Why does Dropbox cause problems with code projects?
Dropbox syncs files individually as they change. But when your code project saves, it writes several connected files in sequence. Dropbox can grab one file before the matching file is ready, which breaks the connection between them. It also creates conflict files that confuse your code tools.
What is a git-aware backup?
Instead of copying files from your project's hidden folder, a git-aware backup reads from your project's internal tracking system using the project's own tools. This creates a backup that is always consistent and complete, no matter what else is happening on your computer at that moment.
Does Time Machine produce valid git backups?
No. Time Machine performs file-level snapshots without transactional awareness. Git's internal structures (refs, pack files, indexes) have referential integrity constraints that span multiple files. Independent file copies produce a snapshot with broken cross-references. The repository structure appears intact. The object graph is not.
Why does Dropbox corrupt .git directories?
Dropbox syncs files individually on mtime change. Git operations write multiple interdependent files in sequence. Dropbox captures ref updates before corresponding objects are written, syncs pack files without matching .idx files, and creates conflict files inside the .git namespace that pollute git's state. Each mechanism independently produces corruption.
What distinguishes git-aware backup from filesystem backup?
Abstraction layer. Filesystem backup operates below git's transactional boundary. Git-aware backup reads through git's own plumbing commands (git bundle, git rev-list), producing a self-consistent snapshot that respects the object DAG. Concurrent git operations do not affect output consistency. The backup is verified as restorable before storage.