A new engineer joins the team on Monday morning. By 11am she's still waiting on git lfs pull to finish downloading 12 GB of training-data fixtures and model checkpoints that the rest of the team needs to run the test suite. The clone itself took three minutes. The LFS fetch has been running for an hour. Her welcome message in chat is "is this normal?" The team's answer, slightly embarrassed: "yeah, that's just how Mondays work."
This post describes the pattern that some teams have quietly moved to instead. Store the large binaries in a cloud bucket, mount the bucket inside the repo at the path the code expects, .gitignore the mount, and treat the binaries as normal local files. No LFS server, no LFS bandwidth fees, no broken working copies. The new engineer's clone takes three minutes total and she's running the test suite by 9:15.
The setup, end to end
The pattern is simpler than it sounds. Most of the setup is in the choice of where to put the bucket; the rest is twenty lines of config.
Step 1 — the bucket. Pick a cloud provider. The cheap object stores (Wasabi, Backblaze B2, Cloudflare R2) work great for this and won't charge for the bandwidth the team's daily reads will generate. S3 works too if you're already in the AWS ecosystem. Google Drive works if the binaries are more team-asset shaped than production-data shaped — Drive's per-user disk-free model and its sharing semantics fit some teams better than an object store does.
Create one bucket — team-name-assets or whatever your team's naming convention says — and lay out the directories the way the repo expects to find them. If your repo references assets/models/, assets/data/, assets/videos/, those become folders inside the bucket.
Step 2 — the repo layout. In the repo, the path the code references becomes a mount point. Concretely:
- The repo has an empty directory at
assets/(or wherever the binaries live). - The repo's
.gitignoreincludesassets/so git never tracks the contents. - The repo's README has a section that says "before running the test suite, mount the assets bucket at
assets/."
That's all the repo changes. Code that references assets/models/foo.bin keeps working because, at runtime, assets/models/foo.bin is a real file that resolves through the mount.
Step 3 — the mount. Each developer installs ExpanDrive once, then adds a connection to the team bucket once. ExpanDrive presents the bucket as a drive on their machine; they bind-mount or symlink the relevant subdirectory of that drive into the repo's assets/ path. On macOS and Linux a symlink is the simplest version; on Windows you can use the mklink /D equivalent.
Step 4 — the build pipeline. CI runners mount the same bucket the same way. Most CI systems (GitHub Actions, GitLab CI, Buildkite, CircleCI) support installing arbitrary userspace agents in a setup step. ExpanDrive has an exfs CLI that handles the headless-mount case. The CI step looks like exfs mount s3://team-assets /opt/assets at the start of the job and exfs unmount at the end. Builds read from the mount the same way developer laptops do.
Step 5 — onboarding. New engineer joins. Day one looks like:
git clonethe repo. Three minutes.- Install ExpanDrive. Two minutes.
- Add the team bucket connection (copy the credentials from a 1Password share). One minute.
- Symlink the mount into the repo. One command, run from the repo root.
- Run the test suite. Files stream from the cloud as the tests reference them.
Total time to productive: ten minutes. Compare to git LFS pull running for an hour and the comparison is most of the argument.
What this beats: git-LFS, concretely
Git LFS is the obvious comparison, and it's the comparison this pattern is designed to win.
Bandwidth costs. GitHub LFS includes 1 GB of storage and 1 GB of bandwidth per month per repo on the free tier, then charges $5 per 50 GB of bandwidth above that. A team of 10 engineers each pulling 12 GB of binaries a week will burn through the included quota immediately and start paying real money. The cloud bucket approach uses storage you're already paying for, and providers like Wasabi don't charge for the reads at all.
Self-hosting LFS. Yes, you can run your own LFS server. The git-lfs-test-server project is a reference implementation, not a production tool. Maintaining a real LFS server (auth, garbage collection, repair, backup, scaling) is a real engineering project that nobody on a product team actually wants to own.
Working-copy corruption. When an LFS fetch fails mid-stream — network blip, server timeout, disk full — the working copy ends up with placeholder text files where the binaries should be. Running the test suite against placeholders produces confusing failures ("the model file is 132 bytes, why is it not loading?"). The fix is git lfs fetch --all, which then re-runs the same failure mode under load. Teams have lost half-days to this.
The git lfs command surface. track, untrack, pull, fetch, prune, push, migrate, ls-files. Nobody on the team remembers all of these correctly. Half the team has at some point typed git rm on an LFS-tracked file and accidentally deleted the original. The LFS surface is its own learning curve on top of git, and the failure modes when someone gets it wrong are bad.
The repo-history pollution. LFS pointer files are checked into git history. Once a file is LFS-tracked, removing the pointers cleanly is non-trivial. Migrating off LFS later is a git filter-repo job, which means a force-push, which means coordinating with everyone who has the repo cloned. Teams that adopt LFS hastily and want to back out a year later discover that the back-out is expensive.
The mount pattern has none of these problems. The binaries aren't in git history at all, the cloud provider handles transfer, working copies don't corrupt, and there's no special command surface for the team to learn.
What this looks like across roles
A few concrete role-by-role pictures:
ML team with model checkpoints. Checkpoints are large (50 MB to several GB each), they're produced by training runs, and they need to be referenced by inference code. The bucket holds a checkpoints/ directory organized by experiment ID. The repo's eval scripts read from assets/checkpoints/<experiment>/<version>/. When a new checkpoint is trained, the training job writes it directly to the bucket via the cloud provider's SDK. Every developer's mount sees the new checkpoint within seconds, no git lfs pull step needed.
Game studio with art assets. Texture files, audio samples, mesh files, animation clips. The repo has the game engine's code; the bucket has the assets. Artists work directly against the bucket via the mount; programmers build against whatever's in the bucket at any given moment. Atomic-asset-replacement is a non-feature because nobody is trying to atomically swap two interdependent files mid-build. The mount is what makes the artist workflow and the programmer workflow share a single source of truth without either of them having to think about git.
Web team with media assets. Video, hero images, large PDFs that ship as part of the marketing site. The bucket holds the media; the repo holds the HTML and the build config. The deploy pipeline reads media from the bucket and copies it to the CDN. The dev environment reads from the same bucket so local previews show real assets. No LFS, no Git anywhere near the media files.
Honest about the edges
This isn't a perfect pattern. A few real trade-offs:
File-level versioning lives in the cloud, not in git. S3 and Drive have their own version histories. Git diff doesn't work on binary files anyway, so the loss is mostly notional, but if you've been treating LFS as a way to roll back to "the binary from three commits ago," that workflow doesn't exist here. You roll back through the cloud provider's UI, or through a versioning convention in the bucket layout (models/v1/, models/v2/, etc.).
Branch-specific large files don't have a clean answer. If feature/x references a different model than main, both branches' code will resolve to the same file in the bucket. The workaround is branch-aware folders in the bucket — assets/branch/feature-x/models/ and assets/branch/main/models/ — and a .envrc-style mechanism that points the symlink at the right subfolder based on the branch. It works, but it's a convention the team has to maintain and document. Most teams don't actually have branch-specific binaries; they have main-line binaries that update over time, which the simple pattern handles.
Backup discipline shifts from git to the cloud provider. Git's distributed model means every developer's machine is a backup of the code. The bucket isn't backed up by every developer's machine. Whoever owns the cloud account also owns the backup story — bucket versioning enabled, lifecycle policies preserving old versions, a disaster-recovery plan for the bucket itself. Most teams already have this for their other cloud infrastructure; it's the same playbook applied to a new bucket.
Access control gets coarser. Git LFS inherits its access control from the git host — anyone with repo access can read LFS objects. The bucket has its own access control (IAM, sharing settings). The bucket-level controls are usually fine, but they're not the same as the repo controls, so a team member with repo read access doesn't automatically have bucket read access. Onboarding is one extra step.
Some content genuinely belongs in git. Source-controlled config files, schema migrations, fixture files small enough to read with cat. The mount pattern is for binaries the team can't usefully diff and won't usefully version-by-commit. Use git for the things git is good at; the mount handles the rest.
Where this pattern doesn't fit
Three cases where you should keep using LFS or something else:
Distributed-team requirement for offline reads. A team where engineers work in environments without reliable cloud connectivity (some research environments, some defense contexts) genuinely needs the binaries cloned locally. LFS or git annex is the right answer there.
Small repos where LFS overhead is fine. A repo with 200 MB of LFS-tracked files and a team of three doesn't have the pain that motivates this post. The mount pattern adds a setup step; LFS doesn't. If the pain isn't there, don't add infrastructure to solve a problem you don't have.
Compliance environments that need git-tracked everything. Some regulated workflows require every byte of every artifact to be in version control. The mount pattern explicitly takes binaries out of git history, which is a non-starter for those teams. LFS or a sealed-archive workflow is the right shape there.
What to do next
If you're running git LFS today and onboarding day takes an hour, try the mount pattern on one large file. Pick a single asset (the largest LFS-tracked file in the repo), move it to a cloud bucket, mount the bucket, point the code at the mount path, remove the LFS tracking. Time the onboarding for the next new engineer. If it's faster, expand.
Download ExpanDrive — same install on Mac, Windows, and Linux. The major cloud backends are all supported: S3, Wasabi, Backblaze B2, Google Drive, OneDrive, Box, Dropbox, SFTP. S3 is the safest default; Wasabi is the cheapest at-scale; Drive is the right pick for teams whose assets are already living there.