Despite my last post about blogging about more normal topics, I came across another weird edge case today that had me googling "how does git clone work" and discovering that most of the documentation on the topic is either wrong or incomplete in ways that probably won't matter to most of you, but did matter for me today.
Cloning a repository in Git comprises of a few different operations:
- Initialise the repository.
- Add the remote server as a remote named "origin"
- Fetch all of the branch heads from the remote
- Find the default branch and perform a checkout of that branch
This is usually summarised as:
git init
git remote add origin $REMOTE_URL
git fetch --all
git checkout main
Git is smart enough to figure out what remote's default branch is, so the checkout
can pick main
or master
or something else entirely, depending on your configuration.
It turns out that step (3) above is subtly wrong in all of the posts I have ever read. git fetch
or git fetch --all
is not the same as what happens in a clone. Instead, git clone
calls into git fetch-pack
or git index-pack
, which fetches packed refs.
Packed refs live in .git/packed-refs
rather than .git/refs/remotes/origin/*
.
Now you might be wondering, מאי נפקא מינה? What is the practical difference here?
Usually, nothing. But consider the following scenario:
- The
$REMOTE_URL
host is a Linux machine with a case-sensitive filesystem, or is backed by a database that does not care about filesystem semantics (GitHub, Azure DevOps, etc.). - There is a branch on the remote named
FOO/abc
. - There is also a branch on the remote named
foo
. - The machine performing the
clone
/fetch
operation has a case-insensitive filesystem.
Oh no. Oh dear.
In this case, git clone
will succeed, and .git/packed-refs
will look something like this:
aaaf6de65826aa995773f7034b0766c20edbf062 FOO/abc
dc55f0a5ac13219830235c0c1ffbad0415fc9f5e foo
and .git/refs/remotes
will look like this:
$ find .git/refs/remotes
.git/refs/remotes
.git/refs/remotes/origin
.git/refs/remotes/origin/HEAD
On the other hand, if we were to try git fetch
then this would fail:
* [new branch] FOO/abc -> origin/FOO/abc
error: cannot lock ref 'refs/remotes/origin/foo': there is a non-empty directory '.git/refs/remotes/origin/foo' blocking reference 'refs/remotes/origin/foo'
! [new branch] foo -> origin/foo (unable to update local ref)
* [new branch] master -> origin/master
Linux users will by default never see this, but macOS and Windows clients will suffer.
And this is the difference today that I discovered between fetching with git clone
and fetching with git fetch
, after way too much debugging and hair-pulling.