How Git Clone Actually Works

Despite my last post about blogging about more normal topics, I came across another weird edge case today that had me googling "how does git clone work" and discovering that most of the documentation on the topic is either wrong or incomplete in ways that probably won't matter to most of you, but did matter for me today.

Cloning a repository in Git comprises of a few different operations:

Initialise the repository.
Add the remote server as a remote named "origin"
Fetch all of the branch heads from the remote
Find the default branch and perform a checkout of that branch

This is usually summarised as:

git init
git remote add origin $REMOTE_URL
git fetch --all
git checkout main

Git is smart enough to figure out what remote's default branch is, so the checkout can pick main or master or something else entirely, depending on your configuration.

It turns out that step (3) above is subtly wrong in all of the posts I have ever read. git fetch or git fetch --all is not the same as what happens in a clone. Instead, git clone calls into git fetch-pack or git index-pack, which fetches packed refs.

Packed refs live in .git/packed-refs rather than .git/refs/remotes/origin/*.

Now you might be wondering, מאי נפקא מינה? What is the practical difference here?

Usually, nothing. But consider the following scenario:

The $REMOTE_URL host is a Linux machine with a case-sensitive filesystem, or is backed by a database that does not care about filesystem semantics (GitHub, Azure DevOps, etc.).
There is a branch on the remote named FOO/abc.
There is also a branch on the remote named foo.
The machine performing the clone/fetch operation has a case-insensitive filesystem.

Oh no. Oh dear.

In this case, git clone will succeed, and .git/packed-refs will look something like this:

aaaf6de65826aa995773f7034b0766c20edbf062 FOO/abc
dc55f0a5ac13219830235c0c1ffbad0415fc9f5e foo

and .git/refs/remotes will look like this:

$ find .git/refs/remotes
.git/refs/remotes
.git/refs/remotes/origin
.git/refs/remotes/origin/HEAD

On the other hand, if we were to try git fetch then this would fail:

* [new branch]      FOO/abc    -> origin/FOO/abc
error: cannot lock ref 'refs/remotes/origin/foo': there is a non-empty directory '.git/refs/remotes/origin/foo' blocking reference 'refs/remotes/origin/foo'
 ! [new branch]      foo        -> origin/foo  (unable to update local ref)
 * [new branch]      master     -> origin/master

Linux users will by default never see this, but macOS and Windows clients will suffer.

And this is the difference today that I discovered between fetching with git clone and fetching with git fetch, after way too much debugging and hair-pulling.

How Git Clone Actually Works

ECDSA Signatures: OpenSSL vs .NET

On The Value of Blogging