Repairing Git commit graphs

Recently a GitLab instance I maintain suddenly alerted me to a repository which didn't pass a git fsck. While investigating I saw that git pretty much refused to operate on this repository completely. Even a git log quickly resulted in fatal: commit-graph requires overflow generation data but has none. So something in the commit graph is broken. What is the commit graph? Its documentation describes it as follows:

The commit-graph file is a supplemental data structure that accelerates commit graph walks. If a user downgrades or disables the core.commitGraph config setting, then the existing ODB is sufficient.
...
The commit-graph file stores the commit graph structure along with some extra metadata to speed up graph walks.

So it is basically a persisted cache to speed up certain operations. There is even a command to write a new commit graph,  git commit-graph write. Theoretically this should write a new, valid commit graph. But sadly it fails with the exact same error about the existing commit graph being broken. It appears that we need to get rid of the broken commit graph before generating a new one as the broken one interferes with the generation of the new one.

After some investigation into how Git stores its commit graphs there are two ways they can be stored in a repository (the .git directory unless the repository is bare). There can either be a single commit graph stored at objects/info/commit-graph or a newline-separated list of commit graph hashes at objects/info/commit-graphs/commit-graph-chain. By removing both of those files (most likely only one of them will exist) Git will no longer have a commit graph cache and should work normally again. Optionally you can now write a new commit graph with git commit-graph write, but for example GitLab automatically creates a new commit graph for you when running housekeeping for a repository.

Annex: Investigating the broken commit graph

The affected repository was stored on ZFS so filesystem corruption is very unlikely. When looking at where the error is triggered in Git source code, it appears that Git is hitting a commit date offset with the CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW flag set, which according to its commit message means that the commit date offset exceeds 2³¹ seconds (~68 years). These date offsets are stored in a special block (GDOV) which is not present in the broken commit graph. This means that Git cannot recover the original commit date offset as it would be stored in the GDOV block and that's why it aborts.

Considering that this is a fairly normal repository it seems unlikely that such a date offset would be present and a quick check of all commits confirms that indeed no such offset is present. Also the new commit graph has no GDOV block and works just fine. So what happened? A binary diff of the broken and the new commit graph shows little differences except for a section which is empty (all zero bytes) in the new commit graph and just counts up from 0x800000 bytes (2^31 in binary) to 0x8000009D with a few 0x00000000 sprinkled in between. Sadly the binary format is relatively compact and we cannot get Git to decode it easily because it's broken so I do not know exactly what that section was supposed to be. But it seems clear that Git was interpreting the 31st bit as the offset overflow flag.

I could investigate further and parse the file fully, but I think I'll end it here. It seems likely that this was either caused by a freak accident (bit flip, ...) or a bug in an older version of Git when writing the commit graph.

If anyone else wants to parse the files I've attached both the newly-written and the broken graph files.

Update: Root Cause found

Shortly after this article was written, Will Chandler on the Git mailing list figured out the actual cause of this issue. See https://public-inbox.org/git/DD88D523-0ECA-4474-9AA5-1D4A431E532A@wfchandler.org/ for his write-up. It turned out to be a bug in Git where an upgrade of the commit graph data from v1 to v2 caused an underflow, which flips the overflow flag making the commit graph unreadable.