Package managers keep using git as a database, it never works out

lemmydividebyzero@reddthat.com · 4 hours ago

Package managers keep using git as a database, it never works out

tal@lemmy.today · edit-2 3 hours ago

GitHub explicitly asked Homebrew to stop using shallow clones. Updating them was “an extremely expensive operation” due to the tree layout and traffic of homebrew-core and homebrew-cask.

I’m not going through the PR to understand what’s breaking, since it’s not immediately apparent from a quick skim. But three possible problems based on what people are mentioning there.

The problem is the cost of the shallow clone

Assuming that the workload here is always --depth=1 and they aren’t doing commits at a high rate relative to clones, and that’s an expensive operation for git, I feel like for GitHub, a better solution would be some patch to git that allows it to cache a shallow clone for depth=1 for a given hashref.

The problem is the cost of unshallowing the shallow clone

If the actual problem isn’t the shallow clone, that a regular clone would be fine, but that unshallowing is a problem, then a patch to git that allows more-efficient unshallowing should be a better solution. I mean, I’d think that unshallowing should only need a time-ordered index of commits referenced blobs up to a given point. That shouldn’t be that expensive for git to maintain an index of, if it doesn’t already have it.

The problem is that Homebrew has users repeatedly unshallowing a clone off GitHub and then blowing it away and repeating

If the problem is that people keep repeatedly doing a clone off GitHub — that is, a regular, non-shallow clone would also be problematic — I’d think that a better solution would be to have Homebrew do a local bare clone as a cache, and then just do a pull on that cache and then use it as a reference to create the new clone. If Homebrew uses the fresh clone as read-only and the cache can be relied upon to remain, then they could use --reference alone. If not, then add --dissociate. I’d think that that’d lead to better performance anyway.