Package managers keep using git as a database, it never works out

lemmydividebyzero@reddthat.com · 4 hours ago

Package managers keep using git as a database, it never works out

cadekat@pawb.social · edit-2 3 hours ago

They aren’t using git as a database, they’re using it as revision history. The database is whatever they decide to store in git. For crates.io, for example, they use JSON files in directories.

If you put an sqlite database in git, you are wouldn’t say “git is the database”, and that’s true here too.

That said, yeah, you shouldn’t roll your own database. Take your source code (JSON from crates.io) from git, and compile it into an sqlite file (for example) for download.

tal@lemmy.today · edit-2 3 hours ago

GitHub explicitly asked Homebrew to stop using shallow clones. Updating them was “an extremely expensive operation” due to the tree layout and traffic of homebrew-core and homebrew-cask.

I’m not going through the PR to understand what’s breaking, since it’s not immediately apparent from a quick skim. But three possible problems based on what people are mentioning there.

The problem is the cost of the shallow clone

Assuming that the workload here is always --depth=1 and they aren’t doing commits at a high rate relative to clones, and that’s an expensive operation for git, I feel like for GitHub, a better solution would be some patch to git that allows it to cache a shallow clone for depth=1 for a given hashref.

The problem is the cost of unshallowing the shallow clone

If the actual problem isn’t the shallow clone, that a regular clone would be fine, but that unshallowing is a problem, then a patch to git that allows more-efficient unshallowing should be a better solution. I mean, I’d think that unshallowing should only need a time-ordered index of commits referenced blobs up to a given point. That shouldn’t be that expensive for git to maintain an index of, if it doesn’t already have it.

The problem is that Homebrew has users repeatedly unshallowing a clone off GitHub and then blowing it away and repeating

If the problem is that people keep repeatedly doing a clone off GitHub — that is, a regular, non-shallow clone would also be problematic — I’d think that a better solution would be to have Homebrew do a local bare clone as a cache, and then just do a pull on that cache and then use it as a reference to create the new clone. If Homebrew uses the fresh clone as read-only and the cache can be relied upon to remain, then they could use --reference alone. If not, then add --dissociate. I’d think that that’d lead to better performance anyway.