\

High Performance Git

151 points - today at 12:32 AM

Source
  • nananana9

    today at 7:46 AM

    Git is industry standard, because for what it give you it's a remarkably robust and simple program to use. We're all vaguely aware that the internals are complex, but the UX is clean and usable enough that the complexity usually doesn't leak out.

    But the day this breaks down and I have to deal with bloom filters, packfiles, maintaining the git garbage collector or rerere cleanup, is the day I switch our codebase to a centralized VCS.

    This stuff is cool to learn about; but it's 5 layers removed from anything I want to be thinking about in my day to day work.

      • codesnik

        today at 9:17 AM

        i think it is the other way around. Git is pretty simple internally, and its ui is just knobs and levers to reach into that simple reliable internal structure. This is why for some people it seems like a mess - they want button "do what I want" (and all people and their needs are different), and for other people it's clean - open the throttle, engine will rev.

        • thfuran

          today at 8:17 AM

          I'm pretty sure git is industry standard almost entirely entirely because GitHub exists. And I very much disagree that the UX is clean. The cli is more than a bit of a mess.

            • stingraycharles

              today at 8:24 AM

              > I'm pretty sure git is industry standard almost entirely entirely because GitHub exists.

              Nah, I remember that time vividly, Github became a thing about a year or two after it was already very much taking the lead.

              GitHub became GitHub because git was the winner. There were alternative hubs that supported bazaar and mercurial and whatnot, but git won because for most people, Linus and the kernel team being behind it was reason enough to trust it.

              (and I say this as someone who liked hg more than git)

              • BerislavLopac

                today at 8:21 AM

                Anyone who has ever used Mercurial knows very well what a good versioning tool UX looks like...

                  • windward

                    today at 9:38 AM

                    No. When I left a job using Mercurial, I made a vow never to start a job that used it again. And that employer was seeking to move on from it.

                    • miroljub

                      today at 8:28 AM

                      > Anyone who has ever used Mercurial knows very well what a good versioning tool UX looks like...

                      So true. I used Mercurial back in the day and also used Darcs before it, and it helped me realize that the best versioning tool UX that exists is still the one Git provides.

                      PS: Also CVS, SVN, Perforce, and Clear Case professionally, and gave a try to Fossil. None of them even close to Git usability-wise.

          • alxgsv

            today at 8:40 AM

            I never faced git performance issues when working with code. Guess my repos weren't bit. But when I tried to use git as a versioned database of changes in my pet project, I learned a lot about indexes, compacting, etc. Article covers a lot and is very helpful!

            • ergl

              today at 9:09 AM

              Surprise, surprise, another piece of LLM-generated slop on the front page of HN.

              From chapter 1:

              > When Git slows down, engineers adapt in bad ways. They stop asking questions the history could answer. They batch work to avoid sync cost. They keep messy branches alive longer, postpone cleanup, and treat the repository like something slightly dangerous.

              From https://gitperf.com/epilogue.html

              > Once machines start producing code at machine cadence, the model from this book does not break. What changes is the pace: more branches, more commits, more automation, and more surrounding metadata. The traffic gets louder, and the features that keep Git legible under pressure move from "nice to have" to "essential."

              > These stop looking like side optimizations. They are what keep machine-scale Git traffic usable.

              • hmpc

                today at 9:03 AM

                Similarly, if not performance-focused, I can wholeheartedly recommend Building Git[0], which walks you through building your own git clone in Ruby (although the language is immaterial).

                [0]: https://shop.jcoglan.com/building-git/

                • anitil

                  today at 5:54 AM

                  I'm only on to chapter two and already it's explained some plumbing details that I somehow have missed all these years. This is great

                  • normie3000

                    today at 4:11 AM

                    > LFS adds its own operational overhead.

                    Seemingly seconds on every remote-touching command, even on a very small repo.

                      • Hendrikto

                        today at 7:42 AM

                        What is worse is that for about half a year or so, I now have to authenticate my ed25519-sk key with my Yubikey thrice (!) when using LFS. On every push.

                        • fragmede

                          today at 9:06 AM

                          That they didn't go with git annex was such a fit of NIH of a mistake.

                            • Hendrikto

                              today at 10:04 AM

                              Both have their advantages and disadvantages. git-annex is not strictly better, LFS just chose different tradeoffs.

                      • aa-jv

                        today at 8:04 AM

                        I've always wanted to see a book that describes git for the common man and gives them tons of examples for how to use it to do productive things.

                        Even for a small office, git can be immensely useful. Entire production line workflows can be implemented with git .. if only folks would learn to use it productively.

                        Its not just for development. Writers can use it productively. Accountants too.

                        It always kind of irks me that Git hasn't just been folded into the OS front-end UI by any of the OS vendors .. it'd be so revolutionary to give common folks an easy way to manage the timeline/history of their computer use using git.

                          • awesan

                            today at 8:56 AM

                            The obvious reason is that most file formats used by writers, accountants, etc. are binary files which do not very much benefit from git.

                              • fragmede

                                today at 9:08 AM

                                Microsoft Office files are zipped XML these days, there's a standard and everything.

                        • snthpy

                          today at 2:38 AM

                          I've been wanting to ask this:

                          Why isn't

                              git clone --depth 1 ... 
                          
                          the default?

                          I would guess that for at least 90% of the repos I clone, I just want to install something. Even for the rest, I might hack on the code but seldom look into the history. If I do then I could do a `git fetch` at that point and save the bandwidth and disk space the rest of the time.

                            • dwattttt

                              today at 2:56 AM

                              A question: why is git involved at all in this? You don't want a repository.

                                • skydhash

                                  today at 3:13 AM

                                  This! The default was to have a link to download a tarball of the source. And if the user wanted to contribute (or check the devel version), you would add a link to the vcs.

                                    • kingstnap

                                      today at 4:07 AM

                                      Grabbing git repos instead of just tarballs is useful.

                                      A) You can update them, because you can git pull to fetch changes.

                                      B) If you want to apply patches on top, its better to have version control so you can keep track of what you changed, especially useful if you want to rebase.

                                      • eddythompson80

                                        today at 4:01 AM

                                        I think gitignore solves a problem that is hard to solve with the traditional tarball approach.

                                        Downloading a tarball and running ./configure or make, editing a config file here or there, etc then running `make install` is the most common flow. Now days I find myself frequently editing the Dockerfile to make it to my liking. With a git repo, the owners of the repo have excluded all the local files, build caches, etc and you can keep pulling to get updates stashing and reapplying your local changes. With tarballs, you have to figure it out all over again. Lose your build cache (language dependent maybe), lose a change you made here or there, etc.

                                • joshka

                                  today at 2:59 AM

                                  try `git clone --filter=blob:none` instead

                                  https://github.blog/open-source/git/get-up-to-speed-with-par...

                                  https://gitperf.com/chapter-11.html

                                    • snthpy

                                      today at 8:39 AM

                                      Thanks. That's great! I especially like that it then lazy loads the blobs as you need them.

                                      I was going to ask if there's a way to set that as the default but I guess I'll just set up an alias like I have for most of the subcommands I use daily.

                                  • jurakovic

                                    today at 4:24 AM

                                    What if that's only you? Git isn't made only for those who "just want to install something"

                                    • today at 3:20 AM

                                      • aa-jv

                                        today at 8:07 AM

                                        Its not the default because that'd be counter-productive to developers who use git with larger repositories, which is how git started life in the first place - your clone depth would be entirely useless for Linux kernel developers, for example, if it were default ..

                                    • wadefletch

                                      today at 2:14 AM

                                      ted nyman: #1 most knowledgable college football fan in sf

                                      and also git

                                      which makes more sense i guess

                                    • ruuda

                                      today at 7:32 AM

                                      The text reads like an LLM was involved in this.

                                      • nikhilpareek13

                                        today at 7:10 AM

                                        [dead]

                                        • today at 5:14 AM