\

Show HN: Pgit – A Git-like CLI backed by PostgreSQL

123 points - last Tuesday at 6:11 AM

Source
  • smartmic

    last Wednesday at 8:12 AM

    Of course, we can’t leave out a mention of Fossil here — the SCM system built by and for SQLite.

    https://fossil-scm.org/

      • wps

        last Wednesday at 4:32 PM

        I use Fossil for all of my long term projects. It can even import Git repositories if you want to try it out.

        Today I was working on a semester paper for a non-technical class. It is versioned in fossil and I have all my miscellaneous ideas, initial outline, and the paper guidelines in the Wiki. The branching also makes much more sense, and I’ve used it for major revisions of the paper or its structure.

        Fossil is legitimately awesome, and I lament the fact that Git gained popularity over it.

        • ndegruchy

          last Wednesday at 12:57 PM

          Fossil is great. Not only is it a full suite of tools associated with the repository (discussions, tickets, wiki) but the tool is a single >10mb binary and can run as a web server (or CGI-like interface) for remote hosting.

            • wps

              last Wednesday at 4:34 PM

              The web server that powers fossil was also written by its author! It’s nice that unlike git instaweb you don’t need to install an additional web server just to see a read only view of your commits.

          • thunderbong

            last Wednesday at 9:09 AM

            And fossil itself is an SQLite database!

              • cbluth

                last Wednesday at 3:36 PM

                > fossil itself is an SQLite database

                Can anyone explain what this means and how it works?

                  • wps

                    last Wednesday at 4:26 PM

                    Fossil itself is a C binary, not a database. Maybe they meant that Fossil’s source code is hosted in Fossil, or that Fossil repositories are SQLite files? I don’t exactly know either.

          • Pay08

            last Wednesday at 10:23 AM

            How much does it take advantage of being a DB underneath?

            • ImGajeed76

              last Wednesday at 8:15 AM

              yeah fossil is great, but can fossil import the linux kernel (already working on the next post)

          • aljgz

            last Wednesday at 8:14 AM

            Still halfway through reading, but what you've made can unlock a lot of use cases.

            > I tried SQLite first, but its extension API is limited and write performance with custom storage was painfully slow

            For many use cases, write performance does not matter much. Other than the initial import, in many cases we don't change text that fast. But the simpler logistics of having a sqlite database, with the dual (git+SQL) access to text is huge.

            That said, for the specific use case I have in mind, postgres is perfectly fine

              • hrmtst93837

                last Wednesday at 10:49 AM

                SQLite is fine right up until you want concurrent writers. Once you need multiple users, cross-host access, or anything that looks like shared infra instead of a local cache, the file-locking model stops being cute and starts setting the rules for the whole design. For collaborative versioning, Postgres makes more sense.

                  • brigandish

                    last Wednesday at 11:44 AM

                    For a distributed VCS, what would be the need for such things? Even if it were a really big project, how many writes could be going on that this becomes a bottleneck? I don't see it but maybe you have a situation in mind.

                      • lelanthran

                        last Wednesday at 1:05 PM

                        In the current environment, even a distributed VCS may have concurrent agents modifying it on different branches.

                        • ImGajeed76

                          last Wednesday at 12:03 PM

                          The problem i faced is mostly importing large repos. But normal use should be fine.

                  • babarot

                    last Wednesday at 2:50 PM

                    The single-file simplicity of SQLite is a huge win for self-hosted apps. I've been using SQLite in WAL mode for a single-user app and it handles concurrent reads from the API while background workers write without issues. Backup is just cp. For anything that doesn't need multi-user concurrent writes, it's hard to justify the operational overhead of Postgres. ko

                      • ImGajeed76

                        last Wednesday at 3:10 PM

                        Yeah, I get that, and I'm fully on your side. SQLite would have been a nice fit. The only downside is the delta compression problem. Creating an extension for SQLite works, but it's slow. I had two options:

                        1) Do the delta compression and caching and so on on the pgit side and lose SQL queryability (or I need to do my own), or

                        2) Use postgres

                        • swaminarayan

                          yesterday at 2:57 AM

                          if you want to use key value store using sqlite then you can try : https://github.com/hash-anu/snkv

                          in which i am directly accessing b-tree layer and not consuming query layer.

                          for kv workloads it is much faster compare to sql.

                          and yeah you will get same benefits of sqlite storage engine.

                      • nasretdinov

                        last Wednesday at 10:33 AM

                        Also SQLite in WAL/WAL2 mode is definitely not amy slower for writing than Postgres either.

                        • ImGajeed76

                          last Wednesday at 8:16 AM

                          sounds great yes. maybe an SQLite version will come in the future

                          • ps12

                            last Wednesday at 11:47 AM

                            [dead]

                        • taneliv

                          last Wednesday at 3:24 PM

                          Hey, I tried to import Linux kernel master branch from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... to pgit. My laptop is not the beefiest (some Ryzen 7 with 16G RAM and about 300G disk free), so that did not quite work. It died when trying to rebuild indexes (after bulk import), due to Postgres running out of disk space.

                          I guess this could have been expected, but it didn't quite occur to me since plain git has had no issues with that repository. Either way, the import process was quite slow: the failure happened after 3h30m. I'm not sure if it would be possible to speed it up, or estimate resource consumption ahead of time and warn the user? The laptop also had gone almost 2G into swap at some point, so there was quite a bit of memory pressure as well, but I don't quite know at which point this happened.

                            • ImGajeed76

                              last Wednesday at 3:48 PM

                              haha, great that you tried! i also imported it multiple times now and it does work. but it's huge. the times actually match quite well, i also had around 3 hours, i'm surprised you managed to do it that fast actually. so yeah, i'm currently working on multiple things to improve the speed for importing and then also for analysing the kernel. but that will be something for the next post. stay tuned! as a quick teaser: it imported the 123GB uncompressed master branch into 2.98 GB pgit actual data while git aggressive puts it into 1.95 GB. but keep in mind, pgit was never meant to beat git in any terms. it really started as a demo XD

                                • taneliv

                                  last Wednesday at 4:34 PM

                                  Ok, cheers! I occasionally need to investigate older releases and compare to out-of-tree things, and was thinking pgit might be of help there. I put up a reminder for myself to check pgit again next time I need to do that sort of stuff!

                                    • ImGajeed76

                                      last Wednesday at 6:37 PM

                                      Sounds great! Yeah i have been working on a 3 layer cache in pg-xpatch so its not only in-memory cache but a little more sufisticated and hopefully uses less ram... haha. but its still not quite what i want.

                          • aljgz

                            last Wednesday at 11:18 AM

                            How well does this support random-access queries to the file names and content at a certain revision? Like:

                            - "Checking out" a specific branch (which can be reasonably slow)

                            - Query all files and folders in path `/src`

                            - Query all files and folders in path `/src/*` (and maybe with extra pattern matches)

                            - Be able to read contents of a file from a certain offset for a certain length

                            These are similar to file system queries to a working directory

                              • ImGajeed76

                                last Wednesday at 12:01 PM

                                Accessing specific files is very fast. For sure sub second and most of the times its just a few milliseconds

                            • drob518

                              last Wednesday at 3:34 PM

                              I’m confused by the benchmark detail. It says that the “on disk” size for pgit is always larger than the git aggressive size, but then it breaks out just the pgit data size and says that’s typically smaller. If you’re using PG to implement this, don’t you have to account for the PG storage, too, in your comparison? My takeaway is that pgit always has a larger storage requirement than git aggressive compression. Or am I reading that wrong? Obviously, pgit also brings features like SQL querying that git doesn’t have that you might prioritize more highly. But the author seems to be pushing the storage benefit highly.

                                • ImGajeed76

                                  last Wednesday at 3:41 PM

                                  good question! the "pgit actual" column tries to compare just the compression algorithms, similar to how the git side only counts the .pack file and not .idx/.rev/.bitmap or filesystem overhead. so both sides strip their "container" overhead to make it a fair comparison. but you're totally right that in practice the on-disk size is what you actually pay. that's why both numbers are in the table. and yes, pgit on-disk is usually larger than git aggressive. the tradeoff is that you get SQL queryability over your entire history, which git just can't do natively.

                                    • drob518

                                      last Wednesday at 9:28 PM

                                      Okay, thanks. I would revise the write-up then. It makes it sound like there’s a storage benefit here when there really isn’t. The real message might be that it’s very close to git’s aggressive optimization and it also gives you the sql benefits. I’m also a bit confused by all the write up on delta compression. That’s interesting for the size comparison, but if the real benefit to most users is going to be the sql features, then I’m not sure why all the talk of delta compression, which I’m guessing slows things down slightly. I’m assuming you could do all the sql features without any of the delta compression.

                                        • ImGajeed76

                                          last Wednesday at 10:35 PM

                                          yeah i get that. sorry if it comes across as too salesy. but keep in mind that pgit was only meant to be a demo of pg-xpatch and wasn't built with beating git in mind. the fact that it's SQL queryable and comes close to git's compression was a nice side-effect. so the whole thing was really just built for showcasing xpatch's compression and evolved into what it is now. but yes, in theory you could also just store the git history uncompressed, which would actually solve quite a lot of issues i had :)

                              • lmuscat

                                last Wednesday at 1:13 PM

                                Would be cool to populate the DB and keep it in sync by pointing to postgres as an upstream remote inside of git itself. That would probably require a custom postgres extension and a way to accept traffic from git.

                                  • ImGajeed76

                                    last Wednesday at 3:11 PM

                                    sounds interesting

                                • Fire-Dragon-DoL

                                  last Wednesday at 8:59 AM

                                  Wouldn't duckdb be better suited for this? Forgive the stupid question. I just connected "csv as sql" to "git as sql" and duckdb comes to mind

                                    • ImGajeed76

                                      last Wednesday at 9:03 AM

                                      I did actually look into writing the extension for duckdb. But similar to SQLite the extension possibilities are not great for what I needed. Though duckdb is a great database.

                                      • useftmly

                                        last Wednesday at 10:30 PM

                                        [dead]

                                    • Terretta

                                      last Wednesday at 11:44 AM

                                      Why a custom LLM prompt for what appears to be the default 'report' you'd want? Wouldn't the CLI just do this for a report command?

                                      Is there an example of the tool enabling LLM 'discovering' something non-deterministic and surprising?

                                        • ImGajeed76

                                          last Wednesday at 12:00 PM

                                          Yes, you also got analysis commands the AI can use. I just did the prompt example before they existed.

                                      • killingtime74

                                        last Wednesday at 8:08 AM

                                        I love it. I love having agents write SQL. It's very efficient use of context and it doesn't try to reinvent informal retrieval part of following the context.

                                        Did you find you needed to give agents the schema produced by this or they just query it themselves from postgres?

                                          • ImGajeed76

                                            last Wednesday at 8:19 AM

                                            so most analyses already have a CLI function you can just call with parameters. for those that don't, in my case, the agent just looked at the --help of the commands and was able to perform the queries.

                                        • dmonterocrespo

                                          last Wednesday at 1:59 PM

                                          What would be the general purpose of storing the history in a remote database? Is it for use by agents? It's not the same as agents cloning the project and running "git log".

                                            • ImGajeed76

                                              last Wednesday at 2:05 PM

                                              1) In the case of pgit, the "remote" database is a local docker container

                                              2) You can do more complex analyses faster and easier (you don't need to pipe the git outputs) since it's just SQL

                                              but pgit is not meant to replace git.

                                          • Pay08

                                            last Wednesday at 10:22 AM

                                            This is incredibly neat and might actually become a part of my toolbox.

                                              • ImGajeed76

                                                last Wednesday at 11:00 AM

                                                thanks! but it might still need some releases until it's really good. just don't rely on it ;)

                                            • kardianos

                                              last Wednesday at 3:37 PM

                                              This could be great for larger repos.

                                              If you couple this with an optional FUSE provider, server side user branches, and gerrit like change sets, that would be awesome.

                                                • ImGajeed76

                                                  last Wednesday at 3:42 PM

                                                  thanks! FUSE is actually a really cool idea, hadn't thought about that. would basically let you mount a repo as a filesystem backed by postgres. server side branches and change sets are interesting too, postgres already handles concurrent access well so that could work nicely. definitely adding these to the ideas list!

                                                    • kardianos

                                                      last Wednesday at 4:06 PM

                                                      I've already spun up claude to make a POC for this.

                                                      I like gerrit, but the server is such a pain to handle (java plus FS). PG would be the only server side component required, though you could have an optional review server that would act like a PG client as well.

                                                      The FUSE would be extremely nice for CI/CD for instant cloning with a local resource cache, which is much harder to do with a FS based git.

                                                        • nulltrace

                                                          yesterday at 4:36 AM

                                                          The FUSE angle is what got me. Our monorepo takes about 90 seconds just to clone in CI, and most jobs only touch two or three packages. Shallow clone helps with history but you basically still pull the entire working tree. Something that could mount the tree and fetch files on demand would cut that to almost nothing for most pipeline steps.

                                                          • ImGajeed76

                                                            last Wednesday at 4:26 PM

                                                            fire

                                                • swaminarayan

                                                  yesterday at 2:53 AM

                                                  What would you do if your entire Git history was instantly queryable with SQL?

                                                  • Toby11

                                                    last Wednesday at 10:47 AM

                                                    why do agents need to know these metas about git history to perform its coding functions though?

                                                    even humans don’t do this unless there’s a crazy bug causing them to search around every possible angles.

                                                    that said, this sound like a great and fun project to work on.

                                                      • ImGajeed76

                                                        last Wednesday at 10:58 AM

                                                        but the difference between you and an agent is that you naturally know the history of the project if you have worked on it. the AI doesnt.

                                                          • tomhallett

                                                            last Wednesday at 1:44 PM

                                                            so true!

                                                            1) commit messages often capture the "why" something changed - versus the code/tests which focus on the what/how for right now.

                                                            2) when you have a regression being able to see the code before it was introduced and the code which was changed at the same time is very helpful in understanding the developer's intent, blindspots in their approach, etc.

                                                        • nsonha

                                                          last Wednesday at 2:38 PM

                                                          debuging and operational investigations. I would say half of my sessions with agent involves those

                                                            • ImGajeed76

                                                              last Wednesday at 3:10 PM

                                                              hahaha i feel that

                                                      • quickrefio

                                                        last Wednesday at 3:30 PM

                                                        Feels like swapping filesystem complexity for database complexity.

                                                          • zadikian

                                                            last Wednesday at 7:22 PM

                                                            I would choose a database for this kind of analysis

                                                            • ImGajeed76

                                                              last Wednesday at 4:01 PM

                                                              haha yeah pretty much. but postgres already solves most of that complexity for you, so you get SQL queryability almost for free.

                                                          • Zardoz84

                                                            last Wednesday at 7:51 AM

                                                            Interesting... could be used to store multiple git repos and do a full text search across the multiple repos ?

                                                              • ImGajeed76

                                                                last Wednesday at 8:21 AM

                                                                in theory yes. you just need to do the full text search across the databases. pgit doesnt support it but at the end its just postgres under the hood.

                                                            • jauntywundrkind

                                                              last Wednesday at 6:10 PM

                                                              Andrew Nesbitt's gitgres is also adjacent. And a real git. https://github.com/andrew/gitgres

                                                              There's a nice write up on "why" too. https://nesbitt.io/2026/02/26/git-in-postgres.html

                                                              • waffletower

                                                                last Wednesday at 3:55 PM

                                                                I feel it would be more ergonomic to utilize SQLite as a backend, for the scale of repos I tend to interact with (small-medium sized repos). Yet it might be interesting for all the repos to share a single PostgreSQL db for cross-comparisons -- though that isn't a use case I have seen a need for.

                                                                  • ImGajeed76

                                                                    last Wednesday at 3:58 PM

                                                                    yeah totally get that. the main blocker was delta compression. sqlite's extension api made it really slow for custom storage. i either had to do all the compression on the pgit side (and lose native SQL queryability) or just use postgres which handles it natively. but an sqlite version isn't off the table for smaller repos where that tradeoff makes more sense.

                                                                • rrojas-nexus

                                                                  last Wednesday at 7:04 PM

                                                                  [dead]

                                                                  • olivercoleai

                                                                    last Wednesday at 2:03 PM

                                                                    [dead]

                                                                    • ydw0127

                                                                      last Wednesday at 4:35 PM

                                                                      [dead]

                                                                      • techpulse_x

                                                                        last Wednesday at 8:30 AM

                                                                        [dead]