\

Show HN: Self-host Reddit – 2.38B posts, works offline, yours forever

260 points - yesterday at 3:35 PM


Reddit's API is effectively dead for archival. Third-party apps are gone. Reddit has threatened to cut off access to the Pushshift dataset multiple times. But 3.28TB of Reddit history exists as a torrent right now, and I built a tool to turn it into something you can browse on your own hardware.

The key point: This doesn't touch Reddit's servers. Ever. Download the Pushshift dataset, run my tool locally, get a fully browsable archive. Works on an air-gapped machine. Works on a Raspberry Pi serving your LAN. Works on a USB drive you hand to someone.

What it does: Takes compressed data dumps from Reddit (.zst), Voat (SQL), and Ruqqus (.7z) and generates static HTML. No JavaScript, no external requests, no tracking. Open index.html and browse. Want search? Run the optional Docker stack with PostgreSQL – still entirely on your machine.

API & AI Integration: Full REST API with 30+ endpoints – posts, comments, users, subreddits, full-text search, aggregations. Also ships with an MCP server (29 tools) so you can query your archive directly from AI tools.

Self-hosting options: - USB drive / local folder (just open the HTML files) - Home server on your LAN - Tor hidden service (2 commands, no port forwarding needed) - VPS with HTTPS - GitHub Pages for small archives

Why this matters: Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.

Scale: Tens of millions of posts per instance. PostgreSQL backend keeps memory constant regardless of dataset size. For the full 2.38B post dataset, run multiple instances by topic.

How I built it: Python, PostgreSQL, Jinja2 templates, Docker. Used Claude Code throughout as an experiment in AI-assisted development. Learned that the workflow is "trust but verify" – it accelerates the boring parts but you still own the architecture.

Live demo: https://online-archives.github.io/redd-archiver-example/

GitHub: https://github.com/19-84/redd-archiver (Public Domain)

Pushshift torrent: https://academictorrents.com/details/1614740ac8c94505e4ecb9d...

Source
  • Aurornis

    yesterday at 7:16 PM

    Cool way to self-host archives.

    What I'd really like is a plugin that automatically pulls from archives somewhere and replaces deleted comments and those bot-overwritten comments with the original context.

    Reddit is becoming maddening to use because half the old links I click have comments overwritten with garbage out of protest for something. Ironically the original content is available in these archives (which are used for AI training) but now missing for actual users like me just trying to figure out how someone fixed their printer driver 2 years ago.

      • anonymous908213

        yesterday at 8:35 PM

        That would only really be ironic if the reason for people overwriting their comments was out of protest for LLM training, but the main reason that resulted in by far the biggest wave of deletions was Reddit locking down their API. If the result of their protest is that the site is less useful for you, the user, then in fact it served its purpose, as the entire point was an attempt to boycott Reddit, ie. get people to stop using it by removing the user contributions that give the site its only value in the first place.

          • Aurornis

            yesterday at 8:43 PM

            > If the result of their protest is that the site is less useful for you, the user, then in fact it served its purpose, as the entire point was an attempt to boycott Reddit, ie. get people to stop using it by removing the user contributions that give the site its only value in the first place.

            In practice I just give them more page views because I have to view more threads before I find the answer.

            Reddit's DAU numbers have only gone up since the protest.

              • swed420

                yesterday at 11:58 PM

                > Reddit's DAU numbers have only gone up since the protest.

                And so has the bot activity.

                • anonymous908213

                  yesterday at 8:49 PM

                  I did phrase it as "an attempt". In the end the protest probably wasn't as effective as protestors might have hoped, and it didn't get Reddit to change course on their enshittification decisions. I do think it was good that there was an attempt at pushback, at least, when most software users just accept enshittification as normal and continue tolerating whatever abuse their masters throw at them.

          • accrual

            today at 12:49 AM

            Just offering another perspective because I see those missing comments too. The author decided they didn't want to participate in public discourse anymore and their comment is gone. So be it. I don't search archives or use tools to undermine their effort. I move onto the next thing.

            I read "it's maddening because ... they decided to use their autonomy and..." and I stop there. So be it.

              • hrimfaxi

                today at 1:00 AM

                People use their autonomy to maddening endsβ€”how does the fact that it is of their own volition offer you any comfort? I ask genuinely. Is it something along the lines of recognizing the things you can't change?

                  • dzelzs

                    today at 3:48 AM

                    In this case - recognition of an attempt at doing something. Downplaying that is similar to Downplaying protests for not achieving anything. At the very least it might have brought attention to the topic of contention for more people which can be a spark for change. If you have apathy and disdain for attempts at change - it might be worth evaluating what the consequences might be of that at a societal level when that apathy is the norm for harder to change things (like politics, big corp practices etc.)

        • NickNaraghi

          yesterday at 6:58 PM

          Data is available via torrent in this section: https://github.com/19-84/redd-archiver?tab=readme-ov-file#-g...

        • m463

          yesterday at 10:24 PM

          I wonder if you could use this to "Seed" a new distributed social media thing and just take over from there.

          sort of like forking a project.

        • feconroses

          yesterday at 11:40 PM

          Very cool project! Quick question: is the underlying Pushshift dataset updated with new Reddit data on any regular cadence (daily/weekly/monthly), or is this essentially a fixed historical snapshot up to a certain date? Just want to understand if self-hosters would need to periodically re-download for fresh content or if it's archival-only.

        • twobitshifter

          yesterday at 11:13 PM

          If reddit was a squeaky clean place, or if I could pick certain subs, maybe I would be interested, but I really wouldn't want ALL of reddit on my machine even temporarily.

            • 19-84

              yesterday at 11:52 PM

              the torrent has data for the top 40,000 subs on reddit. thanks to watchful1 splitting the data by subreddit, you can download only the subreddit you want from the torrent

                • Imustaskforhelp

                  today at 11:45 AM

                  I am going to be honest and this looks really cool.

                  40,000 subs are good numbers and I hope that the number can be spread to even more subreddits

                  Perhaps we can finally migrate all or much of the data to lemmy instances as well to finally get the lemmy instance up and running as well.

                  Thank you for creating this. It opens up a lots of interesting opportunities.

          • diggings

            yesterday at 9:49 PM

            This is a neat project, nice work.

            You've probably come across this already but there are alternative archives to PushShift that may have differing sets of posts and comments (perhaps depending on removal request coverage?)

            One is Arctic Shift: https://github.com/ArthurHeitmann/arctic_shift/releases

            Another is PullPush: https://pullpush.io/

            • elSidCampeador

              yesterday at 7:02 PM

              I wonder if this can be hooked up with the now-dead Apollo app in some way, to get back a slice of time that is forever lost now?

                • 19-84

                  yesterday at 7:07 PM

                  the API should allow for a lot of different integrations

              • alcroito

                yesterday at 8:57 PM

                I tried spinning up the local approach with docker compose, but it fails.

                There's no `.env.example` file to copy from. And even if the env vars are set manually, there are issues with the mentioned volumes not existing locally.

                Seems like this needs more polish.

              • vivzkestrel

                today at 3:06 AM

                - slightly offtopic here but does anyone have a similar data set of all youtube channels out there?

                - details probably include the 400 million youtube accounts, channel id, name, creator url, etc

              • bkovacev

                yesterday at 10:02 PM

                Is there any way to check if a subreddit that was made private (2-3 years ago) is in the data dump?

              • blks

                today at 1:29 AM

                Does it also contains countless NSFW content?

                • blks

                  today at 1:33 AM

                  Opened the live demo, went into programming subreddit, felt like I was showered with liquid shit. I tend to forget what kind of edgelord hellhole Reddit was (and stil is sometimes).

                  • yesterday at 8:51 PM

                    • dvngnt_

                      yesterday at 7:47 PM

                      I want to do the same thing for tiktok. I have 5k videos starting from the pandemic downloaded. want to find a way to use AI to tag and categorize the videos to scroll locally.

                      • drob518

                        yesterday at 11:05 PM

                        This is a great way to participate in arguments you missed three years ago.

                        • kylehotchkiss

                          yesterday at 7:39 PM

                          _Hacker News collectively grabs the dataset to train their models on how to become effective reddit trolls_

                            • layer8

                              yesterday at 8:52 PM

                              Don’t we have enough of those already? ;)

                              • 19-84

                                yesterday at 7:43 PM

                                the API and MCP server is very powerful ;)

                            • justsomehnguy

                              yesterday at 11:08 PM

                              Appreciated.

                              EDIT: Is there any cheap way to search? I have MS TechNet archive which is useless without search, so I realky want to know a way to have a cheap local search w/o grepping everyting.

                                • 19-84

                                  yesterday at 11:54 PM

                                  redd-archiver uses postgres full text search. for static search you could use lunr.js

                              • inquirerGeneral

                                today at 4:50 AM

                                [dead]

                                • Jordan-117

                                  yesterday at 7:44 PM

                                  [flagged]

                                    • 19-84

                                      yesterday at 7:52 PM

                                      thank you for your comment, I will support any platform that has complete dataset available. I will take submissions for any complete datasets through github issues. https://github.com/19-84/redd-archiver/blob/main/.github/ISS...

                                      • apstls

                                        yesterday at 9:34 PM

                                        There are certainly things to be learned from analysis of the dataset. Keep your friends close but your enemies as JSON, or something...

                                        • devilsdata

                                          yesterday at 8:57 PM

                                          Might be good for researchers to be able to perform studies on.

                                          • metaPushkin

                                            yesterday at 9:33 PM

                                            It seems you have no understanding of the term neo-fascism, and yes, it's not what your propaganda talks about.

                                              • nozzlegear

                                                yesterday at 9:59 PM

                                                Can you explain for the class? Don't just say that and leave us wondering.

                                                  • metaPushkin

                                                    today at 7:05 AM

                                                    Of course, if you pay me for that job

                                                      • nozzlegear

                                                        today at 3:46 PM

                                                        You'll drop vague hand wavey allusions for free, but you charge for elucidation? What kind of business are you in?

                                            • diggyhole

                                              yesterday at 8:25 PM

                                              Wat?

                                                • Jordan-117

                                                  yesterday at 8:48 PM

                                                  It sold itself as a healthier alternative to Reddit, but by the end of its run virtually every post sitewide was some flavor of virulently racist, misogynistic, anti-semitic, fringe conspiratorial, etc.

                                                    • lisdexan

                                                      today at 6:47 AM

                                                      As far my recollection goes Voat was actually pretty nice content wise until /r/fatpeoplehate got banned. After that it became the dumping ground for every shithole community that was too awful for Reddit, quite an achievement TBH. It was inevitable with 20/20 hindsight, but I would've loved if it had had more breathing room to actually develop as healthier alternative.

                                                      Even if it failed the same way it could've influenced the redesign to implement RES-style QoL changes instead of the clusterfuck that New Reddit still is.

                                                      • 4782626292283

                                                        today at 4:12 AM

                                                        Sounds an awful lot like reddit. The cesspool whose users universally celebrate murders committed by far-left terrorists.

                                                          • lisdexan

                                                            today at 7:08 AM

                                                            While Reddit always has someone celebrating / inciting everything from the benign to the macabre, this is just cope.

                                                            Kirk was in a process of sanctification by the right as a knight for debate and reason, as if he wasn't gleeful about Paul Pelosi almost being pulped to death with a hammer. People not dropping everything to make him MLK is not celebrating his murder. As for Brian Thompson, he was the modern equivalent of a British tax man in the Irish Famine, people being indifferent to his death is expected. If Mangione was more palatable in his politics to MAGA (instead of just being a RFK fan) and Thompson had photos with Pelosi or something, he would've been a perfect Trump pardon candidate.

                                                        • tempsaasexample

                                                          yesterday at 8:56 PM

                                                          [flagged]

                                                            • Jordan-117

                                                              yesterday at 9:10 PM

                                                              Funny defense to use for a crowd that spent all their time regularly, hatefully dehumanizing people. The front page was routinely plastered with shit like "Why interracial children are an abomination," Hitler-did-nothing-wrong propaganda, usernames that echoed Nazi slogans and fantasized about mass-murdering non-white people, etc. It was an utter cesspool, and preserving and perpetuating that is a really weird use of dev time and effort.

                                                                • lazyasciiart

                                                                  yesterday at 9:12 PM

                                                                  It’ll be useful to have when the posters pop up as presidential advisors.

                                                                  • tempsaasexample

                                                                    yesterday at 9:17 PM

                                                                    [flagged]

                                                • syngrog66

                                                  yesterday at 8:08 PM

                                                  Did you pay all the people who created its content?

                                                    • nullandvoid

                                                      yesterday at 9:43 PM

                                                      Did anyone ever comment on reddit with an expectation of pay?

                                                      It's an open forum - similar to here, whatever I post I it's in the public forum and therefore I expect it to be used / remixed however anyone wants.

                                                        • nozzlegear

                                                          yesterday at 9:58 PM

                                                          > Did anyone ever comment on reddit with an expectation of pay?

                                                          Maybe Gallowboob

                                                            • Sohcahtoa82

                                                              yesterday at 10:54 PM

                                                              That's a name I haven't seen in a LONG time.

                                                      • devilsdata

                                                        yesterday at 8:57 PM

                                                        I have no problem with this being downloaded for personal use, in fact that's a good thing. But of course we both know it'll be used to train AI.

                                                        • antisthenes

                                                          yesterday at 10:37 PM

                                                          Reddit didn't pay me for posting either. Not that I posted in the last decade.