\

SQLite is all you need for durable workflows

206 points - today at 5:54 PM

Source
  • bitexploder

    today at 6:17 PM

    I started setting up my workflows using Temporal. It deploys as relatively light weight local app. For an isolated local installation it uses SQLite. It makes the process of dealing with API retries and organizing workflows and tasks really simple. I recommend giving it a try. It is, philosophically, exactly what this article is suggesting, but it adds an incredibly rich and flexible interface for agents to work with. Additionally, the web UI makes it very easy to inspect workflows, review agent execution, etc. Temporal also encodes much higher reliability into your system, almost for free. Distributed and reliable systems are hard, don't reinvent the wheel IMO.

    If you find yourself wanting things like an easy way to then introspect your SQLite database, figure out what is happening in the workflow, compose individual tasks, make workflows trivially callable, etc, give Temporal a look.

    Alongside this, I have mostly moved away from files for agents. Markdown and JSON are great, but also feel like traps when building out smaller local apps. LLMs are great at SQLite and you can render anything you want out of it (Markdown, JSON, etc). It saves a lot of tokens when an agent can just query a specific row instead of having to fire up jq or grep through markdown. You get a nice portable self contained data management system that encourages agents to be more disciplined about how they structure their data than a bunch of files. It also continues to scale into MySQL/Postgres if your little local projects start to outgrow or become more formal, you already have schema and discipline around data.

      • svara

        today at 9:10 PM

        Word on HN is that you're either paying more money than you expected for temporal's managed solution or taking on substantial ops burden ultimately running their very heavy system yourself.

        I wouldn't know, I've not done either, but I'd like to learn more from your or other's experience.

        • jawns

          today at 6:23 PM

          Could you give an example of a case where you'd use SQLite instead of jq or grep through Markdown?

            • gopalv

              today at 9:01 PM

              > an example of a case where you'd use SQLite instead of jq or grep through Markdown?

              Usually we end up writing a script to incrementally refresh a data-set I'm analyzing (or have someone send me a copy after they pull it).

              I've been using sqlite for anything which needs an UPDATE - modifying a row deep inside the data-set with jsonl is a pain.

              My github is full of java programs which update sqlite3 files with threadpools and a single big lock around the UPDATE (& then I write or have an agent write code to analyze it).

              DuckDB is slowly replacing it in the context of python, simply because of the ease of pushing a UDF into the SQL.

              Also because I really like expressing things as LEAD/LAG with a UDF on top.

              • phamilton

                today at 7:43 PM

                My favorite lens on SQLite is that it is actually two things:

                1. A robust durability implementation 2. A library of high performance data structure and algorithms

                The fact this it's SQL is nice, but those two attributes are what make it great.

                For example, I'm implement an in-process event log that I want to be durable. I started simple, but soon saw some edge cases and instead of playing whackamole I just swapped to using sqlite as an ordered kv store that gives me ACID.

                Another example: ingesting multiple inter related datasets. Instead of a dozen hash maps in memory, I load them up into sqlite (no persistence) and then slice and dice as I need to.

                It's a super useful tool.

                • chaps

                  today at 7:40 PM

                  The moment my JSON has any sort of depth and I need to write a parser for it and potentially account for unspecified behavior. JSON's nice when it's nice, but it's terrible when it's terrible. It's 100x easier to write SQL than writing jq and... dear god if I have to use grep -A or -B, I'm doing something wrong. Constraints are actually a good thing!

                  The underlying database isn't the most important thing. Just use SQL. Its namespacing (eg, through CTEs) is good and you're more likely to have colleagues who know SQL compared to jq.

                  • fragmede

                    today at 7:18 PM

                    Honest answer is: whenever your markdown or json files get to be big enough that grep/jq takes long enough that you get bored waiting for it.

                      • embedding-shape

                        today at 8:57 PM

                        > get to be big enough that grep/jq takes long enough

                        On a modern processor, that's about GBs of data typically, right?

                • rick1290

                  today at 7:59 PM

                  Interesting about the files vs db approach. I have been going back and fourth. I landed on db as well.

                  • peterson_lock

                    today at 7:14 PM

                    This reads like an advertisement for Temporal :)

                      • switchbak

                        today at 8:02 PM

                        I'm someone else who has inherited a bunch of ad-hoc orchestration systems and also used Temporal quite heavily. The latter does certainly come with some overhead (not so bad in the age of LLMs), but it also guides you along a well-trodden path of good practices. The latter being very important - it means that when you want to take on more advanced capabilities, you probably haven't painted yourself into a corner too badly and can take that on fairly easily. Think: retries, multi-tenancy, multi-lang, observability, etc.

                        • pzduniak

                          today at 8:46 PM

                          I can vouch for them too, being a super early adopter. One of the best early bets I've ever made. Awesome OSS product, glad the team decided to leave Uber to commercialize it.

                          • baq

                            today at 9:04 PM

                            Low key amazing tech, kinda like clickhouse - nobody is bragging it’s running their business

                    • levkk

                      today at 6:54 PM

                      I don't understand this obsession with SQLite for real, production apps. SQLite is an embedded database, completely unsuitable for managing concurrency. This is what database _servers_ are for, e.g., Postgres, MySQL, etc. Their entire job is to allow you to modify data from multiple processes, on different machines, at the same time.

                      This is a foundational principle of computer science. It seems to me that the "SQLite for everything" crowd is a little bit inexperienced.

                        • jph00

                          today at 7:26 PM

                          You seem to have a rather limited understanding of what kinds of concurrency exist and how those needs are best met. Whether something is a server or not is not very relevant to this discussion.

                          SQLite is an excellent production db for many real world workloads, as has been widely documented. It is very different to Postgres, so requires learning a whole new skill set.

                          One way to think about it is that SQLite can work well for the parts of your system where there is naturally strong partitioning.

                            • tasuki

                              today at 8:59 PM

                              > SQLite can work well for the parts of your system where there is naturally strong partitioning.

                              Or the parts of your system that don't have big data and no need for massively concurrent writes. And that's the vast majority of systems!

                          • lanstin

                            today at 7:30 PM

                            I had very good results giving 1 SQL DB per go routine, so the accesses were serialized up front, on a very high volume (130K requests/second) service. Exact transactionality was not a product goal, and the SQLite was just to backup the in memory state. If we lost a little due to abend or something, that was ok (although for normal maintenance it caught SIGTERM and stopped the listen and then waited for in flight calls and then flushed the remaining changes to SQLite; then on startup it would read the SQLite into memory to populate before taking the listen; persistent storage across container runs, and never both reads and writes to the same file at the same time. (It also just closed the DB and opened a new one when it hit some limit of rows, so as not to fill the disk; the max size of the SQLite corresponded to the max size of the LRU map being served from in memory; then it just flipped A / B between "a full memory worth of data stored" and "the currently updating state." A lot easier than having to write out proto bufs to disk or whatever I would have done for transient (during restarts/maintenance) persistence.

                          • abtinf

                            today at 7:37 PM

                            There are many cases where SQLite + concurrent front end (like a go net/http server) can handle all the load that a service might ever conceivably have to handle, especially if allowed to scale up hardware over time. You can trivially scale up SQLite to, what, hundreds of thousands of tps?

                            The only thing you really give up is HA/failover and DR. But there are solutions to deal with those. And single-server systems are generally surprisingly robust (since, in the absence of very complex control planes, uptime goes down with more systems).

                            • rpdillon

                              today at 7:59 PM

                              Sqlite is good for lots of stuff, but you're probably focusing your days on high-scale webapps that want sharding with networked DBs. That's one domain, and an interesting one, but there are lots of others.

                              I'm a big fan of re-evaluating prior "best practices" in light of technology changes, especially in ways that improve simplicity. Running my family's social media site off a single sqlite DB on a VPS is great. ~15 users, almost zero maintenance. I run my FreshRSS instance off of sqlite, as well as my "now" page. At work, I used sqlite for all kinds of things over the past decades: as an ad hoc job queue, as a quick way to ingest and query lots of logs locally, and present/filter in realtime with simonw's excellent https://github.com/simonw/datasette.

                              I don't think it's every "sqlite for everything" as much as it is "sqlite in lots of places you probably didn't think to apply it."

                              kentonv/Cloudflare's work on sqlite at the edge might have made this thinking a bit more popular, but it was always around. https://blog.cloudflare.com/sqlite-in-durable-objects/

                              I suspect being aware of all those little neat cases and wanting to leverage sqlite for them may be an indicator of experience, rather than the opposite.

                                • droidjj

                                  today at 8:40 PM

                                  > Running my family's social media site off a single sqlite DB on a VPS is great. ~15 users, almost zero maintenance.

                                  Details, please!

                              • peterspath

                                today at 7:32 PM

                                That’s why there are billions of SQLite databases right?

                                SQLite is likely used more than all other database engines combined. Billions and billions of copies of SQLite exist in the wild. SQLite is found in:

                                Every Android device Every iPhone and iOS device Every Mac Every Windows 10/11 installation Every Firefox, Chrome, and Safari web browser Every instance of Skype Every instance of iTunes Every Dropbox client Every TurboTax and QuickBooks PHP and Python Most television sets and set-top cable boxes Most automotive multimedia systems Countless millions of other applications

                                https://sqlite.org/mostdeployed.html

                                  • mr_toad

                                    today at 7:39 PM

                                    That’s a comprehensive list of single user devices.

                                      • larubbio

                                        today at 8:55 PM

                                        'production' doesn't equal 'multi-user concurrent access'. There are production uses where sqlite is a valid choice even if it may not be the best choice for multi-user production use cases.

                                        • ibejoeb

                                          today at 8:25 PM

                                          Single-user, a single natural person, doesn't striclty mean single-accessor though. I don't think anyone here is suggesting that sqlite is a viable replacement a for any networked client/server postgresql system, but it is certainly capable of handling more than the most basic 1:1 tasks. Beyond that, the point is that you only need a file, so when you have natural data boundaries, a lot of problems decompose to that single user/single concern paradigm.

                                      • pibaker

                                        today at 7:51 PM

                                        GP calls out concurrency as a weakness of SQLite. Most of the examples here don't experience the same load even a moderately sized web service experience day to day.

                                        And no, being a part of the python standard library doesn't means it is being used by the average python user. These days I'd say at least half of them are just there for machine learning.

                                        • ksd482

                                          today at 7:52 PM

                                          levkk is talking about concurrency. The list you gave doesn't explain high concurrency requirements for usage.

                                            • rpdillon

                                              today at 8:02 PM

                                              My read is that levkk is conflating concurrency with "real production apps" and this whole thread is starting to surface that "real production apps" and "high concurrency" are not measuring the same thing at all.

                                              Sqlite is used in real production apps more than any other database.

                                              Sqlite is also weak at any sort of write concurrency.

                                              Both can be true.

                                          • petcat

                                            today at 7:57 PM

                                            sqlite is great for the contacts app on your phone, but that's it.

                                            Hipp even said that it is not a replacement for a real multi-user, concurrent RDMS. Its primary competitor is "fsync".

                                        • bborud

                                          today at 7:26 PM

                                          Computer science no more get its hands dirty with concrete software than physics primarily being about building bridges.

                                          It is not «a foundational principle of computer science».

                                          • bastardoperator

                                            today at 8:57 PM

                                            Are you one of my enterprise customers? What if your workload does not require write concurrency?

                                            • MagicMoonlight

                                              today at 8:51 PM

                                              How many production apps do you think have enough users to justify these huge DB servers?

                                              • sevenzero

                                                today at 7:05 PM

                                                Isn't concurrency also limited by your machines disk speed for writes, what difference does it make if you write sequentially vs concurrently? Why does concurrency even matter for databases?

                                                  • malisper

                                                    today at 7:31 PM

                                                    > Isn't concurrency also limited by your machines disk speed for writes, what difference does it make if you write sequentially vs concurrently? Why does concurrency even matter for databases?

                                                    For a simplified example, having three processes reading blocks X, Y, Z in parallel is much faster than having a single process read block X, wait for the read to finish, read block Y, wait for the read to finish, read block Z and wait for the read to finish.

                                                    • refulgentis

                                                      today at 7:10 PM

                                                      > Isn't concurrency also limited by your machines disk speed for writes

                                                      Yes, in theory: given a large enough database, and a disk that can only do one operation at a time, and a large enough operation that touches enough of the database. In practice, in a SQLite single tenant scenario? No, not at all.

                                                      > what difference does it make if you write sequentially vs concurrently. Why does concurrency even matter for databases?

                                                      As soon as your codebase involves reacting to events independently of a user taking action it becomes a practical concern. Generally, this is a broad question and has 1,000,000 answers.

                                                      EDIT: Originally I had "I think you understand generally, no?" appended but realized that's not helpful at all, if you did, you wouldn't be asking.

                                                      Something that may help is imagining what'd happen if a DB wasn't thread safe / didn't allow multiple writers. Ex. in SQLite's case, it allows multiple write operations to take place but there's a one-at-a-time queue. If we didn't have databases that were able to execute multiple writes simultaneously, you'd need a separate database for each concurrent writer you expect, and you'd effectively have a global lock. Orderly scaling would be ~impossible unless you did something crazy like have a single server per user

                                                        • sevenzero

                                                          today at 7:20 PM

                                                          I guess I need to dive deeper into this as I do not understand the implications you gave me, but I appreciate the attempt. Generally I understand why concurrency is good in many cases, I just dont get why its important for database stuff too.

                                                          Edit: thanks for clarifying in the edit, makes a lot more sense.

                                                            • strbean

                                                              today at 8:00 PM

                                                              Imagine if every tweet had to go through a one-at-a-time queue before being persisted. There's about 6000 tweets per second, so you would have to be able to save them at <0.17ms per tweet or else you would become backlogged. If you are getting backlogged, you have to buffer those incoming tweets somewhere until they can be writted, and eventually that buffer gets full and you start losing tweets.

                                                                • goobatrooba

                                                                  today at 9:01 PM

                                                                  Maybe that too is a native question, but there's a large scale between single user and 6000 tweets per second - most of our apps will never reach anything approaching even one save a second. So where to draw the line? I do far have gone the sqlite route for my hobby apps as it's so easy to handle and doesn't require setting up two docker containers for a single app. Am I drawing myself in a corner in case my apps ever do become relevant?

                                                                  • sevenzero

                                                                    today at 8:21 PM

                                                                    While I understand your point and like the explanation, I gotta make the joke that some Tweets should be lost

                                                    • dboreham

                                                      today at 8:59 PM

                                                      And of course there are now several responses proving your point.

                                                      • O3marchnative

                                                        today at 7:21 PM

                                                        > This is a foundational principle of computer science

                                                        How exactly is this a foundational principle of computer science?

                                                        • teaearlgraycold

                                                          today at 6:56 PM

                                                          Well if you run a tiny single-threaded app then SQLite is a nice simplification over spinning up a separate machine for Postgres.

                                                            • eterm

                                                              today at 8:20 PM

                                                              Or you can run postgres on the same machine as the application, which lets you much more easily migrate if the time comes when you need to scale to multiple application servers.

                                                              There's a world between "local file" and "network DB server", running a DB server locally has lots of benefits from being able to easily query from outside if needed to forcing you to consider concurrency without the latency overhead of a network hop.

                                                                • s_ting765

                                                                  today at 8:43 PM

                                                                  This decision tree doesn't make much sense to me. Why you someone forego performance today in favor of adding a completely unnecessary network layer to every DB query in order to "satisfy" future imaginary "scaling concerns"?

                                                                  • eddd-ddde

                                                                    today at 8:48 PM

                                                                    That's still orders of magnitude more complexity for no real benefit. A migration from sqlite to postgres, if really required, is not that hard.

                                                                • ai_fry_ur_brain

                                                                  today at 7:38 PM

                                                                  I use postgres for very simple apps. I have a Dockerfile I use in my boilerplate repo. It takes a single make cmd for me to build, start and run migrations. Its as simple as using sqlite.

                                                                    • tasuki

                                                                      today at 9:05 PM

                                                                      But now you have another process to babysit. How do you keep it healthy? And you have to ensure the client-server communication won't break.

                                                                      For me the main benefit of sqlite is that it's a library rather than an app.

                                                              • onlyrealcuzzo

                                                                today at 6:56 PM

                                                                It's almost as if Postgres isn't perfect, and one size shoe doesn't fit all.

                                                                Some people want some of the benefits you get from SQLite.

                                                                SQLite is obviously not perfect, but it's an incredible piece of software, and people regularly find good ways to make use of an excellent pieces of software.

                                                                • today at 7:02 PM

                                                                  • switchbak

                                                                    today at 8:06 PM

                                                                    I mean - I agree for the typical multi-user, SaaS webapp. But I don't think that's what these folks are proposing. If they are - yeesh, count me out.

                                                                    If on the other hand they're talking about single-user, software in the small - hell yeah. In fact, I'd also promote DuckDB in this regard (mostly for analytics) - with the power of a single machine these days, you can do a surprising amount and never have to worry about distribution. Unless you know you'll have to, in which case you're probably just digging yourself a hole?

                                                                      • lunar_mycroft

                                                                        today at 8:32 PM

                                                                        The typical multi-user SaaS webapp doesn't have anywhere near enough users to overwhelm a single SQLite instance. Of the few that do succeed to the point where that's no longer true, a significant fraction can use techniques like sharding to stretch SQLite further.

                                                                    • fragmede

                                                                      today at 7:04 PM

                                                                      So teach them. If you want to bring up computer science fundamentals, the question is where does SQLite sit with regards to the CAP theorem. Consistency, Availability, and Partition tolerance. SQLite isn't a distributed system, so there are no partitions to tolerate, so it's a CA system. Other databases make different tradeoffs. For systems that don't need concurrent writes, SQLite is pretty great! There are no users to manage, no permissions, no daemon to run, no server and port to mix up. Just open a file on disk using a library.

                                                                        • refulgentis

                                                                          today at 7:06 PM

                                                                          Strawman, no? "run an Obelisk server with a SQLite database", now we're distributed.

                                                                          SQLite is a nice local store. It's this server stuff that I don’t grok, well, yet. :)

                                                                            • 9rx

                                                                              today at 8:37 PM

                                                                              In the beginning apps and SQL were co-mingled. Oracle eventually came along and noticed that people wanted SQL on the network so that many different apps, running on different computers, could all access the same data. But then people realized that clients really want rich, 'tree'-like data, not simple rows and columns, so people started sticking networked databases in front of networked databases to serve as a transformation system. And now people are realizing that the second networked database layer is redundant and never used beyond what is required for the client-facing network database, so they are moving the storage back into the first network database layer, just like Oracle did all those years ago. What is old is new again.

                                                                              • fragmede

                                                                                today at 7:26 PM

                                                                                What changed is SSDs. SSDs means that local access is faster than hitting the network. An expensive SAN stopped making sense because of this in specific cases. So for read heavy, or even read only database loads, you copy the SQLite file to the node that's processing the file, and just update that file whenever the data does get changed.

                                                                        • pstuart

                                                                          today at 8:37 PM

                                                                          Sure, SQLite doesn't solve every problem -- but in many cases it solves the need at hand with the reward of one less piece of infra required to support it.

                                                                          I see obsessions with tooling/solutions constantly from experienced devs who fall in love with the original solution and think it's the only way to do things -- so the experience part cuts both ways.

                                                                          • BoredPositron

                                                                            today at 7:45 PM

                                                                            I worked on an app that had sqlite databases per user... it was fine.

                                                                            • doctorpangloss

                                                                              today at 7:19 PM

                                                                              sqlite is more like a file format than a database. it competes with .xlsx.

                                                                              > "SQLite for everything" crowd is a little bit inexperienced.

                                                                              every time i see it in a real application, it becomes a huge focus of issues (for example: jellyfin, hermes, openwebui, comfyui)

                                                                                • fragmede

                                                                                  today at 7:28 PM

                                                                                  What kind of issues commonly arise?

                                                                              • refulgentis

                                                                                today at 7:03 PM

                                                                                I absolutely 100% do not understand it either. At all. Every time I try to over the last year or two I come away with the conclusion its something that sounds cool (to me too!) but is guaranteed to cause more problems than more obvious solutions.

                                                                                That being said I'd kill for someone who used it and benefited to explain it to me in a practical sense. (specifically where syncing is involved, and syncing a subset of the SQLite is necessary. If it's "just" a document store thats treated like a blob for syncing/backup, that's familiar. If it's all in one storage but only local, that's familiar.)

                                                                                Re: TFA, I guess it would have helped if I knew what Obelisk was, which is on me, and a more in-depth explanation of how this ties into AI/agents, which is on the industry/writer.

                                                                            • m2f2

                                                                              today at 8:13 PM

                                                                              There's a wide gap from files to multipartition databases. Running databases in a container is not for me sorry whenever real production stuff is on the table.

                                                                              Personally, lots of ETL can just be taken care of locally without involving enterprise databases. In such cases, DuckDB is 5x-10x better than SQLite and orders of magnitude simpler/faster than spinning up a dedicated Postgres database.

                                                                              For general scripting, there's no match between a 20-lines awk script and a much cleaner, robust, maintainable equivalent SQL script based on DuckDB.

                                                                              I just hope MotherDuck don't need to pump/dump for IPO - it would be sad losing that tool for the usual corporate greed.

                                                                                • szarnyasg

                                                                                  today at 8:46 PM

                                                                                  Hello, DuckDB devrel here. First, thanks for the kind words :)

                                                                                  Second, it's funny you should mention the 20-line awk script. I was making a very similar argument yesterday at the Ubuntu Summit: at some point, using shell scripts with GNU coreutilus becomes impractical, while DuckDB SQL scripts scale better in terms of complexity and maintainability (and often also performance). My slides are here: https://blobs.duckdb.org/slides/duckdb-ubuntu-summit-2026.pd... (pages 32 to 36)

                                                                                  Third, MotherDuck develops a closed-source DBaaS on DuckDB. They build on DuckDB, and you connect to MotherDuck with DuckDB but they are a separate VC-funded company headquartered in Seattle. DuckDB is developed by DuckLabs, a bootstrapped (revenue-funded) company in Amsterdam. And the IP of the project is in a third organization: a Dutch non-profit called the DuckDB Foundation. For details, see https://duckdb.org/faq#how-are-duckdb-the-duckdb-foundation-...

                                                                              • shukantpal

                                                                                today at 6:47 PM

                                                                                SQLite is surprisingly performant for single node applications even when comparing to Postgres. Postgres consumes a lot more memory and requires IO to hop through IPC whereas you can keep everything in process in SQLite with a shared connection pool.

                                                                                I've been testing different storage engines for my agent harness and I can get up to 7.5k concurrent sessions on a single vCPU with SQLite whereas Postgres crashes or runs out connections.

                                                                                [0] https://github.com/impalasys/talon/pull/23#issuecomment-4577...

                                                                                  • bob1029

                                                                                    today at 7:24 PM

                                                                                    When used properly, SQLite is effectively an in-process method invoke. If the only remaining things in the way are your runtime, kernel, file system and a local NVMe storage device, you may find it massively outperforms hosted alternatives.

                                                                                    Leaving the current thread is where you lose the game in terms of latency. SQLite can work on timescales measured in microseconds if you don't force interthread communication.

                                                                                    • onlyrealcuzzo

                                                                                      today at 6:53 PM

                                                                                      > SQLite is surprisingly performant for single node applications even when comparing to Postgres.

                                                                                      In the context of SQLite being understood to be a quite excellent piece of software - shouldn't we expect it to be?

                                                                                      In the context of a single-node, Postgres is overkill. It should not be expected to be competitive with SQLite.

                                                                                      This is almost like benchmarking an in-memory HashMap to Redis and being surprised that it performs well in ideal conditions.

                                                                                        • shukantpal

                                                                                          today at 6:59 PM

                                                                                          Yes, agreed on SQLite/Postgres. But I'm going to benchmark RocksDB next and see what the performance characteristics are. I suspect the LSM tree storage engine of RocksDB might perform better since agents are so write heavy when running highly concurrent workloads. After all, you are streaming LLM tokens into disk and fanning them out to subscribed clients.

                                                                                            • onlyrealcuzzo

                                                                                              today at 7:01 PM

                                                                                              You might want to start here: https://docs.cozodb.org/en/latest/releases/v0.3.html

                                                                                                • andriy_koval

                                                                                                  today at 7:15 PM

                                                                                                  That project has 0 commits for 2 years.

                                                                                                    • onlyrealcuzzo

                                                                                                      today at 8:02 PM

                                                                                                      What does that have to do with their research on the exact topic OP was looking into?

                                                                                                        • andriy_koval

                                                                                                          today at 8:09 PM

                                                                                                          Abandoned research of unknown quality is strong signal to downprioritize that direction

                                                                                                      • recursive

                                                                                                        today at 7:37 PM

                                                                                                        Sounds pretty stable

                                                                                    • mburaksayici

                                                                                      today at 8:56 PM

                                                                                      Agreeing on the point, I needed NoSQL version on the similar uses, I've used TinyDB : https://mburaksayici.com/blog/2024/09/21/easy-to-use-nosql-p...

                                                                                      • stephenlf

                                                                                        today at 7:44 PM

                                                                                        Can’t wait to see the next iteration of this idea with “Logs are all you need for durable workflows.”

                                                                                          • gchamonlive

                                                                                            today at 7:49 PM

                                                                                            Are logs all you need for durable workflows? I'm confused here. How'd persist and query nested or related data over logs? By logs I assume you mean something like elasticsearch or meilisearch?

                                                                                              • deathanatos

                                                                                                today at 8:48 PM

                                                                                                I assume they meant a log like a WAL. A WAL should be (quite literally?) all you need for durable workflows.

                                                                                                A distributed WAL (to survive a machine death) would also probably be something I'd want, and … something I'm not sure you're getting directly from SQLite.

                                                                                                • wolttam

                                                                                                  today at 8:03 PM

                                                                                                  Pretty much every durable system has an intent log of some sort. The log provides durability, the database system just integrates that log into a more queryable format.

                                                                                                    • notawhitemale

                                                                                                      today at 8:32 PM

                                                                                                      [dead]

                                                                                                  • fourside

                                                                                                    today at 8:26 PM

                                                                                                    I read the parents comment as sarcasm and not a serious suggestion.

                                                                                                • this_user

                                                                                                  today at 8:15 PM

                                                                                                  Shortly followed by:

                                                                                                  "Sockets are all you need for durable workflows" and then finally "Kernel primitives are all you need for durable workflows."

                                                                                                  But seriously, part of being a professional is using the right tool for the job.

                                                                                              • golem14

                                                                                                today at 7:09 PM

                                                                                                Litestream releases 5.9 and newer have a bug that causes instances to sync an insane amount of data. a DB with <10K of data in it and practically no writes/reads causes something like 10GB of daily replication traffic. For my toy project that got needlessly expensive.

                                                                                                • skybrian

                                                                                                  today at 8:42 PM

                                                                                                  Instead of "just use Litestream," I'd like to see a review of different object stores one could use and which ones work well with Litestream. Is there a nice object store I could run in another Linux VM? As a hobbyist, which services providing an S3-like API make the most sense?

                                                                                                • yokoprime

                                                                                                  today at 7:08 PM

                                                                                                  If you're just doing workflows from a single node, i guess it can be ok as long as theres a single writer. But scaling across multiple servers it clearly is not all you need.

                                                                                                  • kubik369

                                                                                                    today at 6:25 PM

                                                                                                    Meta comment: This is a domain under my countries TLD (Slovakia) and it is one of the handful of words that are a word with the TLD in my language (and coincidentally) also in English. Every now and then, I will check on the domains with a retrograde dictionary for domains that have this property and root of this particular domain had a roundcube email server on it (can be checked on archive.org). After further checking, the local company actually named themselves Obeli s.r.o. (s.r.o. is Ltd), presumably so that they could use a domain that is a real word when said together with the TLD. (EDIT:) Forgot to write the thing I wanted to mention in the first place: it appears the domain must have lapsed and/or the author bought it from the company that was using it.

                                                                                                    Another fascinating fact: our countries TLD has been stolen Ocean's 11 style (I am not kidding). After Czechoslovakia split into Czech Republic and Slovak Republic, the newly created Slovak .sk TLD has been under the care of people from the local university. The university also had some offices that they were leasing out. Someone had leased this office space (EDIT: this is important as this means they had the same physical address), created a company that had the same name as the NGO that was taking care of the domain, so e.g. the NGO was named "My Company o.z." and the perpetrator created a "My Company s.r.o." (our countries version of the american Ltd). This person then wrote to ICANN to change the address to the "My Company s.r.o." presumably under the pretense that this was just an administrative error and from this point, they have functionally taken custody of the TLD. I was not able to find how they did it technically, but I presume they persuaded ICANN to then point to their servers instead of the real ones. After this happened, it seems that no one noticed for some time. When they noticed, they tried taking it back, but they weren't able to. For some inexplicable reason, the government during that time (Ĺ uster era, early 2000s) gave the new company a contract that was functionally uncancellable from the government side. Later governments made this even more uncancellable and in 2017, then Minister of IT (and as of this day president!) Pellegrini made the contract literally uncancellable. As a result of this, we have one of the most expensive domains around (18e/year, rising each year for no good reason). (EDIT:) The company running our countries TLD is now a foreign entity that the whole thing has been sold to (multiple owners over time) and we as a country have no control over if I understand it correctly.

                                                                                                    I might have gotten some details wrong as I am writing this from my memory of researching it a couple of years back, but you get the idea, crazy stuff. Here is an article in Czech [0] that tells the story a bit better, but you have to translate it.

                                                                                                    [0] https://www.root.cz/clanky/pribeh-domeny-sk-aneb-kradez-za-b...

                                                                                                    // EDIT: I have found that the article actually links the movement to return the TLD back [1]. It also has a story tab [2], so they have something much more precise than the paraphrasing I wrote.

                                                                                                    [1] https://www.nasadomena.sk/

                                                                                                    [2] https://www.nasadomena.sk/historia/

                                                                                                    • Xcelerate

                                                                                                      today at 6:05 PM

                                                                                                      Haha, I just started doing this on my own. Found it helps the agents preserve state better. I typically ask them to design a DAG first based on a set of specifications and then execute it (each step stores something in a SQLite DB). Iteration is pretty simple then because I just ask for a tweak to one or two steps of the DAG, and then to re-run.

                                                                                                      Funny how people are independently converging on similar patterns of "what works" here. Still feels like we're in the wild west with all these ad-hoc patterns of agent orchestration that people are coming up with.

                                                                                                        • zrail

                                                                                                          today at 7:58 PM

                                                                                                          Same. The prompt was essentially, every checkbox in this PLAN.md should be task in SQLite.

                                                                                                      • orliesaurus

                                                                                                        today at 8:24 PM

                                                                                                        Surprised no one has mentioned Turbopuffer yet [1] which natively supports dense vector similarity and BM25 keyword indexes out of the box

                                                                                                        [1]. https://turbopuffer.com/

                                                                                                        • localhoster

                                                                                                          today at 7:01 PM

                                                                                                          Idk if this article was vibe written or the author just "got adjusted" but it's clearly is, and it's unreadable. Man this becomes anmoying

                                                                                                          • sgloutnikov

                                                                                                            today at 6:14 PM

                                                                                                            It's close enough that DBOS does support SQLite. [0] The default for prototyping is SQLite, but sure you can run it in production if you wanted.

                                                                                                            Obligatory list of workflow engines and libraries because it's such a common need that a lot have rolled their own. [1]

                                                                                                            [0] https://docs.dbos.dev/python/tutorials/database-connection

                                                                                                            [1] https://github.com/meirwah/awesome-workflow-engines

                                                                                                            • 0x59

                                                                                                              today at 7:07 PM

                                                                                                              Big complex data model with ambiguous query patterns? Postgres

                                                                                                              Small, well defined, data model with known query patterns? Bespoke model

                                                                                                              There probably is a place for sqlite and my project space so far hasn't yet well-aligned with it.

                                                                                                                • asdff

                                                                                                                  today at 7:22 PM

                                                                                                                  Probably going to get some winces for this but I do everything with flat files. Maybe my data aren't massive enough, but I mean I can do the relational thing by just having these metadata in some column, and returning rows that contain my desired information in these columns. Even if the file were too big to fit into memory one could just subset chunks of it and chew through. All this can be done with no dependencies, just base libraries of a lot of languages.

                                                                                                              • netik

                                                                                                                today at 7:19 PM

                                                                                                                Until you scale past one machine…

                                                                                                                • bze12

                                                                                                                  today at 7:33 PM

                                                                                                                  Isn’t this very similar to cloudflare durable objects & workflows?

                                                                                                                  • ChrisArchitect

                                                                                                                    today at 7:20 PM

                                                                                                                    Related:

                                                                                                                    Building durable workflows on Postgres

                                                                                                                    https://news.ycombinator.com/item?id=48313530

                                                                                                                  • EGreg

                                                                                                                    today at 6:03 PM

                                                                                                                    Files is all you need.

                                                                                                                    https://xkcd.com/378/

                                                                                                                      • tclancy

                                                                                                                        today at 6:06 PM

                                                                                                                        Post It Notes will do if you have a good system.

                                                                                                                        • contingencies

                                                                                                                          today at 6:49 PM

                                                                                                                          Those who don't understand Unix are condemned to reinvent it, poorly. - Henry Spencer .. via https://github.com/globalcitizen/taoup

                                                                                                                      • orf

                                                                                                                        today at 6:04 PM

                                                                                                                        > The caveat is that Litestream replication is asynchronous. A restore can miss the newest local writes if the SQLite volume disappears before they are copied. That is fine for many AI and experimentation workflows

                                                                                                                        In short: SQLite is not all you need, unless you’re just experimenting don’t actually care about durability, in which case you also need litestream + object storage.

                                                                                                                        Right.

                                                                                                                          • gwking

                                                                                                                            today at 6:24 PM

                                                                                                                            The suitability of Litestream for production disaster recovery is also an open question in my mind. I used 0.3.x for several years and when I tried to upgrade to the 0.5.x series there were runaway disk usage problems that would have caused downtime had they made it to prod. As far as I can tell these have not been entirely addressed, although recent bug reports suggest that they might be getting closer.

                                                                                                                            I want to love it, and I don't take open source projects like this for granted. But during my last production upgrade I chose to decommission Litestream in favor of a dumber, less granular solution using sqlite3_rsync and nightly backups because there is no point in using a backup system that is not rock solid.

                                                                                                                            • 0cf8612b2e1e

                                                                                                                              today at 6:24 PM

                                                                                                                              Postgres also does not synchronously replicate for free. You can setup both to get a confirmation write if you require that durability.

                                                                                                                                • orf

                                                                                                                                  today at 6:28 PM

                                                                                                                                  > postgresql also does not synchronously replicate

                                                                                                                                  By default. Generally your primary database is in a completely different failure category than a kubernetes node running an ephemeral workflow pod.

                                                                                                                                    • 0cf8612b2e1e

                                                                                                                                      today at 6:45 PM

                                                                                                                                      Either you have durable storage or you do not. SQLite and Postgres can both ensure local durability of commits. If you want distributed durability, you need to ship that data elsewhere. That is another Postgres node, object store, whatever that’s still an external dependency.

                                                                                                                                  • paulddraper

                                                                                                                                    today at 6:31 PM

                                                                                                                                    Not for free, but without the needing additional software.

                                                                                                                                      synchronous_commit = on

                                                                                                                                      • 0cf8612b2e1e

                                                                                                                                        today at 6:38 PM

                                                                                                                                        That’s about the local transaction, not replication. SQLite WAL also gives you strict durability.

                                                                                                                                          PRAGMA synchronous = full

                                                                                                                                • bootsmann

                                                                                                                                  today at 6:22 PM

                                                                                                                                  S3 is strongly consistent, if you need it anyways you can just use s3 keys to deconflict and store the workflow state.

                                                                                                                                    • orf

                                                                                                                                      today at 6:25 PM

                                                                                                                                      Yes, but directly using s3 as a key-value database is completely different from using SQLite + litestream.

                                                                                                                                  • paulddraper

                                                                                                                                    today at 6:32 PM

                                                                                                                                    "Durable workflows without the durability"

                                                                                                                                    That's distributed workflows :)

                                                                                                                                    • dilyevsky

                                                                                                                                      today at 7:14 PM

                                                                                                                                      i mean it's durable as long as nothing crashes or litestream has a data corruption bug which only happens every other release...

                                                                                                                                  • tutamon

                                                                                                                                    today at 7:59 PM

                                                                                                                                    [dead]

                                                                                                                                    • madbo1

                                                                                                                                      today at 8:15 PM

                                                                                                                                      [flagged]

                                                                                                                                      • CoderAshton

                                                                                                                                        today at 7:30 PM

                                                                                                                                        [dead]

                                                                                                                                        • steveharing1

                                                                                                                                          today at 7:03 PM

                                                                                                                                          [dead]