Tiled Hacker news on React Router

SQLite is all you need for durable workflows

206 points - today at 5:54 PM

Source

bitexploder
today at 6:17 PM
I started setting up my workflows using Temporal. It deploys as relatively light weight local app. For an isolated local installation it uses SQLite. It makes the process of dealing with API retries and organizing workflows and tasks really simple. I recommend giving it a try. It is, philosophically, exactly what this article is suggesting, but it adds an incredibly rich and flexible interface for agents to work with. Additionally, the web UI makes it very easy to inspect workflows, review agent execution, etc. Temporal also encodes much higher reliability into your system, almost for free. Distributed and reliable systems are hard, don't reinvent the wheel IMO.
If you find yourself wanting things like an easy way to then introspect your SQLite database, figure out what is happening in the workflow, compose individual tasks, make workflows trivially callable, etc, give Temporal a look.
Alongside this, I have mostly moved away from files for agents. Markdown and JSON are great, but also feel like traps when building out smaller local apps. LLMs are great at SQLite and you can render anything you want out of it (Markdown, JSON, etc). It saves a lot of tokens when an agent can just query a specific row instead of having to fire up jq or grep through markdown. You get a nice portable self contained data management system that encourages agents to be more disciplined about how they structure their data than a bunch of files. It also continues to scale into MySQL/Postgres if your little local projects start to outgrow or become more formal, you already have schema and discipline around data.
levkk
today at 6:54 PM
I don't understand this obsession with SQLite for real, production apps. SQLite is an embedded database, completely unsuitable for managing concurrency. This is what database _servers_ are for, e.g., Postgres, MySQL, etc. Their entire job is to allow you to modify data from multiple processes, on different machines, at the same time.
This is a foundational principle of computer science. It seems to me that the "SQLite for everything" crowd is a little bit inexperienced.
m2f2
today at 8:13 PM
There's a wide gap from files to multipartition databases. Running databases in a container is not for me sorry whenever real production stuff is on the table.
Personally, lots of ETL can just be taken care of locally without involving enterprise databases. In such cases, DuckDB is 5x-10x better than SQLite and orders of magnitude simpler/faster than spinning up a dedicated Postgres database.
For general scripting, there's no match between a 20-lines awk script and a much cleaner, robust, maintainable equivalent SQL script based on DuckDB.
I just hope MotherDuck don't need to pump/dump for IPO - it would be sad losing that tool for the usual corporate greed.
shukantpal
today at 6:47 PM
SQLite is surprisingly performant for single node applications even when comparing to Postgres. Postgres consumes a lot more memory and requires IO to hop through IPC whereas you can keep everything in process in SQLite with a shared connection pool.
I've been testing different storage engines for my agent harness and I can get up to 7.5k concurrent sessions on a single vCPU with SQLite whereas Postgres crashes or runs out connections.
[0] https://github.com/impalasys/talon/pull/23#issuecomment-4577...
mburaksayici
today at 8:56 PM
Agreeing on the point, I needed NoSQL version on the similar uses, I've used TinyDB : https://mburaksayici.com/blog/2024/09/21/easy-to-use-nosql-p...
stephenlf
today at 7:44 PM
Can’t wait to see the next iteration of this idea with “Logs are all you need for durable workflows.”
golem14
today at 7:09 PM
Litestream releases 5.9 and newer have a bug that causes instances to sync an insane amount of data. a DB with <10K of data in it and practically no writes/reads causes something like 10GB of daily replication traffic. For my toy project that got needlessly expensive.
skybrian
today at 8:42 PM
Instead of "just use Litestream," I'd like to see a review of different object stores one could use and which ones work well with Litestream. Is there a nice object store I could run in another Linux VM? As a hobbyist, which services providing an S3-like API make the most sense?
yokoprime
today at 7:08 PM
If you're just doing workflows from a single node, i guess it can be ok as long as theres a single writer. But scaling across multiple servers it clearly is not all you need.
kubik369
today at 6:25 PM
Meta comment: This is a domain under my countries TLD (Slovakia) and it is one of the handful of words that are a word with the TLD in my language (and coincidentally) also in English. Every now and then, I will check on the domains with a retrograde dictionary for domains that have this property and root of this particular domain had a roundcube email server on it (can be checked on archive.org). After further checking, the local company actually named themselves Obeli s.r.o. (s.r.o. is Ltd), presumably so that they could use a domain that is a real word when said together with the TLD. (EDIT:) Forgot to write the thing I wanted to mention in the first place: it appears the domain must have lapsed and/or the author bought it from the company that was using it.
Another fascinating fact: our countries TLD has been stolen Ocean's 11 style (I am not kidding). After Czechoslovakia split into Czech Republic and Slovak Republic, the newly created Slovak .sk TLD has been under the care of people from the local university. The university also had some offices that they were leasing out. Someone had leased this office space (EDIT: this is important as this means they had the same physical address), created a company that had the same name as the NGO that was taking care of the domain, so e.g. the NGO was named "My Company o.z." and the perpetrator created a "My Company s.r.o." (our countries version of the american Ltd). This person then wrote to ICANN to change the address to the "My Company s.r.o." presumably under the pretense that this was just an administrative error and from this point, they have functionally taken custody of the TLD. I was not able to find how they did it technically, but I presume they persuaded ICANN to then point to their servers instead of the real ones. After this happened, it seems that no one noticed for some time. When they noticed, they tried taking it back, but they weren't able to. For some inexplicable reason, the government during that time (Šuster era, early 2000s) gave the new company a contract that was functionally uncancellable from the government side. Later governments made this even more uncancellable and in 2017, then Minister of IT (and as of this day president!) Pellegrini made the contract literally uncancellable. As a result of this, we have one of the most expensive domains around (18e/year, rising each year for no good reason). (EDIT:) The company running our countries TLD is now a foreign entity that the whole thing has been sold to (multiple owners over time) and we as a country have no control over if I understand it correctly.
I might have gotten some details wrong as I am writing this from my memory of researching it a couple of years back, but you get the idea, crazy stuff. Here is an article in Czech [0] that tells the story a bit better, but you have to translate it.
[0] https://www.root.cz/clanky/pribeh-domeny-sk-aneb-kradez-za-b...
// EDIT: I have found that the article actually links the movement to return the TLD back [1]. It also has a story tab [2], so they have something much more precise than the paraphrasing I wrote.
[1] https://www.nasadomena.sk/
[2] https://www.nasadomena.sk/historia/
Xcelerate
today at 6:05 PM
Haha, I just started doing this on my own. Found it helps the agents preserve state better. I typically ask them to design a DAG first based on a set of specifications and then execute it (each step stores something in a SQLite DB). Iteration is pretty simple then because I just ask for a tweak to one or two steps of the DAG, and then to re-run.
Funny how people are independently converging on similar patterns of "what works" here. Still feels like we're in the wild west with all these ad-hoc patterns of agent orchestration that people are coming up with.
orliesaurus
today at 8:24 PM
Surprised no one has mentioned Turbopuffer yet [1] which natively supports dense vector similarity and BM25 keyword indexes out of the box
[1]. https://turbopuffer.com/
localhoster
today at 7:01 PM
Idk if this article was vibe written or the author just "got adjusted" but it's clearly is, and it's unreadable. Man this becomes anmoying
sgloutnikov
today at 6:14 PM
It's close enough that DBOS does support SQLite. [0] The default for prototyping is SQLite, but sure you can run it in production if you wanted.
Obligatory list of workflow engines and libraries because it's such a common need that a lot have rolled their own. [1]
[0] https://docs.dbos.dev/python/tutorials/database-connection
[1] https://github.com/meirwah/awesome-workflow-engines
0x59
today at 7:07 PM
Big complex data model with ambiguous query patterns? Postgres
Small, well defined, data model with known query patterns? Bespoke model
There probably is a place for sqlite and my project space so far hasn't yet well-aligned with it.
netik
today at 7:19 PM
Until you scale past one machine…
bze12
today at 7:33 PM
Isn’t this very similar to cloudflare durable objects & workflows?
ChrisArchitect
today at 7:20 PM
Related:
Building durable workflows on Postgres
https://news.ycombinator.com/item?id=48313530
EGreg
today at 6:03 PM
Files is all you need.
https://xkcd.com/378/
orf
today at 6:04 PM
> The caveat is that Litestream replication is asynchronous. A restore can miss the newest local writes if the SQLite volume disappears before they are copied. That is fine for many AI and experimentation workflows
In short: SQLite is not all you need, unless you’re just experimenting don’t actually care about durability, in which case you also need litestream + object storage.
Right.
tutamon
today at 7:59 PM
[dead]
madbo1
today at 8:15 PM
[flagged]
CoderAshton
today at 7:30 PM
[dead]
steveharing1
today at 7:03 PM
[dead]

SQLite is all you need for durable workflows

bitexploder

svara

jawns

gopalv

phamilton

chaps

fragmede

embedding-shape

rick1290

peterson_lock

switchbak

pzduniak

baq

levkk

jph00

tasuki

lanstin

petcat

abtinf

rpdillon

droidjj

peterspath

mr_toad

larubbio

ibejoeb

pibaker

ksd482

rpdillon

petcat

bborud

bastardoperator

MagicMoonlight

sevenzero

malisper

refulgentis

sevenzero

strbean

goobatrooba

sevenzero

dboreham

O3marchnative

teaearlgraycold

eterm

s_ting765

eddd-ddde

ai_fry_ur_brain

tasuki

onlyrealcuzzo

switchbak

lunar_mycroft

fragmede

refulgentis

9rx

fragmede

pstuart

BoredPositron

doctorpangloss

fragmede

refulgentis

m2f2

szarnyasg

shukantpal

bob1029

onlyrealcuzzo

shukantpal

onlyrealcuzzo

andriy_koval

onlyrealcuzzo

andriy_koval

recursive

mburaksayici

stephenlf

gchamonlive

deathanatos

wolttam

notawhitemale

fourside

this_user

golem14