> For end-users, this dichotomy implies that agents are reliable for rapid prototyping but remain unreliable for production-grade backend development.
Time to start writing linting tools that check the architecture and spoon feed the LLM what exactly it's doing wrong.
I reckon something like this would be good for every project out there: https://www.archunit.org/getting-started
They expand a bit more on the reasoning behind it: https://www.archunit.org/motivation
(I also wrote a simple linter for architecture/code checks that aren't well encapsulated by ones that just focus on individual files, that uses Go + goja to write rules in ECMAScript and parallelize the read only ones and also allow ones that change files as necessary, in addition to something like Ruff / Oxlint / Oxfmt / whatever is present in each stack; though it's is still in development and not as good of a focused example as ArchUnit is)
If we write software specification docs, bother describing how it evolves with ADRs, enforce code style automatically and require certain test coverage automatically (or at least should), why couldn't we go a step further, formalize those specs and ensure that any new code is also up to snuff? I don't think that's any more of a job for an LLM, than telling it how it should format code is. Also, I'm in the camp that believes that at least many of your ORM mappings and similar stuff should be the output of codegen, since you've already gone through the trouble of describing the schema/migrations to get there.
I don't think this would be only good for LLMs, though - I've seen projects that have like 3 different audit systems built in, not because of some fancy business requirement, but rather cause the devs either didn't know about the previous one(s) or just didn't feel like following what should have been the pre-established conventions, even when there were docs in place (nobody read those).