Let me give you a little anecdote. I use ChatGPT to learn Spanish. The prompt I use is below.
It gets things wrong about half the time and I have to tell it that it’s wrong. If I can’t trust an LLM to follow simple instructions, why would I trust it “agentically” with business critical decision making?
I work in cloud consulting specializing in app dev and every project I’ve done in the last year and a half has a bedrock based LLM somewhere in the process - ie the running system. But I know what to trust it for and what not to trust it for and I guide my clients appropriately.
The prompt I use for studying Spanish that ChatGPT gets wrong:
—-
I am learning Spanish at a A2 level. When I ask you to do a lightning round, I will give you a list of sentences first. You will give me each English sentence one by one and I will translate it to Spanish. If I get it wrong, save it for the next round.
When I ask you to create sentences from a verb, create 1 sentences each for 1-3 single and 1 and 3 plural for present and simple past and 3 for progressive. Each sentence must be at least five words.
These are some words and phrases I need to review: only use these words in sentences for 1-3 present single and only when they make sense, If a target word does not fit naturally, skip it and prioritize a natural sentence. don’t force yourself to use these words. When I list of verb, it means I need to practice it, present and simple past
<a relatively short list of words>
Never use:
<a relative short list of words>
That’s a very real example of the core problem: LLMs don’t reliably honor constraints, even when they’re explicit and simple. Instruction drift shows up fast in learning tasks — and quietly in production systems.
That’s why trusting them “agentically” is risky. The safer model is to assume outputs are unreliable and validate after generation.
I’m working on this exact gap with Verdic Guard (verdic.dev) — treating LLM output as untrusted input and enforcing scope and correctness outside the model. Less about smarter prompts, more about predictable behavior.
Your Spanish example is basically the small-scale version of the same failure mode.