From the article:
In distributed systems, there’s a common understanding that
it is not possible to guarantee exactly-once delivery of
messages.
This is not only a common understanding, it is a provably correct axiom. For a detailed discussion regarding the concepts involved, see the "two general's problem"[0].
To guarantee exactly once processing requires a Single Point of Truth (SPoT) enforcing uniqueness shared by all consumers, such as a transactional persistent store. Any independently derived or generated "idempotency keys" cannot provide the same guarantee.
The author goes on to discuss using the PostgreSQL transaction log to create "idempotency keys", which is a specialization of the aforementioned SPoT approach. A more performant variation of this approach is the "hi/low" algorithm[1], which can reduce SPoT allocation of a unique "hi value" to 1 in 2,147,483,648 times when both are 32-bit signed integers having only positive values.
Still and all, none of the above establishes logical message uniqueness. This is a trait of the problem domain, in that whether two or more messages having the same content are considered distinct (thus mandating different "idempotentcy keys") or duplicates (thus mandating identical "idempotency keys").
0 - https://en.wikipedia.org/wiki/Two_Generals'_Problem
1 - https://en.wikipedia.org/wiki/Hi/Lo_algorithm
> it is a provably correct axiom.
Pedantically, axioms by definition are assumed/defined without proof and not provable; if it is provable from axioms/definitions, it is a theorem, not an axiom.
[flagged]
It's just not what the word axiom means nor how anyone uses it. An axiom is unprovable by definition - is it a thing we accept to be true because it is useful to do so (e.g. there exists an empty set)
"Provably Correct Axiom" is nonsense. An axiom is unprovable.
Just "provably correct" would've been fine. This chess stuff is hilariously pretentious.
> Just "provably correct" would've been fine. This chess stuff is hilariously pretentious.
Apparently a bit of humour when responding to a self-identified pedantic response didn't come across as I thought it would.
Lesson learned.
It's grok-level cringe is what it is.
I just nod and keep playing checkers.
Sorry but the entire line of argumentation and all it’s chess flavor is miles off the mark. This is not the sound of arguing with someone who studied what they talked about.
> Sorry but the entire line of argumentation and all it’s chess flavor is miles off the mark.
Apparently a bit of humour when responding to a self-identified pedantic response didn't come across as I thought it would.
Lesson learned.
> And, of course, a hypothesis is capable of being proven.
No, that's just you not understanding the definition of 'postulate'.
Username checks out
> Username checks out
Of all possible replies in the set of "meaningful and/or contributory", this does not have membership thereof.
At least the other responders had the common courtesy of putting forth some intellectual effort and not resort to a trite Reddit meme.
> A more performant variation of this approach is the "hi/low" algorithm
I am discussing this approach, just not under that name:
> Gaps in the sequence are fine, hence it is possible to increment the persistent state of the sequence or counter in larger steps, and dispense the actual values from an in-memory copy.
In that model, a database sequence (e.g. fetched in 100 increments) represents the hi value, and local increments to the fetched sequence value are the low value.
However, unlike the log-based approach, this does not ensure monotonicity across multiple concurrent requests.
> However, unlike the log-based approach, this does not ensure monotonicity across multiple concurrent requests.
Is this a functional requirement when a system is multi-process? Specifically, a single multi-threaded process could provide monotonic guarantees across all messages processed.
Once 2+ processes are introduced, it is impossible to guarantee monotonicity globally unless a SPoT (a.k.a. "queue") is used for delivery due to variations in message delivery times, performance of machines running each process, etc. An additional implication is there is no provable way to determine if a message received by process "A" logically precedes a message received by process "B" without authoritative information in the messages.
In short, an answer which generalizes the question:
Can we get the space efficiency for consumers when using
monotonically increasing idempotency keys, without
hampering performance of multi-threaded producers?
Is to use the "hi/low" algorithm for both a single process multi-threaded solution, by only needing the process to allocate the "hi" value periodically and provide verifiable monotonic values, and also for multi-process (potentially multi-threaded as well) solutions, which ensures each process does not produce duplicate keys while providing a per-process partial ordering guarantee.
It should be noted that when you have bad actors in your system almost all guarantees of all kinds go out the window.