eveningtree
today at 9:02 AM
Rather than answering directly, I'm thinking about this problem from the other end altogether ever since I saw the dbricks rt demo. Apologies for the rambling response, as I haven't yet finished thinking about this problem...
We ended up with 'hot' data in oltp and 'cold/archival' data in olap because the storage size of oltp has always been limited.
(1) Limited by computation - there's only so much data that we can store on disks and nvme
(2) Limited by wallet - disks and nvme are EXPENSIVE
Also, the tight coupling of compute and data didn't help. It limited the size of databases on the individual expensive compute nodes.
So, another question will be -
What's currently stopping me from keeping the scd history tables right in my oltp db? what's forcing me to copy state into my etl/elt pipeline and the process it into scd into a dedicated olap db?
To some extent,the answer is still the same - the oltp cannot scale for the storage size required for keeping historical data. So, I've had to take out the 'cold' historical data and keep it in my olap freezer.
Now, if oltp itself is scaling, I'm not gonna bother with the copying step. I'll just prefer to store the history in oltp itself.
In my perspective (majorly from handling IoT systems), I need olap for 2 reasons - (1) storage scalability, and (2) analytical processing speed
I now consider (1) to be a solved problem
As for (2), I'm still not sure how this architecture ends up matching the query processing speeds of column-oriented storages. But again, I need to study more.
The SCD pipeline still remains in some form. Either in the form of (1) scd rows that we currently keep (etl pipeline)
, or (2) as older lsn rows that simply don't get deleted (existing db engine).
I've done quite a lot of experimentation with (2), and it is a pretty solid concept to work with.
I've spent quite a lot of years hammering my brain at databases and datastores in general. And I've now got a feeling that this is it.
Finally.