Tiled Hacker news on React Router

Postgres data stored in Parquet on S3: LTAP architecture explained

129 points - last Wednesday at 12:48 PM

Source

dsauerbrun
today at 12:24 PM
Maybe I'm too stupid to understand the article... How does this achieve performant querying for olap and oltp purposes?
Based on my understanding, olap queries will go to the parquet files which are stored in a columnar fashion and oltp style queries will go to a caching layer that sits on top of those parquet files?
What's the special sauce here? Seems like they're just caching the data which, for all intents and purposes, seems like the same solution of storing another copy of the data which is what they say they're avoiding.
saisrirampur
today at 3:39 PM
But why? I’m skeptical of the idea of unifying storage just because it sounds “elegant” or “cool”. It’s not obvious to me how a single storage engine can compete with purpose-built OLTP and OLAP systems like Postgres and ClickHouse, without significant tradeoffs.
You also mention removing CDC pipelines. I’m curious if the materialization (conversion across formats) can catchup to an OLTP workload that is heavy (50K+ tps), which is pretty common these days. Also CDC if done right and with care can be magical for users and stays native to the OLTP/OLAP data-store.
Third, data Lakes and open formats are suitable for Data Warehousing / Data analyst use-cases than real-time customer facing apps. Sure, you might work on changing that, which is what you are upto, but you’ll always run into tradeoffs, which will make it hard to unleash the best performance, much needed for the latter category.
Avalaxy
today at 11:06 AM
Super cool stuff. Being able to combine your analytical platform and transactional database into one storage layer without having to set up ETL pipelines in between is really a game changer. Especially since it's just postgres, instead of some proprietary database.
scritty-dev
today at 2:06 PM
So then would LTAP sit to both the left and the right of the medallion architecture? Meaning would you on the left of Bronze use it as an OLTP and to the right of Gold use it as an OLAP? Currently we've been mainly utilizing it to the right of Gold to develop analytic PERN applications that allow us to reuse the RBAC/ACLs set in Unity Catalog, but from this article it seems like that's only half of its utility?
andrenotgiant
last Wednesday at 1:44 PM
Here's what I don't understand:
Part of the value of doing an ETL pipeline via streaming replication is you get the full history of data in a table. An SCD type 2 table where each row also has a valid_from and valid_to timestamp column.
How would someone do the same thing with this architecture?
hasyimibhar
today at 5:04 PM
How does LTAP architecture deals with major Postgres upgrade? Is it truly zero-downtime for both upstream and downstream?
today at 1:20 PM
today at 12:59 PM
seobot_dk1289
last Wednesday at 1:10 PM
[dead]
PunchyHamster
today at 8:54 AM
I don't wanna see that S3 bandwidth bill after running some big query

Postgres data stored in Parquet on S3: LTAP architecture explained

dsauerbrun

conradludgate

dsauerbrun

nikita

viccis

saisrirampur

nikita

saisrirampur

ronfriedhaber

saisrirampur

nikita

saisrirampur

ronfriedhaber

creeksai

saisrirampur

Avalaxy

scritty-dev

andrenotgiant

khurs

_zoltan_

hasyimibhar

TheTaytay

hasyimibhar

eveningtree

ignoreusernames

nikita

hasyimibhar

seobot_dk1289

PunchyHamster

otterley

khurs

dockerd

khurs

scritty-dev

_zoltan_

khurs

_zoltan_

tux3

khurs

tux3

re-thc

xtracto

_zoltan_