\

Launch HN: Ardent (YC P26) – Postgres sandboxes in seconds with zero migration

96 points - last Wednesday at 4:54 PM


Hey HN! We’re Vikram and Evan from Ardent (https://tryardent.com). We're building database sandboxes for you and your coding agents.

In the last two years coding agents have gotten dramatically more capable at handling complex engineering tasks. But without access to a realistic sandbox at the DB layer for testing, they ship garbage that can take down production databases. I spent over a year building an AI Data Engineer that failed for this exact reason. Evan spent the last 12 years in data engineering and hit this wall building agents at his last company.

Ardent was built to make it possible for coding agents to get near instant access to production-like sandboxes so they can test their work. To do this we write a replication stream out of the target DB, scaling with kafka onto a read replica with copy on write enabled and autoscaling compute (we currently prefer neon as a primary branching engine due to their implementation of these properties).

Our replication stream uses logical replication + ddl triggers to enable usage on any hosted postgres DB since most platforms do not allow physical replication which is traditionally used for creating replicas.

This provides a few primary benefits:

1. Does not require a platform migration to a DB provider like neon, allowing strong separation of production and development concerns. 2. Minimal impact on the production database while allowing clones to spin up in <6s, even at TB scale with copy-on-write

Security matters a lot with cloning production so we run a proxy layer to generate custom postgres URLs and route all connections to allow more granular access control to clones, prevent credential leak, and follow a split plane architecture to allow full data residency on your cloud through BYOC.

We also support anonymization through the ability to register SQL that runs on branches before they are returned. This has been used for PII redaction and branch modification.

Our goal is to make every data infrastructure platform “cloneable” in one place so agents can fully test the impact of their changes on production like data environments without risk.

Here's a demo of it: https://youtu.be/5S1kwPtiRU0

We’d love to understand how you work with coding agents on the DB and if you try Ardent (it's free to get started) what worked, what broke and what’s missing.

Source
  • eugercek

    last Wednesday at 6:12 PM

    If you use xfs (+`file_copy_method=CLONE`) you can do this with Postgres 18.

    `CREATE DATABASE clankerdb TEMPLATE sourcedb STRATEGY=FILE_COPY;`.

    But Ardent can be useful for many, because cloud providers uses heavily restricted Postgres. And many use Aurora, which doesn't event let configure the `log_line_prefix`.

    Though if cloud providers add file_copy_method=CLONE compatible managed pg ...

    ref: https://boringsql.com/posts/instant-database-clones/

      • the__alchemist

        last Wednesday at 11:29 PM

        Here's how I do it with Heroku. Are there some cloud services that don't have an equivalent?

          heroku pg:backups:capture --app x
          heroku pg:backups:download --app x
          pg_restore --verbose --clean --no-acl --no-owner -h localhost -U postgres -d y local_db_for_robots_etc.dump
        
        This takes more than 6 seconds. I'm curious how they achieved that for arbitrary DBs!

          • vc289

            yesterday at 12:21 AM

            We've got docs on how we did it :)

            https://docs.tryardent.com/architecture

            But essentially we get around the restrictions of the original DB by replicating into a different postgres compatible DB that essentially serves as a read replica. That DB is the one that branches but since it mirrors the original DB you get effective clones

            By doing this we get a lot more control over what we can do to create the clones. The read replica clones using copy on write + isolated autoscaling compute to clone in 6s. We use neon to do this since we think they've implemented those two properties well.

            Since it's default postgres logical replication + DDL triggers you can technically point it at any "branching enabled" db on the other end in order to achieve the same effect

        • nijave

          yesterday at 1:39 AM

          A little slow but on Aurora you can attach then promote read replicas. Iirc that's around 20 minutes but I haven't tested recently.

          I'd think you could also setup logical rep to a VM then snapshot and clone the storage which is generally pretty fast.

            • yandie

              yesterday at 2:05 AM

              You can create a new instance directly on AWS aurora. Takes less than 20 minutes!

                aws rds restore-db-cluster-to-point-in-time \
                    --source-db-cluster-identifier <source-cluster> \
                    --db-cluster-identifier <new-cluster> \                                         
                    --restore-type copy-on-write \                                                   
                    --use-latest-restorable-time \                                                   
                    --db-subnet-group-name <sub group> \                     
                    --vpc-security-group-ids <security group> \            
                    --serverless-v2-scaling-configuration MinCapacity=0,MaxCapacity=16

          • mnahkies

            last Wednesday at 8:28 PM

            I wanted to try doing something similar to this in our dev environment (think shared dev database but per branch clones), but this limitation seemed tricky to accept:

            > The source database can't have any active connections during cloning.

            I wouldn't mind some lock contention, but having to kill all connections seemed a bit harsh

              • IdontKnowRust

                yesterday at 12:00 PM

                I guess it's still possible with a replica that no one is using?

                So you don't need to touch real production db?

                Not sure if it applies for all use cases tou

                  • mnahkies

                    today at 8:26 AM

                    Yeah I think a read replica might fit the bill - though I suspect active logical replication counts as a connection in this context.

                    Using a cloud provider read replica might not (as I think that might use block level replication) - but then you're paying for an extra dev database host for the privilege

            • whalesalad

              yesterday at 1:30 AM

              Oh nice, the `/var` part of my workstation is a dedicated nvme drive and it's coincidentally formatted as xfs.

          • znnajdla

            last Wednesday at 5:13 PM

            “Never impacts production data” is impossible to guarantee. Playing with real world data often has side effects outside of the database. For example if you store oauth tokens to external services in your DB (customer integrations) it’s easy to mess up your customers data through a bad API call (been there done that).

            There is still value in carefully testing on your prod DB, but for that you could just easily maintain a read replica. I don’t see the need for a SaaS here.

              • vc289

                last Wednesday at 5:58 PM

                One of the main things people use us for is ease of testing writes on a per dev/agent basis which would be difficult on a read replica!

                On the real world data impact I absolutely agree. We added something called "branch hooks" which essentially let you define SQL to run against the branch before it's returned

                This lets you essentially anonymize and modify the branch to scrub unintended external side effects.

                It's something that we're still working on though and trying to design the right abstractions around because we want to get that part right.

                • 999900000999

                  last Wednesday at 6:49 PM

                  If it’s production data I probably don’t trust a random startup with it.

                  I’m very confused as to the target market here

                    • vessenes

                      yesterday at 4:13 PM

                      I’ll bite. You’re a dev at random mid size company and tasked with using this newfangled agentic tech to implement an intranet feature everybody wants and nobody else wants to build.

                      How do you get a staging and dev db together that’s going to let you test your migrations?

                  • tommy29tmar

                    yesterday at 9:45 AM

                    [flagged]

                      • dang

                        yesterday at 3:09 PM

                        Your comments are getting classified by our software as LLM-generated and/or LLM-edited. It's impossible to be certain, of course, but if this is the case—can you please not do this? It's not allowed here - see https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079. We end up banning accounts that do this repeatedly and I don't want to ban you.

                        LLMs are amazing and we use them heavily ourselves - but not for modifying text that is to be posted to HN. Doing so leaves imprints on the language that readers are increasingly allergic to, and we want HN to be a place human conversation.

                • jedberg

                  last Wednesday at 5:25 PM

                  Looks interesting, curious what your moat here is. What prevents Supabase/Neon from doing this? Actually don't they already do this? How does this differ from the branching Neon and Supabase already offer?

                    • vc289

                      last Wednesday at 5:44 PM

                      We enable branching on any postgres DB through our architecture. So if you're on RDS, Planetscale, etc you can keep your DB where it is but also get the ability to branch with a full clone of the DB.

                      Neon does support copy on write branching natively and autoscaling compute but you make certain performance tradeoffs. A lot of the folks we've talked to that use RDS or Planetscale are reliant on things like query latencies supported by that platform's specific architecture but also want the ability to test on branches. We let you get the best of both worlds (branch but leave your DB where it is and freely choose your production environment based on prod concerns)

                      Supabase does have branching but they do not branch the data so you can't test any interactions that rely on the data. You can restore from backup as an option but this slows down based on data size since you're actually moving data as opposed to copy on write.

                      Longer term we want to be the place you branch all your data infra. So expanding to S3, Snowflake, MySQL etc.

                      For now though we're focusing on just postgres and getting it right!

                        • dou99

                          yesterday at 1:37 AM

                          You’re literally using Neon? lol

                  • luodaint

                    yesterday at 5:30 PM

                    Drop-the-database events occur whenever the review process is bypassed. It’s “Just run it; the agent wrote it” that undermines trust. The migration may be semantically correct yet practically incorrect, such as renaming a column to break a service that is in use or creating an index that locks the table during heavy use.

                    What Ardent does makes sense for a team setting where several agents/developers require their own environment before deploying code. But from a one-man-show founder’s perspective, the constraint is not about isolation but rather self-discipline.

                    • nilirl

                      last Wednesday at 5:17 PM

                      Hi, site looks beautiful!

                      How does this compare to managing our own read-only replica with anonymized data?

                        • vc289

                          last Wednesday at 5:49 PM

                          A true read replica won't let you write! So if you need to test something like a backfill and see if anything goes wrong you wouldn't be able to quite as easily.

                          We'd let you instantly clone prod + user defined auto-anonymization so you can test writes. The architecture also somewhat takes the place of an existing read replica if you want to use it like that to make it more cost efficient.

                          Also since we're using copy on write for the clones they're incredibly storage efficient and the autoscaling compute helps minimize cost on clones by minimizing excess compute uptime

                            • jagged-chisel

                              last Wednesday at 6:02 PM

                              > A true read replica won't let you write!

                              I mean, they said "read-only" ...

                          • xnx

                            last Wednesday at 5:38 PM

                            Ardent adds extra dependencies and cost.

                        • cphoover

                          last Wednesday at 5:17 PM

                          How many people are giving an LLM Agent full read access to their production data? That seems nuts to me.

                            • evanvolgas

                              last Wednesday at 6:32 PM

                              Evan here, from Ardent.

                              It's not uncommon (hex.ai, etc all do this, as do developers, MCP tools, etc). One thing we do at Ardent is enable obfuscated read replicas. We can strip PII in the replicas, so your agents are operating on realistic (but not sensitive) data. Moreover, they can do so in a way that doesn't impact your production database and is fast enough to wire into your CI/CD processes.

                              Jeremy is correct, though. The main risk/concern is primarily agents with write access. There are two high profile instances in the last year of agents dropping production databases (even when, in one case, after being given explicit instructions to never do such a thing). While read-replicas of a primary DB solve the "agents can't destroy things" problem, they don't solve things like testing schema migrations (in particular) or updates to the data.

                              • evolgas

                                last Wednesday at 6:17 PM

                                Evan here, from Ardent.

                                It's not uncommon (Hex.ai, etc all do this, as do developers, MCP tools, etc). One thing we do at Ardent is enable obfuscated read replicas. We can strip PII in the replicas, so your agents are operating on very realistic (but not sensitive) data. Moreover, they can do so in a way that doesn't impact your production database and is fast enough to wire into your CI/CD processes.

                                Jeremy is correct, though. The main risk/concern is primarily agents with write access. There are two high profile instances in the last year of agents dropping production databases (even when, in one case, after being given explicit instructions to never do such a thing). While read-replicas of a primary DB solve the "agents can't destroy things" problem, they don't solve things like testing schema migrations (in particular) or updates to the data.

                                • Normal_gaussian

                                  last Wednesday at 6:24 PM

                                  Business side people install Claude, find it fantastic, read about postgres and BigQuery MCP, and immediately demand it.

                                  Small enough company without suitable MoC and they've got a real chance of getting it.

                                  • jedberg

                                    last Wednesday at 5:28 PM

                                    I'm much more worried about people who give full write access to their agents! But at least this solves that problem.

                                      • cphoover

                                        last Wednesday at 6:05 PM

                                        Jedberg... Wow an internet legend replied to me! ><

                                        > I'm much more worried about people who give full write access to their agents! But at least this solves that problem.

                                        Yeah it goes without saying that write access would be crazy... But, it seems like people don't really care about the fact that they are just giving their private data to companies like Anthropic, OpenAI and Google.

                                        > Branch anonymization Branches default to a full copy of your production data.

                                        <-- This doesn't seem a safe default to me...

                                        Perhaps a data policy should be required to be in place before a branch can be cloned... The default configuration giving the LLM full prod data access by default, is a bad standard to set, I think.

                                          • jedberg

                                            last Wednesday at 6:10 PM

                                            > Jedberg... Wow an internet legend replied to me!

                                            Hey, I put on my pants the same way you do: by having my staff hold them up while I jump into them.

                                            > But, it seems like people don't really care about the fact that they are just giving their private data to companies like Anthropic/Open AI and Google.

                                            This isn't quite as risky as it seems. All of them have a TOS that says if you pay them enough money they won't train on your data. But you're right that there are probably a lot of people who aren't on those plans sharing private data.

                                            > > Branch anonymization Branches default to a full copy of your production data. > <-- This doesn't seem a safe default to me...

                                            Agreed, and I'm sure it will cause trouble if you don't also bring along with the copies the internal controls around access logging.

                                            But also, for smaller companies, this isn't an issue since they don't have SOC2 and the other compliance needs yet. So it's probably a sane starting place for Ardent at this time. Most small startups let everyone in the company access the full database anyway.

                                            > Perhaps a data policy should be required to be in place before a branch can be cloned... The default configuration giving the LLM full prod data access by default, is a bad standard to set, I think.

                                            Or at least an easy way to copy it from the database you're branching from.

                                              • vc289

                                                last Wednesday at 7:20 PM

                                                >> I'm sure it will cause trouble if you don't also bring along with the copies the internal controls around access logging

                                                Yep! Agreed. We've tried to combat this with the "branch_hooks" being team/org level policy objects so we can do enforcement of any kind on the branches before they're ever actually handed to users. This would be things like access control + defined anonymization rules. The broader hope with this class of objects/policies is they can serve as enforcement barriers and essentially allow scoped access at the org level across branches.

                                                The proxy we run in the middle also helps a lot here. Since the URL is minted by our control plane and is not the "real" DB url we can authenticate each user from the URL they're using and enforce RBAC controls.

                                                for example:

                                                User 1's API key is 1234

                                                The CLI can auto-construct urls like: postgresql://{APIKEY}:{ANYTHING}@{IDENTIFIER}--postgres.routing.tryardent.com:5432/DB_NAME?{params}

                                                Your API key is something that can be scoped per user

                                                This is an off the cuff example but essentially we have a way of knowing who is calling the host and thus can enforce if APIKEY = You can't access this DB based on whatever rules.

                                                Curious to understand what additional pieces would be helpful here because this is 100% very important to get right.

                                • clintonb

                                  yesterday at 12:25 AM

                                  Congrats on the launch! DB clones have been a game changer for my team, allowing us to build isolated workspaces for agents to do work ranging from optimizing queries/views to building UI/UX that works for the actually combinations of data we have.

                                  We self-host DBLab since we had trouble getting Xata, Neon, and hosted DBLab configured.

                                    • samokhvalov

                                      yesterday at 10:55 PM

                                      Hey, PostgresAI founder here.

                                      thank you for using DBLab

                                      Can you DM me, please? Really curious about your experience

                                  • Jinyibruceli

                                    yesterday at 2:29 AM

                                    I ran into this exact problem building browser automation agents that needed to test DB migrations. The real killer wasn't just getting a sandbox quickly, it was that reverting changes after a failed test would take forever with traditional backup/restore. One thing I'm curious about though - how do you handle agents that need to test against production data patterns but can't actually touch real user data? Do you have a synthetic data layer or is that on the user to solve?

                                      • vc289

                                        yesterday at 3:48 AM

                                        We have ways to scrub PII/manipulate the data on the clone per branch.

                                        It's called branch hooks and lets you register SQL to be run against the branch after it's created but before it's handed to you (or an agent)

                                        So you can retain production shapes but manipulate the data however you want to make it safe

                                    • fmajid

                                      last Wednesday at 5:46 PM

                                      Doesn't look open-source. If you are interested in having a Neon or git-like branching for PostgreSQL experience, have a look at Xata, which is based on ZFS like Delphix was:

                                      https://github.com/xataio/xata

                                        • polskibus

                                          last Wednesday at 7:33 PM

                                          Would such approach work for MS SQL?

                                            • fmajid

                                              yesterday at 12:30 AM

                                              There's no reason why it shouldn't, Delphix primarily targeted Oracle, but there is of course not as much open-source enthusiasm for supporting a proprietary database as an open-source one.

                                                • fmajid

                                                  yesterday at 10:23 AM

                                                  And also, of course, MS-SQL is only supported on Windows, and ZFS is not available on Windows. Windows does have Volume Shadow copy Services, but they are not as capable as ZFS snapshots and clones.

                                      • dou99

                                        yesterday at 12:43 AM

                                        The concept is cool but what value are you adding ontop of the Neon Twin infra it’s built on? It seems the same can be done just using Neon directly for half the cost?

                                          • sharts

                                            yesterday at 1:59 AM

                                            Sounds like thats all they’re doing.

                                        • danisaza

                                          yesterday at 6:49 AM

                                          Congrats on the launch!

                                          One note on the pricing: it would kind of bum me out to pay $250/month for $100/month in credit.

                                          That feels like I'm losing $150/month.

                                            • vc289

                                              yesterday at 10:12 PM

                                              Totally makes sense. We do offer a pure PAYG tier (starter) that scales completely dynamically to workload

                                              But seems like this may be less about the absolute price but more about the way the 100/month of credit feels?

                                              What do you think could be better? The intention of the 250/month scale tier was intended for companies scaling up that want BYOC for data residency etc. etc. and give them enough to test things internally without worrying about an overage bill before running it directly on prod but this might be able to be implemented better.

                                          • debarshri

                                            yesterday at 11:03 AM

                                            Does it clone the data? We have a table with 35GB data, what happen in that case?

                                              • vc289

                                                yesterday at 5:56 PM

                                                It does clone the data. It uses copy on write to clone so data size won't slow it down since it's a metadata operation instead of actual data movement

                                                It clones the entire DB so it's not quite 6s per TB but actually the entire thing takes <6s independent of size

                                                Probably something to adjust on the site to make it clearer!

                                                • IdontKnowRust

                                                  yesterday at 12:08 PM

                                                  Their website says they clone at 6s/TB so you probably will get a branch less than 1s

                                                    • debarshri

                                                      yesterday at 2:33 PM

                                                      Does it make it remotely? I mean, you can do that locally with

                                                      ``` CREATE DATABASE cloned_db WITH TEMPLATE source_db OWNER your_owner; ```

                                              • anonpolls

                                                today at 2:40 AM

                                                [flagged]

                                                • heocoi

                                                  yesterday at 12:01 AM

                                                  [flagged]

                                                  • Serhii-Set

                                                    last Wednesday at 7:12 PM

                                                    [flagged]

                                                    • galaSerge

                                                      last Wednesday at 6:42 PM

                                                      [flagged]

                                                      • kramit1288

                                                        last Wednesday at 7:20 PM

                                                        [dead]

                                                        • MehdiBelkacem

                                                          yesterday at 10:45 AM

                                                          [flagged]

                                                            • dang

                                                              yesterday at 3:08 PM

                                                              Your comments are getting classified by our software as LLM-generated and/or LLM-edited. It's impossible to be certain, of course, but if this is the case—can you please not do this? It's not allowed here - see https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079. We end up banning accounts that do this repeatedly and I don't want to ban you.

                                                              LLMs are amazing and we use them heavily ourselves - but not for modifying text that is to be posted to HN. Doing so leaves imprints on the language that readers are increasingly allergic to, and we want HN to be a place human conversation.