\

Show HN: Drive any macOS app in the background without stealing the cursor

150 points - yesterday at 4:03 PM


Hi HN, Francesco from Cua here. I hacked this project together last weekend, inspired by the Codex Computer-Use release and lessons learned from deploying GUI-operating agents for our customers.

The main problem: when a UI automation process controls a desktop app today, it usually takes over the human’s session. Your cursor moves, keyboard focus gets stolen, windows jump to the front, and you have to stop working until the agent is done. That is why we have historically avoided encouraging users to run these processes directly on their host machine, instead relying on VMs or GUI containers for concurrency and background execution.

But computer-use - the tools we give agents to operate computers like humans - does not scale cleanly that way. As models get smarter, agents need to share hosts safely, run in the background, and avoid collisions with the human or other agents using the same machine.

We realized macOS has no first-class API for "drive this app without touching the cursor". CGEventPost routes through the hardware input stream, so it moves your cursor. CGEvent.postToPid avoids the cursor warp, but Chromium treats those events as untrusted and silently drops clicks at the renderer boundary. Activating the target app first raises the window and pulls focus, defeating the point of background execution.

Cua Driver is our attempt at a real fix: a background computer-use driver for macOS that lets an agent click, type, scroll, and read native apps while your cursor, frontmost app, and Space stay where they are. The default interface is a CLI, so it is easy to script or call from any coding agent shell.

Try it on macOS 14+:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-d...)"

The first internal use case was delegated demo recording. We ask Claude Code to drive an app while 'cua-driver recording start' captures the trajectory, screenshots, actions, and click markers. The result is an agent-generated product demo, Screen Studio inspired.

Other things we have used it for:

- Replacing Vercel’s agent-browser and other browser-use CLIs. With Claude Code and Cua Driver, you do not need Chrome DevTools Protocol at all.

- A dev-loop QA agent that reproduces a visual bug, edits code, rebuilds, and verifies the UI while my editor stays frontmost.

- Personal-assistant flows that use iMessage from Claude Code, Hermes, or other general-purpose agent CLIs.

- Pulling visual context from Chrome, Figma, Preview, or YouTube windows I am not looking at, without relying on their APIs.

What made this harder than expected:

- CGEventPost warps the cursor because it goes through the HID stream.

- CGEvent.postToPid does not warp the cursor, but Chromium drops it at the renderer IPC boundary.

- Activating the target first raises the window and can drag you across Spaces.

- Electron apps stop keeping useful AX trees alive when windows are occluded without a private remote-aware SPI.

The unlock was SkyLight. SLEventPostToPid is a sibling of the public per-PID call, but it travels through a WindowServer channel Chromium accepts as trusted. Pair it with yabai’s focus-without-raise pattern, plus an off-screen primer click at (-1, -1), and the click lands without the window ever raising.

One thing we learned: the right addressing mode depends on the app. Native macOS apps usually have rich AX trees, Chromium-family apps often need a hybrid of AX and screenshots, and apps like Blender or CAD tools may expose almost no useful AX surface. The mistake is defaulting to pixels everywhere - or defaulting to AX everywhere.

Long technical writeup: https://github.com/trycua/cua/blob/main/blog/inside-macos-wi...

I would like feedback from people building Mac automation, agent harnesses, or accessibility tooling. If it breaks on an macOS app you care about, that is useful data for us.

Source
  • LatencyKills

    yesterday at 4:18 PM

    Ex-Apple engineer here. I really like your implementation. A few years ago I built a similar tool to help me automate the testing of some of my native macOS apps. Being able to run multiple UI automation tests simultaneously was the big win in my case.

    My only criticism is enabling telemetry by default. I'm a fan of having people opt-in.

      • jorvi

        yesterday at 8:58 PM

        The problem with opt-in telemetry is that 95% of users don't change defaults, and the 5% who do are your power users. They're not representative of the average user. And only a subset of them will turn it on

        Ironically enough the opposite happens with opt-out telemetry, for the same reason: a lot of power users will turn off telemetry, thus you will never see their usage patterns and will have to infer them. Dogfooding helps.

          • crazygringo

            yesterday at 9:22 PM

            I'm confused.

            You claim power users opt in to telemetry, and then immediately say power users opt out.

              • jorvi

                yesterday at 11:07 PM

                A subset of power users want to their usage to be profiled (me, if I trust the company. Brave, Mozilla, Mullvad, 1Password, Bitwarden, Valve, companies like that). But most power users will not want that because of privacy worries.

                From that you get two situations.

                Opt-in:

                - Regular users: click all 'ok' through setup at lightning speed, no telemetry enabled.

                - Most power users: consciously don't check the box to opt-in because of privacy worries.

                - Big picture power users: consciously check the opt-in box given they trust you (because they want their usage patterns to be profiled and optimized for).

                Opt-out:

                - Regular users: click all 'ok' through setup at lightning speed, telemetry enabled.

                - Most power users: consciously check the box to opt-out because of privacy worries.

                - Big picture power users: consciously don't check the opt-out box given they trust you (because they want their usage patterns to be profiled and optimized for).

                • awwaiid

                  yesterday at 11:11 PM

                  power users opt in to opt in telemetry, and power users opt out of opt out telemetry. Power users click all the buttons.

              • pnw_throwaway

                yesterday at 9:30 PM

                The problem with opt-in telemetry is that 95% of users are sick and tired of being spied on with every little thing they do.

                  • eddyg

                    today at 3:32 AM

                    Telemetry (if it’s truly telemetry) is nowhere close to “tracking”. People conflate the two all the time. One can provide useful, anonymous metrics (e.g. “user enabled feature X”) without doing anything but incrementing the counter for “feature X”.

                    The “Firefox Problem” is that all the power users disable telemetry, so all the “cool” features that power users like (but never get used by “regular people”) get ignored or removed instead of improved because, according to the metrics, “nobody uses them”.

                      • hilariously

                        today at 8:48 AM

                        The user doesn't conflate the two, the developers do, and that's why we turn off telemetry, because its damn close to tracking.

                        Knowing what (vulnerable) version of software a user is using transmitted in the clear was absolutely a part of the NSA monitoring error information from windows crash logs https://www.schneier.com/blog/archives/2017/08/nsa_collects_... - so forgive me if I do not trust the developer to know what makes me unsafe or not.

                        If you enable telemetry by default I will do my best to never use your product.

                    • jorvi

                      yesterday at 10:57 PM

                      If they really were they would turn it off. And stop using Gmail and Android.

                      The overwhelming majority of people don't care about digital privacy because the cost is opaque to them.

                      Also, telemetry when done right isn't "spying". Again, it is anonymized and used to see, for example, where the hot paths and paper cuts in applications are.

                        • lukewarm707

                          today at 9:34 AM

                          i think that in a free society, you should be able to sell the product you want to sell. but, you should give information of what you are selling to the customer.

                          if it has telemetry, then it is a tool the customer buys, that also has the function of listening and reporting to others, how it is being used.

                          you want to sell it - no problem. but tell the customer, "look, this is bugged, and it's going to tell me what you are doing. but it's a great product." anything with opt-out telemetry needs a big version of that warning on the top of the page.

                          personally i am not a buyer. but that's my preference.

                          • hilariously

                            today at 8:49 AM

                            And how would I know if you did it right or not?

                        • dewey

                          today at 2:56 AM

                          As you can see with TikTok / Instagram usage…regular people who are not on HN could not care less about that.

                      • jonhohle

                        today at 12:18 AM

                        If Charmin put sensors in toilet paper rolls to optimize the wiping experience, it would be dystopian. Why do we give software a pass? Privacy is a right not a telemetry problem and opt-out by default is non-consensual surveillance.

                          • cyberrock

                            today at 12:07 PM

                            In fairness Charmin is probably backed by millions of dollars of market research on simple user questions like softness, tendency to crumble, size, etc., while free software faces more criticism for issues that are exponentially more difficult to express.

                            • lukewarm707

                              today at 9:42 AM

                              i think it's not so much non-consensual, it's misrepresentation.

                              it's bugged. the same as a mole in your company. or a sculpture with a listening device in it.

                              tell the user that your thing is bugged!

                      • frabonacci

                        yesterday at 4:33 PM

                        Fair criticism. We took a similar approach to established dev tools like Homebrew, with an anonymous, opt-out telemetry to understand install issues, crashes, and high-level usage. For cua-driver specifically, telemetry is limited to command/tool-level events and basic environment metadata. We don’t send screenshots, recordings, app contents, prompts, typed text, file paths, or tool arguments. That said, we should make the opt-out path clearer

                        • kveykva

                          yesterday at 8:49 PM

                          Would you be open to sharing what you built for running the automation tests? I could really use this right now.

                            • frabonacci

                              yesterday at 9:26 PM

                              We don't have a specific testing framework yet. cua-driver is closer to an automation interface than a test runner. that said, you could definitely build one on top of it. For reference these are some of our integration tests: https://github.com/trycua/cua/tree/main/libs/cua-driver/Test...

                              One useful trick is to cua-driver 'launch_app' instead of the default 'open' or other osascript, since it can start the app without raising/focusing it, and the tests don't disturb your active desktop while they run

                      • krackers

                        yesterday at 9:11 PM

                        Nice! Thanks for the technical writeup, ~2 weeks from me wondering how it's implemented [1] to being able to play with a replicated version!

                        [1] https://news.ycombinator.com/item?id=47799128

                      • dtran

                        yesterday at 10:00 PM

                        This is one of the coolest hacks I've seen recently. Having done some much less involved MacOS hacking, I can't help but wonder if we may finally see momentum behind some flavor of agent-friendly Linux/Android if Apple doesn't give us more ways to let agents interact with our machines.

                          • frabonacci

                            yesterday at 10:17 PM

                            really appreciate it. macOS has powerful primitives already, but they weren’t designed as one coherent agent API so you end up stitching together and hitting roadblocks. If Apple doesn't make this more first-class, Linux/Android-style environments may move faster because they’re easier to instrument. I think the OpenAI/Jony Ive AI hardware rumors are yet another signal that people may start building agent-native CUA devices instead of retrofitting agents onto existing desktops

                        • pimlottc

                          yesterday at 11:31 PM

                          What is specific about this for using with agents? As opposed to offering it as a general automation library for any use?

                        • j-conn

                          yesterday at 11:39 PM

                          Incredible! I’m interested in doing something similar on windows, have you looked into that at all? Apparently codex computer use plans to support this on windows in the future. Were you able to see how codex was doing it, or the inspiration was just “they’ve shown it’s possible”?

                            • hexmiles

                              today at 11:55 AM

                              I did something similar on Windows by creating a "virtual desktop," where I can give the app focus without stealing it from another one. The idea was to basically reimplement RemoteApp without needing a dedicated Windows server. However, in that case, the app is not visible to the user unless you use "connect" to the virtual desktop; to do it, I implemented (WIP) a simple VNC server in C#.

                              • frabonacci

                                today at 12:32 AM

                                Thanks! We haven't gone deep on Windows yet because we're still focused on polishing the macOS release. We want to go deeper on the Mac experience before going broader across platforms, and there are still a lot of features we want to ship and use cases we want to share.

                            • today at 8:41 AM

                              • alsetmusic

                                yesterday at 9:17 PM

                                I tried out their Loom vm software a couple of months back. Worked well, fwiw. I'm not using it anymore because I decided to just give agents direct (supervised) access to my devices.

                                  • frabonacci

                                    yesterday at 9:46 PM

                                    Thanks for trying out Lume! We definitely haven't given up on the idea of sandboxing GUI agents in local macOS VMs. Cua Driver is aimed at a different use case though, letting coding agents and general agents use the Mac you're already on, asynchronously and in the background. That also makes the economics better since multiple agents can share the same machine instead of each needing its own VM

                                    • prashant3210

                                      today at 7:00 AM

                                      Same here. I give agents supervised direct access on my Mac for a side project. Session stealing is annoying. VM feels overkill for solo dev, but hate that the cursor jumps around while I try to do other things. Background driver sounds like the missing middle ground.

                                        • fragmede

                                          today at 7:14 AM

                                          http://tart.run makes the VM part easy. So what if it's overkill?

                                  • BenFranklin100

                                    today at 3:39 AM

                                    Being new to the idea of using agents to run programs on one’s computer, could someone provide several use cases?

                                      • frabonacci

                                        today at 5:59 AM

                                        A few examples i'm excited about:

                                        - Closing the coding feedback loop by having agents verify their own changes in a real app

                                        - Automating repetitive workflows across apps that don't have good APIs

                                        - Agents recording product demos of them using software. One compelling use case here: https://x.com/trycua/status/2047383207612645426

                                        - Creating CLI and APIs for apps by reverse implementing their GUI, e.g. see: https://github.com/HKUDS/CLI-Anything

                                    • davey2wavey

                                      yesterday at 8:59 PM

                                      Its looking great.

                                      The audit trail question is interesting and I haven't seen it come up much. When an agent clicks through an ERP or edits a file, you've got logs, but how do you explain the "why" behind each decision to, say, a compliance team?

                                      Curious if that's something you're thinking about or if it's too early.

                                      • dmazhukov

                                        today at 2:35 AM

                                        [dead]