\

Show HN: PageAgent, A GUI agent that lives inside your web app

62 points - today at 5:01 PM


Title: Show HN: PageAgent, A GUI agent that lives inside your web app

Hi HN,

I'm building PageAgent, an open-source (MIT) library that embeds an AI agent directly into your frontend.

I built this because I believe there's a massive design space for deploying general agents natively inside the web apps we already use, rather than treating the web merely as a dumb target for isolated bots.

Currently, most AI agents operate from external clients or server-side programs, effectively leaving web development out of the AI ecosystem. I'm experimenting with an "inside-out" paradigm instead. By dropping the library into a page, you get a client-side agent that interacts natively with the live DOM tree and inherits the user's active session out of the box, which works perfectly for SPAs.

To handle cross-page tasks, I built an optional browser extension that acts as a "bridge". This allows the web-page agent to control the entire browser with explicit user authorization. Instead of a desktop app controlling your browser, your web app is empowered to act as a general agent that can navigate the broader web.

I'd love to start a conversation about the viability of this architecture, and what you all think about the future of in-app general agents. Happy to answer any questions!

Source
  • moehj

    today at 10:49 PM

    "Interesting architecture — embedding the agent inside the app context rather than outside it makes sense for session-aware tasks. One question: how do you handle output validation before the agent acts on the DOM? Client-side agents acting on live state without a certification layer seems like a reliability risk in production. We've been building ARU (aru-runtime.com) as a runtime certification layer for exactly this — curious if you've thought about that boundary."

    • simon_luv_pho

      today at 5:07 PM

      This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:

      - GitHub: https://github.com/alibaba/page-agent

      - Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)

      - Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...

      I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!

        • jadbox

          today at 10:34 PM

          I tried setting the LLM to "http://0.0.0.0:8080" and the extension crashed and now continues to crash at startup.

      • mentalgear

        today at 6:59 PM

        > Data processed via servers in Mainland China

        Appreciate the transparency, but maybe you could add some European (preferably) alternatives ?

          • simon_luv_pho

            today at 7:08 PM

            Please use your own LLM api instead!

            The free testing LLM is Qwen hosted by Aliyun. Qwen and DeepSeek are the only ones I can afford to offer for free. It's just there to lower the try-out barrier; please DO NOT rely on it.

            The library itself does NOT include any backend service. Your data only goes to the LLM api you configured.

            I tested it on local Ollama models it works fine.

            • simon_luv_pho

              today at 7:25 PM

              I'm looking into a European testing endpoint. The legal and compliance requirements are quite hassle, and persuading my company to pay for that infrastructure is gonna be a tough sell.

          • general_reveal

            today at 7:18 PM

            I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?

            The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?

              • simon_luv_pho

                today at 7:54 PM

                Everything happens at runtime, on the HTML level.

                It uses a similiar process as `browser-use` but all in the web page. A script parses the live HTML, strips it down to its semantic essentials (HTML dehydration), and indexes every interactive element. That snapshot goes to the LLM, which returns actions referencing elements by index. The agent then simulates mouse/keyboard events on those elements via JS.

                This works best on pages with proper semantic HTML and accessibility markup. You can test it right now on any page using the bookmarklet on the homepage (unless that page CSP blocks script injection of course).

                  • today at 8:07 PM

            • jadbox

              today at 10:21 PM

              Firefox support?

              • dzink

                today at 6:53 PM

                Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?

                  • simon_luv_pho

                    today at 7:23 PM

                    Full transparency: I work at Alibaba and published this under Alibaba's open-source org. I sometines maintain it during work hours, so yes, Alibaba technically pays me for it. That said, this is my project — it's MIT-licensed, includes no backend service, and is open for anyone to audit.

                    The free testing LLM endpoint is hosted on Alibaba Cloud because I happen to have some company quota to spend, but it's not part of the library. Bring your own LLM and there is zero data transmission to Alibaba or anywhere else you haven't configured yourself.

                    I highly recommend using it with a local Ollama setup.

                • pscanf

                  today at 5:59 PM

                  Very cool!

                  I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!

                    • simon_luv_pho

                      today at 6:11 PM

                      Thanks!

                      Bookmarklets are such an underrated feature. It's super convenient to inject and test scripts on any page. Seemed like the perfect low-friction entry point for people to try it out.

                      Spent some time on that UX because the concept is a bit hard to explain. Glad it worked!

                      • today at 7:22 PM

                    • Mnexium

                      today at 7:21 PM

                      Curious - how does it perform with captchas and other "are you human" stuff on the web?

                        • simon_luv_pho

                          today at 7:38 PM

                          I added in the system prompt that it should skip CAPTCHAs and hand control back to the user. Currently working on a proper human-in-the-loop feature. That's actually one of the key advantages of running the agent inside your own browser.

                            • CloakHQ

                              today at 10:04 PM

                              the CAPTCHA question points to a deeper issue: even before the CAPTCHA, most sites are already fingerprinting the browser that's running the agent. if the underlying browser leaks automation signals - navigator.webdriver, Canvas fingerprint deviations, WebGL anomalies - the session gets flagged or soft-blocked long before a CAPTCHA is even served.

                              the "inside your own browser" angle is actually the right intuition here. a real user's browser has built up a consistent fingerprint profile across sessions. the moment you run an agent in a context where those signals differ from that baseline, you're detectable. curious whether you've run into this on sites with aggressive bot detection, or whether the use case has mostly been internal/enterprise apps where that's not a concern?

                              • Mnexium

                                today at 9:28 PM

                                Makes sense.

                                For curiosity's sake, have you had it try to attempt captchas?

                                If so, what were the results?

                                  • simon_luv_pho

                                    today at 9:43 PM

                                    I haven’t. I don’t think it will work well.

                                    I use a text-based approach. Captchas like “crossroad” usually need a screenshot, a visual model and coordinate-based mouse events.

                        • coreylane

                          today at 6:52 PM

                          Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?

                            • simon_luv_pho

                              today at 7:30 PM

                              Thanks!

                              It supports any OpenAI-compatible API out of the box, so AWS Bedrock, LiteLLM, Ollama, etc. should all work. The free testing LLM is just there for a quick demo. Please bring your own LLM for long-time usage.

                          • MeteorMarc

                            today at 6:41 PM

                            Confusing name because of the existence of pageant, the putty agent.

                              • simon_luv_pho

                                today at 6:57 PM

                                Darn. Pageant would've been a nice name though. Maybe `page-agent.js` is more relevant in web dev community.

                                  • graypegg

                                    today at 9:01 PM

                                    I think every successful Show HN post ends up with a "thought this was about X" or "didn't look up the name first?" comment. Consider it a win! I don't think anyone will mistake a tool for putty with your tool, but you might share a google search page with it.

                                    • mmarian

                                      today at 7:54 PM

                                      I think page agent is good. I've never heard of putty's pageant. And I think it's better to distinguish it from general meaning of pageant (for beauty).

                                        • simon_luv_pho

                                          today at 8:02 PM

                                          Thanks!

                                  • kirth_gersen

                                    today at 6:49 PM

                                    Came here to say missed opportunity to call it "PAgent". Rolls off the tongue better than Page Agent.

                                      • simon_luv_pho

                                        today at 7:59 PM

                                        I'm 2 years too late for that one...

                                • popalchemist

                                  today at 7:32 PM

                                  Does it support long-click / click-and-drag?

                                    • simon_luv_pho

                                      today at 7:42 PM

                                      Not yet. Currently focused on the more common interaction patterns. PRs welcome though!

                                        • popalchemist

                                          today at 7:46 PM

                                          Gotcha. Still very cool! Congrats on the release.

                                            • simon_luv_pho

                                              today at 8:28 PM

                                              Thanks!

                                  • today at 6:35 PM

                                    • jauntywundrkind

                                      today at 5:43 PM

                                      Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,

                                      > Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.

                                      https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...

                                        • klueinc

                                          today at 6:54 PM

                                          I've been trying to arrive to something like this with my own sidepanel extension called Klue but its more of a user notes + web page context approach. Nice to see another take on this! https://chromewebstore.google.com/detail/cackjmmgcmnkjnffabk...

                                          • simon_luv_pho

                                            today at 6:23 PM

                                            Thanks for sharing! We need more projects like this in the JS ecosystem.