Launching today

Openclick
macOS agent that turns prompts into automated clicks
19 followers
macOS agent that turns prompts into automated clicks
19 followers
openclick is an experimental open-source CLI that drives macOS UI from a prompt. An LLM produces a plan of UI actions; openclick executes them via the macOS Accessibility APIs. Early and rough. MIT-licensed.





Podcast List
Filliny
Riccardo congrats on the launch. screenshot-driven planners are interesting territory. one thing we keep getting tripped up on in browser automation: the screenshot the LLM saw at step N is stale by step N+1 if the page re-rendered. how does Openclick handle that. does it re-anchor each step from a fresh AX tree, or trust the original plan and verify at the end? curious because the AX-tree-vs-pixel-coords delta is exactly where most of our agent failures hide.
Podcast List
@whateverneveranywhere Thanks!
Yeah, that staleness gap is exactly where we got burned early too. The planner picks something like “click element 47” and by the time it runs, the page has rerendered and 47 is now a completely different button.
What we do in OpenClick is basically two layers.
Within a batch: every AX action (click, type, etc.) re-resolves the target right before execution using a fresh AX snapshot, not the one the planner saw. We never rely on element indices. Everything is matched via more stable signals like __ax_id, title, or role + frame.
If an action is likely to change state, we force an AX refresh before the next step, since that’s where things usually drift. Pixel coords are only a fallback for things like canvas or WebGL where AX is useless.
Between batches: we take a fresh screenshot and AX snapshot, then run a verifier model that checks if what we intended actually happened. If not, or only partially, we replan with the new state plus a short critique of what drifted.
So we don’t really trust plans beyond a single batch, and we keep batches small (usually 3–5 actions) for that reason.
Honestly, the hardest cases now aren’t AX drift, but apps that expose AX inconsistently or lazily. Gmail is a classic. Message rows can be weird, so we sometimes force an AX refresh right before clicking them. Otherwise you get cases where a coord click “works” but the row never actually activates.
Curious to hear what approach you ended up taking here.