building a browser that thinks
notes on letting agents drive chromium without losing your mind — selector drift, viewport ghosts, and the one trick that actually works.
The first time I handed an agent a browser, it broke in twenty different ways before lunch. Buttons it knew about yesterday were gone. Pages loaded in iframes nested in iframes. A modal opened, the agent typed into the document behind it, and nothing happened.
selector drift
The biggest lie in browser automation is the CSS selector. #submit-button works for a week, then a designer renames it to #cta-primary, and your test suite quietly hallucinates success because the new button still happens to fire on the same page navigation. Agents have no idea anything changed.
The fix that actually works: don't trust the selector. Trust the intent. Describe what the button does, not what it's called. Then re-resolve every time.
viewport ghosts
Half of browser bugs aren't bugs in the page — they're in the headless viewport. Chrome thinks it's 800×600 in incognito-headless mode by default. Half the world's responsive designs render an entirely different tree at that size.
Set a real viewport. 1440×900 is a good default. Then check what the page actually rendered before you start clicking.
the one trick that actually works
Snapshot the DOM before every action. Diff after. If the diff is empty, your click did nothing — you clicked a div that looked like a button.
That single check has caught more silent failures than any other thing I've tried.