Beyond Chat and Clicks: GUI Agents for In-Situ Assistance via Live Interface Transformation

Pan Hao, Rishi Selvakumaran*, Jacob Sun*, Qianwen Wang

University of Minnesota, Minneapolis, MN, USA

Chrome Extension Read the Paper Code (coming soon)

What acts beyond chat and clicks?

Complex visual interfaces are powerful yet have a steep learning curve, as users must navigate feature-rich visual interfaces while reasoning about domain-specific operations. Existing approaches either deliver assistance through a separate chat-based interaction, or require substantial application-specific engineering to build support natively into each interface.

To address the gaps, we propose in-situ assistance: a mode of support delivered directly within any live web interface through lightweight, browser-level interventions on the Document Object Model (DOM), without rebuilding the application or modifying its underlying logic.

We contribute a design space and a computational pipeline for DOM-mediated in-situ assistance, characterizing how GUI agents can insert, mutate, or recompose web elements to make the interface easier for users to understand, use, and navigate. We instantiate in-situ assistance in DOMSteer, a Chrome extension that interprets a user's help request and live interface context, grounds it to relevant UI elements, and executes reversible DOM manipulations directly on the live page to deliver assistance, including contextual tooltips, control highlighting, layout reorganization.

Design Space

DOMSteer design space

Our design space organizes in-situ assistance by how it intervenes in an existing interface. Insert adds new assistance content. Mutate modifies existing elements to make them easier to notice, understand, or operate. Recompose reorganizes elements to better match the user's task. We exclude removal to preserve the original interface's full functionality.

Pipeline

DOMSteer pipeline

DOMSteer is a Chrome Manifest V3 extension. Knowledge Acquisition captures the interface state and indexes elements. Assistance Recommendation generates grounded DOM operations from user context. Assistance Delivery applies the transformations directly to the live page.

More Cases of In-Situ Assistance

Citation

Cite This Work

Copy
@misc{hao2026insitu-assistance,
      title={Beyond Chat and Clicks: GUI Agents for In-Situ Assistance via Live Interface Transformation},
      author={Pan Hao and Rishi Selvakumaran and Jacob Sun and Qianwen Wang},
      year={2026},
      eprint={2604.14668},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
      url={https://arxiv.org/abs/2604.14668},
}