SGDR: Dynamic Skill Retrieval for Web Agents

Olivia Hughes

June 5, 2026

126

original

SGDR (State-Grounded Dynamic Retrieval) is an online skill learning method for web agents that addresses the limitations of static skill policies. By dynamically retrieving and reusing skills step-by-step based on real-time web page states, SGDR allows agents to adapt to evolving online environments. This approach, developed by researchers at Carnegie Mellon and Microsoft, significantly improves task success rates on benchmarks like Mind2Web and WebArena, offering a more robust solution for web automation.

Language agents are becoming increasingly vital for automating tasks across the web. Historically, these agents would learn skills from past interactions and then apply them statically. This meant an agent would lock into a predefined set of skills based on the initial instruction and stick with it throughout the entire task. The problem? The web is anything but static. User clicks trigger new elements, forms, or pop-ups, and a fixed skill set often fails when the page state shifts unexpectedly. This 'define skills first, then execute' model clearly falls short in real-world scenarios.

The Need for Dynamic Adaptation

Imagine an agent trying to fill out a complex online shopping form. Initially, it might retrieve a 'fill address' skill. But after submission, a new pop-up appears, asking for a discount code – a step not included in its initial skill set. At this point, the agent either gets stuck or has to rely on an expensive, large language model to re-reason the entire process. Researchers from Carnegie Mellon University and Microsoft Research pinpointed this exact pain point, introducing SGDR (State-Grounded Dynamic Retrieval). This online skill learning method empowers agents to dynamically retrieve and reuse skills at each step, directly informed by the current web page state.

SGDR operates on a three-step core process. First, it uses a sliding window extraction technique to break down completed task segments into atomic-level skills. Second, during runtime, it encodes the current web page's DOM structure alongside the task objective to retrieve the most relevant skill from its library. Finally, after executing a new skill, it feeds that skill back into the library, creating a continuous learning loop. While the 'learn-as-you-go' concept isn't entirely new, SGDR's innovation lies in reducing the retrieval granularity from 'task-level' to 'step-level' and, crucially, integrating real-time page states into the retrieval conditions.

Real-World Implications and Practicalities

The practical impact of this work primarily benefits two groups: automation testing engineers and developers building personal browser assistants. Test engineers, who traditionally write manual assertions for every possible page state, could see significantly reduced script maintenance costs with an agent capable of dynamic skill reuse. Browser assistant developers, on the other hand, could create far more flexible tools – think an automated email expense report script that can handle varied web layouts for expense forms, rather than needing separate training for each. Experiments on benchmarks like Mind2Web and WebArena show SGDR improving task success rates by over 8% compared to baseline methods, with the skill library continuously growing as tasks are executed.

Of course, SGDR isn't a silver bullet. Dynamic retrieval inherently adds latency to each decision, meaning real-time sensitive applications might need caching optimizations. Furthermore, the quality of the skill library heavily depends on the initial extraction algorithm; noisy trajectories could introduce suboptimal skills. However, this 'state-grounded' approach offers a more pragmatic path for deploying robust web agents.

Key Takeaways for Developers

Prioritize Page State Encoding: SGDR's effectiveness hinges on the DOM structure as a grounding signal. Complex states in dynamic rendering frameworks like React might require careful preprocessing.
Skill Library Visualization: For practical deployment, consider building a human-review interface for the accumulated skill library to filter out anomalous or inefficient skills.
Integrate with Existing Frameworks: Developers can wrap SGDR logic around tools like Playwright or Puppeteer, persisting the skill library in a vector database for scalable access.

The SGDR paper is currently available on arXiv, with code expected to follow. Instead of chasing a mythical, all-capable general AI, SGDR focuses on solving a very specific, persistent problem in web automation: adapting to state changes. This kind of grounded, incremental improvement is often more impactful than grand, abstract promises.

SGDRweb agentsonline skill learningdynamic retrievalstate-groundedweb automationlanguage modelsautomation testingAI research

Comments

No comments yet

Be the first to comment

Explore More

Similar Tools

Cursor

A smart code editor based on secondary development of VS Code, with "native built-in AI" as its core selling point. It does not rely on plugins but deeply integrates AI into the underlying architecture of the editor, enabling it to understand the context of the entire project's codebase. It also supports seamless migration of all VS Code configurations and plugins.

Google Antigravity

Antigravity supports multiple models, including Gemini 3 Pro, Claude Sonnet 4.5, and GPT-OSS, allowing developers to select the most suitable model for their tasks within the same environment.

Codex

OpenAI Codex is an AI programming model and assistant developed by OpenAI, capable of translating natural language instructions into corresponding source code. It provides developers with intelligent code completion and code generation functionalities. Initially launched in 2021 as the code model for the OpenAI API, it once served as the core engine for GitHub Copilot. With the evolution of OpenAI's technology, Codex returned in 2025 in a new form as an "AI programming agent," capable of understanding complex requirements and automatically writing and debugging code, significantly enhancing development efficiency and software delivery speed.

Kiro

Kiro is an AI-powered programming IDE launched by AWS, which adopts a specification-driven development model. It transforms natural language requirements into clear specification documents and tasks, then uses built-in AI agents to generate code, debug, and optimize, providing comprehensive assistance throughout the development process of large-scale projects.

Trae

Trae (official website: trae.ai) is an AI-native integrated development environment (IDE) launched by ByteDance. It is not merely a programming assistant but rather a "collaborative partner" that deeply integrates large language models (LLMs) to help developers achieve more intelligent and automated software development—from requirements analysis and code construction to debugging and deployment.

Claude

Claude is an intelligent language interaction platform developed by the American AI company Anthropic. It integrates capabilities such as deep text understanding, information organization, code assistance, and task analysis, enabling it to handle more complex tasks beyond simple chat conversations. These include long-text summarization, image analysis, logical reasoning, and programming assistance, among others. Compared to some single-purpose Q&A bots, Claude functions more like an intelligent tool equipped with reasoning logic and scalable features.

Open-source Alternatives

guidellm: Optimize LLM Deployment Performance

guidellm is an open-source tool designed to evaluate and optimize Large Language Model (LLM) inference performance in production environments. It offers stress testing, latency analysis, and throughput assessment, helping developers pinpoint bottlenecks and fine-tune deployment configurations. Developed by the vLLM team, it's ideal for teams needing granular control over their LLM service tuning.

Kun: Embed AI Agent Workspaces in Your Apps

Kun is an open-source AI Agent workspace, built with TypeScript, designed for seamless integration into your applications. It offers dedicated Code and Write modes, providing developers with a customizable, intelligent interaction environment that supports multi-turn conversations, tool calling, and context management. It's a pragmatic solution for adding AI capabilities without building from scratch.

terax-ai: AI-Powered Terminal Workbench for Devs

terax-ai is a remarkably lightweight (just 7MB) open-source, terminal-first AI development workbench. Designed for command-line enthusiasts, it integrates AI assistance directly into your familiar terminal environment, offering lightning-fast startup and minimal resource usage. It's perfect for developers seeking efficiency and a streamlined workflow without the bloat of traditional IDEs.

go-micro: Go Microservice Framework for AI Agents

go-micro is a Go microservices framework optimized for building AI agents. It provides service discovery, load balancing, message encoding, and event-driven capabilities out of the box, enabling developers to quickly build scalable distributed AI systems. With over 22,000 GitHub stars, it's a popular choice for Go developers diving into microservices and AI agent architectures.

ai-gateway: Unify Your Generative AI API Management

ai-gateway is an open-source project built on Envoy Gateway, offering a unified API gateway to manage access to diverse generative AI services. It simplifies AI application integration and operations by providing features like load balancing, caching, and rate limiting for various AI providers.

Kiln: The All-in-One AI System Evaluation Toolkit

Kiln is an open-source Python framework designed to streamline the entire AI system development lifecycle, from initial build to continuous optimization. It integrates crucial components like evals, RAG, agents, fine-tuning, synthetic data generation, and dataset management, making AI workflows more efficient and controllable. Ideal for teams and individuals focused on deep AI performance tuning.