πŸ’‘ If you like this website, please share it with your friends and network! πŸš€
Agentic QA Framework Hero
Open Source Β· Built with Gemini + Playwright

I Built an Agentic QA Framework.
Here's Exactly How It Works.

If you're looking to implement an Autonomous AI QA Agent β€” one that discovers, plans, explores, executes, and self-heals β€” this is the post for you.

5-Phase Autonomous Pipeline
WebMCP Β· Gemini Β· Playwright
Self-Healing Β· Zero Maintenance

What I Built β†’ inner-event

πŸ€–

Autonomous Agents

Discovery Β· Planner Β· Explorer Β· Executor Β· Feedback β€” 5 specialized AI agents, each with a single job.

⚑

WebMCP Polyfill

Injects semantic tools into the browser. No more fragile CSS selectors. 95% execution cost reduction.

🧬

Closed-Loop Learning

Failures train the system. Every broken run updates the RAG Knowledge Bank. It gets smarter every time.

One command. It explores your app, writes the test, runs it, and fixes itself when it fails.
The Goal

You write a goal. The AI writes the tests.

No page objects. No locators. No scripts.

Your Goal (Natural Language):

β€œLog in, sort products by price low-to-high, add the cheapest item to cart, and checkout.”

The Command:

bash
python orchestrator.py \
  --project projects/my_app \
  --base_url "https://www.saucedemo.com/" \
  --goal "Log in, sort by price low-to-high, add cheapest item, checkout" \
  --headed

What happens next (automatically):

πŸ” DiscoveryπŸ“‹ PlanningπŸ€– Exploration⚑ ExecutionπŸ“Š HTML ReportπŸ”§ Self-Healing
Under the Hood

5 Phases. 5 Agents. Zero Manual Work.

  • πŸ•·οΈ Crawls the DOM and maps every page structure
  • πŸ—ΊοΈ Builds a semantic sitemap (login forms, grids, navbars)
  • 🧠 Runs only in --deep mode for regression suites
Run: --deep
Semantic Discovery Agent mapping web app
Secret Weapon

WebMCP β€” No Selectors. Ever.

The biggest win in this framework. Instead of brittle CSS selectors, the agent uses semantic tool calls injected directly into the browser context.

100% reliable β€” no more dropdown failures
95% cost reduction β€” zero vision API calls during execution
Polyfill works on any browser β€” no Chrome 145+ required
bash
# Selector-less execution with WebMCP
python execute_with_webmcp.py \
  --project projects/my_ecommerce \
  --headed

❌ Old Way

page.click('[data-test="sort-container"]')

Breaks when ID changes. Constant maintenance.

βœ… WebMCP Way

call_tool('sort_products_by_price', {direction: 'low_to_high'})

Semantic. Resilient. Self-describing. Never breaks.

Power Mode

Banking? Healthcare? SaaS?
Use --deep mode.

Full semantic discovery + regression suite generation + security audit. One flag.

bash
python orchestrator.py \
  --project projects/parabank \
  --base_url "https://parabank.parasoft.com/parabank/" \
  --goal "Register new user and transfer funds" \
  --deep --security
🏦
Banking
KYC Β· Transfers Β· Compliance
πŸ›οΈ
E-Commerce
Cart Β· Checkout Β· Inventory
πŸ’»
SaaS
CRUD Β· Auth Β· Billing
πŸ“ž
Telecom
Plans Β· Bills Β· Support
The Brain

orchestrator.py β€” The Hub

The orchestrator doesn't do any testing itself. It coordinates every agent, tracks phase checkpoints, retries failures, and triggers the Feedback loop.

--project

Path to your project folder. All outputs (workflow.json, trace, report) land here. Required.

--goal

Natural language objective. Passed directly to PlannerAgent for scenario decomposition.

--base_url

Root URL of the app under test. Used by DiscoveryAgent and PlannerAgent for navigation grounding.

--deep

Enables full BFS semantic crawl (depth 3, 50 pages). Activates DiscoveryAgent before planning.

--force

Clears .checkpoint.json so every phase re-runs from scratch. Use when the app has changed significantly.

--headed

Launches Chromium in visible mode. Great for debugging or recording demos of the agent working.

--security

Runs SecurityAuditor after execution. Scans for XSS, SQL injection patterns, and session exposure.

--phase

Run a single phase only: planning | exploration | execution | security. Perfect for iterating fast.

Checkpoint System β€” Never re-run what already passed:

discovery βœ“planning βœ“exploration βœ“execution βœ“

Each phase writes to .checkpoint.json. Re-running skips completed phases automatically.

The Team

6 Specialized AI Agents

Each agent has one job. One responsibility. Zero overlap.

DiscoveryAgent

core/agents/discovery.py
Phase 1 Β· --deep only
Outputs

sitemap.json

Parallel BFS crawler (3 concurrent tab workers). Visits up to 50 pages, classifies each by DOM patterns (login, product_list, checkout, etc.), extracts all interactive elements, forms, and business rules.

Key Innovation

LLM semantically filters out ghost/hidden elements so the planner never plans against phantom DOM nodes.

PlannerAgent

core/agents/planner.py
Phase 2 Β· Always runs
Outputs

workflow.json

Takes your natural language goal + sitemap, detects application domain (ecommerce, finance, saas…), then prompts Gemini with strict grounding rules to decompose the goal into a structured keyword-driven workflow. Falls back gracefully if LLM parse fails.

Key Innovation

Anti-hallucination rules prevent the LLM from inventing elements not in the discovered sitemap.

ExplorerAgent

core/agents/explorer.py
Phase 3 Β· Core Agent
Outputs

trace.json + screenshots

Executes each workflow step live in Playwright. At each step, it reads the DOM, calls Gemini to decide the next action, executes it, and handles multi-tab switching, lazy loading, scrolling (keyboard + mouse + JS), and autonomous registration flows.

Key Innovation

SmartLocator tries 5 fallback strategies before failing β€” data-test, aria-label, text content, CSS, and XPath.

DeepExplorerAgent

core/agents/deep_explorer.py
Phase 3 Β· --deep mode
Outputs

extended trace.json + regression scenarios

Extended version of ExplorerAgent that generates regression scenarios on the fly while navigating. Explores branches autonomously, identifies untested paths, and appends new scenarios to the workflow for comprehensive coverage.

Key Innovation

Uses graph memory to avoid revisiting pages and prioritize unexplored branches.

ExecutorAgent

core/agents/executor.py
Phase 4 Β· Always runs
Outputs

execution.json + HTML report

Replays the trace as a deterministic Playwright test. Supports WebMCP tool interception β€” if a step matches a registered semantic tool (e.g. sort_products_by_price), it executes the tool directly instead of hunting DOM selectors. Auto-retries on failure.

Key Innovation

WebMCP coverage stat: tracks what % of steps used semantic tools vs. traditional selectors.

FeedbackAgent

core/agents/feedback_agent.py
Post-run Β· On failure
Outputs

Updated knowledge/sites/{domain}/locators.json + rules.md

Post-mortem analyst. When execution fails, it parses qa_session_logs.json, sends the failure trace to Gemini for root cause analysis, extracts bad locators + learned rules, penalizes unstable selectors in the Knowledge Bank, and saves positive/negative rules to rules.md.

Key Innovation

Filters generic programming errors β€” only domain-specific UI lessons are persisted.

Visual Blueprint

Architecture Diagram

How the orchestrator, agents, browser, and LLM all connect.

Agentic QA Architecture Diagram
Discovery
Planning
Exploration
Execution
Feedback
Knowledge
The Stack

Tools & Libraries

Every tool chosen for a specific reason. No bloat.

Playwright (Python)

Browser Automation

Drives Chromium headlessly (or headed). Handles clicks, fills, multi-tab context switching, screenshot capture after every step, and network idle detection for slow production sites.

Google Gemini

LLM Engine

Powers all AI reasoning β€” page summarization, semantic element classification, goal decomposition, and failure post-mortem analysis. Uses google-genai SDK with structured JSON response mode enabled.

WebMCP Polyfill

Protocol Bridge

Injects a navigator.modelContext polyfill into the browser page context. Registers semantic tool functions that the ExecutorAgent calls directly β€” eliminating the need for CSS/XPath selectors entirely.

python-dotenv

Config

Loads GOOGLE_API_KEY and other secrets from .env at startup. Keeps credentials out of source code. All agents call load_dotenv() on init.

Pillow + OpenCV

Vision

Used by the visual locator for screenshot analysis. Pillow handles image loading and resizing; OpenCV applies template matching and edge detection for element identification when DOM selectors fail.

Faker

Test Data

Generates realistic random test data at runtime β€” emails, names, phone numbers, addresses. Replaces placeholders like {random_email} in workflow steps. No hardcoded test users needed.

PyYAML

Knowledge Bank

Reads and writes the domain knowledge files (banking.yaml, ecommerce.yaml). Stores stable locator patterns, business rules, and compliance checks that the PlannerAgent injects into prompts.

termcolor

DX

Color-codes terminal output by phase β€” cyan for discovery, green for success, red for failures, yellow for retries. Makes it instantly clear which agent is running and whether it passed or failed.

🧠

Start Automating in 1 Command.

Clone the framework, set your goal, and watch the agent explore, plan, test, and self-heal β€” completely autonomously.

About the Author

Vishvas Dhengula β€” Lead SDET

Vishvas is a highly accomplished Software Development Engineer in Test (SDET) with 15+ years of experience architecting enterprise test automation frameworks for Fortune 500 companies across the United States and India. His expertise spans across a wide range of industry-leading automation tools, including UFT, Selenium, Cypress, Protractor, and Playwright.