Open Source · Built with Gemini + Playwright

I Built an Agentic QA Framework.
Here's Exactly How It Works.

If you're looking to implement an Autonomous AI QA Agent — one that discovers, plans, explores, executes, and self-heals — this is the post for you.

5-Phase Autonomous Pipeline

WebMCP · Gemini · Playwright

Self-Healing · Zero Maintenance

What I Built → inner-event

🤖

Autonomous Agents

Discovery · Planner · Explorer · Executor · Feedback — 5 specialized AI agents, each with a single job.

⚡

WebMCP Polyfill

Injects semantic tools into the browser. No more fragile CSS selectors. 95% execution cost reduction.

🧬

Closed-Loop Learning

Failures train the system. Every broken run updates the RAG Knowledge Bank. It gets smarter every time.

One command. It explores your app, writes the test, runs it, and fixes itself when it fails.

The Goal

You write a goal. The AI writes the tests.

No page objects. No locators. No scripts.

Your Goal (Natural Language):

“Log in, sort products by price low-to-high, add the cheapest item to cart, and checkout.”

The Command:

bash

python orchestrator.py \
  --project projects/my_app \
  --base_url "https://www.saucedemo.com/" \
  --goal "Log in, sort by price low-to-high, add cheapest item, checkout" \
  --headed

What happens next (automatically):

🔍 Discovery📋 Planning🤖 Exploration⚡ Execution📊 HTML Report🔧 Self-Healing

Under the Hood

5 Phases. 5 Agents. Zero Manual Work.

🕷️ Crawls the DOM and maps every page structure
🗺️ Builds a semantic sitemap (login forms, grids, navbars)
🧠 Runs only in --deep mode for regression suites

Run: --deep

Semantic Discovery Agent mapping web app

Secret Weapon

WebMCP — No Selectors. Ever.

The biggest win in this framework. Instead of brittle CSS selectors, the agent uses semantic tool calls injected directly into the browser context.

100% reliable — no more dropdown failures

95% cost reduction — zero vision API calls during execution

Polyfill works on any browser — no Chrome 145+ required

bash

# Selector-less execution with WebMCP
python execute_with_webmcp.py \
  --project projects/my_ecommerce \
  --headed

❌ Old Way

page.click('[data-test="sort-container"]')

Breaks when ID changes. Constant maintenance.

✅ WebMCP Way

call_tool('sort_products_by_price', {direction: 'low_to_high'})

Semantic. Resilient. Self-describing. Never breaks.

Power Mode

Banking? Healthcare? SaaS?
Use `--deep` mode.

Full semantic discovery + regression suite generation + security audit. One flag.

bash

python orchestrator.py \
  --project projects/parabank \
  --base_url "https://parabank.parasoft.com/parabank/" \
  --goal "Register new user and transfer funds" \
  --deep --security

🏦

Banking

KYC · Transfers · Compliance

🛍️

E-Commerce

Cart · Checkout · Inventory

💻

SaaS

CRUD · Auth · Billing

📞

Telecom

Plans · Bills · Support

The Brain

orchestrator.py — The Hub

The orchestrator doesn't do any testing itself. It coordinates every agent, tracks phase checkpoints, retries failures, and triggers the Feedback loop.

--project

Path to your project folder. All outputs (workflow.json, trace, report) land here. Required.

--goal

Natural language objective. Passed directly to PlannerAgent for scenario decomposition.

--base_url

Root URL of the app under test. Used by DiscoveryAgent and PlannerAgent for navigation grounding.

--deep

Enables full BFS semantic crawl (depth 3, 50 pages). Activates DiscoveryAgent before planning.

--force

Clears .checkpoint.json so every phase re-runs from scratch. Use when the app has changed significantly.

--headed

Launches Chromium in visible mode. Great for debugging or recording demos of the agent working.

--security

Runs SecurityAuditor after execution. Scans for XSS, SQL injection patterns, and session exposure.

--phase

core/agents/feedback_agent.py

Post-run · On failure

Outputs

Updated knowledge/sites/{domain}/locators.json + rules.md

Post-mortem analyst. When execution fails, it parses qa_session_logs.json, sends the failure trace to Gemini for root cause analysis, extracts bad locators + learned rules, penalizes unstable selectors in the Knowledge Bank, and saves positive/negative rules to rules.md.

Key Innovation

Filters generic programming errors — only domain-specific UI lessons are persisted.

Visual Blueprint

Architecture Diagram

How the orchestrator, agents, browser, and LLM all connect.

Discovery

Planning

Exploration

Execution

Feedback

Knowledge

The Stack

Tools & Libraries

Every tool chosen for a specific reason. No bloat.

Playwright (Python)

Browser Automation

Drives Chromium headlessly (or headed). Handles clicks, fills, multi-tab context switching, screenshot capture after every step, and network idle detection for slow production sites.

Google Gemini

LLM Engine

Powers all AI reasoning — page summarization, semantic element classification, goal decomposition, and failure post-mortem analysis. Uses google-genai SDK with structured JSON response mode enabled.

WebMCP Polyfill

Protocol Bridge

Injects a navigator.modelContext polyfill into the browser page context. Registers semantic tool functions that the ExecutorAgent calls directly — eliminating the need for CSS/XPath selectors entirely.

python-dotenv

Config

Loads GOOGLE_API_KEY and other secrets from .env at startup. Keeps credentials out of source code. All agents call load_dotenv() on init.

Pillow + OpenCV

Vision

Used by the visual locator for screenshot analysis. Pillow handles image loading and resizing; OpenCV applies template matching and edge detection for element identification when DOM selectors fail.

Faker

Test Data

Generates realistic random test data at runtime — emails, names, phone numbers, addresses. Replaces placeholders like {random_email} in workflow steps. No hardcoded test users needed.

PyYAML

Knowledge Bank

Reads and writes the domain knowledge files (banking.yaml, ecommerce.yaml). Stores stable locator patterns, business rules, and compliance checks that the PlannerAgent injects into prompts.

termcolor

Color-codes terminal output by phase — cyan for discovery, green for success, red for failures, yellow for retries. Makes it instantly clear which agent is running and whether it passed or failed.

🧠

Start Automating in 1 Command.

Clone the framework, set your goal, and watch the agent explore, plan, test, and self-heal — completely autonomously.

Learn AI QA Automation Try in QA Playground

About the Author

Vishvas Dhengula — Lead SDET

Vishvas is a highly accomplished Software Development Engineer in Test (SDET) with 15+ years of experience architecting enterprise test automation frameworks for Fortune 500 companies across the United States and India. His expertise spans across a wide range of industry-leading automation tools, including UFT, Selenium, Cypress, Protractor, and Playwright.

I Built an Agentic QA Framework.
Here's Exactly How It Works.

What I Built → inner-event

Autonomous Agents

WebMCP Polyfill

Closed-Loop Learning

You write a goal. The AI writes the tests.

5 Phases. 5 Agents. Zero Manual Work.

Semantic Discovery

AI Goal Planning

Dynamic Exploration

Self-Healing Execution

Closed-Loop Learning

WebMCP — No Selectors. Ever.

Banking? Healthcare? SaaS?
Use `--deep` mode.

orchestrator.py — The Hub

6 Specialized AI Agents

DiscoveryAgent

PlannerAgent

ExplorerAgent

DeepExplorerAgent

ExecutorAgent

FeedbackAgent

Architecture Diagram

Tools & Libraries

Playwright (Python)

Google Gemini

WebMCP Polyfill

python-dotenv

Pillow + OpenCV

Faker

PyYAML

termcolor

Start Automating in 1 Command.

About the Author

Vishvas Dhengula — Lead SDET

I Built an Agentic QA Framework.Here's Exactly How It Works.

What I Built → inner-event

Autonomous Agents

WebMCP Polyfill

Closed-Loop Learning

You write a goal. The AI writes the tests.

5 Phases. 5 Agents. Zero Manual Work.

Semantic Discovery

AI Goal Planning

Dynamic Exploration

Self-Healing Execution

Closed-Loop Learning

WebMCP — No Selectors. Ever.

Banking? Healthcare? SaaS?Use --deep mode.

orchestrator.py — The Hub

6 Specialized AI Agents

DiscoveryAgent

PlannerAgent

ExplorerAgent

DeepExplorerAgent

ExecutorAgent

FeedbackAgent

Architecture Diagram

Tools & Libraries

Playwright (Python)

Google Gemini

WebMCP Polyfill

python-dotenv

Pillow + OpenCV

Faker

PyYAML

termcolor

Start Automating in 1 Command.

About the Author

Vishvas Dhengula — Lead SDET

I Built an Agentic QA Framework.
Here's Exactly How It Works.

Banking? Healthcare? SaaS?
Use `--deep` mode.