Back to All Scenarios
Scenario 96 of 100
E2E Scenarios
Advanced
Verifying PDF Files Opened Inside the Web Browser
πScenario Overview
Verifying PDF Files Opened Inside the Web Browser
Key Takeaways & Cheat Sheet
- βConfigure browser profile options to download PDF files directly
- βParse local PDF contents using PDFBox java helper libraries
- βExtract PDF document texts to run standard assertion checks
- βAvoid fragile visual layout analysis on rendering canvases
Short Direct Answer
Web browsers render PDFs on canvas elements, which are difficult to automate reliably. The best strategy is to configure your browser options to download PDFs directly, and then use PDF-parsing libraries like `Apache PDFBox` to extract and assert against the file's actual text content.
β οΈ Senior Warning (Red Flag)
Do not write image-matching scripts to verify PDF contents inside the browser. Standard visual checks are highly sensitive to resolution changes, leading to brittle tests.
π‘ STAR Deep Dive Explanation & Pro Tip
Using PDFBox to parse content directly is extremely fast, highly reliable, and works seamlessly in both local and headless CI runs.
SeleniumAutomation.java
Selenium 4 + Javaimport org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.BufferedInputStream;
import java.net.URL;
public void verifyPdfContent(String pdfUrl) throws Exception {
// 1. Establish network connection to PDF URL
URL url = new URL(pdfUrl);
BufferedInputStream fileStream = new BufferedInputStream(url.openStream());
// 2. Parse PDF document
PDDocument doc = PDDocument.load(fileStream);
String pdfText = new PDFTextStripper().getText(doc);
// 3. Verify content
System.out.println("Total Pages: " + doc.getNumberOfPages());
assert pdfText.contains("Tax Invoice") : "Invoice verification failed!";
assert pdfText.contains("Total Due: $1,250.00");
doc.close();
}