Evaluation & Testing & Large Language ModelsEvaluation & Testing
Evaluation & Testing
Interview Prep Portal
Master Large Language Models (LLMs), RAG pipelines, vector semantic search, embedding geometries, prompt engineering methodologies, and autonomous tool-calling AI agents.
LLMs & TransformersRAG PipelinesVector SearchPrompt EngineeringAI Agents
PROGRESS0 / 29 Mastered
0%
Filter Level:
Evaluation & TestingIntermediateQ1
AI Agent Evaluation
Evaluation & TestingBeginnerQ2
What is evaluation-driven development for AI applications?
Evaluation & TestingIntermediateQ3
How do you evaluate LLM outputs? What metrics do you use?
Evaluation & TestingIntermediateQ4
Explain BLEU, ROUGE, and BERTScore. When would you use each?
Evaluation & TestingAdvancedQ5
What is G-Eval, and how does it use LLMs for evaluation?
Evaluation & TestingAdvancedQ6
What is LLM-as-a-judge evaluation, and what are its limitations?
Evaluation & TestingIntermediateQ7
How do you conduct human evaluation for AI systems?
Evaluation & TestingAdvancedQ8
What is red teaming, and how do you red team an LLM application?
Evaluation & TestingAdvancedQ9
How do you detect and measure hallucinations in LLM outputs?
Evaluation & TestingIntermediateQ10
What is adversarial testing for AI systems?
Evaluation & TestingIntermediateQ11
How do you build a regression test suite for AI applications?
Evaluation & TestingBeginnerQ12
What are benchmark suites (MMLU, HumanEval, GSM8K), and how do you interpret them?
Evaluation & TestingAdvancedQ13
How do you evaluate a RAG system end-to-end?
Evaluation & TestingAdvancedQ14
How do you evaluate the quality of AI agents?
Evaluation & TestingIntermediateQ15
What is the difference between offline and online evaluation for AI systems?
Evaluation & TestingAdvancedQ16
How do you measure factual consistency in LLM outputs?
Evaluation & TestingIntermediateQ17
How do you evaluate multi-turn conversation quality?
Evaluation & TestingBeginnerQ18
What is the role of golden datasets in AI evaluation?
Evaluation & TestingAdvancedQ19
How do you implement continuous evaluation for production AI systems?
Evaluation & TestingAdvancedQ20
How do you evaluate bias in AI model outputs?
Evaluation & TestingAdvancedQ21
How do you compare two models or prompts in a statistically rigorous way?
Evaluation & TestingIntermediateQ22
How do you evaluate the robustness of an LLM application across input variations?
Evaluation & TestingBeginnerQ23
What are the key differences between evaluating traditional ML vs LLM applications?
Evaluation & TestingIntermediateQ24
How do you set up an evaluation framework from scratch for a new LLM application?
Evaluation & TestingAdvancedQ25
Your model passes one fairness metric but fails another. How do you handle conflicting audit results?
Evaluation & TestingAdvancedQ26
Your model was fair at deployment, but became biased 6 months later. How do you monitor continuously?
Evaluation & TestingIntermediateQ27
An external auditor cannot reproduce your model's results. How do you ensure audit reproducibility?
Evaluation & TestingAdvancedQ28
How do you structure red teaming for an LLM chatbot before launch?
Evaluation & TestingAdvancedQ29