Testing & Evaluation
Overview
Section titled “Overview”@prompd/test is the testing framework for .prmd prompt files. It uses colocated .test.prmd sidecar files to define test cases with assertions against compiled and executed prompts.
- Colocated discovery:
summarize.prmdautomatically discoverssummarize.test.prmdin the same directory - Three evaluator types ordered by cost: NLP (local/free), Script (custom logic), Prmd (LLM-based)
- Fail-fast execution: Evaluators run in cost order — if a cheap assertion fails, expensive evaluators are skipped
- CI-friendly:
--no-llmflag runs only compilation and NLP assertions with zero API costs
.test.prmd File Format
Section titled “.test.prmd File Format”Test files use YAML frontmatter to define a test suite with one or more test cases. The Markdown content block is optional and serves as the default evaluator prompt for prmd assertions.
Frontmatter Fields
Section titled “Frontmatter Fields”| Field | Type | Required | Description |
|---|---|---|---|
name | string | No | Test suite name. Defaults to filename. |
description | string | No | Human-readable description of the test suite. |
target | string | No | Relative path to source .prmd. Auto-discovered from filename if omitted. |
max_tokens | number | No | Default max tokens for LLM execution across all test cases. |
tests | array | Yes | Array of test case definitions. |
Test Case Fields
Section titled “Test Case Fields”Each entry in the tests array supports:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | No | Test case name. Defaults to test_N. |
params | object | No | Parameters passed to the target .prmd for compilation. |
assert | array | No | Array of assertion definitions. |
expect_error | boolean | No | If true, the test passes when compilation fails. |
Evaluator Types
Section titled “Evaluator Types”Assertions use one of three evaluator types. They execute in cost order: NLP first, then Script, then Prmd. If any assertion fails, remaining evaluators in the chain are skipped.
NLP Evaluator
Section titled “NLP Evaluator”Local, fast, free, and deterministic. Runs string and token checks against the LLM response without any external calls.
- evaluator: nlp check: contains value: "expected text"Check Types
Section titled “Check Types”| Check | Value Type | Description |
|---|---|---|
contains | string or string[] | Response contains all values (case-insensitive). |
not_contains | string or string[] | Response contains none of the values. |
matches | string | Response matches the given regex pattern. |
max_tokens | number | Estimated token count is at most this value. |
min_tokens | number | Estimated token count is at least this value. |
starts_with | string | Response starts with this value (case-insensitive). |
ends_with | string | Response ends with this value (case-insensitive). |
Script Evaluator
Section titled “Script Evaluator”Runs a custom script for validation logic that goes beyond string matching.
- evaluator: script run: ./validators/schema-check.tsThe script receives a JSON object on stdin with the following shape:
interface ScriptInput { prompt: string; // The compiled prompt sent to the LLM response: string; // The LLM's response params: object; // The test case parameters metadata: object; // Additional execution metadata}- Exit code 0 = PASS
- Exit code 1 = FAIL
- stdout = reason string (displayed in test output)
Prmd Evaluator
Section titled “Prmd Evaluator”Uses an LLM to evaluate the response. The evaluator prompt can come from the content block of the .test.prmd file, a local .prmd file, or a registry package.
# Use the content block of this .test.prmd as the evaluator prompt- evaluator: prmd
# Use a local .prmd file as the evaluator- evaluator: prmd prompt: ./evaluators/my-evaluator.prmd
# Use a registry package as the evaluator- evaluator: prmd prompt: "@prompd/eval-coherence@^1.0.0"
# Override provider and model for this evaluator- evaluator: prmd provider: anthropic model: claude-sonnet-4-20250514Template Variables
Section titled “Template Variables”The following variables are available inside evaluator prompts:
| Variable | Description |
|---|---|
{{ prompt }} | The compiled prompt that was sent to the LLM. |
{{ response }} | The LLM’s response being evaluated. |
{{ params }} | The full test case parameters object (JSON). |
{{ params.key }} | Individual parameter value via dot notation. |
The params parameter type must be object (not string) to enable dot notation access like {{ params.name }}.
The evaluator prompt must respond with PASS or FAIL as the first word, followed by a reason.
Execution
Section titled “Execution”Evaluator Ordering
Section titled “Evaluator Ordering”Evaluators always run in cost order regardless of how they appear in the assert array:
- NLP — local string/token checks (free)
- Script — custom validation scripts (free, but has process overhead)
- Prmd — LLM-based evaluation (costs API tokens)
If any assertion at a given tier fails, all remaining evaluators are skipped (fail-fast).
No-LLM Mode
Section titled “No-LLM Mode”The --no-llm flag restricts execution to compilation and NLP assertions only. Prmd evaluators are skipped entirely. This is useful for CI pipelines where you want fast, deterministic checks without API costs.
Provider and Model Resolution
Section titled “Provider and Model Resolution”When executing test cases, the provider and model are resolved in priority order:
- Source
.prmdfrontmatter (provider/modelfields) - Test run options (UI selector or CLI flags)
- User config defaults (
~/.prompd/config.yaml) - Built-in fallback
Complete Example
Section titled “Complete Example”Given a prompt file hello.prmd that accepts a name parameter, the following hello.test.prmd defines two test cases with mixed evaluator types:
---id: hello-testname: "hello.test"version: 0.0.1description: "Tests for Hello World prompt"target: ./hello.prmdmax_tokens: 64tests: - name: "greets Alice" params: name: "Alice" assert: - evaluator: nlp check: contains value: "Alice" - evaluator: nlp check: min_tokens value: 5
- name: "output is friendly" params: name: "Bob" assert: - evaluator: nlp check: not_contains value: ["error", "fail", "sorry"] - evaluator: prmd---
# Evaluator
You are evaluating a greeting message.
- **Name provided:** {{ params.name }}- Addresses the person by name: "{{ params.name }}"- Is friendly and warm tone- Is 2-3 sentences
**Prompt:** {{ prompt }}**Response:** {{ response }}
Respond with PASS or FAIL followed by a one-sentence reason.The first test case (“greets Alice”) uses only NLP assertions — it checks that the response contains “Alice” and is at least 5 tokens. No LLM evaluation is needed.
The second test case (“output is friendly”) combines an NLP assertion with a Prmd evaluator. The NLP check runs first to verify no error-like words appear. If that passes, the Prmd evaluator uses the content block (the # Evaluator section) as its prompt, with {{ params.name }}, {{ prompt }}, and {{ response }} substituted at evaluation time.
Package Architecture
Section titled “Package Architecture”@prompd/test TestRunner # Orchestrates test execution TestParser # Parses .test.prmd files TestDiscovery # Finds colocated test files EvaluatorEngine # Routes assertions to evaluators evaluators/ NlpEvaluator # String and token checks ScriptEvaluator # External script runner PrmdEvaluator # LLM-based evaluation reporters/ ConsoleReporter # Terminal output JsonReporter # Machine-readable JSON JunitReporter # JUnit XML for CI integration
Peer dependency: @prompd/cliCLI Integration
Section titled “CLI Integration”The TestHarness interface in @prompd/cli allows test framework registration as a plugin:
cli.registerTestHarness(harness);# Run a specific test fileprompd test ./hello.test.prmd
# Run tests without LLM evaluation (NLP assertions only)prompd test ./hello.test.prmd --no-llm