Skip to content
GitHub

Testing & Evaluation

@prompd/test is the testing framework for .prmd prompt files. It uses colocated .test.prmd sidecar files to define test cases with assertions against compiled and executed prompts.

  • Colocated discovery: summarize.prmd automatically discovers summarize.test.prmd in the same directory
  • Three evaluator types ordered by cost: NLP (local/free), Script (custom logic), Prmd (LLM-based)
  • Fail-fast execution: Evaluators run in cost order — if a cheap assertion fails, expensive evaluators are skipped
  • CI-friendly: --no-llm flag runs only compilation and NLP assertions with zero API costs

Test files use YAML frontmatter to define a test suite with one or more test cases. The Markdown content block is optional and serves as the default evaluator prompt for prmd assertions.

FieldTypeRequiredDescription
namestringNoTest suite name. Defaults to filename.
descriptionstringNoHuman-readable description of the test suite.
targetstringNoRelative path to source .prmd. Auto-discovered from filename if omitted.
max_tokensnumberNoDefault max tokens for LLM execution across all test cases.
testsarrayYesArray of test case definitions.

Each entry in the tests array supports:

FieldTypeRequiredDescription
namestringNoTest case name. Defaults to test_N.
paramsobjectNoParameters passed to the target .prmd for compilation.
assertarrayNoArray of assertion definitions.
expect_errorbooleanNoIf true, the test passes when compilation fails.

Assertions use one of three evaluator types. They execute in cost order: NLP first, then Script, then Prmd. If any assertion fails, remaining evaluators in the chain are skipped.

Local, fast, free, and deterministic. Runs string and token checks against the LLM response without any external calls.

- evaluator: nlp
check: contains
value: "expected text"
CheckValue TypeDescription
containsstring or string[]Response contains all values (case-insensitive).
not_containsstring or string[]Response contains none of the values.
matchesstringResponse matches the given regex pattern.
max_tokensnumberEstimated token count is at most this value.
min_tokensnumberEstimated token count is at least this value.
starts_withstringResponse starts with this value (case-insensitive).
ends_withstringResponse ends with this value (case-insensitive).

Runs a custom script for validation logic that goes beyond string matching.

- evaluator: script
run: ./validators/schema-check.ts

The script receives a JSON object on stdin with the following shape:

interface ScriptInput {
prompt: string; // The compiled prompt sent to the LLM
response: string; // The LLM's response
params: object; // The test case parameters
metadata: object; // Additional execution metadata
}
  • Exit code 0 = PASS
  • Exit code 1 = FAIL
  • stdout = reason string (displayed in test output)

Uses an LLM to evaluate the response. The evaluator prompt can come from the content block of the .test.prmd file, a local .prmd file, or a registry package.

# Use the content block of this .test.prmd as the evaluator prompt
- evaluator: prmd
# Use a local .prmd file as the evaluator
- evaluator: prmd
prompt: ./evaluators/my-evaluator.prmd
# Use a registry package as the evaluator
- evaluator: prmd
prompt: "@prompd/eval-coherence@^1.0.0"
# Override provider and model for this evaluator
- evaluator: prmd
provider: anthropic
model: claude-sonnet-4-20250514

The following variables are available inside evaluator prompts:

VariableDescription
{{ prompt }}The compiled prompt that was sent to the LLM.
{{ response }}The LLM’s response being evaluated.
{{ params }}The full test case parameters object (JSON).
{{ params.key }}Individual parameter value via dot notation.

The params parameter type must be object (not string) to enable dot notation access like {{ params.name }}.

The evaluator prompt must respond with PASS or FAIL as the first word, followed by a reason.

Evaluators always run in cost order regardless of how they appear in the assert array:

  1. NLP — local string/token checks (free)
  2. Script — custom validation scripts (free, but has process overhead)
  3. Prmd — LLM-based evaluation (costs API tokens)

If any assertion at a given tier fails, all remaining evaluators are skipped (fail-fast).

The --no-llm flag restricts execution to compilation and NLP assertions only. Prmd evaluators are skipped entirely. This is useful for CI pipelines where you want fast, deterministic checks without API costs.

When executing test cases, the provider and model are resolved in priority order:

  1. Source .prmd frontmatter (provider / model fields)
  2. Test run options (UI selector or CLI flags)
  3. User config defaults (~/.prompd/config.yaml)
  4. Built-in fallback

Given a prompt file hello.prmd that accepts a name parameter, the following hello.test.prmd defines two test cases with mixed evaluator types:

---
id: hello-test
name: "hello.test"
version: 0.0.1
description: "Tests for Hello World prompt"
target: ./hello.prmd
max_tokens: 64
tests:
- name: "greets Alice"
params:
name: "Alice"
assert:
- evaluator: nlp
check: contains
value: "Alice"
- evaluator: nlp
check: min_tokens
value: 5
- name: "output is friendly"
params:
name: "Bob"
assert:
- evaluator: nlp
check: not_contains
value: ["error", "fail", "sorry"]
- evaluator: prmd
---
# Evaluator
You are evaluating a greeting message.
- **Name provided:** {{ params.name }}
- Addresses the person by name: "{{ params.name }}"
- Is friendly and warm tone
- Is 2-3 sentences
**Prompt:** {{ prompt }}
**Response:** {{ response }}
Respond with PASS or FAIL followed by a one-sentence reason.

The first test case (“greets Alice”) uses only NLP assertions — it checks that the response contains “Alice” and is at least 5 tokens. No LLM evaluation is needed.

The second test case (“output is friendly”) combines an NLP assertion with a Prmd evaluator. The NLP check runs first to verify no error-like words appear. If that passes, the Prmd evaluator uses the content block (the # Evaluator section) as its prompt, with {{ params.name }}, {{ prompt }}, and {{ response }} substituted at evaluation time.

@prompd/test
TestRunner # Orchestrates test execution
TestParser # Parses .test.prmd files
TestDiscovery # Finds colocated test files
EvaluatorEngine # Routes assertions to evaluators
evaluators/
NlpEvaluator # String and token checks
ScriptEvaluator # External script runner
PrmdEvaluator # LLM-based evaluation
reporters/
ConsoleReporter # Terminal output
JsonReporter # Machine-readable JSON
JunitReporter # JUnit XML for CI integration
Peer dependency: @prompd/cli

The TestHarness interface in @prompd/cli allows test framework registration as a plugin:

cli.registerTestHarness(harness);
Terminal window
# Run a specific test file
prompd test ./hello.test.prmd
# Run tests without LLM evaluation (NLP assertions only)
prompd test ./hello.test.prmd --no-llm