Visualization¶

Workspace-Bench includes a built-in visualization dashboard for browsing evaluation results, inspecting task outputs, and analyzing agent performance.

Getting Started¶

Prerequisites¶

Node.js ≥ 18

Install and Run¶

cd viz
npm install
npm run dev

This starts both the Vite React frontend and the Express API backend concurrently:

Frontend: http://localhost:5173
API: http://localhost:3000

The dashboard automatically discovers run directories under evaluation/output/ relative to the project root.

Build for Production¶

npm run build

The production build outputs static files to viz/dist/.

Dashboard Pages¶

Home¶

Overview of all completed benchmark runs. Each run card shows:

Harness and model name
Dataset split (Smoke / Lite / Full)
Pass / fail / error / timeout counts
Total duration

Run Detail¶

Drill into a specific run to see:

Task list with per-task status, duration, and token usage
Rubric judgment summary (passed vs failed criteria)
Output files produced by the agent
Dependency graph extracted from tool calls

File View¶

Side-by-side file comparison with syntax highlighting (powered by Monaco Editor):

Workspace files — Original inputs from the task
Agent outputs — Files produced by the agent
Ground truth — Reference standard outputs (if available)

Statistics¶

Aggregate analysis across runs:

Rubric success rate by task type and difficulty
Token histogram — Prompt vs completion distribution
Tool call frequency — Which tools agents use most

Tech Stack¶

Frontend: React 18 + TypeScript + Vite + Tailwind CSS + Zustand
Backend: Express + TypeScript
Editor: Monaco Editor

Customizing the Data Path¶

By default, the API server scans ../evaluation/output/ for run directories. To point it elsewhere, set the environment variable before starting the server:

EVAL_OUTPUT_DIR=/path/to/output npm run server:dev