// instruments · laserbrain judge
the judge
A model can't always be benchmarked, but its output can be judged. Paste a task and a response — or two, to compare — pick what you're grading on, and a hosted model returns a scored verdict. It's a rubric run on a model, the way alice is a persona run on a model.
// the task they answered
// response A
// response B
// judge on
// what this is, and isn't
This is one hosted model's scored opinion (Meta's llama-4-scout on Cloudflare), not a benchmark and not the last word. LLM judges are real but noisy — they can be swayed by length, order, and confidence, so this one is told to watch for that, and pairwise (A vs B) is steadier than scoring one in a vacuum. Treat it as a fast second read, not a verdict. The laserbrain oscillator field isn't doing the judging; a language model is — same honest split as alice.