Steven Willmott

Steven Willmott

Builder at Safe Intelligence

About

Previously CEO of 3scale and helped run a Multi-Agent System lab in the early 2000s. Now CEO at Safe Intelligence and product lead on /Spec27.

Badges

Tastemaker
Tastemaker
Veteran
Veteran
Gone streaking
Gone streaking

Forums

What kind of Agent validation are you doing today?

Everything started with model Evals and benchmarks (which model is better?), then evolved to prompt management and from there to analyzing traces. What do people do today, and how are they sourcing test datasets?

View more