Blog

2 days ago

How to Bootstrap Agent Evals with Synthetic Queries

Checking agent outputs isn't enough. The real failures hide in trajectories: which tools got called, in what order, with what inputs. This article walks through a pattern for building evals when you don't have production data yet. You define the dimensions your agent varies along, generate structured tuples across them, and turn those into natural-language test queries. Run them, read the traces, write down what broke. Those notes become goals that shape the next batch of queries. Repeat until the failures vanish.

Source: HackerNoon →


Share

BTCBTC
$65,795.00
2.16%
ETHETH
$1,926.67
4.74%
USDTUSDT
$1.00
0.01%
BNBBNB
$612.60
1.98%
XRPXRP
$1.36
2.91%
USDCUSDC
$1.00
0.01%
SOLSOL
$81.76
4.54%
TRXTRX
$0.283
1.03%
FIGR_HELOCFIGR_HELOC
$1.05
2.66%
DOGEDOGE
$0.0936
3.11%
WBTWBT
$49.11
2.2%
ADAADA
$0.278
2.91%
USDSUSDS
$1.00
0%
BCHBCH
$463.03
3.15%
LEOLEO
$8.80
0.54%
HYPEHYPE
$27.39
2.55%
CCCC
$0.171
0.38%
XMRXMR
$337.94
0.79%
LINKLINK
$8.69
4.09%
USDEUSDE
$0.999
0.11%