RESEARCH · LLM EVAL · GRAPHQL FUZZER · SF
>_

avery k

@averyk · ex-anthropic / indep

ML engineer. Open-source eval harnesses, weird agent experiments, occasional papers. Was at Anthropic — now independent.

NOW SHIPPING
v0.7 · MIT

evalkit_

Open-source LLM eval harness with side-by-side judge mode. 4.2k stars in 8 weeks. Used by 11 labs.

STARS
4,218
VER
0.7
CONTRIB
32
LICENSE
MIT
STARS · LAST 30 DAYS +1,247
Star on GitHub  ↗
SHIPPED 2 LIVE · 1 ARCHIVED · 6.4k ★
evalkit LIVE ★ 4,218 Apr 26
gqfuzz LIVE ★ 1,847 '25
promptd ARCHIVED ★ 412 '23