RESEARCH · LLM EVAL · GRAPHQL FUZZER · SF
>_
avery k
@averyk · ex-anthropic / indep
ML engineer. Open-source eval harnesses, weird agent experiments, occasional papers. Was at Anthropic — now independent.
NOW SHIPPING
v0.7 · MIT
evalkit_
Open-source LLM eval harness with side-by-side judge mode. 4.2k stars in 8 weeks. Used by 11 labs.
STARS
4,218
VER
0.7
CONTRIB
32
LICENSE
MIT
STARS · LAST 30 DAYS
+1,247
SHIPPED
2 LIVE · 1 ARCHIVED · 6.4k ★
| evalkit | LIVE | ★ 4,218 | Apr 26 |
| gqfuzz | LIVE | ★ 1,847 | '25 |
| promptd | ARCHIVED | ★ 412 | '23 |