Anonymous View

DEV Community

Maya Andersson profile picture

Maya Andersson

Just a bored curious dev

Joined Joined on 
We put confidence intervals on our LLM-judge scores. The error bars ate three weeks of "trend"

We put confidence intervals on our LLM-judge scores. The error bars ate three weeks of "trend"

Comments
2 min read

Want to connect with Maya Andersson?

Create an account to connect with Maya Andersson. You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
More eval traces will not stabilize your kappa. Stratify the ones you have

More eval traces will not stabilize your kappa. Stratify the ones you have

1
Comments
2 min read
Calibration set size for LLM-as-judge: when 50 traces is enough and when 200 is mandatory

Calibration set size for LLM-as-judge: when 50 traces is enough and when 200 is mandatory

1
Comments
9 min read
why Cohen's kappa drifts week to week (and what to do about it)

why Cohen's kappa drifts week to week (and what to do about it)

7
Comments 1
1 min read
Your LLM-as-judge eval set is too small. Here is the math

Your LLM-as-judge eval set is too small. Here is the math

7
Comments 1
4 min read
loading...