Anonymous View

Skip to content

DEV Community

Maya Andersson

Just a bored curious dev

Joined on May 19, 2026

Jun 11

We put confidence intervals on our LLM-judge scores. The error bars ate three weeks of "trend"

#datascience #statistics #machinelearning #ai

2 min read

Want to connect with Maya Andersson?

Create an account to connect with Maya Andersson. You can also sign in below to proceed if you already have an account.

Already have an account? Sign in

Jun 9

More eval traces will not stabilize your kappa. Stratify the ones you have

#ai #programming #devops #agents

2 min read

Jun 4

Calibration set size for LLM-as-judge: when 50 traces is enough and when 200 is mandatory

#ai #development #programming #tutorial

9 min read

Jun 2

why Cohen's kappa drifts week to week (and what to do about it)

#ai #evaluation #machinelearning #statistics

1 min read

May 26

Your LLM-as-judge eval set is too small. Here is the math

#ai #machinelearning #llm #datascience

4 min read

loading...