I kept running into the same problem with AI metrics: we were measuring usage, not impact. License counts told us who had access. Prompt volume told us who was active. But neither answered the question that actually mattered to me: are we becoming better, more effective engineers because of AI?
I created a composite Productivity Signal, a daily 0–100 score per builder (not just engineers, AI enables everyone to build) and not to gamify output, but to measure whether AI use was translating into meaningful engineering work.
The model is intentionally opinionated:
- 50 points: PR merge rate. If code does not ship, it should not count.
- 25 points: AI shipping days. AI use without shipping is just noise.
- 25 points: Code reviews given. Because acceleration without review usually turns into rework somewhere else.
That gives us four tiers of builder maturity: Manual (L1), Assisted (L2), Augmented (L3), and Autonomous (L4). I have added a quality cap: CI pass rate and review pass rate to have a clear a threshold before speed gets rewarded. Otherwise you end up measuring throughput while quietly degrading quality. And yes, the score is biased by design. It penalizes usage without shipping, shipping without review, and speed without quality. Because if we are serious about understanding AI’s value in building, we have to be honest about what counts.