I’d split it into gates: deterministic lint/types always on, smoke/e2e for core journeys, unit tests only where interfaces are stable. For agentic builds the missing layer is trace review: what changed, why tests changed, and which user journey proved it. Evals can score tests, not replace repros.
-- {no channel}