ID.me seeks a Staff Software Engineer to define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring reliability of agentic systems. You will build eval infrastructure, production observability, and developer tooling for AI features, while mentoring engineers and establishing quality standards across the org.
Responsibilities
Define AI Quality Standards and own the evaluation framework for AI agents
Build and maintain evaluation pipelines for LLM outputs and agent behavior
Instrument agentic systems for production observability and behavioral drift detection
Lead the design of test suites for non-deterministic AI outputs
Champion developer experience by building internal tooling and feedback loops
Drive AI-first engineering culture and mentor engineers on AI testing best practices
Collaborate with Security, Platform, Product, and AI/ML teams to embed quality gates
Requirements
Bachelor's degree in Computer Science, Engineering, or equivalent experience
8+ years building and operating production software systems
Demonstrated experience evaluating or testing LLM-powered features or autonomous agents in production
Proficiency with AI-assisted development tools (Claude Code, Cursor, or equivalent)
ID.me is an online identity network company that provides a digital identity wallet for secure identity verification and authentication. It allows users to prove their identity once and access various services across government agencies, healthcare organizations, and commercial retailers.