About the Role

ThirdLaw builds the control layer for enterprise AI. This role involves designing and building real-time evaluation logic to enforce AI safety policies using LLMs, semantic similarity, and classifiers. Engineers will integrate with foundation models and build tools for monitoring and debugging.

Responsibilities

Design and build real-time evaluation logic that determines whether LLM prompts or outputs violate enterprise policies.
Implement evaluation strategies using a mix of semantic similarity, foundation model scoring, rule-based systems, and statistical checks.
Integrate model outputs with downstream enforcement actions (e.g. redaction, escalation, blocking).
Prototype, tune, and productize small language models and prompt templates for classification, labeling, or scoring.
Collaborate with data infrastructure engineers to connect evaluation logic with ingestion and storage layers.
Build tools to observe, debug, and improve evaluator performance across real-world data distributions.
Define abstractions for reusable evaluation components that can scale across use cases.

Requirements

7+ years of experience in ML systems or AI engineering roles, with at least 1–2 years working directly with LLMs, NLP pipelines, or semantic search.
Deep understanding of foundation models (e.g. OpenAI, Claude, Mistral, Llama) and how to work with them via APIs or open source.

Wfh

San Francisco · US · 8+ employees

WFH.team provides remote job intelligence for candidates and employers, offering confirmed remote job listings and resume-based matching. They also provide various employer-facing hiring tools and public resources for remote work workflows.

Senior AI Engineer - AI Agents / RAG / Fine-Tuning

Magellan Technology Research Institute (MTRI)

広域東京エリア · JP

LLM Evaluation Engineer

About the Role

Responsibilities

Requirements

Wfh

Staff Solutions Architect, AI (Remote)

AI Research Engineer – Datadog AI Research (DAIR)

Research Engineer, AI for Science

Related Jobs

LLM Evaluation Engineer

Senior AI Engineer – LLM, RAG

Sr. AI Engineer Full-time Role - Bellevue, WA

Nice to Have

Tech Stack

AI Engineer

DataOps / MLOps Engineer (Strong DevOps Focus)

AI Engineer - Assistant Vice President

Senior AI Engineer - AI Agents / RAG / Fine-Tuning

AI Engineer/ML Engineer - Senior Developers - AI Training - Atlanta, US