AppFolio is building an AI-native platform for property management called Realm-X. They are hiring a Staff Machine Learning Engineer to develop the ML platform that supports training, fine-tuning, inference, RAG, and evaluation. The role focuses on ML infrastructure on AWS, cost optimization, multi-provider LLM access, and productionizing research prototypes.
Responsibilities
Design and operate AppFolio's ML infrastructure on AWS — ECS, SageMaker, GPU fleets, model serving, autoscaling, and cost controls.
Drive AI Cost Discipline: Optimize cost across all AI applications — provider routing, caching, batch vs. real-time, model size selection, and inference economics.
Maintain reliable, multi-provider LLM access across Google, OpenAI, and Anthropic with sensible fallbacks and abstractions.
Build the training and fine-tuning stack for Small Language Models, including data pipelines, GPU orchestration, and evaluation.
Partner with Voice & Agents and Research ML engineers to harden their prototypes into production systems with SLOs, on-call rotations, and observability.
Operate AppFolio's AI safety and authorization layer — guardrails on AWS, scoped tool permissions, and human-in-the-loop gates for autonomous agent actions.
Requirements
ML infra at scale: has built and operated production ML infrastructure on AWS — ECS, SageMaker, GPUs, autoscaling, and cost controls.