Senior MLOps Engineer - LLMscbf2e9f8-800a-4934-8118-87645fead87e - Job Board X

 See all jobs

Prosus

Technology, Information and Internet

10000+

Employees

Senior MLOps Engineer - LLMs

Amsterdam

Full-time

Not specified

Mid-Senior level

Salary

Sponsorship

Posted on:

February 18, 2026

15% more than your current base salary

SAVE

APPLY

👥

Clicked Apply

Job Description

Join our AI team at Prosus, the largest consumer internet company in Europe and one of the biggest tech investors in the world. You'll be working on the team that drives growth and innovation across the company, with your work directly impacting how millions of people shop online.Who We’re Looking ForWe're seeking an experienced Senior MLOps Engineer to build and operate the infrastructure that powers our LLM systems at scale. You'll own the deployment pipelines, serving infrastructure, and production APIs that enable our ML teams to ship models with confidence. You have deep expertise in model serving optimization with vLLM, understand how to balance latency, throughput, and cost at scale, and excel at building reliable systems that enable rapid experimentation. You're motivated by seeing ML research make it to production efficiently and thrive in environments where infrastructure quality directly impacts business outcomes.What you’ll doML PipelinesBuild ML pipelines for data ingestion, processing, model deployment, and evaluationOwn CI/CD for ML systems, including automated testing, model versioning, and deployment workflowsImplement monitoring for model performance, latency, throughput, and costs with budget alertingSet up experiment tracking and model registry systems (MLflow, Weights & Biases, or similar)Define and monitor SLIs/SLOs for production model servingInfrastructure & OrchestrationManage Kubernetes and Slurm clusters for GPU workloads with multi-tenant resource allocationOptimize GPU utilization and implement cost controls across training and inference workloadsOwn CI/CD pipelines, model versioning, and deployment automationModel Serving & APIsDeploy and optimize LLM serving infrastructure using vLLMApply inference optimizations: quantization, continuous batching, PagedAttention, KV cache management to maximize throughput and minimize latencyDesign and build production-grade async API services (FastAPI, etc.) with pre/post-processing, business logic, and strict latency SLAsContinuously optimize serving costs through model compression, batching strategies, and infrastructure tuningImplement A/B testing infrastructure and canary deployments for safe model rolloutsEnablement & Best PracticesCreate templates and documentation to accelerate team productivityEstablish MLOps best practices and guide teams in their adoptionSupport model training experiments when needed Minimum Qualifications5+ years in MLOps, DevOps, or platform engineering with focus on ML workloadsExpert-level experience deploying and optimizing LLM serving infrastructureStrong Python skills with experience building production APIs (FastAPI or similar)Proven experience with cost optimization for GPU-intensive workloads: tracking, budgeting, alerting, and resource efficiencyHands-on experience with Kubernetes and Docker for GPU workloadsExperience with job orchestration systems (Slurm, Ray, Argo, Kubeflow, or similar)Solid understanding of monitoring and observability for production ML systemsNaturally curious with a track record of proactively identifying and implementing improvementsPreferred QualificationsDeep knowledge of GPU architectures and their performance implications for inference optimizationExpertise in model compression techniques: quantization (INT8, INT4, FP8), pruning, distillation for production deploymentUnderstanding of security best practices for ML serving: authentication, authorization, rate limiting, model access controlsExperience managing multi-tenant GPU clusters with fair scheduling and resource isolationProficiency with infrastructure-as-code tools Experience supporting distributed training infrastructure: multi-node job orchestration, checkpoint management, debugging training failuresContributions to open-source MLOps tools or serving frameworksWhat We OfferCritical infrastructure ownership for high-impact AI projects that are strategically vital to the company, with direct visibility to senior leadership including the CEOState-of-the-art GPU infrastructure: H200 fleet, vLLM serving stack, cutting-edge optimization toolsExpert ML team who have released top Hugging Face models, published at NeurIPS, and built production systems that will run on your infrastructureSignificant autonomy in designing MLOps solutions, choosing tools, and shaping infrastructure strategy for LLM servingModern tooling: Latest MLOps frameworks, coding assistants, best-in-class development environmentHybrid work model with our Amsterdam office - home to the AI House, bringing together 200+ AI professionals through events and collaborationsCompetitive compensation, top-spec MacBook Pro, and an environment genuinely built for professional growth and learningIf you're passionate about building scalable, high-performance infrastructure that enables cutting-edge AI deployment and want to see your work impact millions of users globally, let's talk.Our Diversity & Inclusion Commitment We respect the dignity and human rights of individuals and communities wherever we operate in the world. Building an inclusive workplace where everyone feels welcome and can thrive is critical for us. We provide access to education, which helps everyone understand the important role they play and the positive impact they can have.For a deeper look at our journey and future plans, explore our latest Annual Report . Stay up to date with our latest news to see what makes Prosus stand out. Learn more at www.prosus.com .