Join our AI team at Prosus, the largest consumer internet company in Europe and one of the biggest tech investors in the world. You'll be working on the team that drives growth and innovation across the company, with your work directly impacting how millions of people shop online.Who We’re Looking ForWe're seeking a Senior Machine Learning Engineer to train domain-specific language models and provide technical leadership to the team. You'll own critical parts of our training infrastructure, mentor engineers, and drive technical decisions from data preparation through production deployment. You have deep hands-on experience training language models at scale, lead by example through rigorous experimentation and high-quality code, and are motivated by seeing your work deployed to millions of users. You thrive in fast-paced environments where you balance technical depth with practical business impact.What You’ll DoAnalyze model performance and training data, formulate hypotheses, design and execute rigorous experiments to systematically improve model quality, training and inference efficiency, and downstream task performanceDrive technical decision-making for model architecture, training strategies, and infrastructure choicesProvide technical leadership and mentorship to ML engineers and interns, conducting code reviews, sharing best practices, and accelerating team growthTrain large language models through continued pre-training and full parameter fine-tuning on proprietary datasetsBuild and optimize distributed training infrastructure across multi-node GPU clusters using frameworks like DeepSpeed, FSDP, Megatron-LM, or AxolotlOwn large-scale data preparation: filtering, quality assessment, deduplication, and data mixture strategies for training corpora at 100B+ token scaleGenerate and curate high-quality synthetic data for instruction fine-tuning and capability enhancementDebug training stability issues, optimize training and inference throughput (quantization, distillation, serving optimization), and monitor model performance throughout long-running distributed jobsBuild robust evaluation frameworks and establish metrics to measure model quality and guide decisionsWrite production-grade, well-tested code and set engineering standards for the teamMinimum Qualifications7+ years of ML engineering experienceTechnical leadership experience: mentoring engineers, conducting code reviews, making architecture decisions, and delivering projects with measurable business impactProven experience training and deploying language models to production (embedding models, encoder models, or large language models) including pre-training, continued pre-training, or fine-tuning with rigorous evaluation and inference optimizationExperience preparing large-scale training datasets: data filtering, quality assessment, deduplication strategies, and data mixture designHands-on experience with distributed training frameworks (DeepSpeed, FSDP, Megatron-LM, or Axolotl) including orchestrating multi-node jobs, debugging failures, and optimizing throughputStrong understanding of training dynamics at scale: debugging loss instabilities, tuning learning rate schedules, managing training stability across long-running multi-node jobsExpert Python and PyTorch with production experience using training libraries (Transformers, DeepSpeed, Accelerate)Preferred QualificationsPublished research at ML conferences (NeurIPS, ICML, ICLR, ACL, EMNLP), released models on Hugging Face, created public benchmarks, or contributed to open-source projectsExperience with post-training methods: RLHF, DPO, GRPO, or other reinforcement learning approaches for alignment and instruction-followingExperience optimizing models for production inference including quantization, model compression, distillation, and serving frameworks (vLLM, TensorRT-LLM)Understanding of memory optimization: gradient checkpointing, mixed precision training (FP16, BF16, FP8), ZeRO optimizationDeep knowledge of GPU architectures (A100, H100, H200) and their implications for training and inference optimizationTrack record of building synthetic data generation pipelines for instruction tuning or domain adaptationWhat We OfferHigh-impact AI projects that are strategically vital to the company, with direct engagement from senior leadership including the CEOState-of-the-art infrastructure: H200 GPU fleet, massive proprietary datasets, access to frontier models (OpenAI, Anthropic, Google, Together.ai ) for evaluation and baselinesExpert colleagues who have released top Hugging Face models, authored papers at NeurIPS, created well-known benchmarks, and built multiple production AI systemsSignificant autonomy and freedom to test ideas, experiment with new approaches, and drive technical decisionsModern tooling: Latest ML frameworks, coding assistants, best-in-class development environmentHybrid work model with our Amsterdam office - home to the AI House, bringing together 200+ AI professionals through events, meetups, and startup collaborations Competitive compensation, top-spec MacBook Pro, and an environment genuinely built for professional growth and learningIf you're excited to apply your LLM training expertise to high-impact applications at scale, lead technical initiatives, and grow the next generation of ML engineers, let's talk.Our Diversity & Inclusion Commitment We respect the dignity and human rights of individuals and communities wherever we operate in the world. Building an inclusive workplace where everyone feels welcome and can thrive is critical for us. We provide access to education, which helps everyone understand the important role they play and the positive impact they can have.For a deeper look at our journey and future plans, explore our latest Annual Report . Stay up to date with our latest news to see what makes Prosus stand out. Learn more at www.prosus.com .If you’re excited to push the boundaries of AI applications — and to see your work make a tangible difference on a global scale—let’s talk.
Responsibilities
Job Requirements
Apply now