Machine Learning Engineer (LLMs + RL)88df39e6-21ac-4355-a7b8-c68dc66046d5 - Job Board X

 See all jobs

AgileRL

11-50

Employees

Machine Learning Engineer (LLMs + RL)

London

Full-time

Not specified

Salary

Sponsorship

Posted on:

March 3, 2026

15% more than your current base salary

SAVE

APPLY

👥

Clicked Apply

Job Description

At AgileRL, we are on a mission to accelerate reinforcement learning for building superhuman artificial intelligence systems.‍We believe that reinforcement learning will form a part of every sophisticated AI system of the future. It already impacts the world we live in, from its use in creating LLMs with reasoning capabilities, to enabling autonomous vehicles to make decisions. Reinforcement learning enables AI models to plan and achieve objectives, but currently very few companies or individuals have the resources to leverage this powerful machine learning paradigm.‍AgileRL offers Arena, an enterprise-grade reinforcement learning operations (RLOps) platform and a state-of-the-art open-source framework to eliminate these barriers to entry. Our framework has already achieved 10x faster training and hyperparameter optimisation than leading RL libraries. Arena, built on top of our open-source framework, is focused on four key areas - simulation, training, deployment and monitoring.‍We work closely with companies across industries including finance, defence, and technology to deliver best-in-class autonomous solutions. We are looking for talented engineers to join the team and develop the systems and tools that will enable the next wave of impactful AI.‍As a member of the AgileRL team, you will have the opportunity to be at the forefront of reinforcement learning innovation. We value curiosity, creativity, and a passion for pushing boundaries. Together, we will build not only state-of-the-art software but also a culture of excellence, collaboration, and continuous learning.We are seeking a talented and experienced Machine Learning Engineer to join our team and contribute to the development of a first-of-its-kind RLOps platform. As a Machine Learning Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure, tools, and services that enable businesses to build and deploy reinforcement learning models efficiently and effectively.Responsibilities:Collaborate with the team to understand requirements and design the architecture of the Arena platform and open-source framework.Develop scalable and reliable infrastructure to support LLM training, reinforcement fine-tuning, model deployment, and management.Integrate existing machine learning frameworks and libraries into the platform and open-source framework, providing a range of algorithms, environments, and tools for AI model development.Stay up-to-date with the latest advancements in AI, MLOps, reinforcement learning algorithms, tools, and techniques, and incorporate them into the platform as appropriate.Provide technical guidance and support to internal users and external customers using the Arena platform and open-source framework.Requirements:Master's or Ph.D. degree in Computer Science, Engineering, or a related field, or 3+ years of relevant industry experience.Solid understanding of LLM training, reinforcement learning algorithms and concepts, with hands-on experience in building and training AI models.Strong programming skills, with experience using ML frameworks and libraries (e.g. PyTorch, TensorFlow, Ray, Gym, TRL, DeepSpeed, VLLM), and MLOps tools.Experience in building machine learning platforms or tooling for industrial or enterprise settings.Proficiency in data management techniques, including storage, retrieval, and pre-processing of large-scale datasets.Familiarity with model deployment and integration, including the development of APIs and deployment pipelines, and performance optimisation.Experience in designing and developing cloud-based infrastructure for distributed computing and scalable data processing.Deep understanding of software engineering and machine learning principles and best practices.Strong problem-solving and communication skills, and the ability to work independently as well as in a team environment.Compensation:Competitive salary + significant stock options.30 days of holiday, plus bank holidays, per year.Flexible working from home and 6-month remote working policies.Enhanced parental leave.Learning budget of £500 per calendar year for books, training courses and conferences.Company pension scheme.Regular team socials and quarterly all-company parties.Cycle-to-work scheme.Join the fast-growing AgileRL team and play a key role in the development of cutting-edge reinforcement learning tooling and infrastructure. Learn more about AgileRL at https://agilerl.com