London
Full-time
Not specified
Mid-Senior level
Salary
Sponsorship
15% more than your current base salary
SAVE
APPLY
👥
45
Clicked Apply

Job Description

Principal Platform Engineer / Team Lead - Azure Data & AI PlatformThe RoleYou'll lead a team senior platform engineers building and operating our Azure data and AI platform. This is a technical leadership role, not a people manager hiding from the work. You'll be 60% hands-on (architecture, complex problems, unblocking the team) and 40% leadership (mentoring, standards, strategy, team development).You have a mandate for continual improvement. Your job is to make the platform better, the team stronger, and the engineering practices more mature - month over month. You'll define what "good" looks like and hold the line on quality while keeping delivery moving.This role reports to the Head of Data Platform Engineering & Operations and has direct reports. What You'll Actually DoTechnical Leadership & Architecture (35%)Run a full Agile workstream.Own the technical vision and roadmap for the Azure platformWork with the Head of Data Platform Engineering & Operations to make architectural decisions on network design, data architecture, MLOps patterns, and security modelsReview and approve significant infrastructure changes and RFCsSolve the hardest technical problems the team faces - the ones that require years of hard-won experienceRepresent platform engineering in leadership discussions about technical direction and prioritiesStay hands-on: you're still writing Terraform, reviewing PRs, debugging production issuesEvaluate and pilot new technologies/services that could improve the platformMake build-vs-buy decisions and challenge vendor claims with evidenceStandards, Best Practice & Quality (25%)Define and document platform engineering standards - Terraform patterns, pipeline structures, security controls, documentation requirementsEstablish and enforce code review quality bars without creating review bottlenecksImplement and monitor SLIs/SLOs for platform services - what does "platform is working" mean?Drive adoption of DevSecOps practices: security scanning, vulnerability management, secrets rotation, least privilegeRun retrospectives and blameless post-mortems - turn incidents into improvements, not finger-pointingChampion technical debt management - maintain the backlog, prioritise paydown, prevent accumulationBuild consensus around best practices while remaining pragmatic about exceptionsCreate and maintain architecture decision records (ADRs) and RFCs for significant choicesTeam Development & Mentoring (20%)Mentor the senior engineers - career development, technical growth, leadership skills for those who want itRun 1-on-1s that matter - career conversations, skill gaps, blockers, workload balanceCreate growth plans and ensure people have challenging work that develops themBuild a learning culture: lunch-and-learns, RFC reviews, pair programming, knowledge sharingIdentify skill gaps in the team and address through hiring, training, or reorganisationCoach engineers on communication, stakeholder management, and influence without authorityDevelop succession planning - ensuring there's always someone to cover when neededHandle performance issues directly and promptly - no festering problemsProcess & Continual Improvement (15%)Drive platform maturity - move from reactive to proactive, from manual to automated, from tribal knowledge to documentedImplement and iterate on team processes: sprint planning (if you sprint), incident response, on-call rotation, knowledge managementTrack and improve key metrics: deployment frequency, lead time, MTTR, change failure rateRun quarterly improvement initiatives based on team retros and pain pointsEstablish platform team rituals that add value: design reviews, demo days, incident reviewsRemove blockers and organisational friction that slow the team downBuild relationships with stakeholder teams (data engineering, ML, security, compliance) to smooth collaborationPush back on unreasonable demands and protect the team from organisational chaosStakeholder Management & Communication (5%)Translate technical work into business value for leadershipCommunicate platform roadmap, incidents, and status to stakeholdersManage expectations and negotiate priorities with product, data science, and other engineering teamsEscalate and resolve cross-team conflicts and dependenciesAdvocate for platform investment (budget, headcount, tooling) with data-driven arguments What We Need From YouRequired Experience10+ years in platform/infrastructure engineering with at least 3 years leading technical teamsDeep Azure expertise - you've architected multi-subscription environments with complex networking, security, and governance requirementsDatabricks and data platform experience - you understand data architecture, not just infrastructureMLOps/AI platform knowledge - you've built production ML systems and know the operational challengesProven track record establishing standards and best practices that teams followedMentoring and developing senior engineers - you've grown people into better engineersInfrastructure as Code mastery - Terraform (or Bicep) at scale, with modules, state management, and testingDevSecOps implementation - you've built secure pipelines and embedded security into engineering workflowsIncident response and production operations - you've been on-call, managed incidents, and improved systems based on failuresCritical Leadership QualitiesTechnical credibility - the senior engineers respect your technical judgment because you've earned itClear decision-making - you gather input, make decisions, explain your reasoning, and commitComfortable with conflict - you have hard conversations about quality, performance, and standardsServant leadership mindset - your job is to make the team successful, not to be the heroTeaching ability - you can explain complex technical concepts and help people develop masteryIntellectual humility - you admit mistakes, change your mind with new evidence, and don't have ego tied to being right About The RoleHowden Group Services is expanding its AI & Data Science capabilities and is looking for an AI Deployment Engineer to help accelerate our transformation and build enterprise-grade solutions with AI at their core that will be used by hundreds of colleagues across the Group. You will have a dual reporting line into the Group Head of Data Science and the Group Head of Data Operations and will bring deep technical expertise on cloud engineering and SRE with a focus on AI applications. You will be given freedom to experiment, test and bring new technologies that push the envelope on using AI to solve enterprise problems and apply them to the complex business domain of commercial insurance. Role Responsibilities Design and develop scalable and secure infrastructure and CI/CD pipelines for AI solutionsEngineer and maintain production-ready RAG infrastructures and Vector Databases for our AI use cases and implement efficient retrieval strategies for dataWork with our AI Engineers and Data Scientists to develop and maintain a highly reliable Model Serving Layer and make our models available as scalable and reliable services, including for LLM accessAct as a technical and platform authority for AI Solutions and provide thought leadership to the rest of the Data Science and Data Platform team on the latest AI technologies and solutionsEngineer and maintain infrastructure and data pipelines for custom model fine-tuningImplement and manage standardised Agent Frameworks, multi-agent systems and autonomous decision-making frameworksWork with AI Engineers and Data Scientists to build appropriate observability, logging and monitoring solutions for our AI use cases, including model performance KPIs, token usage, drift and hallucination detectionDevelop, build and maintain the Howden Enterprise AI Platform, including robust security, networking, and access strategiesImplement a robust FinOps framework for our AI use cases allowing for precise cost controls and chargebacksContribute to an excellent developer experience by building robust SDKs, APIs as well as drafting clear documentation and knowledge-sharing artefacts

Responsibilities

Job Requirements

Apply now
Read Full Description

More job openings