Inference Engineer - Acceleration
Description
Kaiko is building a next-generation agentic clinical AI assistant that helps clinicians reason across patient data, guidelines, and diagnostics. The serving stack runs open-weight MoE bases in the hundreds-of-billions to trillion-parameter range. You own cost-per-token on our Blackwell inference cluster, identify utilization gaps, and ship optimizations that push throughput, latency, and uptime. You work alongside product and research as new workloads, and models land. The hard problems are disaggregated prefill/decode with RDMA KV transfer, KV-cache hierarchy across memory tiers, low-precision MoE serving, and long-context attention.
You will be based in either The Netherlands or Switzerland, with the expectation of spending at least 50% of your time at the office.
Responsibilities
- Instrument and analyze the inference stack on Blackwell, token cost, throughput, latency, uptime — and own the path to the cost target.
- Tune scheduling and admission control to hold the stack at its cost floor across ramp-up and steady-state regimes.
- Own the KV-cache hierarchy and the prefill / decode split.
- Drive low-precision MoE serving with quality regression gates.
Qualifications
- Deep GPU systems experience, with kernel-level CUDA / Triton work and comfort with CUTLASS, FlashInfer / Flash Attention, and Nsight profiling.
- Shipped a production inference stack at scale (vLLM, SGLang, TensorRT-LLM, or equivalent).
- Roofline literacy: arithmetic intensity, critical batch, prefill vs decode, KV-cache cost.
- Tracks the relevant systems literature and brings it into the stack.
- Nice to have: Quantisation kernel work (FP8 / FP4 expert weights, AWQ / GPTQ, custom dequant paths).
- MoE serving experience — expert parallelism, routing, batching with imbalanced experts.
- Experience scheduling shared training and inference on the same fleet.
- Healthcare or other regulated-deployment exposure.
Why kaiko
At kaiko, we believe the best ideas come from collaboration, ownership and ambition. We've built a team of international experts where your work has a direct impact. Here's what we value:
- Ownership: You'll have the autonomy to set your own goals, make critical decisions, and see the direct impact of your work.
- Collaboration: You'll have to approach disagreement with curiosity, build on common ground, and create solutions together.
- Ambition: You'll be surrounded by people who set high standards for themselves and others, who see obstacles as opportunities, and who are relentless in their work to create better outcomes for patients.
In addition, we offer:
- An attractive and competitive salary, a good pension plan, and 25 vacation days per year.
- Great offsites and team events to strengthen the team and celebrate successes together.
- A EUR 1000 learning and development budget to help you grow.
- Autonomy to do your work the way that works best for you, whether you have a kid or prefer early mornings.
- An annual commuting subsidy.