Internship — Hybrid Models and Inference Systems for LLMs
Description
IBM Research Zurich is one of IBM’s leading global research laboratories and is at the forefront of research shaping the future of information technology. We foster close collaboration with academia and industry and offer a unique environment combining long-term research with real-world impact. Our AI Platform research team is offering a paid internship position (3 months, flexible timing with a start date in July) for a highly motivated student to work on modern inference systems for large language models (LLMs) .
Context
Large language models are increasingly deployed in latency- and throughput-critical environments, driving the need for new model architectures and highly efficient inference systems. Recent hybrid and sparse architectures move beyond full quadratic attention by combining multiple computational paradigms within a single model. These include mixtures of full attention with state-space models or linear attention , as well as explicitly sparse attention mechanisms such as sliding-window attention, block-sparse or pattern-based attention, and structured sparse schemes as used in frontier LLM models. Such hybrid architectures aim to balance model quality, context length, and computational efficiency, but pose significant challenges for inference systems, which must efficiently execute heterogeneous operators and sparsity patterns at scale. At the same time, modern LLM inference platforms must support these models across diverse hardware accelerators while maintaining high performance and portability.
In this project, you will work at the intersection of model architecture, sparse and hybrid attention mechanisms, inference optimisation, and system design , closely connected to the vLLM open-source project , one of the most widely used LLM inference engines. Possible research directions include:
- Hybrid LLM architectures leveraging sparse attention mechanisms (such as SWA), linear attention, or state-space models.
- Inference optimisations for large-scale serving and post-training techniques.
- Efficient execution of hybrid models in production-grade inference engines.
- Portability of modern LLM inference systems across heterogeneous hardware accelerators.
You will collaborate with a global research team at IBM Research and contribute directly to open-source development . A successful internship is expected to result in a paper submission to a top-tier conference, open-source contributions to vLLM, or both .
Requirements
- Enrolled in or in possession of a Master’s degree in computer science or a closely related field.
- Strong interest in computer systems, machine learning systems, or AI infrastructure.
- Excellent programming skills, particularly in Python and PyTorch.
- Familiarity with Linux environments and modern software development tools (git/GitHub, containers, virtual environments).
- Strong analytical thinking, creativity, and problem-solving ability.
Preferred Qualifications
- Experience with LLMs, transformer architectures, or hybrid model variants.
- Experience with LLM inference frameworks, such as vLLM, or large-scale model serving.
- Background in systems programming, performance optimisation, or hardware-aware software design.
- Prior experience contributing to open-source projects.
- Excellent written and spoken English with good presentation skills.
- Strong interpersonal and collaboration skills.
What We Offer
An exciting research internship in a world-class research environment, close collaboration with experienced research scientists and engineers, and exposure to one of the most active open-source ecosystems in modern AI systems.
Diversity & Work Environment
IBM is committed to fostering diversity and inclusion in the workplace. You will join an open, multicultural research environment that values different perspectives and supports flexible working arrangements. Our goal is to help all genders and backgrounds thrive professionally while maintaining a healthy work–life balance.
How to Apply
If you are interested in this position, please submit your application through the link below.
Apply
Interview process
After the initial screening based on the uploaded documentation, identified candidates will be contacted for a first technical discussion on their experience, background, and motivations, followed by coding interview and an AI/ML interview. Further selection steps might be added based on candidates skills and project needs.
If you have any question related to this position, please contact Prof. Dr. Bert Offrein, Manager Co-packaged Optics at ofb@zurich.ibm.com or tel. +41 44 724 8572. If you have any question related to this position, please contact Dr. Cezar Zota, zot@zurich.ibm.com .