vLLM Blog

Disaggregated Serving for Hybrid SSM Models in vLLM

Apr 21, 2026·15 min read

Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way to combine the linear-time...

#mamba

Disaggregated Serving for Hybrid SSM Models in vLLM