研究LMSYS Blog· 06-26

Improving DeepEP MoE Load Balance in SGLang with Waterfill and LPLB

Mixture-of-Experts models rely on Expert Parallelism to scale inference across multiple GPUs; this post covers Waterfill and LPLB in SGLang.

该条目暂未提供中文翻译。标题/摘要已自动中译;本系统只对人工挑选的内容生成全文翻译。

挑中后 → markitdown 取正文 → 精翻 → 此处切换为译文