vllm.compilation.passes.fusion.rms_quant_fusion ¶
FusedRMSQuantKey ¶
Bases: NamedTuple
Named tuple for identifying the type of RMSNorm + quant fusion. quant: type of quantization fused_add: does the op also perform the residual add
Source code in vllm/compilation/passes/fusion/rms_quant_fusion.py
RMSNormQuantFusionPass ¶
Bases: VllmPatternMatcherPass
This pass fuses rms_norm & quant custom ops into a fused rms_norm_quant op. It also supports fused_add_rms_norm.
Source code in vllm/compilation/passes/fusion/rms_quant_fusion.py
593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 | |
_rms_input_weight_dtype_match ¶
_rms_input_weight_dtype_match(match: Match) -> bool
Prevent fusion when rms_norm input and weight dtypes differ.