Skip to content

feat: support quickgelu operator on metax#1343

Open
LindseyMei wants to merge 1 commit into
InfiniTensor:mainfrom
LindseyMei:feat/metax-quickgelu
Open

feat: support quickgelu operator on metax#1343
LindseyMei wants to merge 1 commit into
InfiniTensor:mainfrom
LindseyMei:feat/metax-quickgelu

Conversation

@LindseyMei

Copy link
Copy Markdown

This PR adds the MetaX backend for the quickgelu elementwise operator, mirroring the existing silu MetaX implementation.

What changed

  • Added src/infiniop/ops/quickgelu/metax/quickgelu_metax.h and quickgelu_metax.maca.
  • Reused the existing op::quickgelu::cuda::QuickGeluOp kernel from quickgelu/cuda/kernel.cuh through the elementwise MetaX descriptor.
  • Wired MetaX into quickgelu/operator.cc with 5 #ifdef ENABLE_METAX_API blocks (include + CREATE / GET / CALCULATE / DELETE).
  • Cleaned up quickgelu/cuda/kernel.cuh: removed the nvidia-specific elementwise_nvidia.cuh include and changed __nv_bfloat16 to cuda_bfloat16 so the kernel can be reused by both NVIDIA and MetaX backends.
  • Updated quickgelu/nvidia/quickgelu_nvidia.cu to use cuda_bfloat16 consistently.
  • Registered quickgelu ctypes bindings in test/infiniop/libinfiniop/op_register.py.
  • Added test/infiniop/quickgelu.py for correctness verification.
  • Supports BF16 / F16 / F32 / F64.
  • No xmake changes needed: xmake/metax.lua already globs ops/*/metax/*.maca.

Verification

  • Built successfully on MetaX C500 (MACA 3.3.0.15) with XMAKE_ROOT=y xmake -y -j4 and installed to ~/.infini.
  • python3 test/infiniop/quickgelu.py --metax passes accuracy checks against x * sigmoid(1.702 * x) for F16 / F32 / BF16 across multiple shapes, strides, and inplace / out-of-place modes.
  • clang-format clean (--dry-run --Werror) on all modified C/C++ files.

Add MetaX backend for the quickgelu elementwise operator, reusing the existing cuda::QuickGeluOp kernel through the elementwise MetaX descriptor.

Changes:

- Add quickgelu/metax/quickgelu_metax.{h,maca}

- Wire MetaX into quickgelu/operator.cc

- Clean up quickgelu/cuda/kernel.cuh: remove nvidia-specific elementwise include and use cuda_bfloat16 for cross-backend compatibility

- Update nvidia/quickgelu_nvidia.cu to use cuda_bfloat16

- Register quickgelu ctypes bindings in test/libinfiniop/op_register.py

- Add test/infiniop/quickgelu.py for correctness verification

Verified with test/infiniop/quickgelu.py --metax on MetaX C500: passes accuracy check against torch reference (x * sigmoid(1.702 * x)) across shapes/strides and inplace/out-of-place for F16/F32/BF16.

Signed-off-by: LindseyMei <648816901@qq.com>
@LindseyMei LindseyMei requested a review from a team June 26, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant