feat: support quickgelu operator on metax#1343
Open
LindseyMei wants to merge 1 commit into
Open
Conversation
Add MetaX backend for the quickgelu elementwise operator, reusing the existing cuda::QuickGeluOp kernel through the elementwise MetaX descriptor.
Changes:
- Add quickgelu/metax/quickgelu_metax.{h,maca}
- Wire MetaX into quickgelu/operator.cc
- Clean up quickgelu/cuda/kernel.cuh: remove nvidia-specific elementwise include and use cuda_bfloat16 for cross-backend compatibility
- Update nvidia/quickgelu_nvidia.cu to use cuda_bfloat16
- Register quickgelu ctypes bindings in test/libinfiniop/op_register.py
- Add test/infiniop/quickgelu.py for correctness verification
Verified with test/infiniop/quickgelu.py --metax on MetaX C500: passes accuracy check against torch reference (x * sigmoid(1.702 * x)) across shapes/strides and inplace/out-of-place for F16/F32/BF16.
Signed-off-by: LindseyMei <648816901@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds the MetaX backend for the
quickgeluelementwise operator, mirroring the existingsiluMetaX implementation.What changed
src/infiniop/ops/quickgelu/metax/quickgelu_metax.handquickgelu_metax.maca.op::quickgelu::cuda::QuickGeluOpkernel fromquickgelu/cuda/kernel.cuhthrough the elementwise MetaX descriptor.quickgelu/operator.ccwith 5#ifdef ENABLE_METAX_APIblocks (include + CREATE / GET / CALCULATE / DELETE).quickgelu/cuda/kernel.cuh: removed the nvidia-specificelementwise_nvidia.cuhinclude and changed__nv_bfloat16tocuda_bfloat16so the kernel can be reused by both NVIDIA and MetaX backends.quickgelu/nvidia/quickgelu_nvidia.cuto usecuda_bfloat16consistently.quickgeluctypes bindings intest/infiniop/libinfiniop/op_register.py.test/infiniop/quickgelu.pyfor correctness verification.xmake/metax.luaalready globsops/*/metax/*.maca.Verification
XMAKE_ROOT=y xmake -y -j4and installed to~/.infini.python3 test/infiniop/quickgelu.py --metaxpasses accuracy checks againstx * sigmoid(1.702 * x)for F16 / F32 / BF16 across multiple shapes, strides, and inplace / out-of-place modes.--dry-run --Werror) on all modified C/C++ files.