Aditya Rajagopal

We're building the tool to allow anyone to try their hand at building GPU kernels

Hey everyone! I think writing GPU code is going to be something that's going to become ubiquitous even though today it's a tough and time consuming process. Claude Code and Cursor agents have gotten great at writing GPU kernels once they know what to target. What's missing is the tooling to get them there.

We're building that tooling at https://ncompass.tech. Our agent analyzes the profile and gives back a list of bottlenecks and ways to solve them. Once that's known, you can feed that to Claude Code / Cursor and they will implement the code.

Using Claude Code and the nCompass agent, we implemented a Hopper GEMM kernel that outperformed NVIDIA's CUTLASS GEMMs by 3%, within a day - this took us months once.

I'd love fo you to try this for yourselves - and let me know how it goes! If you have access to an NVIDIA GPU:

  1. Install our extension in VSCode / Cursor with the docs here (https://docs.ncompass.tech/ai-assistant) and enable our agent MCP

  2. Create a skills file that tells Claude Code / Cursor to:

    • Create a basic matrix multiplication in CUDA along with an environment to build it and profile it using Nsight Compute - this is where you would normally start.

    • Create a reference which uses torch.matmul - this is your reference kernel.

  3. Create a separate skills files that tells Claude Code / Cursor to:

    • Optimize the naieve kernel to match / beat the torch.matmul kernel

    • Use the environment created above to profile for both correctness and performance

    • Use the nCompass agent to analyze .ncu-rep files to get suggestions on bottlenecks

Leave this to run and you should end up with a kernel of your own that's as performant as NVIDIA's kernel.

Would love to hear your feedback!

10 views

Add a comment

Replies

Be the first to comment