Brave Search

Struggling with CUDA, Clang and LLVM IR, and getting: CUDA failure: 'Invalid device function'

stackoverflow.com › questions › 67070926 › struggling-with-cuda-clang-and-llvm-ir-and-getting-cuda-failure-invalid-dev

The problem was not related to PowerPC architecture. I needed to pass the fatbin file to the host-side compilation command with -Xclang -fcuda-include-gpubinary -Xclang axpy.fatbin to replicate the whole compilation behavior.

Here is the corrected Makefile:

BIN_FILE=axpy
SRC_FILE=$(BIN_FILE).cu

main: $(BIN_FILE)

# Host Side
$(BIN_FILE).ll: $(SRC_FILE) $(BIN_FILE).fatbin
    clang++ -stdlib=libc++ -Wall -Werror $(BIN_FILE).cu -march=ppc64le --cuda-host-only -relocatable-pch \
        -Xclang -fcuda-include-gpubinary -Xclang $(BIN_FILE).fatbin -S -g -c -emit-llvm

$(BIN_FILE).o: $(BIN_FILE).ll
    llc -march=ppc64le $(BIN_FILE).ll -o $(BIN_FILE).s
    clang++ -c -Wall $(BIN_FILE).s -o $(BIN_FILE).o

# GPU Side
$(BIN_FILE)-cuda-nvptx64-nvidia-cuda-sm_70.ll: $(SRC_FILE)
    clang++ -x cuda -stdlib=libc++ -Wall -Werror $(BIN_FILE).cu --cuda-device-only \
        --cuda-gpu-arch=sm_70 -S -g -emit-llvm

$(BIN_FILE).ptx: $(BIN_FILE)-cuda-nvptx64-nvidia-cuda-sm_70.ll
    llc -march=nvptx64 -mcpu=sm_70 -mattr=+ptx64 $(BIN_FILE)-cuda-nvptx64-nvidia-cuda-sm_70.ll -o $(BIN_FILE).ptx

$(BIN_FILE).ptx.o: $(BIN_FILE).ptx
    ptxas -m64 --gpu-name=sm_70 $(BIN_FILE).ptx -o $(BIN_FILE).ptx.o

$(BIN_FILE).fatbin: $(BIN_FILE).ptx.o
    fatbinary --64 --create $(BIN_FILE).fatbin --image=profile=sm_70,file=$(BIN_FILE).ptx.o \
        --image=profile=compute_70,file=$(BIN_FILE).ptx -link

$(BIN_FILE)_dlink.o: $(BIN_FILE).fatbin
    nvcc $(BIN_FILE).fatbin -gencode arch=compute_70,code=sm_70 \
        -dlink -o $(BIN_FILE)_dlink.o -lcudart -lcudart_static -lcudadevrt

# Link both object files together (either nvcc or clang works here):
$(BIN_FILE): $(BIN_FILE).o $(BIN_FILE)_dlink.o
    nvcc $(BIN_FILE).o $(BIN_FILE)_dlink.o -o $(BIN_FILE) -arch=sm_70 -lc++

Figure 1 in this link includes the creation steps of the fatbinary file.

Answer from AmirSojoodi on Stack Overflow

NVIDIA Developer

developer.nvidia.com › cuda-llvm-compiler

CUDA LLVM Compiler | NVIDIA Developer

NVIDIA has worked with the LLVM organization to contribute the CUDA compiler source code changes to the LLVM core and parallel thread execution backend, enabling full support of NVIDIA GPUs.

LLVM

llvm.org › docs › CompileCudaWithLLVM.html

Compiling CUDA with clang — LLVM 23.0.0git documentation

This document describes how to compile CUDA code with clang, and gives some details about LLVM and clang’s CUDA implementations.

Discussions

Struggling with CUDA, Clang and LLVM IR, and getting: CUDA failure: 'Invalid device function' - Stack Overflow

I am trying to optimize a CUDA code with LLVM passes on a PowerPC system (RHEL 7.6 with no root access) equipped with V100 GPUs, CUDA 10.1, and LLVM 11 (built from source). Also, I tested clang, ll... More on stackoverflow.com

stackoverflow.com

Full CUDA support

Hi Folks, I love what you're doing here and would like to make use of oneAPI at my company for realtime image processing. We currently use CUDA and would like to move to a platform with less ve... More on github.com

github.com

NVIDIA Open Sources CUDA, LLVM-Based Compiler

now just OPENSOURCE YOUR DRIVERS !!!!!!!! More on reddit.com

r/programming

139

656

December 14, 2011

compiler construction - how to compile CUDA to llvm IR? - Stack Overflow

I've been trying for three days to compile a CUDA kernel into llvm IR and I couldn't do it. I've changed langoptions.cpp and added CUDA=1; in the constructor , but still the clang give me Error me... More on stackoverflow.com

stackoverflow.com

Videos

18:54

YouTube

2022 LLVM Dev Mtg: CuPBoP: CUDA for Parallelized and Broad-range ...

December 9, 2022

497

youtube.com

2022 LLVM Dev Mtg: CUDA-OMP — Or, Breaking the Vendor ...

03:47

YouTube

2020 LLVM Developers’ Meeting: “SYCL for CUDA: An overview ...

2020 LLVM Developers’ Meeting: S. Ehrig “Adding CUDA® Support ...

October 27, 2020

59:34

YouTube

CppCon 2016: “Bringing Clang and C++ to GPUs: An Open-Source, ...

October 5, 2016

View all

AMD ROCm

rocm.docs.amd.com › projects › llvm-project › en › latest › LLVM › llvm › html › CompileCudaWithLLVM.html

Compiling CUDA with clang — LLVM 22.0.0git documentation

CUDA is supported since llvm 3.9. Clang currently supports CUDA 7.0 through 12.1.

LLVM

releases.llvm.org › 3.9.1 › docs › CompileCudaWithLLVM.html

Compiling CUDA C/C++ with LLVM — LLVM 3.9 documentation

This document contains the user guides and the internals of compiling CUDA C/C++ with LLVM. It is aimed at both users who want to compile CUDA with LLVM and developers who want to improve LLVM for GPUs. This document assumes a basic familiarity with CUDA.

Llvm

libc.llvm.org › gpu › building.html

Building libs for GPUs - The LLVM C Library

cmake -G Ninja -S llvm -B $HOST_BUILD_DIR \ -DLLVM_ENABLE_PROJECTS="clang" \ -DCMAKE_C_COMPILER=$HOST_C_COMPILER \ -DCMAKE_CXX_COMPILER=$HOST_CXX_COMPILER \ -DLLVM_LIBC_FULL_BUILD=ON \ -DCMAKE_BUILD_TYPE=Release ... Once this has finished, use the newly built compiler to build the C library for the GPU. Select your target architecture (amdgcn-amd-amdhsa or nvptx64-nvidia-cuda).

Stack Overflow

stackoverflow.com › questions › 67070926 › struggling-with-cuda-clang-and-llvm-ir-and-getting-cuda-failure-invalid-dev

Struggling with CUDA, Clang and LLVM IR, and getting: CUDA failure: 'Invalid device function' - Stack Overflow

Top answer

1 of 1

Here is the corrected Makefile:

BIN_FILE=axpy
SRC_FILE=$(BIN_FILE).cu

main: $(BIN_FILE)

# Host Side
$(BIN_FILE).ll: $(SRC_FILE) $(BIN_FILE).fatbin
    clang++ -stdlib=libc++ -Wall -Werror $(BIN_FILE).cu -march=ppc64le --cuda-host-only -relocatable-pch \
        -Xclang -fcuda-include-gpubinary -Xclang $(BIN_FILE).fatbin -S -g -c -emit-llvm

$(BIN_FILE).o: $(BIN_FILE).ll
    llc -march=ppc64le $(BIN_FILE).ll -o $(BIN_FILE).s
    clang++ -c -Wall $(BIN_FILE).s -o $(BIN_FILE).o

# GPU Side
$(BIN_FILE)-cuda-nvptx64-nvidia-cuda-sm_70.ll: $(SRC_FILE)
    clang++ -x cuda -stdlib=libc++ -Wall -Werror $(BIN_FILE).cu --cuda-device-only \
        --cuda-gpu-arch=sm_70 -S -g -emit-llvm

$(BIN_FILE).ptx: $(BIN_FILE)-cuda-nvptx64-nvidia-cuda-sm_70.ll
    llc -march=nvptx64 -mcpu=sm_70 -mattr=+ptx64 $(BIN_FILE)-cuda-nvptx64-nvidia-cuda-sm_70.ll -o $(BIN_FILE).ptx

$(BIN_FILE).ptx.o: $(BIN_FILE).ptx
    ptxas -m64 --gpu-name=sm_70 $(BIN_FILE).ptx -o $(BIN_FILE).ptx.o

$(BIN_FILE).fatbin: $(BIN_FILE).ptx.o
    fatbinary --64 --create $(BIN_FILE).fatbin --image=profile=sm_70,file=$(BIN_FILE).ptx.o \
        --image=profile=compute_70,file=$(BIN_FILE).ptx -link

$(BIN_FILE)_dlink.o: $(BIN_FILE).fatbin
    nvcc $(BIN_FILE).fatbin -gencode arch=compute_70,code=sm_70 \
        -dlink -o $(BIN_FILE)_dlink.o -lcudart -lcudart_static -lcudadevrt

# Link both object files together (either nvcc or clang works here):
$(BIN_FILE): $(BIN_FILE).o $(BIN_FILE)_dlink.o
    nvcc $(BIN_FILE).o $(BIN_FILE)_dlink.o -o $(BIN_FILE) -arch=sm_70 -lc++

Figure 1 in this link includes the creation steps of the fatbinary file.

LLVM

llvm.org › docs › NVPTXUsage.html

User Guide for NVPTX Back-end — LLVM 23.0.0git documentation

To support GPU programming, the ... the back- end, including a description of the conventions used and the set of accepted LLVM IR. ... This document assumes a basic familiarity with CUDA and the PTX assembly language....

Find elsewhere

Google Bing Mojeek

GitHub

github.com › llvm › llvm-project › blob › main › clang › include › clang › Basic › Cuda.h

llvm-project/clang/include/clang/Basic/Cuda.h at main · llvm/llvm-project

//===--- Cuda.h - Utilities for compiling CUDA code ------------*- C++ -*-===//

Author llvm

LLVM

releases.llvm.org › 3.8.0 › docs › CompileCudaWithLLVM.html

Compiling CUDA C/C++ with LLVM — LLVM 3.8 documentation

GitHub

github.com › intel › llvm › discussions › 4832

Full CUDA support · intel/llvm · Discussion #4832

Author intel

Top answer

1 of 1

Hi @dahubley, Codeplay is supporting and maintaining the CUDA backend for Nvidia GPU support in DPC++ for Windows and Linux. You can find instructions for how to build DPC++ with the CUDA backend and how to use it, and there are some video guides on our web page (https://www.codeplay.com/solutions/oneapi/for-cuda/).

On a related note, we are also contributing to a HIP backend for AMD GPU support in DPC++, which is currently experimental and supported on Linux only. There are instructions for how to build with the HIP backend on the DPC++ getting started guide (https://intel.github.io/llvm-docs/GetStartedGuide.html#build-dpc-toolchain-with-support-for-hip-amd).

We are planning to make a binary package available in the first quarter of 2022. It won't be part of the Intel oneAPI releases but we will make it available to download via the Codeplay website and will update it on a regular basis. This binary release will also come with more documentation and guides to help you get the most out of it.

If you would like to talk to us directly about our plans or have any questions you can also email us at sycl@codeplay.com

GitHub

github.com › apc-llc › nvcc-llvm-ir

GitHub - apc-llc/nvcc-llvm-ir: Enabling on-the-fly manipulations with LLVM IR code of CUDA sources · GitHub

What is the best set of GPU-specific LLVM optimizations and how to continue modifying IR after applying them? The first question is the result of opensource CUDA frontend unavailability. In fact the EDG frontend (by Edison Design Group Inc.) used by NVIDIA CUDA compiler is the only frontend that is able to translate CUDA source into LLVM IR.

Starred by 123 users

Forked by 25 users

Phoronix

phoronix.com › news › Compile-CUDA-LLVM

How To Compile CUDA Code With LLVM - Phoronix

November 11, 2015 - Building CUDA codes with LLVM/Clang still currently requires one out-of-tree patch, still obviously requires the CUDA driver/runtime from NVIDIA Corp, and setting various Clang arguments for generating the NVPTX code to then be consumed by NVIDIA's driver stack.

GitHub

github.com › lennyerik › cutransform

GitHub - lennyerik/cutransform: CUDA kernels in any language supported by LLVM · GitHub

Are you tired of having to write your CUDA kernel code in C++? This project aims to make it possible to compile CUDA kernels written in any language supported by LLVM without much hassle.

Author lennyerik

Llvm

prereleases.llvm.org › 18.1.0 › rc3 › docs › CompileCudaWithLLVM.html

Compiling CUDA with clang — LLVM 18.1.0rc documentation

February 22, 2024 - CUDA is supported since llvm 3.9. Clang currently supports CUDA 7.0 through 12.1.

reddit.com › r/programming › nvidia open sources cuda, llvm-based compiler

r/programming on Reddit: NVIDIA Open Sources CUDA, LLVM-Based Compiler

December 14, 2011 - The real question is what the front-end is, not what the backend is (LLVM is the obvious choice there). Their previous CUDA compiler work has used EDG as the front-end, which has a restricted license. The other alternative is clang.

Stack Overflow

stackoverflow.com › questions › 9117109 › how-to-compile-cuda-to-llvm-ir

compiler construction - how to compile CUDA to llvm IR? - Stack Overflow

Top answer

1 of 1

clang –x=CUDA in the public LLVM trunk is something experimental done outside of NVIDIA; you should contact the llvm-dev alias with questions.

NVCC doesn't support emitting LLVM IR.

Stack Overflow

stackoverflow.com › questions › 12099684 › compiling-cuda-with-clang

Compiling CUDA with clang - Stack Overflow

Top answer

1 of 2

Thanks to contributions from Google and others, Clang now supports building CUDA. Command line parameters are slightly different from nvcc, though. According to the official documentation, assuming your file is named axpy.cu, the basic usage is:

$ clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch>  \
    -L<CUDA install path>/<lib64 or lib>              \
    -lcudart_static -ldl -lrt -pthread

Note that using Clang for compiling CUDA still requires that you have the proprietary CUDA runtime from the NVIDIA CUDA toolkit installed.

2 of 2

2016-05-01 Update: clang now supports CUDA. See @rivanvx' answer.

The CUDA compiler is based on LLVM. Clang, though also based on LLVM, does not support CUDA.

NVIDIA Developer Forums

forums.developer.nvidia.com › accelerated computing › cuda › cuda programming and performance

CUDA LLVM compiler samples - CUDA Programming and Performance - NVIDIA Developer Forums

December 15, 2013 - Hello, the page https://developer.nvidia.com/cuda-llvm-compiler says there are samples for building compilers targeting NVidia GPUs. I joined the program, but they said this is now distributed with the SDK. All I found in SDK is /usr/local/cuda/nvvm/libnvvm-samples which has some codes but ...

NVIDIA Developer Forums

forums.developer.nvidia.com › accelerated computing › cuda › cuda nvcc compiler

Using nvcc with the latest version of LLVM/clang - CUDA NVCC Compiler - NVIDIA Developer Forums

March 2, 2023 - Hi, I guess this is a bit more of a “bug report” than a help request. (As by now I do know how to work around this issue.) Also, “bug report” is in quotes as the thing not working is not actually something that NVIDIA w…

Hackage

hackage.haskell.org › package › accelerate-llvm-ptx

accelerate-llvm-ptx: Accelerate backend for NVIDIA GPUs

April 2, 2026 - This library implements a backend for the Accelerate language which generates LLVM IR targeting CUDA capable GPUs.