How to use NVIDIA GPU on podman (RHEL 9 / Fedora 41)
Podman is a container engine for developing, managing, and running containers on your Linux System. With the support for NVIDIA GPUs, you can easily run GPU-accelerated workloads in your containers, making it a great option for machine learning and other high-performance computing tasks.
In this guide, we will cover the necessary steps to set up your server, including installing the necessary drivers and software, configuring the system to recognize the GPU, and running your first container with GPU support.
Installing NVIDIA drivers
1. Make sure you have third-party packages enabled:
https://docs.fedoraproject.org/en-US/workstation-working-group/third-party-repos/
2. Then, install the `akmod-nvidia` package:
sudo dnf install akmod-nvidia
3. Great. Now, make sure you restart your machine.
4. After restart, install the `xorg-x11-drv-nvidia-cuda` package:
sudo dnf install xorg-x11-drv-nvidia-cuda
5. Test if you NVIDIA GPU is working:
nvidia-smi -L
# You should get a result like this:
# GPU 0: NVIDIA GeForce RTX 3070 (UUID: GPU-...)
Install the nvidia-contaiern-toolkit
I wrote an Ansible playbook that will setup everything automatically. Just save the ./playbook-install-nvidia-container-toolkit-podman.yaml
and executed it using the ansible-playbook
CLI as showned down below:
# Install Ansible
sudo dnf install -y ansible
# Run the playbook (you need to provide the "sudo" password)
ansible-playbook playbook-install-nvidia-container-toolkit-podman.yaml --ask-become
#./playbook-install-nvidia-container-toolkit-podman.yaml
---
- name: Install nvidia-container-toolkit for podman
hosts: localhost
connection: local
vars:
# For Fedora 37
distribution: rhel9.0
# Image used to test nvidia-container-toolkit with podman
test_container_image: docker.io/nvidia/cuda:11.6.2-base-ubuntu20.04
tasks:
# -- Preflight checks
- name: Preflight checks (GPU found)
block:
- name: Check if GPU is available
ansible.builtin.shell: nvidia-smi -L
register: nvidia_smi_L
changed_when: false
failed_when: "'UUID: GPU-' not in nvidia_smi_L.stdout"
rescue:
- name: ERROR NVIDIA GPU not found
ansible.builtin.fail:
msg: "ERROR: NVIDIA GPU not found. Please check if the GPU is available."
# -- Install
- name: Install nvidia-container-toolkit and podman
block:
- name: Add nvidia-docker repo
become: true
ansible.builtin.get_url:
url: https://nvidia.github.io/nvidia-docker/{{ distribution }}/nvidia-docker.repo
dest: /etc/yum.repos.d/nvidia-container-toolkit.repo
mode: '0644'
- name: Install xorg-x11-drv-nvidia
block:
- name: Install xorg-x11-drv-nvidia
become: true
ansible.builtin.package:
name: xorg-x11-drv-nvidia
state: present
rescue:
- name: ERROR package couldn't be installed
ansible.builtin.fail:
msg: "ERROR: package xorg-x11-drv-nvidia couldn't be installed. Did you enable RPM Fusion? Check https://rpmfusion.org/Configuration"
- name: Install nvidia-container-toolkit and podman
become: true
ansible.builtin.package:
name:
- nvidia-container-toolkit
- podman
state: present
- name: Set no-cgroups to true
become: true
ansible.builtin.lineinfile:
path: /etc/nvidia-container-runtime/config.toml
regexp: '^#no-cgroups = false'
line: 'no-cgroups = true'
state: present
# -- Test
- name: Check if the GPU is visible from the container
ansible.builtin.shell: >-
podman run --rm --security-opt=label=disable \
--hooks-dir=/usr/share/containers/oci/hooks.d/ \
{{ test_container_image }} \
nvidia-smi -L
register: container_nvidia_smi_L
changed_when: false
failed_when: "'UUID: GPU-' not in container_nvidia_smi_L.stdout"
If you get no errors, then everything should be ready on your machine.
How to use
You will need to pass two flags in order to allow the podman
container to access your host's GPU:
# You must provide the following lines:
# --security-opt=label=disable
# --hooks-dir=/usr/share/containers/oci/hooks.d/
podman run --rm -it \
--security-opt=label=disable \
--hooks-dir=/usr/share/containers/oci/hooks.d/ \
docker.io/nvidia/cuda:11.6.2-base-ubuntu20.04
# (inside the container)
nvidia-smi